Sample 02
Sample 02
Sample 02
1, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2946342
Corresponding authors: Kang Hao Cheong (kanghao_cheong@sutd.edu.sg) and Neng-Gang Xie (xienenggang@aliyun.com)
This work was supported in part by the Singapore University of Technology and Design Start-up Research Grant under Grant SRG SCI
2019 142, and in part by the Science and Technology Major Project of Anhui Province under Grant 17030901037.
ABSTRACT Data centers are mission-critical infrastructures. There are strict service level requirements
and standards imposed on operators and maintainers to ensure reliable run-the-clock operation. In the
context of thermal management and data hall environmental control, the formation of hot and cold spots
around server cabinets are especially undesirable, and can result in overheating, lifespan reductions, and
performance throttling in the former and condensation damage in the latter. In this paper, we present a
comprehensive multi-pronged methodology in data center environmental control, comprising computational
fluid dynamics (CFD) simulation-aided predictive design first-stage approach, and a complementing Internet
of Things (IoT) reactive management system that autonomously monitors and regulates fluctuations in
thermal parameters. The novel hybrid methodology is demonstrated on various test scenarios derived from
real-world context, and prototypes of the IoT system have been experimentally validated. The approach
is shown to be efficient in eliminating unfavourable environmental variations and provides an enhanced
understanding of common design problems and respective mitigation measures.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 7, 2019 153799
K. H. Cheong et al.: Novel Methodology to Improve Cooling Efficiency at Data Centers
While stringent measures are typically taken in the design Prior to the release of these guidelines, environmental
and operation of data centers to ensure that key data hall parameters required for the operation of data center comput-
environmental parameters are within acceptable levels, unde- ing hardware were unregulated and manufacturer-dependent,
sired temperature variations in the form of hot and cold often resulting in issues of ambiguity and consistency over
spots may nevertheless arise. These can be transient in correct operating standards [32]. In response to the industrial
nature [22] or caused by unforeseen on-site infrastructural void on consistent and unambiguous operating conditions,
constraints or loads, and are therefore difficult to suppress these guidelines provide a common environmental interface
through traditional a priori planning. As such, innovative for the equipment and its surroundings, guidance on the eval-
approaches to address these needs are extremely impor- uation and testing of the operational health of data centers,
tant and it is the intent of this paper to explore meth- and a standardized methodology for reporting the environ-
ods to optimize environmental control, such that thermal mental characteristics of computer systems.
regulatory failure and related effects can be efficiently With reference to these guidelines [31], a data center oper-
avoided. While existing literature and industry best practices ating environment consisting of mission-critical server infras-
already encompass computational simulation-aided design tructure is classified under class A1 with regard to environ-
methodologies [23]–[26] and dynamic load-balancing tech- mental control measures. It is recommended that operating
niques at various software and hardware abstraction temperatures of these mission-critical server infrastructures
levels [27], [28], the fusion of different approaches have be within a temperature range of 18–27◦ C at 60% typi-
remained practically limited, and the exploitation of smart cal relative humidity, with an allowable limit of 15–32◦ C
networks also presents much untapped potential [29], [30]. at 20–80% relative humidity. Parameters out of the upper
In this paper, we propose a multi-pronged approach com- boundaries of these specifications are classified as hot spots
prising a predictive infrastructure design methodology while parameters out of the lower boundaries of these speci-
utilizing computational fluid dynamics (CFD) and heat fications are classified as cold spots [31].
transport simulations, coupled with an Internet of things (IoT)
reactive management system to sense transient fluctuations B. HOT AND COLD SPOTS
in environmental parameters and effect adjustments in near Unfavourable thermal conditions in a data center may be
real-time. An accurate three-dimensional representation of binarily classified as hot and cold spots. Hot spots are defined
a data hall along with its infrastructural sub-components as localized regions of hot air at the intakes of network and
is constructed, on which the proposed approach is demon- server equipment, in this study taken to be greater than tem-
strated. The approach is shown to be effective for various perature values recommended by the ASHRAE TC9.9 guide-
scenarios, encompassing differing equipment placement and lines. Prolonged exposure to hot spots is detrimental to the
configurations, and differing airflow and thermal condi- reliability, performance and longevity of electronic hardware,
tions. A practical engineering prototype of the IoT man- and may void warranties and maintenance agreements of
agement system and its reactive operational modes are also hardware manufacturers in more severe cases [33].
demonstrated. Likewise, cold spots refer to localized regions of cold air
We first present a theoretical review of data center envi- resulting from excessive cooling. Prolonged exposure to cold
ronmental control principles and guidelines and relevant spots results in increased risks of corrosion or liquid damage
thermal-airflow phenomena in Section II. We then present to electronic equipment resulting from the condensation of
the implementation and results of the predictive first stage of moisture [34], [35]. For the purposes of this study, we take the
our approach in Section III and the reactive second stage in minimum allowable temperature of cool air to be consistent
Section IV, followed by a discussion in Section V. with the ASHRAE TC9.9 guidelines. In light of the potential
damage to equipment and degradation in performance, it is
II. THEORETICAL REVIEW imperative that both hot and cold spots be avoided, or other-
A. TECHNICAL GUIDELINES wise quickly addressed once discovered.
We make critical reference to the ASHRAE TC9.9 Ther-
mal Guidelines for Data Processing Environments white C. PRINCIPLES OF COOLING IN DATA CENTERS
paper [31] in this study, in order to set appropriate envi- Server cabinets in a data center are typically arranged in rows
ronmental control objectives for the design and evaluation with consistent orientations, such that all air intakes face a
of the proposed CFD/IoT predictive-reactive approach. The cold aisle where cool air is supplied, and all air exhausts face
ASHRAE TC9.9 white paper was first written by the Amer- a hot aisle where heated air is removed. A sectional view of
ican Society of Heating, Refrigerating and Air-Conditioning a data hall arranged in such a hot-cold aisle configuration is
Engineers (ASHRAE) technical committee (TC) in 2004 to shown in Figure 1. In data centers with multiple rows of cab-
create a common set of environmental parameter guidelines inets, the orientations of the racks are arranged to alternate,
across hardware manufacturers supplying mission-critical such that the cold-hot aisle configuration is preserved. Cold
computing hardware, and has been periodically revised to air supplied from computer room air conditioning (CRAC)
address changes in technology and to enhance the compre- units is channeled through the raised floor plenum to servers
hensiveness of the guidelines. via perforated floor tiles located along the cold aisle;
A. SIMULATION SOFTWARE
We utilize Future Facilities 6 Sigma Room, a special-
ized virtual facility (VF) and computational fluid dynam-
ics (CFD) software for the simulation of the data center
environment. The software iteratively solves the many simul-
taneous equations representing the conservation equations FIGURE 4. Floor plan of the modelled data hall.
and Navier-Stokes equations for fluid dynamics and heat
transport. The numerical scheme defines mass, momentum TABLE 1. Simulation constants for the computational fluid
and energy on a three-dimensional automatically-generated dynamics (CFD) study. Symbols Hrm , Hrf , Qnom , Uair , Tair , Pmed and Phigh
mesh [40]. In comparison to other conventional CFD software denote room height, raised floor height, nominal cooling load, air flow
rate, air supply temperature set point, low-density thermal power, and
choices such as Ansys Fluent and Autodesk CFD, 6 Sigma medium-density thermal power respectively. The 42U cabinet form factor
DC Room is extensively used by data center operators glob- corresponds to 2050 mm in height by 600 mm in width by 900 mm in
depth.
ally [25], and its ease of convenience of use with built-in
libraries of data center-specific hardware that are individually
modelled, tested and validated to match real-world opera-
tional behaviours renders it preferable in our context. The
accuracy of the software has been independently validated
by end-users and leading academic institutions prominent in
performing audits of CFD tools [41] to yield error differences
within 5% of real-world results, thereby lending confidence
to the predictions made through simulations.
B. SIMULATION ENVIRONMENT
Detailed simulations are conducted on a 91 m2 , 28-cabinet
capacity data hall with 4 CRAC units arranged in a hot-cold
aisle configuration (Figure 4). This configuration was cho-
sen as it represents the most commonly used arrangement
in the data center industry, and conforms to the industrial simulations are first run to yield a comparison basis, and
best practices for efficient cooling of servers in a data hall suggested mitigation measures are evaluated against these.
setting—specifically, CRAC units at each end of server rows,
hot aisle width of approximately 1000 mm and cold aisle 1) COLD AISLE CONTAMINATION
width of approximately 1200 mm [31], [42]. Specific con- This scenario investigates undesired temperature variations
ditions imposed on the infrastructural components of the caused by the contamination of the cold aisle. Such contam-
data hall, such as floorboards, servers, and CRAC units, are ination can be caused by the improper placement of servers,
presented in Table 1. Likewise, these represent typical values or back-flow resulting from excessive heated air unable to
expected in modern data centers. be promptly returned to the ceiling plenum and recirculated.
The contamination of the cold aisle increases the temperature
C. SCENARIOS & RESULTS of ingested air into neighboring server racks, and diminishes
Simulation variables and scenarios are aplenty and read- cooling efficiency. A baseline simulation run on the data
ily explorable in a virtual setting. We shall focus on hall with an empty rack position reveals two sites of con-
common hot-cold spot scenarios encountered in the tamination in the central cold aisle and the formation of a
industry, as well the feasibility and effectiveness of hot spot (Figure 5a), with two server cabinets found to be
plausible mitigation measures, in this study. Three dis- in non-compliance with ASHRAE 2011 Class A1 standards
tinct scenarios are demonstrated—cold aisle contamination (Figure 5b). This necessitates corrective action to prevent
(Section III-C1), inadequate cooling in a semi-occupied thermal overload on the affected cabinets.
data hall (Section III-C2), and cold-hot aisle containment A plausible solution to the observed undesired thermal
(Section III-C3). For all demonstrated scenarios, baseline scenario is to introduce perforated floorboards integrated with
FIGURE 5. (a) Resultant heat map of the baseline simulation with one empty cabinet position, showing two sites of contamination in the central cold
aisle and one hot spot (circled), and (b) the corresponding ASHRAE temperature compliance map. (c) The addition of two VCDG floor grills to mitigate
cold aisle contamination. (d) Resultant heat map of the post-mitigation simulation with the two cold aisle contamination sites eliminated (circled), and
(e) the corresponding ASHRAE temperature compliance map with a non-compliant cabinet eliminated (circled). All heat maps and temperature
compliance maps were captured at a height of ≈1 m from the raised hall floor.
volume-control damper grates (VCDGs) to suppress hot air cold aisles (Figure 6b). Three servers were also found to
infiltration into the cold aisle. In this implementation, two be in non-compliance with ASHRAE 2011 Class A1 rec-
perforated floorboards integrated with VCDG were placed at ommended standards, requiring corrective action to prevent
the flow paths of identified contamination sites (Figure 5c), thermal overload.
with the floor grills and constituent dampers set to 50% open- A plausible solution to the unfavourable thermal scenario
ing areas. The corresponding simulation results (Figure 5d) seen in the baseline simulation is to activate an additional
shows a noticeable decrease in temperatures of approximately CRAC unit located across the hall to provide additional cool-
13%, with contamination of the cold aisle largely eradicated ing. Corresponding simulation results (Figure 6c) shows the
and the elimination of one non-compliant server cabinet solution to be effective in eliminating the three instances of
(Figure 5e). non-compliance, and in reducing dilution of the cold aisles.
Indoor environment temperatures are also observed to be
2) INADEQUATE COOLING OF SEMI-OCCUPIED DATA HALL significantly cooler, averaging approximately 22◦ C, corre-
sponding to a 14% improvement to the baseline. While the
This second scenario concerns the inadequate cooling of a
results of this solution is adequate, mild contamination can
semi-occupied data center hall. Such occurrences are com-
still be observed. This can be further improved through the
monly the result of commissioning and decommissioning
opening of two VCDGs (Figure 6d) to introduce air flow and
processes in data center environments, where the staggered
serve as an air barrier to contain hot server exhausts within
deployment or removal of servers may lead to an underesti-
the hot aisle.
mation of cooling requirements, as the contamination of cold
and hot aisles are not easily taken into account.
As illustrated in the baseline simulation schematic 3) COLD/HOT AISLE CONTAINMENT
(Figure 6a), only a single 60 kW CRAC unit was powered This third scenario explores the utilization of hot and cold
on to provide cooling to the servers, in emulation of typical aisle containment methods [1], [43], [44] in reducing unde-
real-world operating conditions where heat load in the room sired temperature variations in a data hall. A baseline simula-
is estimated to be low and naïvely calculated to be sufficiently tion comprising 5.5 kW medium-density servers, four CRAC
supportable by one CRAC unit. Perforated floorboards with units, an empty cabinet position, and an absence of con-
integrated VCDGs are manually opened only at rack posi- tainment measures was run. Note that in the previous two
tions with live servers (low-density rated at 3 kW each) to simulations, 3 kW low-density servers were modelled—this
ensure adequate volumetric air flow and prevent cool air change is to demonstrate the effects that higher-density server
wastage. The baseline simulation reveals significant contam- racks, a trend that the data center industry in general is
ination of the entire data hall, with the indoor environment moving towards, have on airflow conditions. The results of
averaging 25◦ C in conjunction to extensive dilution of the the baseline simulation are illustrated in Figure 7a, revealing
FIGURE 6. (a) Simulation schematic of the semi-occupied data hall. (b) Resultant heat map of the
baseline simulation, showing inadequate cooling of three server cabinets and dilution of the cold
aisles (circled). (c) Lower temperatures and reduced cold aisle contamination were observed with
the activation of an additional CRAC unit. (d) Further improvement is observed with the utilization of
VCDGs to serve as an air barrier for containment of hot exhaust at hot aisles. All heat maps and
temperature compliance maps were captured at a height of ≈1 m from the raised hall floor.
significant dilution of the cold aisles caused by exces- regulate environmental parameters such as temperature and
sive exhausted heat. Two servers were found to be humidity within a data center environment with manual
in non-compliance with ASHRAE 2011 Class A1 stan- overrides available as a safe guard measure. Specifically,
dards, with one unit exceeding Class A1 allowable limits we develop the IoT management system based on the simula-
(Figure 7b), requiring immediate action. tion results of Section III-C1 and III-C2, in which the modu-
Simulation results with containment measures imple- lation of airflow via volume-control damper grates (VCDGs)
mented are presented in Figures 7c–f. The utilization of hot at strategic locations were shown to be effective in sup-
and cold aisle containment is evidently effective in reducing pressing cold and hot spots. This concept was inspired by
hot spots. The accelerating demands on computational power automated fresh air dampers typically found in mechanical
and their superior space efficiency has seen rapidly increasing ventilation, heating ventilation and air conditioning systems.
adoption of higher-density server racks in modern times, and In its primary operational mode, the IoT management sys-
this demonstration suggests that cold or hot aisle contain- tem autonomously controls the percentage area opening of
ment measures, while not strictly essential with lower-density VCDGs throughout the data center, thereby eliminating the
equipment, becomes important at high densities. For data need for manual tuning of floor grill dampers and enabling
centers already in operation, this presents a conundrum to fast response times to changes in thermal conditions. This
operators—to invest in containment measures to boost cool- reduces operational man-hour requirements, and can lead to
ing efficiency and shoulder the potential disruption in service improved cooling efficiency and data center performance.
and capital costs during transition, or to remain with the The implementation demonstrated in this study utilizes
non-containment status quo. a network of IoT-enabled VCDGs connected to a central
IV. APPROACH 2: AN INTERNET OF THINGS (IOT)
communication device, which processes and publishes col-
MANAGEMENT SYSTEM
lated data to an online IoT portal specially developed on
The simulation-based predictive approach presented enables Node-red and linked to Google Sheets for remote monitoring,
better-informed decision-making in the planning and design control and logging purposes. In addition, as a redundant
of data centers, and also allows changes in configuration to be fail-safe measure, the system is also configurable to send
virtually evaluated before actual implementation in a live set- regular data updates to operators via the Short Messaging
ting. While this reduces the likelihood of static design flaws in System (SMS), or otherwise send alerts when environmental
environmental control, transient effects due to fluctuations in parameters are beyond acceptable ranges. Within this net-
server load and airflow, and unforeseen constraints or changes work, device-to-device communication is achieved through
in infrastructure cannot be readily dealt with using such an the Message Queuing Telemetry Transport (MQTT) protocol.
a priori method. A closed feedback control is best suited for
these. A. SYSTEM OVERVIEW
We present an autonomous Internet of things (IoT) man- The construction of the proposed IoT management sys-
agement system to serve this role, comprising a chain of tem requires the convergence of numerous technical pro-
IoT-enabled sensing and actuating devices to monitor and cesses and components that must work in unison to per-
FIGURE 7. (a) Baseline simulation heat map on a data hall with 5.5 kW medium-density server cabinets and one empty cabinet position, and (b) the
corresponding ASHRAE temperature compliance map showing two non-compliant cabinets and one beyond the allowable temperature range.
(c) Post-mitigation simulation heat map utilizing hot aisle containment, and (d) corresponding temperature compliance map. (e) Post-mitigation
simulation heat map utilizing cold aisle containment, and (d) corresponding temperature compliance map. All heat maps and temperature compliance
maps were captured at a height of ≈1 m from the raised hall floor.
TABLE 2. Functional overview of the IoT-enabled VCDGs. T denotes air Debian was installed as the operating system, with the nec-
temperature sensed by sensing nodes in proximity to the VCDG device.
Data publishing interval can be shortened or lengthened from 3600 sec as essary Ada-Fruit libraries, Paho-MQTT client and GAMMU
necessary. SMS Daemon for purposes of temperature-humidity mon-
itoring, device-to-device communication, and remote SMS
capabilities.
2) ACTUATORS
VCDG damper actuation was achieved through a mounted
5V 28BYJ-48 stepper motor coupled to a ULN2003 motor
controller. The motor was selected for its small size, and its
history of use in the heating, ventilation and air condition-
form its intended function reliably. A high-level concep-
ing (HVAC) industry, illustrating similarities in the intended
tual overview of the intended operation of the IoT-enabled
mode of utilization. In our implementation, a half-step motor
VCDGs is shown in Table 2, implemented programmatically
excitation sequence is utilized.
in Python, and Figure 8 presents schematics for the VCDG
feedback control system, and the network topology between
the various IoT components. The following subsections 3) SENSORS
(Sections IV-A1–IV-A4) details technical specifications of VCDG actuator modules are coupled to sensing nodes com-
the key IoT devices. prising a DHT11 temperature-humidity sensor, capable of a
temperature detection range of 0–50◦ C, accurate to ±2◦ C,
1) CONTROLLER and a relative humidity detection range of 20–80%, accurate
A Raspberry Pi 3 Model B serves as the controller for each to ±5%, both of which completely encompasses the ranges
IoT node. The Raspberry Pi 3 platform was selected due stipulated by the ASHRAE guidelines and are well within
to its relative affordability and the expansive range of con- the plausible parameters of a typical data center hall space.
nectivity options, inclusive of Bluetooth, tethered USB, and During each polling interval, sensor parameters and damper
802.11 Wi-Fi, available on-board without additional hard- position status are transmitted via MQTT to a central IoT
ware. The Raspberry Pi can also be remotely connected to communications device, which collates and publishes the
via secure shell (SSH) to access its operating system and data to the IoT platform via Wi-Fi.
general-purpose input and output (GPIO) ports, thereby eas-
ing maintenance in an IoT context. The computational power 4) SMS MODEM
available on the Raspberry Pi also enables a larger degree Notification of actuator operation statuses and environmental
of scalability before additional controllers need to be added. parameter data can be sent via SMS to data center operators.
FIGURE 12. (a) Temperature data collected over the validation period for IoT-enabled VCDG device #1, and
(b) corresponding damper operation status over the same time period. (c) Temperature data for IoT-enabled VCDG device
#2, and (d) corresponding damper operation status. A +1 damper state represents an opened position, and A-1 damper
state represents a closed position.
limits, the VCDG remains steady at its prior position. The through a simulation-based predictive method leveraging
system can be trivially modified to adjust the damper to an on computational fluid dynamics (CFD) and heat transport
interpolated position between 0-100% effective airflow, say, computational analyses, and secondly through a novel reac-
via a linear mapping or a simulation-determined nonlinear tive Internet of things (IoT) feedback-controlled autonomous
map; this simplistic scheme was chosen for demonstrative management system. The use of computational fluid
clarity. Results of the validation are presented in Figure 12, dynamics simulations enables accurate predictions of data
illustrating the autonomous response of two IoT-enabled center operational conditions in a virtual setting, and can
VCDGs towards changing data center thermal conditions. provide a better understanding of design flaws as well as
The results indicate the correct operation of the VCDGs the effects of configuration changes, before implementa-
as specified, and successful data transmission and logging tion in a live data hall. In the various archetypal scenar-
functions of the IoT management system. ios considered (Section III-C), simulational analysis was
Through the integration of the various IoT components demonstrated as an efficient tool both to identify potential
detailed in Section IV-A, real-time sensing and autonomous thermal non-compliance and to evaluate plausible mitiga-
algorithm-based control of environmental parameters is tion measures, across diverse circumstances of cold aisle
achieved, remote monitoring capabilities are realized via the contamination, inadequate cooling during commissioning
developed IoT platform and redundant SMS notification, and and decommissioning processes, and hot-cold aisle contain-
the enabled data logging options can be exploited for analy- ment for higher-density server equipment. Evidently, across
sis and subsequent data-based improvements on data center the wide spectrum of possible problems in data hall envi-
cooling and computing infrastructure. The VCDG system ronmental control, there exists no one-size-fits-all solution.
can potentially interface with the CRAC units for optimal Simulation tools such as that utilized here enable the rapid
control of set point temperatures and air flow rates; and this exploration of possible resolution strategies and alternative
motivates future work. In such integrated systems, additional infrastructure configurations, thereby aiding in efficiently
redundancy measures ought to be taken, in consideration of eliminating unfavourable operating conditions.
operational needs and the possibility of human errors caused This a priori simulational branch of the approach
by increased system complexity. is supplemented by a reactive IoT management system
that autonomously monitors and regulates environmental
V. CONCLUSION parameters, enabling good robustness to transient fluctua-
In this paper, we have demonstrated a multi-pronged tions and unforeseen conditions within the data center. A fea-
approach to enhance and optimize data center cooling, firstly sible prototype of the IoT system has been demonstrated
on affordable hardware, and its operational behaviour vali- [12] R. Rao and S. Vrudhula, ‘‘Performance optimal processor throt-
dated on rudimentary real-world tests. Used in conjunction, tling under thermal constraints,’’ in Proc. Int. Conf. Compil. Archit.
Synth. Embedded Syst., New York, NY, USA, 2007, pp. 257–266.
these predictive and reactive branches can enable effective doi: 10.1145/1289881.1289925.
cooling and environmental control in data centers under a [13] D. Brooks and M. Martonosi, ‘‘Dynamic thermal management for high-
wide range of scenarios. The development of these tools is performance microprocessors,’’ in Proc. Int. Symp. High-Perform. Com-
put. Archit., Jan. 2001, pp. 171–182.
especially important amidst the current movement towards [14] M. K. Patterson, ‘‘The effect of data center temperature on energy effi-
higher-density data centers that must simultaneously satisfy ciency,’’ in Proc. 11th Intersoc. Conf. Therm. Thermomech. Phenomena
stringent computational performance and energy sustainabil- Electron. Syst., May 2008, pp. 1167–1174.
[15] N. El-Sayed, I. A. Stefanovici, G. Amvrosiadis, A. A. Hwang, and
ity requirements [45]–[48]. In our work, we have studied the B. Schroeder, ‘‘Temperature management in data centers: Why some
practical operational situations in data centre processing envi- (might) like it hot,’’ ACM SIGMETRICS Perform. Eval. Rev., vol. 40, no. 1,
ronment and the impacts it has on cooling. This is reflected pp. 163–174, 2012. doi: 10.1145/2318857.2254778.
[16] H. S. Gunawi, M. Hao, R. O. Suminto, A. Laksono, A. D. Satria,
in the first approach (predictive) where various conditions J. Adityatama, and K. J. Eliazar, ‘‘Why does the cloud stop computing?:
are simulated and assessed for its effectiveness. The second Lessons from hundreds of service outages,’’ in Proc. 7th Symp. Cloud Com-
approach (reactive) further contributes to the research via the put., New York, NY, USA, 2016, pp. 1–16. doi: 10.1145/2987550.2987583.
[17] S. Pertet and P. Narasimhan, ‘‘Causes of failure in Web applications,’’
development of an IoT-based solution which enhances oper- Pittsburgh, PA, USA, Carnegie Mellon Univ., Tech. Rep., CMU-PDL-05-
ator monitoring, work flow and cooling deficiencies through 109, 2005.
automation of volume control damper grates that adjusts its [18] J. Lienig and M. Thiele, Fundamentals of Electromigration-Aware Inte-
grated Circuit Design. Cham, Switzerland: Springer, 2018.
opening according to hall conditions. This is an advancement [19] J. R. Black, ‘‘Electromigration—A brief survey and some recent results,’’
over the current industrial data centre norms that are pri- IEEE Trans. Electron Devices, vol. ED-16, no. 4, pp. 338–347, Apr. 1969.
marily non-adjustable or manually adjustable. In optimizing [20] E. Pinheiro, W.-D. Weber, and L. A. Barroso, ‘‘Failure trends in a large
disk drive population,’’ in Proc. FAST, 2007, vol. 7, no. 1, pp. 17–23.
thermal control, not only is hardware performance, reliability [21] J. D. Moore, J. S. Chase, P. Ranganathan, and R. K. Sharma, ‘‘Making
and lifespan enhanced, but energy expenditure can also be scheduling ’Cool’ temperature-aware workload placement in data cen-
reduced as well. ters,’’ in Proc. USENIX Annu. Tech. Conf., Gen. Track, 2005, pp. 61–75.
[22] M. Kummert, W. Dempster, and K. McLean, ‘‘Transient thermal analysis
ACKNOWLEDGMENT of a data centre cooling system under fault conditions,’’ in Proc. 11th Int.
Building Perform. Simulation Assoc. Conf. Exhib. Building Simulation,
The authors wish to thank the handling editor and two anony- 2009, pp. 1302–1305.
mous reviewers for their constructive comments. [23] R. Romadhon, M. Ali, A. M. Mahdzir, and Y. A. Abakr, ‘‘Optimization
of cooling systems in data centre by computational fluid dynamics model
COMPETING INTERESTS and simulation,’’ in Proc. Innov. Technol. Intell. Syst. Ind. Appl., Jul. 2009,
The authors declare no competing interests. pp. 322–327.
[24] K. C. Karki, A. Radmehr, and S. V. Patankar, ‘‘Use of computational fluid
REFERENCES dynamics for calculating flow rates through perforated tiles in raised-floor
[1] B. Fakhim, M. Behnia, S. Armfield, and N. Srinarayana, ‘‘Cool- data centers,’’ HVAC&R Res., vol. 9, no. 2, pp. 153–166, 2003.
ing solutions in an operational data centre: A case study,’’ Appl. [25] J. Cho, J. Yang, and W. Park, ‘‘Evaluation of air distribution system’s air-
Therm. Eng., vol. 31, pp. 2279–2291, Oct. 2011. [Online]. Available: flow performance for cooling energy savings in high-density data centers,’’
http://www.sciencedirect.com/science/article/pii/S1359431111001578 Energy Buildings, vol. 68, pp. 270–279, Jan. 2014.
[26] A. Almoli, A. Thompson, N. Kapur, J. Summers, H. Thompson,
[2] J. Shuja, K. Bilal, S. A. Madani, M. Othman, R. Ranjan, P. Balaji, and
and G. Hannah, ‘‘Computational fluid dynamic investigation of
S. U. Khan, ‘‘Survey of techniques and architectures for designing energy-
liquid rack cooling in data centres,’’ Appl. Energy, vol. 89, no. 1,
efficient data centers,’’ IEEE Syst. J., vol. 10, no. 2, pp. 507–519, Jun. 2016.
pp. 150–155, 2012. [Online]. Available: http://www.sciencedirect.
[3] Y. Joshi and P. Kumar, ‘‘Introduction to data center energy flow and thermal
com/science/article/pii/S0306261911001012
management,’’ in Energy Efficient Thermal Management of Data Centers,
[27] C. B. Bash, C. D. Patel, and R. K. Sharma, ‘‘Dynamic thermal management
Y. Joshi and P. Kumar, Eds. Boston, MA, USA: Springer, 2012, pp. 1–38.
of air cooled data centers,’’ in Proc. Therm. Thermomech. 10th Intersoc.
doi: 10.1007/978-1-4419-7124-1_1.
Conf. Phenomena Electron. Syst., May/Jun. 2006, p. 8.
[4] H. Geng, Data Center Handbook. Hoboken, NJ, USA: Wiley, 2014.
[28] R. K. Sharma, C. E. Bash, C. D. Patel, R. J. Friedrich, and J. S. Chase,
[5] V. Josyula, M. Orr, and G. Page, Cloud Computing: Automating the Virtu- ‘‘Balance of power: Dynamic thermal management for Internet data cen-
alized Data Center. Cisco, San Jose, CA, USA,, 2011. ters,’’ IEEE Internet Comput., vol. 9, no. 1, pp. 42–49, Jan. 2005.
[6] C. J. M. Lasance, ‘‘Thermally driven reliability issues in microelectronic [29] Q. Liu, Y. Ma, M. Alhussein, Y. Zhang, and L. Peng, ‘‘Green data center
systems: Status-quo and challenges,’’ Microelectron. Rel., vol. 43, no. 12, with IoT sensing and cloud-assisted smart temperature control system,’’
pp. 1969–1974, 2003. [Online]. Available: http://www.sciencedirect.com/ Comput. Netw., vol. 101, pp. 104–112, Jun. 2016.
science/article/pii/S0026271403001835 [30] I. Lee and K. Lee, ‘‘The Internet of Things (IoT): Applications,
[7] J. D. Parry, J. Rantala, and C. J. M. Lasance, ‘‘Enhanced electronic system investments, and challenges for enterprises,’’ Bus. Horizons,
reliability—Challenges for temperature prediction,’’ IEEE Trans. Compon. vol. 58, no. 4, pp. 431–440, 2015. [Online]. Available: http://www.
Packag. Technol., vol. 25, no. 4, pp. 533–538, Dec. 2002. sciencedirect.com/science/article/pii/S0007681315000373
[8] P. Lall, M. Pecht, and E. B. Hakim, Influence of Temperature on Micro- [31] ‘‘Thermal guidelines for data processing environments—Expanded Data
electronics and System Reliability: A Physics of Failure Approach. Boca center classes and usage guidance,’’ ASHRAE Tech. Committee, Atlanta,
Raton, FL, USA: CRC Press, 1997. GA, USA, 2011.
[9] C.-C. Teng, Y.-K. Cheng, E. Rosenbaum, and S.-M. Kang, ‘‘ITEM: [32] (2017). About ASHRAE 9.9 Mission Critical Facilities, Data Centers, Tech-
A temperature-dependent electromigration reliability diagnosis tool,’’ nology Spaces Electronic Equipment. Accessed: Jan. 8, 2019. [Online].
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 16, no. 8, Available: http://tc0909.ashraetcs.org/about.php
pp. 882–893, Aug. 1997. [33] P. Lin, ‘‘How to fix hot spots in the data center,’’ Schneider Electr. White
[10] M. G. Pecht and F. R. Nash, ‘‘Predicting the reliability of electronic Paper Library, Schneider Electr. Data Center Sci., Center, West Kingston,
equipment,’’ Proc. IEEE, vol. 82, no. 7, pp. 992–1004, Jul. 1994. RI, USA, Tech. Rep. SPD_VAVR-9GNNGR_EN, 2014.
[11] R. Rao, S. Vrudhula, and C. Chakrabarti, ‘‘Throughput of multi-core [34] T. Evans, ‘‘Humidification strategies for data centers and network
processors under thermal constraints,’’ in Proc. Int. Symp. Low Power rooms,’’ APC by Schneider Electr., West Kingston, RI, USA, Tech.
Electron. Design (ISLPED), Aug. 2007, pp. 201–206. Rep. SPD_NRAN-5TV85S_EN, 2004.
[35] R. A. Steinbrecher and R. Schmidt, ‘‘Data center environments: JIN MING KOH received the NUS High School
ASHRAE’s evolving thermal guidelines,’’ ASHRAE J., vol. 53, no. 12, Diploma (High Distinction), in 2016. Since 2017,
pp. 42–49, 2011. he has been undertaking research projects offered
[36] W. Lintner, B. Tschudi, and O. VanGeet, ‘‘Best practices guide for energy- by K. H. Cheong. He is currently with the Califor-
efficient data center design,’’ U.S. Dept. Energy, Washington, DC, USA, nia Institute of Technology.
Tech. Rep. NREL/BR-7A40-47201, 2011.
[37] (2018). Fujitsu Launches Liquid Immersion Cooling System That
Lowers Total Cost of Ownership. Accessed: Aug. 16, 2019. [Online].
Available: https://www.fujitsu.com/global/about/resources/news/press-
releases/2018/0906-01.html
[38] InfoComm Development Authority of Singapore, ‘‘Green data centre tech-
nology roadmap,’’ Strategy Group Prime Minister’s Office, Singapore,
2014. SIMON CHING MAN YU received the B.Eng.
[39] (2016). Singapore Trials World’S First Tropical Data Centre For a Smart
degree (Hons.) ACGI and Ph.D. DIC degree from
Nation. Accessed: Aug. 16, 2019. [Online]. Available: https://www2.imda.
the Department of Mechanical Engineering, Impe-
gov.sg/news-and-events/Media-Room/archived/ida/Media-
Releases/2016/singapore-trials-worlds-first-tropical-data-centre-for- rial College London, London, in 1987 and 1991,
a-smart-nation respectively. He joined Nanyang Technological
[40] (2018). SigmaRoom. [Online]. Available: https://www.futurefacilities. University, Singapore, immediately after his grad-
com/products/6sigmaroom/ uation as a Lecturer. He became a Senior Lec-
[41] (2016). How accurate is 6SigmaDCX. [Online]. Available: https://www. turer, in 1996 and an Associate Professor, in 2000.
futurefacilities.com/resources/whitepapers/how-accurate-is-6sigmadcx/ He was also very involved in the university admin-
[42] K. Heslin, ‘‘Data center cooling: Crac/crah redundancy, capacity, and istration, since 2000: Principal Staff Officer (Presi-
selection metrics,’’ Uptime Inst. J., vol. 3, 2014, Accessed: Jan. 08, dent’s Office), from 2000 to 2003, University Council Member, from 2001 to
2019. [Online]. Available: https://journal.uptimeinstitute.com/data-center- 2003, the Vice Dean of the Admission Office, from 2003 to 2006 and
cooling-redundancy-capacity-selection-metrics/ the Head of the Division of Aerospace Engineering, School of Mechanical
[43] J. Niemann, K. Brown, and V. Avelar, ‘‘Impact of hot and cold aisle con- and Aerospace Engineering, from 2008 to 2013. He moved over to the
tainment on data center temperature and efficiency,’’ Schneider Electr. Data Singapore Institute of Technology as Professor and the Programme Director,
Center Sci. Center, West Kingston, RI, USA, White Paper SPD_DBOY- in 2013, to establish one of the first engineering degree programmes offered
7EDLE8_EN, 2011.
solely by the university. He is currently the Head of the Interdisciplinary
[44] R. Schmidt, A. Vallury, and M. Iyengar, ‘‘Energy savings through hot
Division of Aeronautical and Aviation Engineering (AAE), The Hong Kong
and cold aisle containment configurations for air cooled servers in data
centers,’’ in Proc. ASME Pacific Rim Tech. Conf. Exhib. Packag. Integr. Polytechnic University. He has published more than 200 research articles
Electron. Photonic Syst., 2011, pp. 611–616. in archive journals and conferences. He managed to secure more than SGD
[45] C. D. Patel, R. Sharma, C. E. Bash, and A. Beitelmal, ‘‘Thermal con- 50 million external grant during his tenure in Nanyang Technological Uni-
siderations in cooling large scale high compute density data centers,’’ in versity, especially during the period as the Head of the Division of Aerospace
Proc. 8th Intersoc. Conf. Therm. Thermomech. Phenomena Electron. Syst., Engineering.
May/Jun. 2002, pp. 767–776.
[46] R. R. Schmidt, E. E. Cruz, and M. Iyengar, ‘‘Challenges of data cen-
ter thermal management,’’ IBM J. Res. Develop., vol. 49, no. 4.5, U. RAJENDRA ACHARYA received the Ph.D.
pp. 709–723, Jul. 2005. degree from the National Institute of Technol-
[47] M. Marwah, P. Maciel, A. Shah, R. Sharma, T. Christian, V. Almeida, ogy Karnataka, Surathkal, India, and the D.Eng.
and C. Araújo, E. Souza, G. Callou, B. Silva, S. Galdino, and J. Pires, degree from Chiba University, Japan. He is cur-
‘‘Quantifying the sustainability impact of data center availability,’’ ACM rently a Senior Faculty Member with Ngee Ann
SIGMETRICS Perform. Eval. Rev., vol. 37, no. 4, pp. 64–68, 2010. Polytechnic, Singapore. He is also an Adjunct
[48] P. Mahadevan, S. Banerjee, P. Sharma, A. Shah, and P. Ranganathan, ‘‘On Professor with Taylor’s University, Malaysia, also
energy efficiency for enterprise and data center networks,’’ IEEE Commun. an Adjunct Faculty with the Singapore Institute
Mag., vol. 49, no. 8, pp. 94–100, Aug. 2011. of Technology, University of Glasgow, Singapore,
and also an Associate faculty with the Singapore
KANG HAO CHEONG (M’18) received the B.Sc. University of Social Sciences, Singapore. He has published more than
degree (Hons.) from the Department of Mathemat- 400 articles, in refereed international SCI-IF journals (345), international
ics and University Scholars Programme, National conference proceedings (42), and books (17) with more than 24,000 citations
University of Singapore (NUS), in 2007, the Ph.D. in Google Scholar (with h-index of 82). He ranked in the top 1% of the Highly
degree from the Department of Electrical and Cited Researchers for the last four consecutive years (2016, 2017, 2018, and
Computer Engineering, NUS, in 2015, and the 2019) in Computer Science according to the Essential Science Indicators of
postgraduate Diploma degree in education from Thomson. He has worked on various funded projects, with grants worth more
the National Institute of Education, Singapore. than SGD 2 million. He has three patents and in the editorial board of many
From 2016 to 2018, he was an Assistant Professor journals. He has served as guest editor for many journals. His current research
with the Engineering Cluster, Singapore Institute interests include biomedical signal processing, biomedical imaging, data
of Technology. He is currently an Assistant Professor with the Science and mining, visualization and biophysics for better healthcare design, delivery,
Math Cluster, Singapore University of Technology and Design (SUTD). and therapy.
KENNETH JIAN WEI TANG received the NENG-GANG XIE received the Ph.D. degree in
B.Eng. degree (Hons.) in sustainable infras- engineering from Hehai University, in 1999. He is
tructure engineering (building services) and the currently a Professor and a Ph.D. Supervisor with
M.EngTech. degree from the Singapore Institute of the School of Management Science and Engineer-
Technology (SIT), in 2018 and 2019, respectively. ing, Anhui University of Technology.
He is currently pursuing the Ph.D. degree with the
Science and Math Cluster, Singapore University of
Technology and Design.