Rule-Based Fault Detection Method For Air Handling Units
Abstract Air handling unit performance assessment rules (APAR) is a fault detection tool that uses a set of expert rules derived from mass and energy balances to detect faults in air handling units (AHUs). Control signals are used to detenTiiil'!'I mode of operation of the AHU. A subset of the the expert rules which correspond to that mode of operation are then evaluated to determine,,,,:hetfi'era fault exists. APAR is computation ally simple enough that it can be embedded in commercial building automation and control systemsli~d:Tel1esonly upon the sensor data and control signals that are commonly available in these systems. APAR was tested using data sets collectedfroriii "hardware-in-the-Ioop" emulator and from several field sites. APAR waS also embedded in commerCial AHU controllers and tested in the emulator.
Keywords: Fault detection; Diagnostics;Buildingautomation;Directdigital control;Energymanagementsystem
1. Introduction
Building heating, ventilation, and air conditioning (HVAC)' equipment routinely fails to satisfy perfonnance expect!lti<Jns envisioned at design because of problems caused by'imp{oper . installation, inadequate maintenance, or equipment failt.)Te. These problems, or "faults," include mechanical failures such as stuck, broken, or leaking valves, dampers, or actuators; control problems related to failed or drifting sensors, poor feedback loop tuning or incorrect sequencing logic; fouled heat exchangers; design errors; or inappropriate operator intervention. Such faults often go unnoticed for extended perioqs of time until the deterioration in perfonnance becomes great enough to trigger comfort complaints or gross equipment failure. The tenn "fault detection and diagnostics"(FDD) refers to mathematical techniques used to detect aJ1dd!!lgnose these types of faults. By identifying and diaih.~!,il}g)faults to be repaired, FDD techniques can benefit bj)ilrl;'1g~ owners by reducing energy consumption, improving;\9P~'1iftions and maintenance (O&M), and increasing effe,ctiy'~.sOmrol over environmental conditions in occupied spaces, rh~energy-saving potential of FDD is estimated at 10-40% of HVAC system energy consumption, depending on the age and condition of the equipment,
maintenance practices, climate, and building use [1-4]. There are also significant non-energy benefits ofFDD. By identifying minor problems before they become major problems, the useful service life of equipment can be extended. Also, repairs can be scheduled when convenient, avoiding downtime and overtime work, Depending on the building use, better control of the temperature, humidity, and ventilation rate of the occupied spaces can improve employee productivity, guest/customer comfort, and/or product quality control. There are a number of FDD tools that are currently emerging from research [5,6). In general, these tools take the fonn of stand-alone software products in which either trend data files must be processed offline .or an interface to the building. control system must be developed to enable on-line analysis, A different approach is to embed FDD in the local controller for each piece of equipment so thatthe FDD algorithm is executed as a component of the control logic. In this case, the algorithm . will have local access to sensor data and control signals, eliminating the need to communicate this infonnation over the building control network. This approach is highly scalable and therefore suitable to larger HVAC systems. Any faults that are detected can be reported to the building operator using the building automation system's alann/event handling capability. House et al. [7] introduced a rule-based FDD method for air handling units and tested it using simulation and field data. The purpose of the study described here was to extend this work by examining the perfonnance. of the FDD method in detecting
Nomenclature MT max
Teo Tma
Tsa Tsa.s
maximum number of mode changes per hour changeover air temperature for switching between modes 3 and 4 mixed air temperature outdoor air temperature return air temperature supply air temperature supply air temperature set point threshold on the minimum temperature difference between the return and outdoor air
The AHU controller typically controls the supply temperature to maintain a setpoint temperature at a location the supply duct downstream of the supply fan. Outdoor enters the AHU and is mixed with. air returned from
building. The mixed air passes over the heating and cooling coils, where if necessary, it is conditioned prior to being supplied to the building. The typical operating sequence for AHUs consists of four primary modes; of operation during occupied periods for maintaining the;,~qpl?ly air temperature and the ventilation at preset Jevels,.#:I:\.!<',;era9,onshipof the four operating modes to the control oft,ge he4ting' coil valve, cooling coil valve, and mixing box.;daiript':~:i:~ shown in Fig. 2. is , . Sequencing logic determines;~pe nlpdt; of operation as dictated by various thermal rei atj(Jnsliipsc"including the internal and
temperature rise across the return fan !:1Tsf temperature rise across the supply fan Qoa/Qsa outdoor air fraction = (Tma - Tra)/(Toa - Tra) (Qoa/Qsa)min threshold on the minimum outdoor air fraction
normalized cooling coil valve control signal [O,IJ, where Uec = 0 indicates the valve is closed and Uee = 1 indicates it is 100% open normalized mixing box damper control signal [0, I J, where Ud = 0 indicates the outdoor air damper is closed and Ud = 1 indicates it is 100% open normalized heating coil valve control signal [0,1], where Uhe = 0 indicates the valve is closed and Uhe = 1 indicates it is 100% open Greek symbols threshold parameter for the cooling coil valve, control signal Ed threshold parameter for the mixing box damp,!,:f control signal .. 8f threshold parameter accounting for errors related to airflows (function of uncertainties in tempera8"" Ehe
ture measurements) ., "', ... threshold parameter for the h(:.ating"coil valve control signal threshold for errors in temperatfu,-e measurements
commonly found mechanical and,S8PtroFfilUits under a variety of weather conditions, systerriqXP'~s:and usage patterns. This study also evaluated the fei-\siblJity" of embedding FDD in commercial HVAC controllers.
external loads on the zones served by the AHU. In the heating mode (mode in Fig. 2), the heating coil valve is controlled to maintain the slipply air temperature at its setpoint and the cooling coil valve is closed. The mixing box dampers (outdoor, exhaust, and recirculation air dampers) are positioned to aJl9\Y the minimum outdoor air fraction necessary to satisfy venWation requirements. As the outdoor air teri-tperatuie.i)]c~hses, the AHU transitions from heating to cooling {(jth ciiJ~doorair (mode 2). In this mode, the heating and coolipg<;;~ii\'alves are closed and the mixing box dampers are modq!at~~ to maintain the supply air temperature at its setpoint. As' th'e cooling load continues to increase, the mixing box f'> damp~rs eventually saturate with the outdoor and exhaust air darr1pers fully open and the recirculation air damper fully ;c.losed, and the AHU changes over to mechanical cooling. . Vhen the AHU is operating in one of the mechaniciil cooling .nodes (modes 3 and 4), the cooling coil valve modulates to maintain the supply air temperature at its setpoint, the heating coil valve is closed, and the mixing box dampers are positioned for either 100% outdoor air fraction or the minimum outdoor air fraction needed to meet ventilation requirements. There are several different types of mixed air controls, generally the control logic compares the outdoor and return air temperatures or enthalpies to determine the proper position of the mixing box dampers such that mechanical cooling requirements are minimized: Hence, the third primary mode (mode 3) of operation is mechanical cooling with 100% outdoor air and the fourth primary mode (mode 4) of operation is mechanical cooling with minimum outdoor air.
rules (APAR)
2. Methodology
2.1. System descripti'tl.r.P The fault detection method described in this paper was developed for application to single duct variable-volume or constantvolume air handling units (AHUs). The rules that are used for FDD focus on temperature control in an AHU. Hence, the system description will be restricted to components and control strategies directly related to temperature control. Fig. 1 is a schematic diagram of a typical single duct air handling unit (AHU).
The basis for the fault detection methodology is a set of expert rules used to assess the performance of the AHU. The tool developed from these rules is referred to as APAR (AHU performance assessment rules). APAR uses control signals and occupancy information to identify the mode of operation of the AHU, thereby identifying a subset of the rules that specify temperature.relationships that are applicable for that mode. The two main mode classifications are occupied and unoccupied. For occupied periods, the modes are further categorized as described previously. For convenience, the operating modes are:
QUid,." Air
& Humidity
Tcmperaturc Scn,,,rs
nIT & H :1
input!; ~._ _----------_._
1: 2: 3: 4: 5:
heating, cooling with outdoor air, mechanical cooling with 100% outdoor air, mechanical cooling with minimum outdoor air, unknown.
Because the direct digital control (DDC) output to the actuators of the heating and cooling coil valves and the mixing box dampers are known, the mode of operation can be" ascertained. Although not depicted in Fig. 2, a fifth mode of operation referred to as "unknown" operation has been defined and listed above. The unknown mode applies to the case in which the AHU is running in an occupied mode, but none of th~~~ntrol . '.~.~, ' output relationships defined for modes 1-4 are satisfied:"The unknown mode could be associated with mode transitions and/or with faulty operation such as simultaneous heating and cooling. Once the mode of operation has been established, rules based on conservation of mass and energy can be used along with the sensor information that is typically available for controlling the AHU. For example, normal operation in the mechanical cooling mode with 100% outdoor air (mode 3) dictates that the outdoor and mixed air temperatures must be approximately equal. Defining Toa and T ma as the outdoor air' and mixed air temperatures, respectively, the rule (defined as Rule 10) is written as Rule 10:
IToa Tmal
fault is indi,catedifil rule is true. In the example above, the rule states that if the outdoor and mixed air temperatures are not the same (Le:, if'true) a fault has occurred. , Asa detailed description of the 28 APAR rules and the reasoning behind them is provided elsewhere [7], the rules are which groups the rules according to simply 'listed in Table mode of operation. As indicated in the column heading for the rule expression, a true expression indicates a fault. The rules are based on mass and energy balances on various subsystems of ,the AHU, for example, Rules 1, 7, 11, and 16 treats the relationship of temperatures in the coil subsystem of the AHU for modes 1-4, respectively. For these four rules, only the relational operator in the rules change from one mode to another. A typical rule from this group requires the supply air temperature to be lower than the sum of the mixed air temperature and the temperature rise across the supply fan in the mechanical cooling modes. There are also groups of rules treating the mixing box subsystem, the zone subsystem, economizer operation, comfort requirements, and controHer logic/tuning. Hence, although there are 28 rules, in reality only a small number of temperature and control signal relationships are used to define the rules.
APAR does not search for the existence of a specific 'set of faults. Rather, any fault that causes a rule to be satisfied would be,detected and additional effort would be necessary to isolate the source of the problem. In general, the rule set can identify the foHowing faults: Stuck or leaking mixing box dampers, heating coil valves, and cooling coil valves; Temperature sensor faults; Design faults such as undersized coils; Controller programming errors related to tuning, setpoint!;, and sequencing logic; Inappropriate operator intervention. It is typical of APAR rules that several different faults can cause a single rule to be violated. As a consequence. a few simple
where &t is a threshold that depends on the uncertainty (or accuracy) of the meas~rements.The rules are written such that a Mode 1 Mode 2 Mode 3 Mode 4
Cooling Coil Valve
Mixing 80.
Dampe .
J. Schein et al./Energy
Rule no.
2 3 4
(true implies -
of a fault)
t, >
For IT" - To.1 ~ ATmin: IQo,lQ" - (Qo.IQ,,)minl IUh' :5 thc and T., - T ~ t, IUh,-II:5thc
with outdoor
air (mode
5 6'
7 8
> T >
Tsa.s T" -
L!.T" -
+ Et + e,
IT - AT"
air (mode 3)
9 10
ATsf _-:- 81
14 Mechanical cooling with minimum outdoor air (mode 4) 15 16 17 18 19
T > Tm. + AT,.f e. T > T" -' AT" + e, lu" - 11 :5 e" and T - T., ~ e, lu" - II :5 e"
T... .a
+ 1
For IT" - 7;1 ~ ATmin: IQo.lQ" - (Qo.lQ,,)minl lucc :5 eee and T" - T". ~ e,
Unknown occupied modes (mode 5) 21 22 23 24 All occupied modes (mode
11:5 tee
> fee and > c'hc and Uhe > the and e. < U. < I Ucc Uhe
Ucc ud
e. and
e. Tu.) -
> eee
I, 2, 3, 4, or 5)
25 26
IT - T.,I
< min(Tr,u
28 rules can be used to find many different faults. Although a list of candidate faults is provided based on the satisfied rule(s), fu" rther information, such as a plot of trend data, is usually needed to identify the specific cause of the fault.
of mode transitions
ance, in the absence of a mixed air temperature sensor, nine rules (Rules I, 2, 7, 10, 1 I, 16, 18, 26, and 27) will be eliminated from consideration in APAR. Conversely, the presence of additional sensors would expand the rule set and provide an opportunity to either detect more faults, or to detect faults during modes of operation in which they would normally be hidden. For instance, if a temperature sensor was installed between the heating and cooling coils, leakage through the heating valve could be detected during the mechanical cooling modes (modes 3 and 4), whereas normally it would be masked in these modes. In addition to the operational data listed above, certain design data are needed to implement the rules. The required design data are: Minimum and maximum values of control signals for the heating coil valve, cooling coil valve and mixing box dampers for normalizing the control signals; Percentage outdoor air necessary to satisfy ventilation requirements; Changeover temperature from mechanical cooling with 100% outdoor air to mechanical cooling with minimum outdoor air (or equivalent condition for enthalpy-based economizer); Description of sequencing/economizer to verify that the rules are suitable installation). cycle strategy (used to a particular AHU
Heating coil valv~tcbi1t:rQlsignal; Mixing box damp~;~ontrol signal; Return air relative humidity (for enthalpy-based economizers only); Outdoor air relative humidity (for enthalpy-based economizers only). This information is generally available for most AHUs controlled with a DDC system. If one or more sensors are' not available, certain rules will no longer be applicable. For inst-
2.4. Threshold.selection
In addition to the sensors, control signals, and setpoint information, there are other parameters that must be specified for APAR. For instance, estimates of the temperature rise across the supply fan (and return fan, if one exists) must be provided;-a reasonable default is 1.1 dc. A model-based value correlated to the airflow rate or the control signal to the fan could be used as the basis for this estimate; however, some amount of training data would likely be necessary-to establish the correlation. Thresholds used in evaluation of rules such as c, in Rule 10 must also be specified. A fault threshold expresses the severity of a fault required to trigger an alarm and is necessary because of uncertainty in the data and operating conditions. If a threshold is too great, the associated fault(s) must be relatively severe to be detected. If, on the other hand, a threshold is too small, normal variation in operating conditions may result in false alarms. These threshold values were determined heuristically for this study.
APAR uses existing sensor points in the control system to perform the fault detection calculations. It has been demon3.2. EXClmple:mixed air temperetture sensor fault strated that the typical industrial grade sensors that are already installed for control purposes have sufficient accuracy [8]. Laboratory grade instruments are not required. Higher quality A mixed air temperatu~e sensor fault was introduced as a sensors that have been installed and calibrated properly will', sensor offset beginning at 0 C and increased linearly over the allow the use of tighter thresholds (less severe faults' ',emulation period to +4C. The sensor drift was positive, detected) than lower quality sensors, or those that have b~~~ meaning that the measured mixed air temperature was greater poorly calibrated or installed. than the actual mixed air temperature. Fig. 3 shows AHU data from the occupied portion of 1 day during the emulation of this fault, which was conducted using 3. Results and discussion cooling season (July) weather data. As is typical for cooling season, the outdoor air temperature and humidity are too high 3.1. Emulation study for economizer operation, so the mixing box dampers are positioned to bring in the minimum amount of outdoor air A hardware-in-the-loop emulation environment that comneeded for ventilation. The AHU controller modulates the bines simulations of a building and its HVAC system with cooling coil valve to maintain the supply air temperature at its actual commercial HVAC equipment controllers was used in setpoint, while the heating coil valve remains closed. order to conduct tests under a wide variety of controlled Based on this combination of control signals, APAR conditions. Emulation provides a test environment that is closer determines the system to be operating in mode 4 (mechanito a real building because it uses real 'building controllers but, cal cooling with minimum outdoor air) and evaluates the like sjmulation, it also provides controlled and reproducible conditions. Details of theeIhulator design al1d operation are documented by Bushby et a!. [9J. A variety of sel1sor, actuator, and control logic faults, alol1g with fault free conditions, were imposed: 3.1.1. Temperature sensor drift A temperature sensor drift was introduced as a sensor offset for a range of 0.0 to 4.0 DC, increased linearly over the emulation period.
of occupancy)
applicable rule set. The mass and energy balance on the mixing . box subsystem of the AHU yields the relationship that the mixed air temperature should be between the return and outdoor air temperatures. Two of the rules for the mixing box subsystem that apply to mode 4 are Rules 26 and 27. Rule 26 states that if the mixed air temperature is less than the minimum of the return and outdoor air temperatures then a fault has been detected, while Rule 27 will generate a fault report if the mixed air temperature is greater than the maximum of the return and outdoor air temperatures. Both of these rules are subject to a threshold, in this example a value of 1.7 C was used. Of course, the actual mixed air temperature is always between the return air and outdoor air temperature, since it is the result of blending the outdoor and return air streams. Fig. 3 shows that, due to the sensor drift, the measured mixed air temperature is. less than the return air temperature (the minimum of the return and outdoor air temperatures) by approximately 3 dc. Rule 26 is satisfied, indicating that this fault has been successfully detected.
signals, APAR determines the system to be operating in mode4 (mechanical cooling with minimum outdoor ror) and evaluates the applicable rule set. One of the rules in' the set for mode 4 is Rule 18, which calculates the outdoor air fraction by dividing the difference between the mixed and return air temperatures by the difference between the outdoor and return air temperatures. If the calculated outdoor air fraction is not equal to the minimum amount of outdoor air needed for .veIltilation, Rule 1'8 is satisfied, indicating that this fault has b~enidetected. This rule is subject to a threshold, in this example a value of 0.30 was used. Since the difference betwe~n. ~beJ'eturn and outdoor air temperatures is in the den()minator, the accuracy of the calculated outdoor air fracticihwill decrease as the difference between the return and outdoo,t air temperatures decreases. In order to prevent false al~rms, it is necessary to first check whether there is a sufficient 'difference between the return and outdoor air temperatures in order to proceed, in this example a difference of 5.6 C is required. Fig. 4 shows that there is a sufficient difference between the return and outdoor air temperatures from approximately 250 min after the beginning
of the occupied period until the end of the occupied period. Fig. 4 shows AHU data from the occupied portion of 1 day Fig: 4 also shows that the calculated outdoor air fraction is during the emulation of this fault, which was conducted using approximately 0.7, which differs from the minimum outdoor air cooling season (July) weather data. As in the previous example, fraction (in this example, 0.15) by 0.55, which is greater than the outdoor air temperature and humidity are too high for the FPreshold of 0.30. Rule 18 is satisfied, indicating that this economizer operation, so the mixing box dampers are . fa!'1lt}ias been successfully detected. ',..;--' commanded to bring in the minimum amount of outdoor air . 4. Field study needed for ventilation. The outdoor air and exhaust air damper$ operate normally, according to the damper control signal frolp' the AHU controller. However, within the emulation; tile' A field study was conducted in which AHU data was collected from several field sites [111: The sites included an recirculation air damper is set to the fully closed position; office building and a restaurant, as weB as community college corresponding to 100% outdoor air. The qualitative effect of the and university campuses, featuring constant- and variable-airstuck recirculation damper can be seen by comparing the mixed volume systems. Several examples of faults, which were air. temperature to the return and outdoor air temperatures. If the detected are presented here. dampers were positioned correctly, the mixed air temperature should be very close to the return air te,JJ1perature, but it is
actually much closer to the outdoor air te~perature, due to the excessive amount of outdoor air being draw.pinto theAHU. The AHU controller modulates the cooli~,g c9il:va1ve to maintain the supply air temperature at its setpoint, \\ihile the heating coil valve remains closed. Based on this 'combination of control
35 33
.-.. 31 ~ 29 27
e ::I
0.680.5 ~
1;j 25
8. 23
0.3 '0
0.2 -E 0.1 ()
19 17
of occupancy)
damper stuck closed.
. 80
'\ \ /;\\
e 8
40 20
\/ Ii
.1 j
I/'1''1\/\ UI I
400 420
setpoint fault
Fig. 6 shows a plot of selected temperature and control signal data from one of the AHUs examined in the study. Based on the control signals (the heating coil valve operation is not shown, but remains fully closed; also the mixing box damper operation is not shown, but remains positioned for minimum ventilation) APAR determines that the AHU is operating in mode 4 (mechanical cooling with minimum outdoor air), then applies the rules for this mode. One of the rules for mode 4 is Rule 19, which states that if the average cooling coil valve control signal is fully open (within a threshold, in this example, I %) and the difference between supply air temperature and the supply air temperature setpoint is greater than another threshold, in this example 1.7 DC, the cooling coil valve is saturated and a persistent supply air temperature error exists. Fig. 6 shows that the supply air temperature varies from 9 to I3C while the supply air temperature setpoint is fixed at 7 DC-an unreasonably low value for this application. Clearly, the supply air temperature error is greater than 1.7 DC,therefore, APAR has detected a fault. Facility personnel confirmed that the fault was the result of inappropriate operator intervention. 4.3. Example: AHU outdoor air temperature
mately an 8 h time span), the mixing box dampers are positioned for 100.% outdoor air. The cooling coil valve (not shown) ranges from fully closed to fully open and the heating coil valve (also not shown) remains fully closed. Based on this combination of control signals, APAR places the AHU in mode 3 (mechanical cooling with 10.0.% outdoor air) and evaluates the rules for mO,de 3. One of these is Rule 10., which states that if the outdoor air and mixed air temperature differ by more than a threShold' value, in this example 1.7 DC, a fault has been detected since the outdoor air and mixed air temperature should be the same when the mixing box dampers are positioned for 100.% outdoor air. During this time, the mixed air temperature remains approximately 3 DC greater than the outdoor air temperature, satisfying Rule 10.. A follow up with facility personnel confirmed the fault and revealed that it was caused by the location of the outdoor air temperature sensor on the roof of the building, remote from the AHU outdoor air plenum. 4.4. Example: AHU simultaneous heating and cooling fault
sensor fault
Fig. 7 shows a plot of temperatures arid control signals from one of the AHUs examined in. the study. From 220. min unti!' 70.0. min after the beginning of the occupied period (approxi. 20
Fig. 8 shows a plot of the control signals from one of the AHUs examined in the study. During the first 1140. min (19 h) of the day, the heating coil valve varies between 5% and 35% open. Over the same time period, the mixing box dampers vary between 25% and 35% open. This AHU has two separate outdoor air dampers: one allows the minimum amount of outdoor air for ventilation, while the other is for cooling with outdoor air. The mixing box damper position shown in Fig. 8 is
~80 (;)
.5 ~ II)
~ ~15
80g ..
60 ::CO
E 10
Supply Nt Temperature SetpOint
o o
o 1001-o
of occupancy)
setpoint fault.
Time (min)
Fig. 8. Field study AHU simultaneous heating and cooling fault.
air temperature
from the damper for cooling with outdoor air. The cooling coil remains closed throughout the 19 h time period. This combination of control signals is inconsistent with any known mode of operation, so this period of operation is classified as mode 5 (unknown mode of operation) and APAR evaluates the rules associated with mode 5. Rule 23 states that if the average heating coil valve position is greater than a threshold, in this example 1%, and the average mixing box damper. position is greater than another threshold, also I % in this example, then a fault has been detected, since the AHU is simultaneously heating and cooling/ economizing. The cause of this fault was a combination of a sequencing logic error and a temperature sensor error related to the specific control strategy implemented in this AHU, in which the cooling coil valve, heating coil valve, and mixing box damper are controlled by independent PID loops, each with independent temperature sensors and setpoints. 5. Conclusions The increasing performance demands on building automation and control systems, combined with the growing complexity of these systems, has created a need for automated FDD tools. Tools that can be embedded directly in the equipment controllers offer significant advantages over approaches that depend on coUecting and analyzing large amounts of trend data. APAR consists ofa set of expert rules, derived from mass and energy balances. Control signals are used to determine the AHU's mode of operation, which identifies the subset of the rules to be evaluated. APAR was tested in an emulation study and a field study. Consistent results detecting a variety of common mechanical and control faults show that APAR is effective at detecting these faults and is suitable for embedding in commercial HVAC equipment controllers.
National Institute of Standards and Technology, Xiaohui Zhou of the Iowa Energy Center, Mark Levi and Stephen May of the U.S. General Services Administration Region IX, Louis Coughenour of Enovity, Jim Butler, Olav P. Hegland, and Robert Veelenturf of Cimetrics, Mike Macklin of the Des Moines Area Commun.ity College, and Joel Bender, H. Michael Newman, and Elaine Stanton-Hicks of Cornell University.
This work was supported in part by the California Energy Commission Public Interest Energy Research (PIER) Program and the U.S. Department of Energy Office of Energy Efficiency and Renewable Energy. In addition,. thi~. project would not have been possible without the assistance of many individuals. Thanks are due to Cheol Park .IlUd MiChael Galler of the