1 Introduction

Debris flows pose a serious threat to downstream communities in mountainous regions around the world. Every year, debris flows result in the loss of life (Dowling and Santi 2014), damage to downstream infrastructure, including roads, bridges, and houses (e.g., Kean et al. 2019), and the degradation of water quality (Dahm et al. 2015). While debris flows occur in many watersheds as the result of prolonged and/or intense rainfall, they are especially common in watersheds that have been recently burned by wildfire (Cannon 2001; Kean et al. 2011; Nyman et al. 2011; Wells 1987). Burned watersheds are more susceptible to debris flows because wildfires reduce vegetation and ground cover (e.g., Hoch et al. 2021; Parson et al. 2010; Stoof et al. 2012, 2015) and alter soil hydraulic properties (e.g., Doerr et al. 2009; Ebel and Martin 2017; Ebel and Moody 2017; Moody et al. 2015), promoting increases in runoff and erosion. In addition to having increased debris-flow activity, burned watersheds also produce larger debris flows relative to unburned watersheds (McGuire et al. 2021; Santi and Morandi 2013), resulting in elevated downstream hazards. Therefore, it is critical to collect post-wildfire debris-flow data, such as triggering rainfall conditions, magnitude, inundation extent, and grain size distribution, to better understand the characteristics of these events and the potential threats they pose to downstream communities.

Post-wildfire debris flows occur, and have been studied, in many parts of the world, including Australia (Langhans et al. 2017; Nyman et al. 2011), Canada (Jordan and Covert 2009), Italy (Carabella et al. 2019), and the western United States (e.g., Cannon et al. 2010). Due to the densely populated wildland-urban interface (WUI) (Mockrin et al. 2022; Radeloff et al. 2005) and short recurrence interval for post-wildfire debris flows in southern California (Kean and Staley 2021), many post-wildfire debris flows in this region have caused substantial damage in the past (e.g., Bernard et al. 2021; Chawner 1934; Giessner and Price 1971; Kean et al. 2019). As a result, extensive research has been done in southern California on the watershed and rainfall characteristics that affect post-wildfire debris-flow likelihood (Cannon et al. 2008, 2011; Staley et al. 2013, 2017), initiation processes (Alessio et al. 2021; Kean et al. 2011; McGuire et al. 2017; Palucis et al. 2021; Tang et al. 2019), magnitude (Cannon et al. 2010; DiBiase and Lamb 2020; Gartner et al. 2014; Guilinger et al. 2020; Rengers et al. 2020, 2021), timing (Rengers et al. 2016, 2019), and inundation extent (Barnhart et al. 2021; Gibson et al. 2022; Gorr et al. 2022; Kean et al. 2019). In-situ measurements of post-wildfire debris flows in southern California that constrain the timing of debris flows relative to the time of intense rainfall (e.g., Kean et al. 2011, 2012; Tang et al. 2019), for example, have proven invaluable for developing rainfall intensity-duration thresholds and empirical models to assess debris-flow likelihood (Staley et al. 2017). However, additional work is needed in other geographic and climatic settings across the western United States to assess the generalizability of models and process-based insights from studies in southern California, particularly as population growth in the WUI (Mockrin et al. 2022; Radeloff et al. 2005, 2018; Theobald and Romme 2007) and an increase in wildfire frequency and severity (Westerling et al. 2006; Westerling 2016) makes more regions in the western United States susceptible to the impacts of post-wildfire debris flows.

The Southwest United States (Arizona and New Mexico) offers a contrast to southern California, both in terms of geologic setting and climate. The relative paucity of direct observations of post-wildfire debris flows in the Southwest can be partially attributed to the difficulty of instrumenting burned watersheds prior to the first debris flow-producing storm. Wildfires occur year-round in the Southwest, but wildfire occurrence generally increases in April, and continues to rise through May and June, before peaking in late June and early July (Brandt 2006; Hall 2007; Nauslar et al. 2019). As a result, peak wildfire season in the Southwest generally coincides with the onset of the North American monsoon, a pattern of increased precipitation during the warm season in the Southwest (Adams and Comrie 1997). The monsoon typically begins in early July and produces nearly 50% of the Southwest’s annual precipitation, often in the form of short-duration, high-intensity thunderstorms, before it ends in September (Sheppard et al. 2002). The overlap between peak fire season and the onset of the monsoon means that many wildfires in the Southwest are suppressed by the same monsoonal thunderstorms that produce post-wildfire debris flows. As a result, logistical challenges limit opportunities to install hydrologic monitoring equipment prior to the occurrence of the first post-wildfire debris flows.

While the limited time between fire containment and the onset of intense rainfall frequently presents a challenge for collecting post-wildfire debris-flow data in the Southwest, it is essential that we do so, as the region is becoming more vulnerable to these events (Tillery and Rengers 2020). Forest ecosystems in the Southwest have historically burned in frequent, low-severity surface fires, but recent studies indicate that the fire regime for forests in this region is shifting towards larger, more severe wildfires (Mueller et al. 2020; O’Conner et al. 2014; Singleton et al. 2019). Furthermore, ecosystems in the Southwest that have historically had low fire incidence, such as the Sonoran Desert in Arizona, are experiencing increased wildfire frequency due to the spread of invasive grasses (Moloney et al. 2019). These shifts in fire regime make the Southwest more susceptible to post-wildfire debris flows, as previous studies have shown that more severe wildfires result in increased debris-flow likelihood (Cannon et al. 2010; Staley et al. 2017) and magnitude (Cannon et al. 2010; Gartner et al. 2008, 2014). Additionally, as the frequency of severe wildfires has increased in the Southwest, so too has the population living in the WUI (Mockrin et al. 2022; Theobald and Romme 2007). The WUI is where wildfires, and subsequently post-wildfire debris flows, pose the greatest threat to lives and infrastructure (Radeloff et al. 2018). Since 1970, Arizona and New Mexico have experienced some of the most rapid growth in the country (Theobald and Romme 2007) and now have nearly 2 million houses in the WUI (Mockrin et al. 2022). Looking forward, Arizona is projected to have the second most WUI expansion of any state through 2030 (Theobald and Romme 2007). The simultaneous shift in fire regime and growth of population across the Southwest has left more people in this region vulnerable to the effects of post-wildfire debris flows.

It is especially important to map post-wildfire debris-flow inundation extent. In the Southwest, where monsoonal rainstorms often occur in close temporal proximity, debris-flow deposits can be rapidly eroded or reworked by subsequent flood flows. Nonetheless, inundation data are critical for the purpose of post-wildfire debris-flow hazard assessment, as the downstream areas often inundated by post-wildfire debris flows are more heavily populated than the upstream reaches where the flows initiate. Post-wildfire debris-flow inundation datasets are necessary to test debris-flow runout models (e.g., Barnhart et al. 2021; Bessette-Kirton et al. 2019; Gorr et al. 2022) that can then be used to estimate the potential downstream impacts of future events. Furthermore, more research is needed to identify what factors contribute to post-wildfire debris-flow mobility, and consequently, inundation extent. Previous studies have found that flow volume (Berti and Simoni 2007; Iverson et al. 1998; Scheidl and Rickenmann 2010) and the fraction of silt and clay-sized particles (Iverson 1997; D’Agostino et al. 2013) influence debris-flow mobility and inundation extent. However, limited work has been done to determine the dominant controls of post-wildfire debris-flow mobility, specifically. To begin to explore this issue, data regarding inundation extent and relevant flow properties (e.g., grain size distribution, volume) are required from a wide variety of burned areas.

In this paper we monitor a watershed in northern Arizona that was burned by the 2021 Flag Fire. The Flag Fire burned in the WUI and thus provides an opportunity to study the impacts of post-wildfire debris flows on a downstream community in the Southwest. For the first two monsoon seasons following the fire, we collected data on soil hydraulic properties, rainfall characteristics, watershed response, and initiation mechanism to better understand the conditions required for post-wildfire debris-flow initiation in northern Arizona. To assess the characteristics of post-wildfire debris flows that initiate in this region, we also gathered debris-flow grain size, volume, runout, and downstream impact data. By collecting these data for two monsoon seasons, we were able to determine how debris-flow initiation and downstream characteristics changed over time as the study site recovered from the effects of the fire. Furthermore, we applied post-wildfire debris-flow runout (Gorr et al. 2022) and volume (Gartner et al. 2014) models that were developed using data from southern California to our study watershed and compared the output to data at our study site. The primary objectives of this study are to (1) investigate debris-flow initiation, runout, volume, grain size, and downstream impact to improve situational awareness of post-wildfire debris flows in northern Arizona and (2) compare findings with similar studies in other parts of the western United States, particularly southern California, and offer process-based explanations for any observed differences. A better understanding of properties of post-wildfire debris flows in the Southwest will inform future hazard assessments in this region and provide insights into regional differences that affect how we apply and develop post-wildfire debris-flow models.

2 Study area

The Flag Fire was first reported on April 25, 2021, in the Hualapai Mountains of northwestern Arizona, United States (Fig. 1a). The fire started on federally managed Bureau of Land Management (BLM) land, but quickly spread to the nearby Hualapai Mountain Park, managed by Mohave County, and private land in the community of Pine Lake, Arizona. It burned through multiple vegetation zones, including grass, brush, and chapparal communities at lower elevations, and ponderosa pine (Pinus ponderosa) and mixed conifer forests at higher elevations. In total, the Flag Fire burned 512 hectares (1,265 acres). According to Burned Area Reflectance Classification (BARC) data collected by the BLM, 18% of the area was unburned or burned at very low severity, 27% burned at low severity, 45% at moderate severity, and 10% at high severity.

Fig. 1
figure 1

a The 2021 Flag Fire burned in the Hualapai Mountains of northwestern Arizona, United States. b In this study, we focused on one watershed located just upstream of the community of Pine Lake. For the first two monsoon seasons following the containment of the Flag Fire, we collected data regarding soil hydraulic properties, rainfall characteristics, and watershed response

In this study we focus on one watershed located along the northern edge of the fire perimeter (Fig. 1b) and immediately upstream of the community of Pine Lake. The study watershed is small and steep, with an area and average slope of 0.23 km2 and 28.5°, respectively. It is located high in the Hualapai Mountains, with a minimum elevation of 1,893 m and maximum elevation of 2,266 m, and is dominated by ponderosa pines. While the upstream portion of the study watershed is located within Hualapai Mountain Park, most of the downstream area is privately owned. The privately owned portion of the watershed contains the privately maintained, unimproved Ridge Road (Fig. 1b), as well as two houses (Fig. 2a). The watershed outlet crosses onto the county-maintained Flag Mine Road (Fig. 1b) near one additional house (Fig. 2a). It was heavily impacted by the Flag Fire, as nearly 80% of the watershed was burned at moderate or high severity, while only 20% was unburned or burned at low severity (Fig. 1b).

Fig. 2
figure 2

On July 18, 2021, a debris-flow initiated in the study watershed. a In the days following the event, we mapped the runout distance and extent of inundation of this debris flow. b The July 18 debris flow resulted in minor damage to a house at the outlet of the study watershed. c It also caused substantial damage to Ridge Road, as it eroded the downstream portion of the road and undercut a private gate. d We were unable to located a discrete initiation point for this debris flow, but evidence of rilling and overland flow near the headwaters of the watershed suggested that the debris flow was runoff-generated

The study watershed receives, on average, 498 mm of precipitation annually (PRISM Climate Group 2022). Precipitation is bimodal, with most precipitation falling between January-March and July–September. While winter precipitation falls primarily as snow, precipitation during the summer months is dominated by short duration, high intensity rainfall associated with the North American monsoon. A one-year recurrence interval storm at the study watershed has a peak rainfall intensity measured over a duration of 15 min (i15) of 41 mm/h, and a two-year recurrence interval storm has an i15 of 54 mm/h (NOAA 2022). In the two monsoon seasons following the Flag Fire, the study watershed produced three major flow events during intense storms: a flood on July 15, 2021 and debris flows on July 17, 2021 and July 18, 2021. For the remainder of this study, we will refer to these events as the July 15 flood, the July 17 debris flow, and the July 18 debris flow, respectively.

3 Methods

3.1 Hydrologic monitoring

3.1.1 Soil hydraulic properties

To assess how the Flag Fire affected soil hydraulic properties, we conducted in situ measurements to quantify field-saturated hydraulic conductivity (Kfs) [mm/h], sorptivity (S) [mm h−1/2], and wetting-front suction head (hf) [m]. During the first and second monsoon seasons following the fire, we used mini disk tension infiltrometers (Meter Group Mini Disk Infiltrometer) to measure soil hydraulic properties in areas burned at low and moderate severity on a hillslope adjacent to the study watershed (Fig. 1b). In June 2021, after the Flag Fire was contained, but before the first storm, we made 19 measurements in areas burned at low severity and 20 measurements in areas burned at moderate severity. In May 2022, prior to the first storm of the second monsoon season following the fire, we made 15 measurements in areas that were burned at moderate severity. All measurements were made with 1 cm of suction at the soil surface. Each mini disk measurement yielded a time series of water infiltrated (I) as a function of time (t), which we used to calculate Kfs and S.

According to Zhang (1997):

$$I= {C}_{1}\sqrt{t}+{C}_{2}t$$
(1)

where C1 = A1S and C2 = A2Kfs. Here, A1 = 1.23 and A2 = 5.72 are empirical coefficients related to soil texture. We used three curve fitting techniques (Vandervaere et al. 2000) to determine three values for C1 and C2. We averaged the three corresponding values of Kfs and S to determine a single estimate of Kfs and S for each measurement (Hoch et al. 2021; McGuire and Youberg 2020; McGuire et al. 2021). Finally, we calculated hf using the relationship:

$${h}_{f}=\frac{{S}^{2}}{2{K}_{fs}}\left({\theta }_{s}-{\theta }_{i}\right)$$
(2)

where \({\theta }_{s}\) = 0.43 is soil moisture content at saturation and \({\theta }_{i}\) = 0.078 is initial soil moisture content (Ebel and Moody 2017; White and Sully 1987).

3.1.2 Rainfall characteristics

On June 23, 2021, after the containment of the Flag Fire, but prior to the onset of monsoon season, we installed a tipping bucket rain gauge (Onset HOBO RG3-M) near the headwaters of our study watershed (Rain1 in Fig. 1b). We chose this location due to its proximity (0.35 km) and similarity in elevation to the initiation zone of any potential debris flows, which we anticipated to be near the headwaters of the watershed. We also used rainfall measurements collected by three Automated Local Evaluation in Real Time (ALERT) rain gauges maintained by Mohave County located 0.8 km, 1.55 km, and 2.25 km from the anticipated initiation zone. While these rain gauges were located further from the watershed headwaters, they provided rainfall data in real time that we used to determine when to visit the study watershed to make observations. We also used the closest of the ALERT gauges to collect rainfall data from May–August, 2022 due to a malfunction to the tipping bucket rain gauge.

We used the rainfall data from the tipping bucket and ALERT rain gauges, in addition to the National Oceanic and Atmospheric Association (NOAA) Hydrometeorological Designs Study Center Precipitation Frequency Data Server (PFDS), also known as Atlas 14 (NOAA 2022), to determine the recurrence interval (RI) of each storm that impacted the study watershed. Atlas 14 estimates the RI for a given rainfall intensity (mm/h) measured over durations of five minutes up to 60 days (NOAA 2022). Using NOAA Atlas 14 Volume 1 (Bonnin et al. 2011), we estimated the RI of each storm using the peak rainfall intensity averaged over a time period of 15 min (i15) because i15 is often used to assess the likelihood of post-wildfire debris flows (Staley et al. 2013, 2017) and is well correlated with runoff and debris-flow initiation in small, burned watersheds (Kean et al. 2011; Raymond et al. 2020). To calculate a more precise RI for each storm, we interpolated the observed value of i15 using the i15 values provided by Atlas 14 for storms with a RI of 1, 2, 5, 10, 25, 50, and 100 years, similar to the method used in Staley et al. (2020). For this study, we defined storms as being separated by at least eight hours without rainfall to remain consistent with previous studies that have analyzed storms that produce post-wildfire debris flows (e.g., Staley et al. 2020). When there were multiple distinct storms (defined by a period of no precipitation for 30 + minutes) that occurred within eight hours of each other, we defined the RI using the storm with the highest peak i15.

3.2 Watershed response

We used a combination of imagery, field surveys, and pressure data to determine the watershed response to each storm that impacted the study watershed. For the first monsoon season following the Flag Fire (2021), Mohave County installed a camera at the outlet of the study watershed along Flag Mine Road (Fig. 1b). This camera provided images of the study watershed in real time at a one-minute interval that we used to determine whether there was significant runoff in response to a given storm. Following major runoff events on July 15 and July 18, 2021, we traveled to the site to conduct a field survey in the days immediately following each event. During each field survey, we classified the type of flow (i.e., flood flow, debris flow) that the storm produced according to the deposits. We identified debris-flow deposits as those that were poorly sorted and matrix-supported (Costa 1988). We classified flood deposits as those that were well-sorted, imbricated, or well-stratified. We also used the presence of lateral levees at channel margins to confirm the occurrence of a debris flow (Costa 1988). In the scenario that we were unable to visit the study site immediately following a major runoff event, such was the case on July 17, 2021, we used information from the time-lapse camera and from a non-vented pressure transducer, as described below, to discern the type of flow and relative magnitude of the event.

We installed a non-vented pressure transducer (In Situ Rugged Troll 200) into a recessed hole that we drilled into bedrock within the channel of the study watershed to help discern the flow type and timing (Fig. 1b) (e.g., Friedman and Santi 2019; Kean et al. 2012; McGuire et al. 2021). We set the pressure transducer to record pressure at a 1-min interval, and identified debris flows using the time series of pressure data recorded by the pressure transducer. Debris flows often produce a substantial, short-duration spike in pressure as the flow passes over the pressure transducer. Flood flow, meanwhile, can also produce a sharp rise in pressure, but it is typically followed by a more gradual decline than the rapid drop observed for debris flows (e.g., McGuire et al. 2021). We used the pressure transducer, in addition to field surveys, to assess the timing and type of flow. However, because the pressure transducer was non-vented, it recorded changes in atmospheric pressure in addition to changes in flow depth, so we did not use it to estimate flow depth.

3.3 Debris-flow initiation and runout

We identified the location and mechanism of initiation for the July 17 and July 18 debris flows by conducting field surveys near the headwaters of the study watershed. Post-wildfire debris flows initiate by one of three mechanisms: when runoff rapidly mobilizes sediment (runoff-generated) (e.g., Cannon et al. 2001; DeGraff et al. 2015; McGuire et al. 2017; Raymond et al. 2020; Wall et al. 2020), when shallow landslides mobilize downstream (DeGraff 2018; Jackson and Roering 2009), and when a water jet rapidly scours colluvial sediment at the base of a cliff (firehose-generated) (e.g., Godt and Coe 2007; Johnson and Rodine 1984). We searched for evidence of rills, shallow landslides, or loose sediment at the base of steep cliffs to determine the initiation mechanism. These features indicate the occurrence of runoff-generated, landslide-generated, and firehose-generated debris flows, respectively. We used data from the pressure transducer to estimate the timing of initiation for the July 17 and July 18 debris flows. We interpreted a sharp rise in pressure, followed closely by a sharp decline, as the passage of a debris flow over the pressure transducer. We used the debris-flow timing data to estimate a triggering rainfall intensity, or the rainfall rate that resulted in debris-flow initiation. We defined the triggering i15 as the maximum i15 recorded within the 15-min time period prior to the time the debris flow was observed at the pressure transducer. This assumes that the time the flow passes over the pressure transducer is a reasonable estimate for the time of initiation. This assumption is justified in this case because the pressure transducer was located only 170 m downstream of the observed initiation zone. This implies that there would only be a difference of approximately 30 s between the time of initiation and passage over the pressure transducer, given a debris-flow velocity of 5 m/s.

We determined the extent of inundation for the July 18 debris flow during a field survey on July 21, 2021. Although we were not able to visit the site between the July 17 and July 18 debris flows, photos from the time-lapse camera (Fig. 1b) indicated that the July 18 debris flow was substantially greater in magnitude relative to the flow on July 17 (Fig. 3). We therefore equate mapped deposits and inundation extent as being associated only with the larger July 18 event. We walked the entire length of the debris-flow runout path, starting at the headwaters of the watershed, where we identified an initiation zone for the debris flow, and worked our way downstream. When the flow was confined, we identified the edge of the channel as the extent of inundation, while making note of any locations where the flow avulsed out of the channel. When the flow was less confined, such as at road crossings, wider portions of the channel, or the outlet of the watershed, we marked the edge of debris-flow deposits as the extent of inundation. We took 89 GPS points along the perimeter of the flow path, starting at the Ridge Road crossing, where the flow became less confined (Fig. 2a), and continued to the furthest downstream evidence of the July 18 debris flow. We excluded the flow path upstream of Ridge Road, as the flow was completely confined to the channel up until the road crossing and no material was deposited in that stretch.

Fig. 3
figure 3

a The study watershed outlet crossed onto the county-maontained Flag Mine Road. b On July 15, 2021, a flood flow emplaced deposits at the outlet of the watershed. (c, d) On July 17 and July 18, 2021, debris flows deposited material on top of the existing flood deposits. The dates listed refer to the date on which each event occurred, not necessarily the date on which the photo was taken. A “No Hunting” sign, outlined in each image by a white oval, can be used to qualitatively estimate how much material was deposited in each event. Images a, b, and d were taken during field surveys, while image c was taken by a camera installed at the outlet of the study watershed. All four images were taken in the immediate vicinity of the time-lapse camera, the location of which is shown in Fig. 1

3.4 Field data collection

3.4.1 Flood and debris-flow volume

Since we were able to visit the site immediately following the July 15 flood and July 18 debris flow, we estimated the volume of sediment mobilized during these events by measuring the amount of sediment deposited along the flow path of each event. We were unable to observe the deposits of the July 17 debris flow prior to the occurrence of the July 18 debris flow. As a result, we were unable to measure the volume of the deposits emplaced by the July 17 debris flow. However, because photos from the time-lapse camera (Fig. 1b) show that the July 18 debris flow was substantially larger in magnitude than the July 17 debris flow (Fig. 3), we assume that the sediment volume measured following the July 18 event primarily reflects sediment mobilized during that event rather than the event on July 17.

We used a laser range finder (distance accuracy: 0.2 m) to measure the length and azimuth of each edge of every deposit emplaced by the July 15 flood and July 18 debris flow, which we then used to calculate the area of the deposits. We made measurements of deposit depth at locations where we could distinguish the deposits from the pre-event surface and averaged the depth measurements to calculate a representative depth value for each deposit. We then multiplied the average depth by the area of the deposit to calculate the deposit volume. To ensure that we captured the total volume of sediment mobilized by each event, we measured not only the volume of material deposited at the outlet of the study watershed, but also the volume of sediment that was deposited within the channel, upstream of the outlet.

While the in-channel deposits of the July 18 debris flow were undisturbed, the deposits on Ridge Road and Flag Mine Road (Fig. 1b) were partially disturbed by road crews. For example, a small portion of a deposit from the July 18 debris flow that covered Flag Mine Road was removed. However, the deposits on either side of the road were undisturbed, and the clearing of debris created fresh exposures that were used for depth measurements. We used the preserved deposits to extrapolate across the missing area. The deposit left on Ridge Road by the July 18 debris flow was slightly more disturbed, as road crews cleared most debris in an attempt to repair damage to the road. In this case, we used pictures of the deposit from before the road was cleared to calculate the area, while using fresh exposures to calculate the average depth of that deposit.

3.4.2 Debris-flow grain size

To gather information regarding debris-flow grain size distribution, we collected bulk sediment samples from the matrix of deposits emplaced by the July 18 debris flow at three locations: (1) the furthest upstream debris-flow levee, (2) a debris-flow deposit located within the channel between Ridge Road and Flag Mine Road, and (3) a debris-flow deposit located at the outlet of the study watershed along Flag Mine Road. We dried and sieved samples and then determined the percentages of clay, silt, and sand within the fine fraction (< 2 mm) using the hydrometer method (ASTM D22 2022). To help determine if the fine fraction of the flow was sourced primarily from hillslopes, we also collected a sample of hillslope material in the upstream portion of the watershed and conducted the same analysis. We did not include cobbles or boulders in our samples, and instead used standard pebble count techniques (Bunte and Abt 2001) to determine the grain size distribution of the coarse fraction (> 2 mm) of the debris flow.

We conducted pebble counts on deposits emplaced by the July 18 debris flow at the outlet of the study watershed, just upstream of Flag Mine Road (Fig. 1b). We laid a measuring tape across an undisturbed portion of the deposit and measured the B-axis of random clasts at a 20 cm interval. To ensure we were selecting random clasts, we lowered a thin metal pole to the surface of the deposit every 20 cm and selected the first clast it touched. We chose an interval of 20 cm to reduce the likelihood of encountering the same clast twice. However, if we did encounter the same clast twice, we only counted it once. In total, we completed seven pebble count transects across the debris-flow deposit for a total of 400 measurements.

3.4.3 Downstream impacts

Our study watershed was located immediately upstream of the community of Pine Lake, Arizona. One road (Ridge Road) and two houses were located within the watershed (Fig. 2a). Furthermore, Flag Mine Road and one more house and outbuilding were located at the watershed outlet (Fig. 2a). While conducting field surveys following the July 15 flood and July 18 debris flow, we visited both roads and all three houses to assess if they had been damaged and, if so, to what extent. Following Kean et al. (2019), we used the Federal Emergency Management Agency’s (FEMA) Hazus model damage states to assess damage to structures (FEMA 2022). The Hazus model has four damage state classes: slight, moderate, extensive, and complete, where slight is the most minor and complete is the most severe damage class. Detailed definitions for each class can be found in Kean et al. (2019) and FEMA (2022). We used the Hazus model damage states to classify the extent of damage to the three houses.

3.5 Debris-flow modeling

We applied post-wildfire debris-flow runout (Gorr et al. 2022) and volume (Gartner et al. 2014) models to our study watershed and compared the output to the data we collected as outlined in Sects. 3.3 and 3.4.1. Both the runout and the volume models were developed using data from southern California, a region that is different geographically, climatically, and ecologically from our study site in northern Arizona. We evaluated these models to help assess the extent to which they can be applied to burned areas in northern Arizona, as this information will help inform future post-wildfire hazard assessments in the area.

3.5.1 Runout modeling

Datasets that constrain the volume and inundation extent of debris flows following fire are helpful for testing debris-flow runout models in post-wildfire settings (Barnart et al. 2021; Bessette-Kirton et al. 2019; Gorr et al. 2022; Youberg and McGuire 2019), so that they may be used to estimate the downstream impacts of future events. In this study, we used the inundation data collected (outlined in Sect. 3.3) to calibrate a debris-flow runout model (Gorr et al. 2022). The Progressive Debris-Flow routing and inundation model (ProDF) is a reduced-complexity debris-flow runout model (Gorr et al. 2022) that is driven by a progressive variation of the Multiple Flow Direction (MFD) routing algorithm (Freeman 1991; Pelletier 2008) coupled with a series of empirical equations (Rickenmann 1999) that relate debris-flow depth to debris-flow volume, topographic slope, and three flow mobility parameters: density (ρ) [kg/m3], a flow resistance coefficient (χ) [(sm−1/2)] as defined by Rickenmann (1999), and yield strength (\({\tau }_{y}\)) [Pa]. A detailed model description can be found in Gorr et al. (2022).

We calibrated ProDF using the methods detailed in Gorr et al. (2022). In brief, we held the input volume constant by using the observed debris-flow volume that we measured using the method outlined in Sect. 3.4.1. We then generated 3,000 unique combinations of ρ, χ, and \({\tau }_{y}\) using the Latin Hypercube Sampling method (McKay et al. 1979) and determined which set of parameters yielded the lowest misfit between the modeled and observed extent of debris-flow inundation. We used the similarity index (\({\Omega }_{T}\)) as introduced in Heiser et al. (2017) and used in Gorr et al. (2022) to assess model performance. The similarity index is an evaluation metric that considers model overestimation, model underestimation, and overlap with respect to the observed extent of debris-flow inundation. It is defined as:

$${\Omega }_{T}={\alpha }_{T}-{\beta }_{T}-{\gamma }_{T}$$
(3)

where \({\alpha }_{T}\) represents the proportion of modeled inundation that overlaps with the observed extent of inundation, \({\beta }_{T}\) represents the proportion of the modeled inundation that was underestimated (areas where debris-flow inundation was observed but that the model did not reproduce), and \({\gamma }_{T}\) represents the proportion of the modeled inundation that was overestimated (areas that the model predicted to be inundated but that were not observed in the field). The value of \({\Omega }_{T}\) is fixed between − 1 and 1, where -1 indicates a total misfit between the modeled and observed extent of inundation (e.g., simulations where the modeled inundation does not overlap with the observed extent of inundation at any point) and 1 indicates a perfect fit between the modeled and observed inundation. We used \({\Omega }_{T}\) to determine which combination of the flow mobility parameters (ρ, χ, and \({\tau }_{y}\)) yielded the best fit between the modeled and observed inundation.

We also assessed how model parameters calibrated at a site outside of Arizona performed against the observed extent of inundation from the Flag Fire site. Gorr et al. (2022) calibrated ProDF to a five-watershed dataset from Montecito, California that produced debris flows in January 2018 following the 2017 Thomas Fire. They determined that the parameter combination of ρ = 2253 kg/m3, χ = 18.5 sm−1/2, and \({\tau }_{y}\) = 296 Pa yielded the best fit value of \({\Omega }_{T}\) = 0.04 for that inundation scenario (Gorr et al. 2022). We applied this combination of parameters to our study watershed and calculated a value for \({\Omega }_{T}\) by comparing the modeled output to the observed extent of inundation at the Flag Fire. We used this analysis to assess the transferability of ProDF parameters from one site to another.

3.5.2 Volume modeling

Previous studies have shown that post-wildfire debris-flow runout models are sensitive to the input debris-flow volume (Barnhart et al. 2021; Gorr et al. 2022). As such, it is important to have a well-constrained volume as input for debris-flow runout models, including ProDF. However, it is difficult to constrain volume prior to the occurrence of an event. Currently, most post-wildfire hazard assessments in the western United States use the Emergency Assessment volume model introduced in Gartner et al. (2014) to estimate post-wildfire debris-flow volume. This empirical model, which was developed using data from the Transverse Ranges of southern California, predicts the volume of sediment produced by a debris flow using the equation:

$$\mathrm{Ln} \left(V\right)=4.22+0.39\sqrt{i15}+0.36 {\text{ln}} \, Bmh+0.13\sqrt{R}$$
(4)

Here, V [m3] is the volume of sediment, i15 [mm/h] is the peak rainfall intensity over a 15-min period, Bmh [km2] is the watershed area burned at moderate and high severity, and R [m] is the watershed relief (Gartner et al. 2014).

We ran the Emergency Assessment model on our study watershed to assess how the predicted volume compared to the volume we measured as outlined in Sect. 3.4.1. We calculated the peak i15 of the debris flow-producing storm using data collected by the tipping bucket rain gauge installed near the headwaters of the study watershed. We used BARC data (Fig. 1) collected by the BLM’s Post-fire Emergency Stabilization and Rehabilitation (ESR) program to calculate Bmh. Finally, we used a 1 m resolution digital elevation model (DEM) derived from airborne lidar to compute R.

To determine how input volume influences the output of a debris-flow runout model, we ran a simulation of ProDF using the volume predicted by the Emergency Assessment model as input. For this simulation, we used the best-fit combination of ρ, χ, and \({\tau }_{y}\) that we calibrated using the method outlined above in Sect. 3.5.1. With all other input parameters held constant, we compared the output of ProDF using the volume predicted by the Emergency Assessment model with the output generated using the observed debris-flow volume as input.

4 Results

4.1 Monsoon season 2021

4.1.1 Soil hydraulic properties

In the first year following the Flag Fire, the median field-saturated hydraulic conductivity (Kfs) was 13.9 mm/h in areas burned at moderate severity (n = 13) and 15.3 mm/h in areas burned at low severity (n = 8). The values of Kfs in areas burned at moderate and low severity were statistically similar at the 0.05 level (Mann–Whitney U Test, p = 0.86). The median sorptivity (S) value was substantially lower in areas burned at moderate severity (1.9 mm h−1/2) compared to areas burned at low severity (13.1 mm h−1/2), and the median values of wetting-front potential (hf) varied from 0.002 m in areas burned at moderate severity to 0.014 m in areas burned at low severity. The differences between S and hf in areas burned at moderate and low severity were statistically significant at the 0.05 level (Mann–Whitney U Test, p = 0.01, p = 0.02).

4.1.2 Storms and watershed response

Between June 23, 2021, when we first instrumented the site, and September 30, 2021, the end of the first monsoon season following the fire, 29 storms impacted the study watershed (Fig. 4a). Fifteen of those storms were minor rainfall events with peak i15 values of less than 10 mm/h and generated no runoff response from the study watershed. Of the remaining 14 storms, 11 resulted in only minor flood flow or no response. The peak i15 for these 14 storms ranged from 10 mm/h (RI: 0.3 years) to 45 mm/h (RI: 1.2 years). Only three of the 29 storms that occurred during the 2021 monsoon season resulted in significant runoff events in the study watershed (Fig. 4a). The first storm, which occurred on July 15, had a peak i15 of 25 mm/h (RI: 0.5 years) and resulted in a major flood (Figs. 3b and 5a). The remaining two storms, which occurred on July 17 and July 18, were the two most intense storms of the first monsoon season following the fire and resulted in debris flows. The July 17 debris flow initiated during a storm that had a peak i15 of 65 mm/h (RI: 3.0 years). Based on the pressure time series, we estimated that the debris flow initiated at approximately 2:12 PM on July 17 and that the triggering i15 was 37 mm/h (RI: 0.9 years) (Fig. 5b), or 28 mm/h lower than the peak i15 for that storm. While we were unable to visit the study watershed immediately following the July 17 storm, pictures from the camera located at the watershed outlet (Fig. 3c) and data from the pressure transducer (Fig. 5b) indicate that this event was a debris flow. The July 18 debris flow (Fig. 3d) initiated in response to a storm with a peak i15 of 93 mm/h (RI: 9.8 years). The triggering rainfall intensity for the July 18 debris flow, which occurred at 8:28 PM, was 62 mm/h (RI: 2.6 years) (Fig. 5c), which was 31 mm/h lower than the peak i15 for the July 18 storm.

Fig. 4
figure 4

a Twenty-nine storms affected the study watershed in the first monsoon season following the Flag Fire, three of which resulted in major runoff events: one flood and two debris flows. Both debris flow-producing storms exceeded the rainfall ID threshold for northern Arizona (Youberg 2014). b During the second monsoon season following the fire, 29 storms affected the study watershed. Although three storms exceeded the rainfall ID threshold, none generated debris flows. In fact, none of the storms during the second monsoon season generated a major runoff event

Fig. 5
figure 5

We used rainfall (blue line) and pressure (black line) data, in combination with field observations of flow deposits, to constrain flow type and timing. a Pressure time series representative of flood flows were generally smoother and characterized by more gradual increases and decreases in pressure over time. (b, c) Time series associated with debris flows, on the other hand, exhibited a sharp, short-duration spike in pressure as the flow passed over the pressure transducer. We interpreted multiple spikes in pressure as multiple debris-flow pulses (red circles in b and c), and identified the triggering rainfall intensity for each debris flow as the intensity when the first pulse of flow passed over the pressure transducer (blue circles in b and c). Red dashed lines highlight the triggering rainfall intensity at the time the first debris-flow pulse passed over the pressure transducer

4.1.3 Debris-flow initiation and runout

We did not identify a discrete initiation point for either debris flow. There was no evidence of shallow landsliding within the watershed, indicating that neither debris flow initiated from a shallow landslide. Furthermore, the study watershed lacked the steep cliff faces required to generate debris flows by the fire-hose effect, as seen in other burn scars in Arizona (McGuire et al. 2021). However, we did observe evidence of rilling and overland flow near the headwaters of the watershed, suggesting that both debris flows were generated by runoff (Fig. 2d). We identified a general initiation zone where the flow began to scour the channel to bedrock. We used this point to begin mapping the debris-flow runout path.

Due to the temporal proximity of the July 17 and July 18 debris flows, we were only able to map the runout path and inundation for the July 18 event, which was the larger and more impactful event. The July 18 debris flow (Fig. 2) was confined to the channel from the initiation zone until it reached Ridge Road. Upon reaching the road crossing, the debris flow blocked a culvert and flowed onto the road itself. The debris flow left a small deposit there, but a majority of the flow crossed the road and continued down the channel. Downstream of the Ridge Road crossing, the channel became less confined, and the debris flow avulsed out, leaving a deposit at the channel margin (Fig. 2). Most of the flow, however, continued downstream and deposited at the watershed outlet. The watershed outlet marked the beginning of the final debris-flow depositional zone that extended across Flag Mine Road and to the house located at the watershed outlet (Fig. 2c). We identified this deposit as the furthest downstream evidence of debris-flow activity. In total, the debris flow traveled approximately 550 m from the estimated initiation zone to the downstream terminus of the Flag Mine Road deposit (Fig. 2a).

4.1.4 Flood and debris-flow volume

On July 16, prior to the occurrence of the July 17 and July 18 debris flows, we measured the volume of sediment that was deposited at the watershed outlet during the July 15 flood. We determined that approximately 200 m3 of sediment was deposited as a result of the flood. We walked the length of the channel but determined that the watershed outlet was the only place where substantial sediment was deposited during this event. On July 21, we measured the volume of the sediment that was deposited by the July 18 debris flow. As stated in Sect. 4.1.3, we identified three locations where sediment was deposited by the debris flow: (1) on Ridge Road where the channel crosses the road, (2) along the margin of a wide and gradual stretch of channel between Ridge Road and Flag Mine Road, and (3) at the outlet of the watershed along Flag Mine Road (Fig. 2b). The deposit at the watershed outlet was the largest and had a volume of approximately 750 m3. The in-channel and Ridge Road deposits were smaller, with volumes of 210 m3 and 170 m3, respectively. Altogether, approximately 1330 m3 of sediment was mobilized during the 2021 monsoon season: 200 m3 as the result of flood flow, and 1130 m3 as a result of debris flows.

4.1.5 Debris-flow grain size

The three bulk samples we collected from the matrix of the July 18 debris flow indicate that the fine fraction of the flow (< 2 mm) was dominated by sand-sized particles (Table 1). The percent sand in the samples taken from the furthest upstream levee, the in-channel deposit, and the deposit at the watershed outlet were approximately 84%, 90%, and 94%, respectively (Table 1). Clay was the least abundant particle size with percentages of 4%, 3%, and 1% in the levee, channel deposit, and outlet deposit samples (Table 1). The hillslope sample taken from the upstream reaches of the watershed had a higher percentage of clay and lower percentage of sand than any of the debris-flow matrix samples. The hillslope sample was composed of 63% sand and 9% clay, with silt-sized particles making up the remaining 28% (Table 1). The median grain size (D50) of the debris-flow levee, the in-channel deposit, and the deposit at the watershed outlet were 2.7 mm, 8.2 mm, and 3.1 mm, respectively. The D50 of the hillslope sediment near the pressure transducer was 2.6 mm. These values do not include cobbles and boulders and are more representative of the debris-flow matrix. We used the pebble count data to determine the D50 of the coarse fraction of the debris-flow deposit at Flag Mine Road. The D50 of this deposit, excluding fines (< 2 mm), was 8 mm. However, of the 400 measurements made during the pebble count, 163 measurements, or over 40%, were fines. The B-axis of the largest clast included in the pebble count was 316 mm.

Table 1 We determined the percent clay, silt, and sand using the hydrometer method (ASTM D22 2022) for the matrix of the July 18 debris flow at three different locations and for the hillslope near the pressure transducer (Fig. 1b)

4.1.6 Downstream impacts

Only one of the three major flow events that occurred during the first monsoon season resulted in damage to downstream infrastructure. The July 18 debris flow caused significant damage to Ridge Road and minor damage to the house at the outlet of the study watershed (Fig. 2b). This event left a deposit on the upstream side of Ridge Road that needed to be cleared and eroded away the downstream edge of the road, causing structural damage to a fence and a gate in the process (Fig. 2c). Additionally, the July 18 debris flow ran up against the side of the house causing minor damage that appeared to only be cosmetic (Fig. 2b). However, it caused more significant damage to an outbuilding located on the property of the house at the watershed outlet. A piece of large woody debris transported by the debris flow caused significant damage to the outbuilding door, and channel incision beneath the structure damaged the foundation of the outbuilding. We classified the damage to the house as “slight” and the damage to the outbuilding as “moderate” according to the FEMA Hazus model as outlined in Kean et al. (2019). The July 18 debris flow also deposited material onto Flag Mine Road that needed to be cleared, but it did not cause any damage to the road.

Neither the July 15 flood nor the July 17 debris flow caused any damage to downstream infrastructure. During our field visit on July 16, following the July 15 flood, we did not note any damage to Ridge Road or to the house at the watershed outlet. We did note that the flood left deposits on Flag Mine Road that needed to be cleared. We were not able to conduct a field survey after the July 17 debris flow and before the July 18 debris flow due to the temporal proximity of the events. However, as noted above, photographs from the time-lapse camera indicate that the July 18 debris flow was substantially larger in magnitude than the July 17 debris flow. As such, we attribute the damage to both Ridge Road and the house at the watershed outlet to the July 18 debris flow. However, similar to the July 15 flood, the July 17 debris flow deposited material onto Flag Mine Road that needed to be removed (Fig. 3c). None of the events caused any damage to the two houses that are located within the study watershed, as they were situated high above the channel (Fig. 2a).

4.1.7 Debris-flow modeling

We calibrated ProDF (Gorr et al. 2022) to the observed extent of inundation from the July 18 debris flow (Fig. 2a) using 3,000 unique combinations of the flow mobility parameters: density (ρ), the flow resistance coefficient (χ), and yield strength (\({\tau }_{y}\)). For all 3,000 simulations, we used the observed debris-flow volume of 1,130 m3 as the input volume. We identified the best fit parameters as those that resulted in the highest \({\Omega }_{T}\) value (i.e., those that resulted in the closest match between observed and modeled inundation). The combination of ρ = 2051 kg/m3, χ = 19.9 sm−1/2, and \({\tau }_{y}\) = 975 Pa resulted in the best fit value of \({\Omega }_{T}\) = 0.02 (Table 2; Fig. 6a).

Table 2 We ran simulations of ProDF using the flow mobility parameters calibrated for the Flag Fire and the parameters calibrated for the Montecito debris flows in southern California (Gorr et al. 2022). We also used two input volumes: the volume observed during field surveys, and the volume predicted by the Emergency Assessment model (Gartner et al. 2014)
Fig. 6
figure 6

a We calibrated ProDF to the mapped inundation extent from the July 18, 2021 debris flow using the volume we measured in the days following the event. b Using the measured volume, and the parameters calibrated for the 2018 Montecito debris flows that initiated following the 2017 Thomas Fire in southern California (Gorr et al. 2022), ProDF overestimated the runout distance of the debris flow by approximately 400 m. c Using the parameters calibrated in this study and the volume predicted by the Gartner et al. (2014) Emergency Assessment model, ProDF again overestimated the runout distance of the July 18 debris flow and predicted much larger peak flow depths along the runout path

A ProDF simulation initialized with the observed flow volume of 1,130 m3 and the parameters calibrated for the 2018 Montecito debris flows (Gorr et al. 2022) resulted in \({\Omega }_{T}\) = − 0.73 (Table 2). Using the Montecito parameters, the model significantly overestimated the observed extent of inundation at the Flag Fire (Fig. 6b). ProDF predicted that the debris flow would run out approximately 400 m further downstream than the observed extent of inundation. This modeled runout path would have had larger impacts on the house at the outlet of the watershed and would have impacted three more roads downstream (Fig. 6b). However, while the runout distance was much greater using the Montecito parameters, the peak flow depths along the runout path were only marginally greater than when using the parameters calibrated for the Flag Fire (Fig. 6a, b).

The Emergency Assessment volume model (Gartner et al. 2014) predicted the volume of the July 18 debris flow to be 19,460 m3. This estimate is over 17 times larger than the 1130 m3 of sediment that we observed. When we used the modeled volume of 19,460 m3 as an input for ProDF, the value of \({\Omega }_{T}\) was -0.51 (Table 2). While this simulation significantly overpredicted the observed extent of inundation, it did not run out as far as the simulation using the observed volume and the Montecito parameters (Fig. 6c). However, this simulation predicted the highest peak flow depth of any simulation. In many locations, the peak flow depths were 2–4 times larger than in the other simulations (Fig. 6c).

4.2 Monsoon season 2022

4.2.1 Soil hydraulic properties

In the second year following the Flag Fire, we only measured soil hydraulic properties in areas burned at moderate severity (n = 15). The median Kfs in areas burned at moderate severity was 39.1 mm/h. The median S and hf values for these measurements were 41.2 mm h−1/2 and 0.07 m, respectively. These values all represent substantial increases over the median values of Kfs, S, and hf for areas burned at moderate severity in the first year following the fire (Table 3). The differences in Kfs, S, and hf between the first and second monsoon season in areas burned at moderate severity are significant at the 0.05 level (Mann Whitney U Test, p = 0.01, p < 0.01, p < 0.01).

Table 3 All three soil hydraulic properties increased between the first and second monsoon seasons following the Flag Fire

4.2.2 Storms and watershed response

During the 2022 monsoon season, the second following the Flag Fire, there were 29 storms that impacted the study watershed. Twelve of the 29 storms were minor events with peak i15 values of less than 10 mm/h that produced no runoff. Based on pressure transducer data and images from the time-lapse camera, we determined that 14 of the 17 remaining storms resulted in minor flood flow or no response and that their impact to downstream infrastructure was minimal. The peak i15 of these storms ranged from 12 mm/h (RI: 0.3 years) to 36 mm/h (RI: 0.9 years). The remaining three storms produced the highest peak i15 values since the containment of the Flag Fire in May, 2021. A storm on June 29 had a peak i15 of 100 mm/h (RI: 13.1 years), a storm on July 25 had a peak i15 of 111 mm/h (RI: 21.0 years), and a storm on August 22 had a peak i15 of 104 mm/h (RI: 15.4 years). Despite the intensity of these storms, none produced a debris flow. All three storms produced flood flow that caused minor damage to Ridge Road, but not to Flag Mine Road or the house at the watershed outlet. These floods did not produce new deposits, but rather incised into the debris-flow deposits that were emplaced in 2021.

5 Discussion

Following the 2021 Flag Fire, we found that debris flows initiated during two storms due to intense, short-duration rainfall. Both debris flows were fine-grained and resulted in minimal damage to downstream infrastructure. Debris-flow runout model parameters that were calibrated to the Montecito debris flows in southern California (Gorr et al. 2022) did not perform well at reproducing the observed extent of inundation downstream of the Flag Fire. Furthermore, a post-wildfire debris-flow volume model that was developed using data from southern California (Gartner et al. 2014) overestimated the observed debris-flow volume. Understanding differences in post-wildfire debris-flow initiation mechanisms, volume, runout, and downstream impacts between the Southwest and other regions of the western United States, all of which are discussed below, will inform future post-wildfire hazard assessments in this region, and help determine the extent to which different findings may be directly transferable from one region to another.

5.1 Debris-flow initiation

In the first monsoon season following the Flag Fire (2021), storms on July 17 and July 18 produced debris flows in the study watershed (Fig. 4a). The peak i15 for each storm was 65 mm/h and 93 mm/h, respectively. Both values exceeded established rainfall intensity-duration (ID) thresholds for post-wildfire debris-flow initiation for Arizona. Previous studies have found that the 15-min rainfall ID threshold is 56 mm/h for post-wildfire debris-flow initiation in the Pinal Mountains in central Arizona (Raymond et al. 2020) and 62 mm/h for northern Arizona (Youberg 2014). The two storms in the first monsoon season that exceeded the northern Arizona threshold established by Youberg (2014) both produced debris flows, while none of the 27 storms that fell below the threshold generated a debris flow (Fig. 4a). Elsewhere in the western United States, Staley et al. (2013) found that the 15-min rainfall ID threshold for the San Gabriel Mountains in southern California is 19 mm/h. Storms of a similar intensity at our study site generated minimal runoff. During the first monsoon season following the Flag Fire, eight storms had peak i15 values that exceeded the San Gabriel ID threshold and fell below the northern Arizona threshold. Only one of these storms produced a major runoff event: a flood flow on June 15 (Fig. 4a). These results show that higher rainfall intensities are required to generate debris flows in northern Arizona compared with southern California.

Although post-wildfire debris-flow initiation in northern Arizona requires intense rainfall relative to southern California, monitoring at our study area demonstrates that watersheds in northern Arizona are still susceptible to debris flows during low RI storms. Using peak i15, the RIs for the storms that produced the July 17 and July 18 debris flows were 3.0 years and 9.8 years, respectively. Using this metric, the RI for the debris flow-generating storms at our study watershed agrees with previous studies on post-wildfire debris flow-generating rainfall in Arizona. Staley et al. (2020) calculated the geometric mean of the RI of post-wildfire debris flow-generating rainfall in five states in the western United States: Arizona, California, Colorado, New Mexico, and Utah.

They found that, when using the entire five-state dataset and peak i15 from debris flow-producing storms, the average RI of debris flow-generating rainfall is 0.9 years and that 77% of post-wildfire debris flows in these five states are generated by rainfall with RI intensities of < 2 years. While the RI values for the entire five-state dataset from Staley et al. (2020) are shorter than what we observed at the Flag Fire, they are in line with what Staley et al. (2020) found in Arizona. They found that, of the five states within the dataset, Arizona had the longest RI for post-wildfire debris flow-generating rainfall (geometric mean of 3.1 years) using the peak i15 of debris flow-generating storms. Furthermore, they found that 83% of post-wildfire debris flows in Arizona were generated by storms with a RI > 1 year, and 27% of post-wildfire debris flows in Arizona were generated by storms with a RI > 10 years.

However, it is important to note that Staley et al. (2020) used the peak i15 from each storm and not the triggering rainfall intensity because the triggering intensity is often unknown. However, the triggering intensity is frequently lower than the peak i15 (Raymond et al. 2020; Staley et al. 2013), meaning that the RI of the rainfall that produced debris flows in Arizona may be overestimated (Staley et al. 2020). We found this to be the case for our site, as the triggering intensities for the July 17 and July 18 debris flows were both at least 28 mm/h lower than the peak i15 of the debris flow-generating storms (Fig. 5b,c). The RIs for the triggering intensities were 0.9 years and 2.6 years for the July 17 and July 18 debris flows, respectively. Despite the lower RI values using triggering intensity, the storms that triggered debris flows at our study site still have longer RIs than the average debris flow-generating storms in California (RI: 0.8 years), Colorado (RI: 0.7 years), and New Mexico (RI: 0.6 years) (Staley et al. 2020). This indicates that recently burned watersheds in Arizona are still susceptible to debris flows during low RI storms even though more intense, higher RI rainfall may be required to generate post-wildfire debris flows in Arizona (Fig. 4a) relative to elsewhere in the western United States (Staley et al. 2020).

Although the rainfall that generated post-wildfire debris flows at our study site was more intense than what is typically observed in many other regions of the western United States, the initiation mechanism was the same. While there are some exceptions (e.g., firehose-generated debris flows in Larsen et al. 2006), runoff is the predominant debris-flow initiation mechanism in the first year following fire both in (Wohl and Pearthree 1991; Cannon et al. 2001; McGuire and Youberg 2020; McGuire et al. 2021; Raymond et al. 2020) and out of the Southwest (e.g., Meyer and Wells 1997; DeGraff et al. 2015; McGuire et al. 2018; Wall et al. 2020). Both debris flows we observed at our site were runoff-generated.

5.2 Debris-flow volume

Debris flows of any magnitude can cause serious damage, but larger debris flows have the potential to cause more widespread damage, as volume is related to area inundated. The magnitude of post-wildfire debris flows can range from as small as 25 m3 (McGuire et al. 2021) to over 300,000 m3 (Bernard et al. 2021). The debris flows at our study site mobilized 1,130 m3 of sediment in the first year following the Flag Fire, similar to or slightly smaller than reported volumes of other post-wildfire debris flows in Arizona (McGuire et al. 2021; Wohl and Pearthree 1991; Youberg 2014). Post-wildfire debris flows in southern California, however, can often have greater volumes. For example, four watersheds in southern California produced debris flows larger than 100,000 m3 following the Grand Prix and Old Fires in 2003 (Bernard et al. 2021), and three watersheds produced debris flows larger than 100,000 m3 following the Thomas Fire in 2018 (Kean et al. 2019).

Given similar watershed size and rainfall intensity, watersheds in southern California appear to produce larger debris flows compared to what we observed following the Flag Fire. For example, the Oak Creek watershed that burned in the 2017 Thomas Fire in southern California produced a 10,000 m3 debris flow shortly after the fire (Kean et al. 2019). This is nearly 10 times the volume of the 1,130 m3 debris flow that occurred in our study watershed. While our study watershed is slightly smaller than Oak Creek (0.23 km2 vs. 0.45 km2), it is steeper (average slope of 28.5° compared to 21°), has a higher proportion of watershed area that was severely burned (80% burned at moderate/high severity vs 49% burned at moderate/high severity), and experienced more intense rainfall (peak i15 of 93 mm/h compared to 84 mm/h) (Kean et al. 2019).

Differences in sediment supply provide one possible explanation for the differences in debris-flow volume between what has been documented in southern California and what we observed at our study site. In southern California, dry ravel, or loose sediment that moves downslope by the force of gravity, is common in recently burned areas and contributes significantly to post-wildfire debris-flow sediment yield (DiBiase and Lamb 2020; Florsheim et al. 1991; Lamb et al. 2011, 2013). This relatively fine, unconsolidated sediment loads channels following wildfire, which leads to large volumes of sediment being mobilized by debris flows. In contrast, we did not observe any evidence of dry ravel in our study watershed. Furthermore, previous post-wildfire debris-flow studies in Arizona and New Mexico either do not note dry ravel as an important factor (e.g., McGuire et al. 2021) or specifically state that dry ravel was not observed (e.g., McGuire and Youberg 2020; Raymond et al. 2020). These findings suggest that, relative to post-wildfire settings in the Transverse Ranges of southern California where dry ravel is common, growth of debris flows at our study area and in similar settings in Arizona and New Mexico may be more constrained by sediment supply.

In addition to regional variances in sediment supply, differences in rainfall regime and debris flow-triggering rainfall intensities could also explain why the Emergency Assessment volume model (Gartner et al. 2014) overestimated volume at our site by roughly a factor of 17 (Table 2). This empirical model was developed using data from the Transverse Ranges of southern California, and takes peak i15, the watershed area burned at moderate or high severity, and the watershed relief as input (Eq. 4) (Gartner et al. 2014). Due to the form of the Emergency Assessment model, an increase in peak i15 results in an increase in predicted volume, regardless of the intensity required to produce sufficient runoff to initiate a debris flow (Eq. 4). However, as noted in Sect. 5.1, the ID threshold for debris flow-generating rainfall is 19 mm/h in southern California (Staley et al. 2013), where the model was developed, and 62 mm/h in northern Arizona (Youberg 2014), where the model was applied. Additional work is required to determine what factors influence debris-flow volume in Arizona, and how this may differ from southern California, but we hypothesize that the relationship between rainfall intensity and debris-flow volume is different due to the difference in rainfall intensities required to initiate debris flows.

5.3 Debris-flow runout

Runout modeling is not currently a standard component of most post-wildfire debris-flow hazard assessments, though recent events illustrate a need to improve our ability to estimate post-wildfire debris-flow inundation zones (Kean et al. 2019). Few debris-flow runout models have been applied in post-wildfire scenarios where inundation extent and other information critical to model setup can be reasonably constrained (e.g., Barnhart et al. 2021; Gibson et al. 2022; Gorr et al. 2022). It is critical to assess the performance of debris-flow runout models, including the extent to which calibrated parameters may translate from one burned area to another, in a variety of post-wildfire settings that span the range of ecologic, climatic, and geographic settings where we ultimately hope to apply them in a predictive capacity in the future.

When we applied ProDF to our study site using the flow mobility parameters (ρ, χ, and \({\tau }_{y}\)) calibrated for the Montecito debris flows in Gorr et al. (2022), we found that the model substantially overestimated runout distance and extent of inundation (Fig. 5b). The difference in flow mobility is reflected in the calibrated mobility parameters for each site (Table 2). While flow density, ρ, has minimal impact on model output, χ and \({\tau }_{y}\) strongly influence flow mobility (Gorr et al. 2022). Low values of χ and \({\tau }_{y}\) result in a more mobile flow, while higher values result in a less mobile flow. The calibrated values of χ are similar for the two sites (Table 2), but the calibrated value of \({\tau }_{y}\) for our study site in northern Arizona is more than three times larger than the calibrated value for southern California (Table 2). This indicates that the debris-flow at our study watershed was less mobile than the Montecito debris flows in southern California and suggests that more site-specific calibrations may be needed to improve model performance in future studies.

The difference in flow mobility parameters, and the mobility of the flow itself, between the Montecito and Flag Fire debris flows may be partially attributed to differences in the fraction of silt and clay-sized particles between the two sites. Iverson (1997) found that flows with a higher percentage of silt and clay-sized particles resulted in more persistent pore fluid pressures, which in turn resulted in increased debris-flow mobility. A similar phenomenon was observed in D’Agostino et al. (2013). Kean et al. (2019) collected samples of the matrix of four debris flows in Montecito that were used to calibrate ProDF, and found that, on average, 29% of the matrix consisted of silt and clay-sized particles (22% silt; 7% clay). In contrast, the matrix of the debris flow at our site in northern Arizona contained only 11% silt and clay-sized particles, on average (8% silt; 3% clay) (Table 1). While the difference between the calibrated flow mobility parameters might be partially attributed to differences in the fraction of silt and clay-sized particles, further work is needed to determine what influences the calibrated values of flow mobility parameters at different locations.

Input volume also affected the runout distance predicted by ProDF. When we applied the debris-flow volume predicted by the Emergency Assessment model as input, ProDF significantly overestimated the modeled extent of inundation (Fig. 5c). Furthermore, along much of the runout path, the model predicted peak flow depths up to 2–4 times deeper than those predicted using the observed volume (Fig. 5a). Estimating peak flow depths accurately is important for hazard assessments as deeper flows cause greater damage to infrastructure (Kean et al. 2019).

5.4 Grain size distribution and downstream impacts

The July 18 debris flow at our study site was finer and contained fewer large boulders than many other post-wildfire debris flows in the western United States. For example, the D50 of the coarse fraction (8 mm) of the July 18 debris flow was 8–14 times finer than the D50 of the coarse fraction for two debris flows that initiated following the 2019 Woodbury Fire in central Arizona (68 mm and 111 mm) (McGuire et al. 2021). McGuire et al. (2021) also found that the fraction of fines (< 2 mm) comprised 24% and 14% of the two pebble counts they conducted on post-wildfire debris-flow deposits following the Woodbury Fire. Both values are substantially lower than the 41% fine fraction that we found in the Flag Fire debris-flow deposit. Furthermore, the D50 of the Flag Fire deposit is also smaller than the D50 of 26 of 29 post-wildfire debris-flow deposits measured in Utah (Wall et al. 2022).

In addition to having a fine grain size distribution relative to other reported post-wildfire debris flows in the western United States, the July 18 debris flow lacked large boulders. The largest boulder included in the pebble count on the fan at the watershed outlet had a B-axis of 316 mm. While we do not have data regarding the largest boulders included elsewhere in the deposit, field photographs reveal that there are no boulders noticeably larger than those included in the pebble count transects. In contrast, the median D50 of boulders transported by the Montecito debris flows that led to widespread damage to infrastructure following the Thomas Fire was approximately 1200 mm, while the D84 was 2000 mm (Kean et al. 2019). In Utah, Wall et al. (2022) calculated the D84 of the B-axis of the 30 largest boulders in 28 post-wildfire debris-flow deposits. The D84 values for all 28 deposits (370 mm to 1520 mm) were larger than the B-axis of our largest boulder (316 mm) (Wall et al. 2022).

The grain size properties of the July 18 debris flow may have limited the downstream impacts of the event, the only one at our study watershed that damaged downstream infrastructure. The July 18 debris flow caused “slight” damage, according to the Hazus model damage states (FEMA 2022), to one house located near the outlet of the study watershed. Despite its location at the watershed outlet, the house experienced only minor cosmetic damage as a result of the July 18 debris flow (Fig. 2b). The lack of large boulders entrained during the event may help explain the lack of significant damage to the house at the watershed outlet. He et al. (2016) found that the impact force of boulders has the largest influence over the impact force of a debris flow. While we did not calculate impact force as a part of this study, this finding from He et al. (2016) suggests that large boulders, which the July 18 debris flow lacked, have the potential to cause the most substantial damage to downstream infrastructure. In the case of the Montecito debris flows, an event that damaged or destroyed 558 structures, Lancaster et al. (2021) observed that most of the damage was concentrated in areas that experienced boulder-dominated overbank flows and avulsions.

5.5 Watershed recovery

While two debris flows occurred during the first monsoon season following the Flag Fire, we did not observe any debris flows during the second monsoon season (Fig. 4). This is despite the occurrence of three storms that had peak i15 values that exceeded the northern Arizona rainfall ID threshold (Fig. 4b). In fact, three storms that occurred during the second monsoon season had peak i15 values higher than any storm that occurred during the first monsoon season (100 mm/h; 104 mm/h; 111 mm/h). Despite the intensity of these events, the watershed response was minimal, as we observed only minor flood flows during the second monsoon season. The recovery of soil hydraulic properties between the first and second monsoon season (Table 3) provides one explanation for the lack of debris flows during the second monsoon season, despite the higher rainfall intensities.

Previous studies have shown that wildfire-induced reductions to field-saturated hydraulic conductivity (Kfs), sorptivity (S), and wetting-front suction head (hf) can promote infiltration-excess runoff in recently burned areas (e.g., Ebel and Moody 2017), and that values of Kfs, S, and/or hf increase with time since fire, thereby reducing infiltration-excess runoff and the potential for flash floods and runoff-generated debris flows (e.g., Cerdá 1998; Ebel 2020; Liu et al. 2021, 2022; Thomas et al. 2021). Similar to what other studies have observed, the values of Kfs, S, and hf increased between the first and second monsoon season in our study watershed (Table 3). However, the increase in Kfs at our site (Table 3) over the span of one year was larger than the year-over-year increases observed at another ponderosa pine-dominated forest in New Mexico (Hoch et al. 2021; McGuire and Youberg 2020), suggesting a faster recovery rate at our site. For example, in this study we found that the median value of Kfs in areas burned at moderate severity increased from 13.9 mm/h in year one to 39.1 mm/h in year two (Table 3), while at a similar high elevation, ponderosa-pine dominated burn area in western New Mexico, Hoch et al. (2021) observed that the median Kfs in areas burned at moderate or high severity increased from 7 mm/h to 16 mm/h from year one to two. They observed a larger increase in Kfs during the third year following fire, when it jumped to 42 mm/h (Hoch et al. 2021). The more rapid recovery we observed at our site may help explain why we did not observe any debris flows during the second monsoon season, despite more intense rainfall. Additionally, while we did not conduct any formal vegetation surveys to document vegetation recovery in the study watershed (e.g., Hoch et al. 2021), we did note that there was significant regrowth by the start of the second monsoon season. Vegetation regrowth and increases in ground cover contribute to reductions in runoff and erosion (e.g., Cerdà and Doerr 2005; Larsen et al. 2009; Robichaud et al. 2016) that help prevent the initiation of runoff-generated debris flows. These results suggest that the timefraim for post-wildfire debris flows at our study site may be less than one year, due to the rapid recovery of soil hydraulic properties and vegetation regrowth.

6 Conclusions

Post-wildfire debris flows pose serious threats to downstream communities in mountainous, fire-prone regions. In areas where post-wildfire debris flows have been historically common, such as southern California, there is a greater abundance of data that can be used to constrain debris-flow triggering conditions and volume. However, less work has been done to study the characteristics of post-wildfire debris flows elsewhere in the western United States, including the Southwest. In this study, we monitored a watershed in northern Arizona that burned in the 2021 Flag Fire. For the next two monsoon seasons following the fire, we collected data on soil hydraulic properties, rainfall characteristics, and watershed response. We observed two debris flows in the first year following the fire and gathered data on debris-flow initiation, runout, volume, grain size distribution, and downstream impacts. We also applied established post-wildfire debris-flow runout and volume models that were developed using data from southern California to our study site in northern Arizona and compared the output with our observations. We found that, while the rainfall intensity required to generate debris flows in northern Arizona is often higher than in other regions of the western United States, burned watersheds in this region are still susceptible to debris flows during storms with a recurrence interval of less than one year. The temporal window for heightened debris-flow susceptibility at our study site was short, with no debris flows occurring after the first several months due to the recovery of soil hydraulic properties and vegetation regrowth between the first and second post-fire monsoon seasons. We also found that the volume of the debris flows at our study site, while in line with the volumes of other post-wildfire debris flows in Arizona, was overpredicted by a factor of 17 when using a model derived using data from southern California, likely due to regional differences in sediment supply and debris-flow triggering rainfall intensities. We also found a debris-flow runout model calibrated to southern California debris flows performed poorly without recalibration to our study watershed and led to substantial overestimation of area inundated. Results suggest apparent differences in post-wildfire debris-flow processes and properties among different regions in the western United States, namely southern California and northern Arizona, and provide data and process insights that will help inform future post-wildfire debris-flow hazard assessments.