the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Probabilistic Hierarchical Interpolation and Interpretable Configuration for Flood Prediction
Abstract. The last few years have witnessed the rise of Neural Networks (NNs) applications for hydrological time series modeling. By virtue of their capabilities, NN models can achieve unprecedented levels of performance when learn how to solve increasingly complex rainfall-runoff processes via data, making them pivotal for the development of computational hydrologic tasks such as flood predictions. The NN models should, in order to be considered practical, provide a probabilistic understanding of the model mechanisms and predictions and hints on what could perturb the model. In this paper, we developed two probabilistic NN models, i.e., Neural Hierarchical Interpolation for Time Series Forecasting (N-HiTS) and Network-Based Expansion Analysis for Interpretable Time Series Forecasting (N-BEATS), and benchmarked them with long short-term memory (LSTM) for flood prediction across two headwater streams in Georgia and North Carolina, USA. To generate a probabilistic prediction, a Multi-Quantile Loss was used to assess the 95th percentile prediction uncertainty (95PPU) of multiple flooding events. We conducted extensive flood prediction experiments demonstrating the advantages of hierarchical interpolation and interpretable architecture, where both N-HiTS and N-BEATS provided an average accuracy improvement of almost 5 % (NSE) over the LSTM benchmarking model. On a variety of flooding events with different timing and magnitudes, both N-HiTS and N-BEATS demonstrated significant performance improvements over the LSTM benchmark and showcased their probabilistic predictions by specifying a likelihood parameter.
- Preprint
(2267 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on hess-2024-261', Nima Zafarmomen, 18 Oct 2024
The paper introduces a novel application of deep learning architectures, specifically the N-HiTS and N-BEATS models, for flood prediction. This is a pioneering approach in the hydrological domain, demonstrating how advanced neural networks can be adapted to model complex environmental systems. The use of these architectures represents a significant advancement in flood prediction, highlighting their ability to capture intricate rainfall-runoff processes and providing more accurate forecasts compared to traditional models.
One of the key strengths of the paper is its focus on probabilistic predictions through the use of the Multi-Quantile Loss (MQL) function. By incorporating uncertainty quantification, the paper enhances the reliability and interpretability of its flood predictions, which is crucial for decision-makers managing flood risks.Â
The research is also commendable for its comprehensive benchmarking against long short-term memory (LSTM) models, a standard in time series forecasting. The study clearly demonstrates that the N-HiTS and N-BEATS models outperform LSTM, particularly for short-term flood predictions, with a notable 5% improvement in accuracy (NSE metric).
I am highly interested in the models introduced in this paper and intend to use N-HiTS and N-BEATS in my future research endeavors. I strongly recommend publishing this paper as it offers a well-structured methodology, comprehensive benchmarking against established models like LSTM, and rigorous sensitivity and uncertainty analyses.Â
Â
Citation: https://doi.org/10.5194/hess-2024-261-CC1 -
RC1: 'Comment on hess-2024-261', Anonymous Referee #1, 04 Nov 2024
Saberian et al. applied two new neural networks to flood prediction at two headwater watersheds. The new approaches have the advantages of uncertainty assessment of the prediction. They also compared the results with LSTM which shows improvement of the prediction performance. This study is novel and important. The manuscript is generally well written, and the structure is well organized. I have several questions as follows:
- Are precipitation, temperature, and humidity enough as input variables for your neural networks?
- The forcing station is a single point in the watershed while the runoff generation should be attributed to the water convergence involving a large area of the watershed, do you think a single station can represent these complex processes at large areas?
- You mentioned, your models predicted one hour ahead? Is this meaningful for flood prediction? In other words, is this enough time to escape once people know the flood will arrive one hour later.
- Did you train each NN model for each watershed? Trained based on one watershed and then transferred to the other one? Or trained both watersheds together?
Citation: https://doi.org/10.5194/hess-2024-261-RC1 - CC2: 'Reply on RC1', Mostafa Saberian, 05 Nov 2024
-
AC1: 'Reply on RC1', Vidya Samadi, 06 Nov 2024
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2024-261/hess-2024-261-AC1-supplement.pdf
-
RC2: 'Comment on hess-2024-261', Anonymous Referee #2, 06 Nov 2024
In their study, Saberian et al. present two innovative neural network models, N-HiTS and N-BEATS, aimed at advancing flood prediction capabilities across two headwater watersheds in the southeastern United States. The authors emphasize the interpretability of these models, alongside their ability to quantify prediction uncertainty—a valuable aspect in flood forecasting. By benchmarking against the LSTM model, the study demonstrates notable performance gains in short-term flood prediction accuracy. There are areas where additional clarity and methodological detail would strengthen the findings. I offer the following questions and suggestions for improvement:
1. Methodology and Models-
Interpretability and Model Complexity: The paper claims that N-HiTS and N-BEATS models offer interpretability. However, further elaboration on how these models achieve interpretability would strengthen the paper. Including visual examples or providing a more explicit breakdown of how interpretability manifests in model outputs could clarify this for readers who may be less familiar with these architectures.
-
Hyperparameter Selection: The selection process for critical hyperparameters like the lookback window size is not fully justified. Lookback windows are crucial in sequence-based forecasting, and this choice should either be explored as a hyperparameter or explained in greater detail, particularly given the model's dependency on residuals for subsequent window predictions. Additionally, since a 24-hour lookback window is used, further elaboration on how this length captures relevant hydrological features, like seasonality or trends, would enhance clarity.
-
Metrics Selection: While NSE, RMSE, and MAE are utilized, the omission of the Kling-Gupta Efficiency (KGE) index is notable. KGE is especially relevant for flood forecasting as it provides insights into peak flow timing, magnitude, and correlation. Including KGE would add robustness to the evaluation by capturing aspects critical to hydrological modeling.
2. Model Evaluation
-
Interpretability in Model Outputs: Although the paper claims interpretability for both N-HiTS and N-BEATS, the explanation is somewhat abstract. Providing visual aids or case studies that illustrate interpretability in flood prediction contexts would be beneficial. Specifically, the paper mentions that projections onto harmonic and trend bases improve prediction accuracy, but further clarification on the physical interpretability of these projections would help. Given the use of a 24-hour window, it would be helpful to explain whether trends, network depth, or some other feature captures seasonality and why this choice is appropriate for flood prediction.
-
Uncertainty Analysis: The application of Maximum Likelihood Estimation (MLE) for uncertainty quantification is intriguing. However, more details on how MLE is applied in this context would improve reproducibility. A clearer formulation of MLE within the training process or its integration with multi-quantile loss could better inform readers about the strengths and limitations of this approach. Additionally, bootstrapping methods could help quantify uncertainty and assess whether observed performance differences between models are statistically significant, providing a more robust comparison.
3. Data and Experimentation
-
Separate Model Training for Each Catchment: Each model was trained separately for each catchment, rather than training a single model on both catchments. This approach limits the assessment of the models' generalizability across different hydrological conditions. Training a unified model on data from both catchments would provide insights into the model’s adaptability and robustness across diverse environments, which is crucial for broader flood prediction applications. I recommend including an analysis of a single model trained across both catchments to evaluate cross-catchment performance.
-
Data Splits for Training, Validation, and Testing: It appears the observational data up to October 1, 2022, was used for training, and data from October 1, 2022, to March 28, 2023, was used for validation. However, the absence of an unseen test set to demonstrate generalization capabilities raises concerns. Dividing the dataset into three splits (training, validation, and testing) would allow for hyperparameter optimization on the validation set and final results on an unseen test set, demonstrating the model’s generalization. Including metrics like loss curves for the training and validation sets or evaluation metrics on a test set would help assess model performance and detect overfitting thereby enhancing reliability. Â
4. Suggestions for Improvement
- Model Reproducibility: Simplifying the explanation of the Multi-Quantile Loss (MQL) function could make the methodology more accessible. Additionally, code availability or pseudocode in an appendix would enhance reproducibility and facilitate further exploration by other researchers.
Additional Comments
- Input Sensitivity Inconsistency (Line 568-569): The statement here suggests that the models are indeed sensitive to input conditions, especially during extreme events. However, in the following section, the paper concludes that the models are not sensitive to input data, which presents an inconsistency. This contradiction should be addressed.
Citation: https://doi.org/10.5194/hess-2024-261-RC2 -
AC2: 'Reply on RC2', Vidya Samadi, 21 Nov 2024
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2024-261/hess-2024-261-AC2-supplement.pdf
-
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
252 | 98 | 115 | 465 | 6 | 8 |
- HTML: 252
- PDF: 98
- XML: 115
- Total: 465
- BibTeX: 6
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1