0% found this document useful (0 votes)

28 views35 pages

Conditional Normalization in Time Series Analysis

Uploaded by

jewel.plus.sic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views35 pages

Conditional Normalization in Time Series Analysis

Uploaded by

jewel.plus.sic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Conditional normalization in time

series analysis

Puwasala Gamakumara
Department of Econometrics & Business Statistics
Monash University, Clayton VIC 3800, Australia
Email: puwasala.gamakumara@gmail.com
Corresponding author
arXiv:2305.12651v1 [stat.ME] 22 May 2023

Edgar Santos-Fernandez
School of Mathematical Sciences
Queensland University of Technology, Brisbane QLD 4000, Australia
Email: edgar.santosfernandez@qut.edu.au

Priyanga Dilini Talagala

Department of Computational Mathematics, University of Moratuwa,
Bandaranayake Mawatha, Moratuwa, 10400, Sri Lanka
Email: priyangad@uom.lk

Rob J Hyndman
Department of Econometrics & Business Statistics
Monash University, Clayton VIC 3800, Australia
Email: Rob.Hyndman@monash.edu

Kerrie Mengersen
Science and Engineering Faculty
School of Mathematical Sciences
Queensland University of Technology, Brisbane QLD 4000, Australia
Email: k.mengersen@qut.edu.au

Catherine Leigh
Biosciences and Food Technology Discipline
School of Science
RMIT University, Bundoora VIC 3082, Australia
Email: catherine.leigh@rmit.edu.au

23 May 2023
Conditional normalization in time
series analysis

Abstract

Time series often reflect variation associated with other related variables. Controlling for the
effect of these variables is useful when modeling or analysing the time series. We introduce a
novel approach to normalize time series data conditional on a set of covariates. We do this by
modeling the conditional mean and the conditional variance of the time series with generalized
additive models using a set of covariates. The conditional mean and variance are then used to
normalize the time series. We illustrate the use of conditionally normalized series using two
applications involving river network data. First, we show how these normalized time series can
be used to impute missing values in the data. Second, we show how the normalized series can
be used to estimate the conditional autocorrelation function and conditional cross-correlation
functions via additive models. Finally we use the conditional cross-correlations to estimate the
time it takes water to flow between two locations in a river network.

Keywords: conditional normalization, missing value imputation, conditional autocorrelation,

conditional cross-correlation, lag time estimation, stream data, water quality

1 Introduction
Normalization of some variables is often required prior to using a statistical or machine learning
algorithm. In this study, we introduce a novel normalization method for time series data which
aims primarily to remove the conditional variation in the time series that is induced by other
sources of variation.

Common data normalization methods, such as min-max transformation or standardization (also

called z-score normalization) are not always applicable for time series data. The normalizing
constants may change in the future, and they assume stationary processes. For non-stationary
data, sliding-window normalization has been proposed (e.g., Ogasawara et al. 2010; Vafaeipour
et al. 2014), where the time series is divided into windows of a specified length and data are
normalized within each window. However, these methods do not account for any external
variables that can influence the variation in the time series.

2
Conditional normalization in time series analysis

In practice, it is common to work with multiple time series that are inter-related and non-
stationary. We propose a method to normalize univariate time series conditional on a set of
covariates. This method can be considered as a variation of z-score normalization, but where the
mean and variance are functions of the covariates. Thus we refer to this method as conditional
normalization.

In the proposed method, we first estimate the conditional mean of the time series using a
generalized additive model (GAM) (Hastie & Tibshirani 1990) with a set of covariates. The
conditional variance is then estimated via a different GAM fitted to the squared errors from
the conditional mean model, with respect to the same set of covariates. Finally, the estimated
conditional mean and variance are used to standardize the time series. One can choose the most
relevant set of covariates that can explain maximum variation in the time series.

It is relatively common to subtract a conditional mean in order to adjust data for subsequent
analysis, and sometimes this is called “normalization” (e.g., Xie et al. 2019). However, our
approach is much more general as both the conditional mean and conditional variance are
modeled, and we allow for non-linear relationships between the response and covariates in both
models.

We show two possible uses of conditionally normalized time series. First, we describe how the
conditionally normalized time series can be used to impute missing values in a univariate time
series. To do this, we model the normalized series, and use the model to impute the missing
values. The resulting imputations are then “unnormalized” to give estimates on the original
scale.

Second, we show how the conditionally normalized time series can be used to estimate the
conditional Autocorrelation Function (ACF) and conditional Cross-Correlation Function (CCF).
We can define the conditional ACF at lag k as the conditional expectation of the cross-product of the
conditionally normalized time series and its kth lagged series. Similarly, the conditional CCF at lag k
can be defined as the conditional expectation of the cross-product between two conditionally normalized
time series at k lags apart. To estimate the conditional expectations, we propose to fit GAMs for
the cross-product of the normalized time series using the same set of covariates used in the
conditional normalization.

We also highlight two straightforward empirical applications of conditional normalization of

time series. The first application involves a time series of mean daily stream temperatures
observed in multiple locations in Boise River, in the northwestern United States of America

3
Conditional normalization in time series analysis

(USA), which has many missing values. We describe how we can impute these missing values
using Bayesian machinery for modeling.

The second application uses the conditional CCF to estimate the lag time between two sensor
locations in the Pringle Creek river network in Texas, USA. The lag time is the time it takes
water to flow downstream from an upstream location. This lag time often depends on the
upstream river behavior. For example, when the upstream water level increases, water flow
will typically be increased and hence the lag time will decline. On the other hand, when the
level is low, water may be flowing more slowly and hence the lag time will increase. Lag time
has been estimated using different approaches in many hydrological applications (see Van
der Velde et al. (2010), Hrachowitz et al. (2016) and Li et al. (2018) for example). We propose to
estimate the lag time as the lag that gives the maximum conditional cross-correlation between
two water-quality variables observed at upstream and downstream locations, conditional on
other, related water-quality variables measured at an upstream location. This will allow the lag
time to be estimated conditional on the upstream river behavior.

The rest of the paper is organized as follows. The underlying methods for conditional normal-
ization are described in Section 2. Section 3 contain the empirical application on missing value
imputation, while Section 4 discusses the application to estimating lag times between sensor
locations. Finally, we discuss the results and provide some concluding remarks in Section 5.

2 Conditional estimation via GAMs

In this section, we discuss our approach to conditional normalization of a time series, and then
we discuss a couple of scenarios in which these conditionally normalized time series can be
helpful in data analysis.

2.1 Conditional normalization

Let yt be a variable observed at times t = 1, . . . , T, and zt = (z1,t , . . . , z p,t ) be a p dimensional
vector of variables measured at the same times. We assume the mean and variance of yt are
functions of zt ; that is, E(yt | zt ) = m(zt ) and V(yt | zt ) = v(zt ). Our aim is to normalize yt
conditional on zt , giving
yt − m̂(zt )
y∗t = p . (1)
v̂(zt )

4
Conditional normalization in time series analysis

We estimate m(zt ) and v(zt ) using GAMs (Hastie & Tibshirani 1990). First, we fit the model

p
yt = α0 + ∑ f i (zi,t ) + ε t ,
i =1

where f i (·) are smooth functions, and ε 1 , . . . , ε T have mean 0 and variance v(zt ), giving

p
m̂(zt ) = α̂0 + ∑ fˆi (zi,t ). (2)
i =1

In estimating the model, we ignore any heteroskedasticity and autocorrelation, and use penalized
splines for each f i function.

Next, we fit the model

[yt − m̂(zt )]2 ∼ Gamma(v(zt ), r ),

p
log(v(zt )) = β 0 + ∑ gi (zi,t ),
i =1

where each gi (·) is a smooth function, and the Gamma parameterization has the first argument,
v(zt ), as the mean, and the second argument, r, as the shape parameter. The Gamma family is
not essential here, and it may be replaced by another distribution whose support is in (0, ∞).
The resulting variance estimate is

p
v̂(zt ) = exp β̂ 0 + ∑ ĝi (zi,t ) . (3)
i =1

2.2 Imputation of missing values

The conditionally normalized series can be used when imputing missing values in a univariate
time series, assuming y∗t is a (possibly seasonal) Autoregressive Integrated Moving Average
(ARIMA) process. The normalization removes some of the sources of variation in the data,
allowing the ARIMA model to handle any remaining serial correlation.

We can then impute yt using

q
yt = ŷ∗t v̂(zt ) + m̂(zt ). (4)

where ŷ∗t is the imputed value of y∗t , computed using a Kalman smoother.

5
Conditional normalization in time series analysis

2.3 Conditional autocorrelation function

We can also use the normalized time series to compute a conditional ACF as

rk (zt ) = E[y∗t y∗t−k | zt ] for k = 1, 2, . . .

The function rk (·) can be estimated using a separate GAM for each k:

y∗t y∗t−k ∼ N (rk (zt ), σk2 ),

p
η (rk (zt )) = γ0 + ∑ hi (zi,t ),
i =1

where hi (.) are smooth functions and

eu − 1
η −1 ( u ) = . (5)
eu + 1

Other smooth monotonic link functions, η, that map [−1, 1] to the real line, (−∞, ∞), may also
be used.

2.4 Conditional cross-correlation function

We can use a similar approach to estimate the conditional cross-correlation functions. Suppose
we have a variable xt observed at the same time as yt . We are interested in estimating the
cross-correlation between xt and yt+k for k = 1, 2, . . ., conditional on a set of variables zt . First
we normalize xt and yt with respect to zt using (1) and get

xt − m̂ x (zt ) yt − m̂y (zt )

xt∗ = p and y∗t = q ,
v̂ x (zt ) v̂y (zt )

where m x (zt ) and my (zt ) are estimated using (2), and v x (zt ) and vy (zt ) are estimated using (3).

Then we can estimate the conditional cross-correlation

ck (zt ) = E[y∗t+k xt∗ | zt ] for k = 1, 2, . . .

using the GAMs

y∗t+k xt∗ ∼ N (ck (zt ), u2k ),

p
η (ck (zt )) = φ0 + ∑ si (zi,t ). (6)
i =1

6
Conditional normalization in time series analysis

3 Application: Stream temperature imputation

3.1 Temperature data

In this case study, we use a dataset comprising daily mean stream temperatures (◦ C) recorded in
a large-scale dendritic network in northwestern USA (Isaak et al. 2017). The five-year dataset
includes data from 42 in-situ sensors, each deployed in a unique spatial location. It contains
mean daily stream temperature data, with a total of 1825 observations per time series. There
is a tendency to get missing values in the original data due to sensor issues and the goal is to
impute those values. For illustration purposes, we took a subset of the data at evenly spaced
intervals discarding all but every 5th observation in the time series resulting in 73 observations
per year on each spatial location.

Figure 1 shows the time series of stream temperature and air temperature. It is known that the
air temperatures are strongly correlated with stream temperatures (Bal et al. 2014), and so these
will be used as a covariate in our models.

20
Temperature

−10

−20

2011 2012 2013 2014 2015 2016

Date

Figure 1: Stream temperatures (black) and air temperatures (red) (◦ C) from the 42 spatial locations.

3.2 The conditional normalization model

Let yst represent the temperature at spatial locations s = 1, 2, . . . , S, and time points t =
1, 2, . . . , T, where S = 42 and T = 1825. We will use the conditional normalization model:

(yst − µst )
y∗st = ∼ AR( p),
σt

7
Conditional normalization in time series analysis

where µst and σt are the mean and the standard deviation respectively (formulated as functions
of covariates). To avoid overparameterization (i.e. having more parameters than what we can
learn from the model), we assume that sites have a common standard deviation σ at time t. The
standardized response variable y∗st is modeled using an autoregressive process of order p to
account for the remaining serial correlation in the data.

3.3 GAM models

We formulate the expected value (µst ) of stream temperature as a function of air temperatures,
as well as spatial covariates and landscape factors that are constant over time: stream slope,
elevation (elev) and cumulative drainage area (cd). The resulting mean is the linear function

µst = β 0 + β 1 slopes + β 2 elevs + β 3 cds + β 4 atst

+ β 5 sin1t + β 6 cos1t + β 7 sin2t + β 8 cos2t + β 9 sin3t + β 10 cos3t

+ β 11 sin4t + β 12 cos4t + β 13 sin5t + β 14 cos5t .

The covariates sinkt = sin(2πtk/m) and coskt = cos(2πtk/m), k = 1, . . . , 5, are the first five
pairs of Fourier terms (harmonic regression parameters), where m is the seasonal period (Section
7.4, Hyndman & Athanasopoulos 2021).

We model σt using a gamma distribution with common a and time-specific parameter bt ,

σt2 ∼ Gamma( a, bt ). We use a non-informative uniform prior for a ∼ U (0, 100) and set
bt = a/ exp ( Xγ), where X is a design matrix containing the Fourier covariates and γ is a
vector of regression coefficients.

3.4 Results
The data are missing 1654 temperature values out of 15330 observation periods. To illustrate our
approach, we also remove 20% of the non-missing observations to form a test set. We aim to
estimate these missing values and compare the estimates with the original values. Thus, the
model is trained using 80% of the non-missing data, with 20% used for testing the prediction
accuracy. Figure 2 shows the water temperature values in the training data for each of the spatial
locations.

The model is estimated in Stan using a Hamiltonian Monte Carlo procedure. We use 3 chains
each composed of 6,000 samples and we discard a burn-in of 4,000 samples.

We found that the AR of order eight offered the best fit in terms of root mean square prediction
error (RMSPE). Figure 3 shows the observed water temperature (training set = blue) and the

8
Conditional normalization in time series analysis

40
Spatial locations

0
2011 2012 2013 2014 2015 2016
Date

Temperature
0 5 10 15 20

Figure 2: Daily mean temperature (◦ C) values. The gray areas represent periods that are missing from
the training data.

estimated values (testing set = red) in each of the 42 spatial locations. We note that the model
captures well the periodicity in the data and produces good estimates of the missing temperature
values when comparing the predictions vs hold-out data.

We also assessed the uncertainty in the estimates using highest posterior density intervals.
Figure 4 shows these for the first two spatial locations. In both locations, the model captures
the periodic patterns, including backcasting to predict the initial missing sections of the time
series in location two. Overall, the proportion of observations in which the nominal 95% highest
density interval of the estimated mean temperature contains the true value is 0.946.

The posterior distributions of the regression coefficients β indicate that the spatial covariates
and air temperature substantially affect the stream temperature (Figure 5), with the three pairs
of Fourier terms explaining seasonality changes in the response variable well.

The posterior means of the daily standard deviation indicates that the standard deviation in
summer was three times higher than in the winter (Figure 6).

An ACF plot of the standardized time series corresponding to site 2 is presented in Appendix A.

9
Conditional normalization in time series analysis

1 2 3 4 5 6 7
20
10
0

8 9 10 11 12 13 14
20
10
0

15 16 17 18 19 20 21
20
Temperature

10
0

22 23 24 25 26 27 28
20
10
0

29 30 31 32 33 34 35
20
10
0

36 37 38 39 40 41 42
20
10
0
11
12
13
14
15
16

11
12
13
14
15
16

11
12
13
14
15
16
20
20
20
20
20
20

20
20
20
20
20
20

20
20
20
20
20
20
Date

Figure 3: Time series of stream temperature in the 42 spatial locations. Points in blue represent the
training set, while the predictions for the missing periods are given in red.

4 Application: Predicting lag time on river flow

4.1 Automated in-situ sensors

Traditional methods of monitoring water quality include collecting water samples at low fre-
quencies (monthly, bi-monthly) and conducting lab-based assessments to measure water quality.
With advancements in technology, this manual process is being replaced or augmented by
automated, high-frequency in-situ sensors.

In river systems, these in-situ sensors are typically placed at one or more locations to measure
multiple water-quality variables semi-continuously. The resultant data can be used for different
kinds of analyses such as identifying trends in water quality and predicting sediment and
nutrient concentrations through space and time (Leigh et al. 2019).

10
Conditional normalization in time series analysis

15
10
5
Temperature

15
10
5
0

2011 2012 2013 2014 2015 2016

Date

Figure 4: Time series of two spatial locations. Points in blue represent the training set, while the
predictions for the missing periods are given in red along with the 95% highest posterior
density intervals.

βcos3
βsin3
βcos2
βsin2
βcos1
βsin1
βair_temp
βcum_drain
βelev
βslope
βintercept

−2 0 2 4

Figure 5: Posterior means of the regression coefficients.

4.2 Lag time

When analysing sensor data from multiple locations along the same water flow path, it is useful
to know the lag time between sensor locations, as this facilitates prediction of downstream water
quality using information collected upstream. Lag time can be defined hydrologically in many
ways. For example, Wanielista, Kersten, Eaglin, et al. (1997) defined it as “time from the centroid of
rainfall excess to the time of peak runoff for a watershed”. Here we define the lag time specifically as
“the time it takes water to flow downstream from an upstream location”.

11
Conditional normalization in time series analysis

Posterior mean of the SD 3.5

3.0

2.5

2.0

1.5

1.0

0 100 200 300

Day of the year

Figure 6: Posterior means of the standard deviation (σ).

Estimating lag time is necessary in order to compute lagged, explanatory water-quality variables
which can be used as predictors in models of the downstream response variable of interest. One
approach for estimating lag time is to use empirical equations based on the length and slope of
the flow path and other catchment features (Green & Nelson 2002; Li & Chibber 2008). Another
approach uses water level and flow (i.e., discharge) data. For example, Seyam & Othman (2014)
used this approach to estimate lag time between four upstream locations and a downstream
location in the Selangor River basin. Their method involved plotting the hydrograph for the
downstream location and then water level at the upstream location during high flow events and
estimating the time difference between peaks of the two plots. The average time difference was
then considered as the lag time between the two locations.

Field-based methods to estimate the lag time include injecting salt tracers (usually Sodium
Chloride or Sodium Bromide) at an upstream location and measuring the salt concentration
through time at a downstream location, from which the travel time is then estimated. This
manual process has to be carried out several times a year during both high flows and low flows,
which is costly and time-consuming.

4.3 Estimating lag time via conditional cross-correlations

Lag time can be influenced by various environmental conditions upstream, such as the water
level, discharge, temperature and other water-quality variables. Therefore, we propose a
method to estimate the lag time between two sensor locations in a river using conditional
cross-correlations.

12
Conditional normalization in time series analysis

Suppose xt and yt , observed at times t = 1, . . . , T, denote the same water-quality variable

measured at upstream and downstream sensors respectively. Let zt be a p dimensional vector of
other water-quality variables measured at the upstream sensor at time t. We estimate ck (zt ), the
cross-correlation between yt+k and xt at lags k = 1, . . . , K, conditional on zt , using the model
described in Section 2.4. Then, we can use the estimates of conditional cross-correlations to
estimate the river lag time between the two locations. We define this lag time as dt , and estimate
it using
dˆt (zt ) = argmax ĉk (zt ). (7)
k

4.4 Computing bootstrapped confidence intervals for dt

Computing the standard errors and confidence intervals for dt is not straightforward, so we use
a bootstrap method. We resample the residuals from the various models used in the conditional
cross-correlation calculation to generate new data. Because these residuals are serially correlated,
we use a sieve bootstrap approach (Bühlmann 1997) to capture the autocorrelation structure
in the data in our bootstrap samples. The following algorithm describes our approach for
computing these confidence intervals.

Algorithm (Sieve bootstrap confidence intervals for dt )

Recall that we have fitted the following separate GAMs for each k

pk
y∗t xt∗−k = η −1 (φ0 + ∑ si (zi,t )) + ε t,k .
i =1

Since the ε t,k are serially correlated, we fit a pth

k order autoregressive model for ε t,k for each k:

pk
ε t,k = µk + ∑ ψi,k ε t−i,k + ζ t,k , ζ t,k ∼ N (0, σk2 ).
i =1

For each model, the order pk is determined by minimizing the corrected Akaike Information
Criterion (AICc) using the auto.arima function from the forecast package (Hyndman et al.
2022; Hyndman & Khandakar 2008). Then we resample from ζ̂ t,k to generate our bootstrap
sample following these steps.
pk
1. Randomly select with replacement a sample of size T from ζ̂ t,k = ε t,k − µ̂k − ∑ ψ̂i,k ε t−i,k .
i =1
b .
Denoted this sample as ζ̂ t,k
pk
2. Compute εbt,k as εbt,k = µ̂k + ∑ ψ̂i,k εbt−i,k + ζ t,k
b for k = 1, . . . , K.
i =1
pk
3. Compute y∗t xt∗−k b = η −1 (φ̂0 + ∑ ŝi (zi,t )) + εbt,k for k = 1, . . . , K.
i =1

13
Conditional normalization in time series analysis

4. Fit the following GAM to the bootstrapped data for each k:

pk
y∗t xt∗−k b = η −1 (φ0 + ∑ si (zi,t )) + ε t,k .
i =1

5. Use the models in step 4 to compute dt for a given set of zi .

6. Repeat steps 1 to 5 for b = 1, . . . , m, where m = 1000. Thereby we will get a sample of dt
of size m which can be used to form the empirical distribution of dt . Use this sample to
compute (α/2)th and (1 − α/2)th quantiles which represent the lower and upper bounds
for the 100(1 − α)% confidence interval for dt .

4.5 Study area and water-quality data

We consider Pringle Creek, one of the NEON (National Ecological Observatory Network)
aquatic sites located in Wise County, Texas, and managed by the U.S Forest Services1 . A detailed
description of the study site is given in Appendix B.

Water quality is measured in Pringle Creek using two sensor locations situated about 200 m apart,
with a small tributary entering the main creek between the two sensors. The variables measured
by these sensors include turbidity (Formazin Nephelometric Unit), specific conductance, pH,
dissolved oxygen, and chlorophyll. Measurements are available at 1-minute frequencies and
can be retrieved from NEON Data Portal (National Ecological Observatory Network (NEON)
2021d). Surface water level and water temperature are also available from both locations at
5-minute frequencies and can be retrieved from National Ecological Observatory Network
(NEON) (2021a) and National Ecological Observatory Network (NEON) (2021c), respectively.

The data we consider were collected from 1 October 2019 to 31 December 2019. This time span
avoids the summer period in which surface pools of water disconnect and contains the least
number of missing observations after removing the anomalies.

We will use turbidity to compute the cross-correlations between upstream and downstream
sensors. Turbidity is chosen because it is heavily influenced by fresh inputs of water from
upstream, and hence there should be a strong relationship between upstream and downstream
turbidity. We choose water level and temperature as the covariates from the upstream sensor to
model the cross-correlation between upstream and downstream turbidity.

Appendix B discusses the data pre-processing, anomaly detection, missing value imputation,
and variable selection steps of the analysis.
1 https://www.neonscience.org/field-sites/prin

14
Conditional normalization in time series analysis

4.6 Conditional cross-correlation between turbidity at upstream and down-

stream sensors
Based on the NEON reaeration sampling protocol (National Ecological Observatory Network
(NEON) 2021b) and information gathered from field experts, the time it takes water to travel
downstream between two sensor locations at Pringle Creek is typically about 45 − 60 minutes,
though it may be shorter than 45 minutes during high flows and greater than 60 minutes during
low flows. Considering this information, we choose to compute the cross-correlations up to
24 lags, which will allow for a maximum of two hours of travel time (as the frequency of the
water-quality data is 5 minutes).

Let yt denote the time series of turbidity measured at the downstream sensor. Following
Section 2.4, we first normalize xt and yt+k for k = 1, . . . , 24 conditional on zt . Figures 7 and 8
visualized the fitted mean and variance models for yt+1 respectively.

Figure 7: Visualizing the fitted smooth functions in the conditional mean model for turbidity downstream
with the predictors, water level and temperature from upstream sensor. Each panel visualizes
the relationship between the response and predictor while holding other predictors at their
medians (251.6m and 9.926o C for water level and temperature, respectively). The smooth
function is shown in blue with 95% confidence bands. The degrees of the smoothing are shown
in the y-axis label for each plot.

Following Equation (6), the fitted conditional cross-correlation function between turbidity at the
upstream and downstream sensors at lag k can be written as

ĉk (zt ) = η −1 φ̂o + ŝ1,k (level_upstream)t + ŝ2,k (temperature_upstream)t ), (8)

15
Conditional normalization in time series analysis

Figure 8: Visualizing the fitted smooth functions in the conditional variance model for turbidity down-
stream with the predictors, water level and temperature from upstream sensor. Each panel
visualizes the relationship between the response and predictor while holding other predictors at
their medians (251.6m and 9.926o C for water level and temperature, respectively). The smooth
function is shown in blue with 95% confidence bands. The degrees of the smoothing are shown
in the y-axis label for each plot.

where {ŝ1,k , ŝ2,k } denote natural cubic splines. Similar to when fitting mean and variance models,
the degrees of freedom for each spline are chosen by examining the relationship between the
response and each covariate.

At lag 1, when temperature is greater than 10◦ C it is slightly negatively affecting the cross-
correlation between turbidity upstream and downstream while controlled for water level (see
Figure 9). Plots that visualize relationships at other lags can be obtained similarly.

4.7 Lag time prediction

As we described in Equation (7), lag time is defined as the lag that gives the maximum cross-
correlation conditional on the upstream variables observed at time t. This will allow dt to vary
according to the upstream river behavior.

The 80% and 95% confidence intervals for the relationship between estimated dt and each
upstream covariate zi,t used in our conditional cross-correlation models (see Figure 10) are
computed using the Sieve bootstrap approach (Bühlmann 1997) and the algorithm is described
in Section 4.4.

To visualize the relationship between dt and each covariate, we replace the remaining covariates
with their medians in the original data, and then estimate dt from the fitted model using this

16
Conditional normalization in time series analysis

Figure 9: Visualizing the fitted smooth functions for conditional cross-correlation between turbidity-
upstream and turbidity-downstream at lag 1 with the predictors, water level and temperature
from upstream sensor. Each plot visualizes the relationship between the response and predictor
while holding other predictors at their medians (251.6m and 9.926o C for water level and
temperature in upstream respectively). The top panel shows the smooth terms in the predictor
scale whereas the bottom panel is in the response scale.

modified data. Figure 10 displays the relationship between dt and each covariate. It is clear from
Figure 10 that upstream water level has a negative effect on the lag time. That is, when water
level increases, the lag time decreases. Increasing water level implies high fresh water inputs
and more flow, water moving downstream in less time, hence the lag time will be decreasing.
When the water level is between 251.6 and 251.8 m, the lag time is very low. In fact, there was
only one incident in November that showed a water level within this range, which occurred
during a freshwater inflow event. However, when the water level is more than 251.8 m, the lag
time has increased deviating from its previous pattern. It is unclear what exactly happens at
that instance, however, the original data indicates that the water level was higher than 251.8 m
only during a single event that happened in November-2019 (see Figure 19 in Section B.4). On
the other hand, when the upstream temperature is below 10o C, it has a positive effect on the lag
time - that is, when the water temperature increases, the lag time also increases. This pattern is
consistent with river behavior, as water temperature can increase during dry seasons when there
is less inflow to the system, particularly if dry seasons occur in the warmer months. Low inflow

17
Conditional normalization in time series analysis

causes the water to move downstream more slowly, resulting in an increase in the lag time.
However, for temperatures greater than 10o C, which mostly occur during early October and
freshwater inflow events, the lag time remains consistently low. Figure 11 maximum conditional
cross-correlation and the lag time between turbidity-upstream and turbidity-downstream with
the predictors, water level and temperature from upstream sensors.

Figure 10: Visualizing dt with 80% and 95% bootstrap confidence intervals. Each panel visualizes dt vs
each upstream covariate while holding the remaining upstream covariates at their medians
(251.6m and 9.926o C for water level and temperature, respectively).

4.8 Evaluation
We can use the estimated dt to compute the lead variable, i.e., yt+dt (lag variable, i.e., xt−dt )
from the downstream (upstream) sensor. yt+dt is expected to have the maximum conditional
cross-correlation with xt compared to any yt+k for k = 1, . . . , 24. That is, ideally we expect
E[y∗t+dt xt∗ | zt ] > E[y∗t+k xt∗ | zt ] for all lags k and time t, where xt∗ and y∗t+dt are the conditionally
normalized series of xt and yt+dt with respect to zt . To evaluate this, we first fit a GAM to y∗t+dt xt∗
using zt as the predictors and follow the Section 2.4 to compute E[y∗t+dt xt∗ | zt ]. These conditional
cross-correlations are then compared with E[y∗t+k xt∗ | zt ] for all k and t, which were obtained
using Equation (8). The resultant conditional cross-correlations are shown in Figure 12. We can
see that the E[y∗t+dt xt∗ | zt ] is greater than E[y∗t+k xt∗ | zt ] for majority of the time.

5 Discussion
In this study we introduce a novel approach to normalize univariate time series conditional
on a set of covariates. The proposed approach uses generalized additive models to estimate

18
Conditional normalization in time series analysis

Figure 11: Time series plots of water-quality variables and lag time between upstream and downstream
sensors for the period 01-Oct-2019 to 31-Dec-2019. (a) Time series plot of turbidity-
downstream. (b) time series plot of turbidity-upstream. (c) Lag time between upstream
and downstream sensors with the predictors, water level and temperature from upstream
sensor (d) Maximum conditional cross-correlation between upstream and downstream sensors
with the predictors, water level and temperature from upstream sensor

the conditional mean and variance of the time series, given a set of covariates. The conditional
mean is estimated via an additive model fitted to the time series with respect to the covariates.
The residuals from this model are then used to estimate the conditional variance, by fitting a
separate generalized additive model to the squared residuals from the conditional mean model
using the same set of covariates. We assume a gamma family with a log link in the latter model.
The estimated conditional means and variances are then used to normalize the time series.

Normalizing a given time series in this manner will reduce some of the variation induced
through the covariates. Thus it will help to effectively model the autocorrelation of the series
via appropriate time series models. Using an empirical application, we have shown that these
normalized time series can be used to impute missing values and make predictions in stream
temperature.

The conditionally normalized time series can also be used to compute conditional autocorrelation
and conditional cross-correlation functions at different lags. To compute conditional ACF at lag

19
Conditional normalization in time series analysis

Figure 12: Time plot of the conditional ccf estimated at different lags, i.e., E[ xt∗ y∗t+k | zt ] for k = 1, . . . , 24.
The black line represent the conditional ccf at lag dt ,i.e.,E[ xt∗ y∗t+dt | zt ]. Approximately, 96%
of the time E[ xt∗ y∗t+dt | zt ] > E[ xt∗ y∗t+k | zt ].

k, we have proposed to fit an additive model to the cross product of the normalized time series
and its lagged series at k, using the same set of covariates used in the normalization. Similarly,
the conditional CCF at k can be estimated via an additive model fitted to the cross product of
two conditionally normalized time series at k lags apart.

We have further shown that the conditional cross-correlations can be used to estimate the water
travel time between two locations in a river. This lag time between two river locations varies in
response to the upstream river behavior. Thus we proposed to estimate this lag time conditional
on the upstream river behavior as observed by the water-quality variables measured at the
upstream location. We first computed the cross-correlation between the same water-quality
variable measured at both upstream and downstream locations at different lags, conditional
on a set of water-quality variables measured at the upstream location. Then the lag time is
computed as the lag that gives the maximum conditional cross-correlation. The significance of
the maximum conditional cross-correlation was evaluated in a probabilistic way by computing
standard errors of the predictions in the link space and then computing t statistics. We used
this approach to estimate the water travel time between two locations in Pringle Creek, one of
the NEON aquatic sites located in Texas, USA. The results show that the estimated time lag

20
Conditional normalization in time series analysis

captures the highest correlation between the two water-quality variables measured at upstream
and downstream locations. Lag time estimation using the conditional behavior of the river
and the correlation between variables is useful in developing statistical methods for predicting
other water-quality variables of interest such as sediment and nutrient concentrations in river
networks (Leigh et al. 2019). Such data-driven approaches are also useful to complement or
replace expensive, and time-consuming field-based methods such as salt tracer experiments.

Further research could extend these approaches, for example, considering vector autoregressions
for multivariate time series problems. Similarly, models can be extended to account for spatial
dependence between sites.

Reproducibility
All code to reproduce the results in this paper are available at https://github.com/PuwasalaG/
Conditional_normalisation_in_TSA. All analysis was conducted using R (R Core Team 2021)
and Stan (Stan Development Team 2021). The methods discussed are available in the conduits
package for R (Gamakumara, Talagala & Hyndman 2023).

Acknowledgments
This project is funded by the Australian Research Council (ARC) Linkage project (grant number:
LP180101151) “Revolutionising high resolution water-quality monitoring in the information
age”. The authors acknowledge the staff members from Aquatic Instruments Science team and
the National Ecological Observatory Network (NEON), especially Guy Litt, Bobby Hensley
and Gary Henson, for their valuable explanations on the background of the Pringle Creek site
and the relationships between water-quality variables. Further, we convey our gratitude to
Erin Peterson and Claire Kermorvant for valuable discussions on the project and water-quality
characteristics.

21
Conditional normalization in time series analysis

A Other results from the stream temperature application

site 2
1.0
ACF

0.4
−0.2

0 5 10 15 20 25

Lag

Figure 13: Autocorrelation plot of the standardized y∗s=2,t .

B Data cleaning and preliminary analysis

B.1 Study area

Pringle Creek drains a catchment of 48.9 km2 and experiences high flows in spring when rainfall
is heaviest and low flows during the typically dry summers. Rainfall can occur in winters,
typically December to January, but not snow or ice fall. The average annual temperature is about
17.5◦ C while the average annual precipitation is about 898 mm.

Figure 14: Pringle Creek sensor locations. The creek is shown in the blue line and the two pink circles
denote the upstream and downstream sensor locations. Image courtesy National Ecological
Observatory Network.

22
Conditional normalization in time series analysis

B.2 Data cleaning

The NEON data we use in Application 2 (Section 4) undergo an automated water-quality
assuring process as a part of the rigorous NEON protocols, and are given a quality flag for
each observation. The most commonly used quality flags include range flags, spike flags and
step flags (see Cawley (2021) for a detailed explanation of these quality flags). Range flags are
obvious technical anomalies as they identify the out of range observations of each water-quality
variable. Therefore we use these range flags and claim these points as anomalies. However,
other quality flags do not necessarily imply technical anomalies. For example, Figure 15 shows
a section of data for turbidity at the downstream sensor at Pringle Creek, colored by the quality
flags given by the automated process. In the event of freshwater inflows, turbidity tends to
increase and then gradually decrease; the natural behavior of turbidity in many river systems.
However, these types of points (i.e., sudden increases in turbidity) are flagged by the automated
process, even though they are unlikely to be technical anomalies. Therefore we did not directly
use any quality flags other than range flags in this study, and claim the points shown in Figure 16
as technical anomalies.

1500

1000
Turbidity

500

0
2019−11−03 2019−11−06 2019−11−09 2019−11−12 2019−11−15
Timestamp

Flag Valid data Spike Flag Null Flag Step Flag

Figure 15: Turbidity at downstream sensor colored by the quality flags. Note that only turbidity series
had technical anomalies. Other series did not show any technical anomalies during the study
period we chose.

23
Conditional normalization in time series analysis

Figure 16: Time plots of the variables used in the study for the period spanning from 01-Oct-2019 to
31-Dec-2019. The anomalous points we identify were colored in red.

24
Conditional normalization in time series analysis

Apart from these anomalous points, we also noticed that every fifth observation in turbidity
in both sites are anomalous as a result of the wiper on the optical turbidity sensor operating
(i.e. wiping any biofouling off the sensor probe) every five minutes (Figure 17). Therefore we
also discarded these anomalous points in turbidity series from both sites.

3
Turbidity (FNU)

Oct 14 00:00 Oct 14 06:00 Oct 14 12:00 Oct 14 18:00 Oct 15 00:00
Timestamp

Flag valid data wiper anomaly

Figure 17: Wiper anomalies in turbidity downstream sensor. This plot only shows data for 14-Oct-2019
to distinguish the scale difference between wiper anomalies and typical data.

B.3 Preliminary analysis

Prior to the analysis, we removed all the anomalous points (see Appendix B.2), then aggregated
the observations for all water-quality variables into 5-minute frequency data (Figure 18). The
resultant time series reflect the typical behavior of water quality in many rivers. Figure 19
is similar to Figure 18 except that the turbidity and level are plotted in log scale in Figure 18.
Changed scales in some variables has allowed us to see the internal variations in these individual
variables.

Turbidity tends to increase when freshwater flows into the river (when water level rises) as
this will increase the suspended particles in water. In contrast, conductance tends to decrease
with fresh water inflows as the water becomes diluted (Leigh et al. 2019) (see Figure 18). This
behavior also explains why the relationship between water level at the upstream location and

25
Conditional normalization in time series analysis

turbidity at the upstream site is much stronger than that between the upstream and downstream
locations (see Figure 20).

We then chose the set of covariates from the upstream sensor to model the cross-correlation
between upstream and downstream turbidity by visually analysing the relationship between
the variables. From Figure 20, we can see that the upstream water level, temperature and
conductance show non-linear relationships with both turbidity series. In contrast, dissolved-
oxygen does not show much relationship with turbidity. We also see that water level and
conductance have a strong non-linear relationship. Hence, choosing both water level and
conductance as covariates could lead to multicollinearity problems. Given these observations,
we chose water level and temperature as the covariates to compute conditional cross-correlations
between upstream and downstream turbidity series.

Prior to the remaining analysis, we impute the missing values in upstream level and temperature
(see Figure 21 to visualize the percentage of missing values). The missing values in water level
are imputed using linear interpolation whereas the missing values in temperature are imputed
using a Kalman-smoother implemented in the imputeTS R package (Moritz & Bartz-Beielstein
2017), based on a state space representation of the ARIMA model chosen by auto.arima
implemented in the forecast R package (Hyndman et al. 2022; Hyndman & Khandakar 2008).
Figure 22 plots the time series for level and temperature with the imputations.

B.4 Imputing missing values of turbidity upstream via conditional normal-

ization
To estimate the lag time between two sensors, we compute the conditional cross-correlations,
assuming that the lag time depends on the water-quality at the upstream location. First we will
normalize the two turbidity series conditional on the upstream level and temperature.

Figure 21 shows that the turbidity upstream has considerable amount of missing values which
might affect the lag time estimation. Therefore, we first impute these missing values following
the method explained in Section 2.2. Let upstream turbidity is denoted by xt and the conditional
xt −m̂ x (zt )
normalized series of that is given by xt∗ = √ where zt contains the water level and
v̂ x (zt )
temperature measured at the upstream sensor. The conditional means, m̂ x (zt ) and variances,
v̂ x (zt ) are computed using Equations (2) and (3) respectively. We use the generalized additive
models implemented in the mgcv R package (Wood 2020; Wood 2017). Thin plate regression
splines (Wood 2003) are used as the smooth function for each covariate. We can set the dimension
k of the smoother by observing the relationship between the response and the predictor. Caution
should be taken when choosing k, because larger values can lead to overfitting due to the

26
Conditional normalization in time series analysis

Figure 18: Time plots of water-quality variables for the period 01-Oct-2019 to 31-Dec-2019. All variables
are aggregated into 5-minute frequencies, post anomaly removal (See Section B.2). Turbidity
and water level are shown on the log scale. Observations highlighted in orange show examples
of patterns in water quality when there are fresh water inflows (water level rises).

27
Conditional normalization in time series analysis

Figure 19: Time plots of water-quality variables for the period 01-Oct-2019 to 31-Dec-2019. All variables
are aggregated into 5-minute frequencies, post anomaly removal (See Appendix Section B.2).
All water-quality variables are plotted in their original scale. A couple of instances are
highlighted in orange to show the patterns in water-quality variables when there is fresh
water inflows.

28
Conditional normalization in time series analysis

Figure 20: Pairwise scatter plots between the turbidity and other covariates from the upstream sensor.
Upper triangular matrices shows the Pearson’s correlation coefficient for each pair.

underlying autocorrelation in the time series. See Wood (2017) for a discussion on choosing k in
GAMs.

From Figure 23 it can be seen that turbidity has a positive relationship with water level when ad-
justed for temperature and the relationship is stronger when the water level is higher. However,
temperature does not seem to have much effect on turbidity when adjusted for water level.

To compute the conditional variance, we model the squared residuals from the conditional
mean models assuming a Gamma family with a log link as in Equation (3). Figure 24 shows the
relationship between the response and each predictor in the fitted conditional variance model.

29
Conditional normalization in time series analysis

)
%
)
%

(1
(2
)
%

am
am
(5

)
%

tre
am

tre

ps
ns
re

_u
t

ow
ps

e
re

ur
_u

t
ps

t
ra
ity

ity

pe
id

l
ve
rb

m
tu

te
le
0
Observations

10000

20000

Missing Present
(2.1%) (97.9%)

Figure 21: Visualization of the missing values in the variables.

Figure 22: Time plot of upstream level and temperature with imputed values. Points in orange denote
the imputed missing values.

30
Conditional normalization in time series analysis

Figure 23: Visualizing the fitted smooth functions in the conditional mean model for turbidity upstream
with the predictors, water level and temperature from upstream sensor. Each panel visualizes
the relationship between the response and predictor while holding other predictors at their
medians (251.6m and 9.926o C for water level and temperature, respectively). The smooth
function is shown in blue and the black points are the partial residuals. The degrees of the
smoothing are shown in the y-axis label for each plot.

Figure 24: Visualizing the fitted smooth functions in the conditional variance model for turbidity up-
stream with the predictors, water level and temperature from upstream sensor. Each panel
visualizes the relationship between the response and predictor while holding other predictors at
their medians (251.6m and 9.926o C for upstream water level and temperature, respectively).
The smooth function is shown in blue and the black points are the partial residuals. The
degrees of the smoothing are shown in the y-axis label for each plot.

31
Conditional normalization in time series analysis

The normalized series for turbidity upstream is shown in Figure 25. To impute the missing
values of turbidity upstream, we fit an ARIMA model to xt∗ using the auto.arima function
from the forecast R package (Hyndman et al. 2022; Hyndman & Khandakar 2008). Following
Equation (4) we then impute the missing values of xt as x̂t∗ v̂(zt ) + m̂(zt ), where x̂t∗ is computed
p

using the Kalman smoother implemented in the imputeTS R package (Moritz & Bartz-Beielstein
2017). It should be noted that, this method leads to impute negative values. Since the turbidity
cannot be negative, we have removed those points in the remaining analysis (see Figure 26).
Turbidity upstream (FNU)

Oct Nov Dec Jan

Timestamp
10000

7500
Count

5000

2500

0
0 5 10 15
Turbidity upstream (FNU)

Figure 25: Conditionally normalized upstream and downstream turbidity.

32
Conditional normalization in time series analysis

Figure 26: Time plot of upstream turbidity and temperature with imputed values. Points in orange
denote the imputed missing values.

References
Bal, G, E Rivot, JL Baglinière, J White & E Prévost (2014). A hierarchical Bayesian model to
quantify uncertainty of stream water temperature forecasts. PLoS One 9(12), e115659.
Bühlmann, P (1997). Sieve Bootstrap for Time Series. Bernoulli 3(2), 123–148. http : / / www .
jstor.org/stable/3318584.

Cawley, KM (2021). NEON Algorithm Theoretical Basis Document (ATBD): Water Quality. https:
//data.neonscience.org/api/v0/documents/NEON.DOC.004931vB.

Gamakumara, P, PD Talagala & RJ Hyndman (2023). conduits: CONDitional UI for Time Series
normalisation. R package version 1.0.0. https://github.com/PuwasalaG/conduits.
Green, JI & EJ Nelson (2002). Calculation of time of concentration for hydrologic design and
analysis using geographic information system vector objects. Journal of Hydroinformatics 4(2),
75–81.
Hastie, TJ & RJ Tibshirani (1990). Generalized additive models. Vol. 43. CRC press.
Hrachowitz, M, P Benettin, BM Van Breukelen, O Fovet, NJ Howden, L Ruiz, Y Van Der Velde
& AJ Wade (2016). Transit times—The link between hydrology and water quality at the
catchment scale. Wiley Interdisciplinary Reviews: Water 3(5), 629–657.
Hyndman, R, G Athanasopoulos, C Bergmeir, G Caceres, L Chhay, M O’Hara-Wild, F Petropou-
los, S Razbash, E Wang & F Yasmeen (2022). forecast: Forecasting functions for time series and
linear models. R package version 8.16. https://pkg.robjhyndman.com/forecast/.

33
Conditional normalization in time series analysis

Hyndman, RJ & G Athanasopoulos (2021). Forecasting: principles and practice. 3rd ed. Melbourne,
Australia: OTexts. OTexts.org/fpp3.
Hyndman, RJ & Y Khandakar (2008). Automatic time series forecasting: the forecast package for
R. Journal of Statistical Software 26(3), 1–22.
Isaak, DJ, SJ Wenger, EE Peterson, JM Ver Hoef, DE Nagel, CH Luce, SW Hostetler, JB Dunham,
BB Roper, SP Wollrab, et al. (2017). The NorWeST summer stream temperature model and
scenarios for the western US: A crowd-sourced database and new geospatial tools foster a
user community and predict broad climate warming of rivers and streams. Water Resources
Research 53(11), 9181–9205.
Leigh, C, S Kandanaarachchi, JM McGree, RJ Hyndman, O Alsibai, K Mengersen & EE Peterson
(2019). Predicting sediment and nutrient concentrations from high-frequency water-quality
data. PloS one 14(8), e0215503.
Li, MH & P Chibber (2008). Overland flow time of concentration on very flat terrains. Transporta-
tion Research Record 2060(1), 133–140.
Li, Y, Y Zhu, L Chen & Z Shen (2018). The time delay of flow and sediment in the Middle and
Lower Yangtze River and its response to the Three Gorges Dam. Journal of Hydrometeorology
19(3), 625–638.
Moritz, S & T Bartz-Beielstein (2017). imputeTS: Time Series Missing Value Imputation in R. The
R Journal 9(1), 207–218.
National Ecological Observatory Network (NEON) (2021a). Elevation of surface water
(DP1.20016.001). en. https : / / data . neonscience . org / data - products / DP1 . 20016 .
001/RELEASE-2021.

National Ecological Observatory Network (NEON) (2021b). Reaeration field and lab collection
(DP1.20190.001). en. https://data.neonscience.org/data-products/DP1.20190.001.
National Ecological Observatory Network (NEON) (2021c). Temperature (PRT) in surface water
(DP1.20053.001). en. https://data.neonscience.org/data-products/DP1.20053.001/
RELEASE-2021.

National Ecological Observatory Network (NEON) (2021d). Water quality (DP1.20288.001). en.
https://data.neonscience.org/data-products/DP1.20288.001/RELEASE-2021.

Ogasawara, E, LC Martinez, D De Oliveira, G Zimbrão, GL Pappa & M Mattoso (2010). Adaptive

normalization: A novel data normalization approach for non-stationary time series. In: The
2010 International Joint Conference on Neural Networks (IJCNN). IEEE, pp.1–8.
R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing. Vienna, Austria. https://www.R-project.org/.

34
Conditional normalization in time series analysis

Seyam, M & F Othman (2014). The influence of accurate lag time estimation on the performance
of stream flow data-driven based models. Water resources management 28(9), 2583–2597.
Stan Development Team (2021). Stan Modeling Language Users Guide and Reference Manual. Version
2.28. https://mc-stan.org.
Vafaeipour, M, O Rahbari, MA Rosen, F Fazelpour & P Ansarirad (2014). Application of sliding
window technique for prediction of wind velocity time series. International Journal of Energy
and Environmental Engineering 5(2), 1–7.
Van der Velde, Y, G De Rooij, J Rozemeijer, F Van Geer & H Broers (2010). Nitrate response
of a lowland catchment: On the relation between stream concentration and travel time
distribution dynamics. Water Resources Research 46(11).
Wanielista, M, R Kersten, R Eaglin, et al. (1997). Hydrology: Water quantity and quality control. John
Wiley and Sons.
Wood, S (2020). mgcv: Mixed GAM Computation Vehicle with Automatic Smoothness Estimation. R
package version 1.8-33. https://cran.r-project.org/package=mgcv.
Wood, SN (2003). Thin plate regression splines. Journal of the Royal Statistical Society: Series B
(Statistical Methodology) 65(1), 95–114.
Wood, SN (2017). Generalized additive models: an introduction with R. CRC press.
Xie, R, M Zhang, P Venkatraman, X Zhang, G Zhang, R Carmer, SA Kantola, CP Pang, P Ma, M
Zhang, et al. (2019). Normalization of large-scale behavioural data collected from zebrafish.
Plos one 14(2), e0212234.

Time Series Analysis
No ratings yet
Time Series Analysis
634 pages
Cipra T. (2020) - Time Series in Economics and Finance. Springer
100% (1)
Cipra T. (2020) - Time Series in Economics and Finance. Springer
409 pages
First Course in Time Series Analysis
No ratings yet
First Course in Time Series Analysis
364 pages
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1st Edition Frank Westhoff PDF Download
100% (1)
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1st Edition Frank Westhoff PDF Download
84 pages
Environmental Data Analysis Methods and Applications (Zhihua Zhang) (Z-Library)
No ratings yet
Environmental Data Analysis Methods and Applications (Zhihua Zhang) (Z-Library)
329 pages
William W.S. Wei-Time Series Analysis - Univariate and Multivariate Methods (2nd Edition) - Addison Wesley (2005) PDF
No ratings yet
William W.S. Wei-Time Series Analysis - Univariate and Multivariate Methods (2nd Edition) - Addison Wesley (2005) PDF
634 pages
Travel + Leisure India, Aug 2023
No ratings yet
Travel + Leisure India, Aug 2023
124 pages
Wei 2006 TSA UM
No ratings yet
Wei 2006 TSA UM
634 pages
Anomaly Detection
No ratings yet
Anomaly Detection
51 pages
(William W.S. Wei) Time Series Analysis Univaria (BookFi) PDF
No ratings yet
(William W.S. Wei) Time Series Analysis Univaria (BookFi) PDF
634 pages
Times Series 1
No ratings yet
Times Series 1
88 pages
Zeluiz, V4n4a03
No ratings yet
Zeluiz, V4n4a03
21 pages
Exploring Time Series Collections Used For Forecast Evaluation
No ratings yet
Exploring Time Series Collections Used For Forecast Evaluation
80 pages
Semi - Detailed DLL For Problems Involving Rational Functions
100% (2)
Semi - Detailed DLL For Problems Involving Rational Functions
3 pages
Time Series Analysis (Stat 569 Lecture Notes)
100% (1)
Time Series Analysis (Stat 569 Lecture Notes)
21 pages
Статья на конференцию
No ratings yet
Статья на конференцию
27 pages
Autoregressivemodelsformatrix Valuedtimeseries
No ratings yet
Autoregressivemodelsformatrix Valuedtimeseries
22 pages
On Binary and Categorical Time Series Models With Feedback
No ratings yet
On Binary and Categorical Time Series Models With Feedback
20 pages
Water 13 01862 v2
No ratings yet
Water 13 01862 v2
20 pages
Bayesian Rank Selection in Multivariate Regression
No ratings yet
Bayesian Rank Selection in Multivariate Regression
45 pages
Cambridge IGCSE and O Level Computer Science Second Edition Boost Ebook
No ratings yet
Cambridge IGCSE and O Level Computer Science Second Edition Boost Ebook
1 page
Principles and Algorithms For Forecasting Groups of Time Series Locality and Globality
No ratings yet
Principles and Algorithms For Forecasting Groups of Time Series Locality and Globality
37 pages
Fung, D.-2006-Methods For The Estimation of Missing Values in Time Series
No ratings yet
Fung, D.-2006-Methods For The Estimation of Missing Values in Time Series
205 pages
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
No ratings yet
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
34 pages
Novel Methods For Imputing Missing Values in Water Level Monitoring Data
No ratings yet
Novel Methods For Imputing Missing Values in Water Level Monitoring Data
28 pages
CH 2
No ratings yet
CH 2
8 pages
Final Need Assessment Report of Project Sites, Enhanced Management and Enforcement of Ethiopia's Protected Areas Estate Project
No ratings yet
Final Need Assessment Report of Project Sites, Enhanced Management and Enforcement of Ethiopia's Protected Areas Estate Project
68 pages
Quote Comparison For Qa Account Qwuzpov2fd55m2l Submitted On 2025 01 15
No ratings yet
Quote Comparison For Qa Account Qwuzpov2fd55m2l Submitted On 2025 01 15
14 pages
Journal Time Series Analysis - 2023 - Armillotta - Count Network Autoregression
No ratings yet
Journal Time Series Analysis - 2023 - Armillotta - Count Network Autoregression
29 pages
Fundamentals of Applied Econometrics: by Richard A. Ashley
No ratings yet
Fundamentals of Applied Econometrics: by Richard A. Ashley
26 pages
Efficient Identification of The Pareto Optimal Set
No ratings yet
Efficient Identification of The Pareto Optimal Set
13 pages
Cheboli Deepthi May2010 PDF
No ratings yet
Cheboli Deepthi May2010 PDF
83 pages
LL02
No ratings yet
LL02
64 pages
Текст до CITRisk2023
No ratings yet
Текст до CITRisk2023
4 pages
Exit Process 001
No ratings yet
Exit Process 001
40 pages
Time Series and Sequential Data
No ratings yet
Time Series and Sequential Data
143 pages
Local Media5396045198076149503
No ratings yet
Local Media5396045198076149503
13 pages
FDSA Unit V Notes
No ratings yet
FDSA Unit V Notes
8 pages
Wwts Book PDF
No ratings yet
Wwts Book PDF
634 pages
Benkabou 2021
No ratings yet
Benkabou 2021
11 pages
Introduction To SAP Access Control
No ratings yet
Introduction To SAP Access Control
7 pages
مؤشرات التنمية المستدامة على المدن المصرية
No ratings yet
مؤشرات التنمية المستدامة على المدن المصرية
7 pages
1-Basic Concepts 34454745
No ratings yet
1-Basic Concepts 34454745
13 pages
PCA For Nonstationary Series
No ratings yet
PCA For Nonstationary Series
55 pages
Micro Controller Question Bank
No ratings yet
Micro Controller Question Bank
6 pages
Статья на CITRisk2023
No ratings yet
Статья на CITRisk2023
15 pages
Guide For The Preparation and Bend Testing of Welder and Welding Procedure Qualification Test Specimens
No ratings yet
Guide For The Preparation and Bend Testing of Welder and Welding Procedure Qualification Test Specimens
2 pages
TIME SERIES ANALYSIS Chapter 1 and 2
No ratings yet
TIME SERIES ANALYSIS Chapter 1 and 2
24 pages
Bagging Exponential Smoothing Methods Using STL Decomposition and Box-Cox Transformation
No ratings yet
Bagging Exponential Smoothing Methods Using STL Decomposition and Box-Cox Transformation
10 pages
Unevenly Spaced Time Series Analysis
No ratings yet
Unevenly Spaced Time Series Analysis
44 pages
Chemometrics and Intelligent Laboratory Systems
No ratings yet
Chemometrics and Intelligent Laboratory Systems
13 pages
Faizan Resume
No ratings yet
Faizan Resume
1 page
Lecture Notes Part1
No ratings yet
Lecture Notes Part1
34 pages
300lvl Remita View Invoice - Receipt
No ratings yet
300lvl Remita View Invoice - Receipt
1 page
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
32 pages
ĐỀ THI HUNG VƯƠNG ANH 10
No ratings yet
ĐỀ THI HUNG VƯƠNG ANH 10
16 pages
High Dimensional Change Point Estimation Via Sparse Projection
No ratings yet
High Dimensional Change Point Estimation Via Sparse Projection
27 pages
Computers and Mathematics With Applications: Alejandro Balbás, Beatriz Balbás, Inna Galperin, Efim Galperin
No ratings yet
Computers and Mathematics With Applications: Alejandro Balbás, Beatriz Balbás, Inna Galperin, Efim Galperin
15 pages
Karnemaka - Kar.nic - in Pqrs Generate Postal Challan - Aspx111
No ratings yet
Karnemaka - Kar.nic - in Pqrs Generate Postal Challan - Aspx111
1 page
Planning and Design of Airport Infrastructures: 10 Transportation Infrastructure Lecture
No ratings yet
Planning and Design of Airport Infrastructures: 10 Transportation Infrastructure Lecture
87 pages
BBS en 2010 1 Piscopo
No ratings yet
BBS en 2010 1 Piscopo
8 pages
5-Detail Guide-Two Point Perspective
No ratings yet
5-Detail Guide-Two Point Perspective
2 pages
Lee-Carter Models The Wider Context
No ratings yet
Lee-Carter Models The Wider Context
3 pages
Surveying Instruments
100% (1)
Surveying Instruments
21 pages
Correlational Research - MKM
No ratings yet
Correlational Research - MKM
3 pages
Career in Petroleum Geology
No ratings yet
Career in Petroleum Geology
6 pages
Week 5 - EViews Practice Note
No ratings yet
Week 5 - EViews Practice Note
16 pages
Technical Information
No ratings yet
Technical Information
2 pages
Beck and Katz 2011
No ratings yet
Beck and Katz 2011
28 pages
MathsReport 7thsem
No ratings yet
MathsReport 7thsem
11 pages
Time Series Updated
No ratings yet
Time Series Updated
25 pages
Notes On Programs Tramo and Seats: 11 March 2003
No ratings yet
Notes On Programs Tramo and Seats: 11 March 2003
95 pages
Trends: Null Hypothesis That The Trend, A, Is Not Significantly Different From 0. From The Above Discussion of
No ratings yet
Trends: Null Hypothesis That The Trend, A, Is Not Significantly Different From 0. From The Above Discussion of
2 pages
Spanish Lesson Plan Week 12
No ratings yet
Spanish Lesson Plan Week 12
1 page
Optimal Multi-Scale Patterns in Time Series Streams: Spiros Papadimitriou Philip S. Yu
No ratings yet
Optimal Multi-Scale Patterns in Time Series Streams: Spiros Papadimitriou Philip S. Yu
12 pages
Easy Way - Get France Schengen Visa in 7 Days
No ratings yet
Easy Way - Get France Schengen Visa in 7 Days
7 pages
Certificate of Visual Examination and Dpi: Description of Equipment Billy Pugh X-904-4 Personnel Transfer Device
No ratings yet
Certificate of Visual Examination and Dpi: Description of Equipment Billy Pugh X-904-4 Personnel Transfer Device
3 pages
Test Report Kalibrasi BBS V800
No ratings yet
Test Report Kalibrasi BBS V800
6 pages
Cybernetics and Systems Analysis Volume 34 Issue 3 1998 (Doi 10.1007/bf02666991) S. B. Kurilenko - Seasonal Adjustment of Some Economic Indicators
No ratings yet
Cybernetics and Systems Analysis Volume 34 Issue 3 1998 (Doi 10.1007/bf02666991) S. B. Kurilenko - Seasonal Adjustment of Some Economic Indicators
5 pages
1) Design A Large Building WITHOUT Expansion Joint
No ratings yet
1) Design A Large Building WITHOUT Expansion Joint
7 pages
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
32 pages
2015 FIGO CONSENSUS GUIDELINES ON INTRAPARTUM FETAL MONITORING - PPT Video Online Download
No ratings yet
2015 FIGO CONSENSUS GUIDELINES ON INTRAPARTUM FETAL MONITORING - PPT Video Online Download
1 page
Chapter 5 Exponential Smoothing Methods L 2015
No ratings yet
Chapter 5 Exponential Smoothing Methods L 2015
19 pages
Detailed WMS Plan
No ratings yet
Detailed WMS Plan
3 pages
Language Learning in Early Childhood - Lightbown & Spada (2006)
No ratings yet
Language Learning in Early Childhood - Lightbown & Spada (2006)
14 pages
Econm
No ratings yet
Econm
5 pages
Baker
No ratings yet
Baker
10 pages
Tensor Structures and Applications: Definitive Reference for Developers and Engineers
From Everand
Tensor Structures and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Explorations in Computational Physics
From Everand
Explorations in Computational Physics
Devang Patil
No ratings yet
Stochastic Calculus and Brownian Motion
From Everand
Stochastic Calculus and Brownian Motion
Tejas Thakur
No ratings yet
Foundations of Mathematical Physics
From Everand
Foundations of Mathematical Physics
Chirag Verma
No ratings yet
Analytical Methods of Optimization
From Everand
Analytical Methods of Optimization
D. F. Lawden
No ratings yet
An Introduction to Probability and Stochastic Processes
From Everand
An Introduction to Probability and Stochastic Processes
James L. Melsa
4.5/5 (2)
Measurement of Length - Screw Gauge (Physics) Question Bank
From Everand
Measurement of Length - Screw Gauge (Physics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Continuum Mechanics
From Everand
Continuum Mechanics
A. J. M. Spencer
3/5 (1)
Computer Methods in Power Systems Analysis with MATLAB
From Everand
Computer Methods in Power Systems Analysis with MATLAB
Sekhar Chandra P.
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Conditional Normalization in Time Series Analysis

Uploaded by

Conditional Normalization in Time Series Analysis

Uploaded by

Conditional normalization in time

Priyanga Dilini Talagala

Keywords: conditional normalization, missing value imputation, conditional autocorrelation,

conditional cross-correlation, lag time estimation, stream data, water quality

Common data normalization methods, such as min-max transformation or standardization (also

We also highlight two straightforward empirical applications of conditional normalization of

2 Conditional estimation via GAMs

2.1 Conditional normalization

Next, we fit the model

[yt − m̂(zt )]2 ∼ Gamma(v(zt ), r ),

2.2 Imputation of missing values

We can then impute yt using

2.3 Conditional autocorrelation function

rk (zt ) = E[y∗t y∗t−k | zt ] for k = 1, 2, . . .

y∗t y∗t−k ∼ N (rk (zt ), σk2 ),

where hi (.) are smooth functions and

2.4 Conditional cross-correlation function

xt − m̂ x (zt ) yt − m̂y (zt )

Then we can estimate the conditional cross-correlation

ck (zt ) = E[y∗t+k xt∗ | zt ] for k = 1, 2, . . .

using the GAMs

y∗t+k xt∗ ∼ N (ck (zt ), u2k ),

3 Application: Stream temperature imputation

3.1 Temperature data

2011 2012 2013 2014 2015 2016

3.2 The conditional normalization model

3.3 GAM models

µst = β 0 + β 1 slopes + β 2 elevs + β 3 cds + β 4 atst

+ β 5 sin1t + β 6 cos1t + β 7 sin2t + β 8 cos2t + β 9 sin3t + β 10 cos3t

+ β 11 sin4t + β 12 cos4t + β 13 sin5t + β 14 cos5t .

We model σt using a gamma distribution with common a and time-specific parameter bt ,

4 Application: Predicting lag time on river flow

4.1 Automated in-situ sensors

2011 2012 2013 2014 2015 2016

Figure 5: Posterior means of the regression coefficients.

4.2 Lag time

Posterior mean of the SD 3.5

0 100 200 300

Figure 6: Posterior means of the standard deviation (σ).

4.3 Estimating lag time via conditional cross-correlations

Suppose xt and yt , observed at times t = 1, . . . , T, denote the same water-quality variable

4.4 Computing bootstrapped confidence intervals for dt

Algorithm (Sieve bootstrap confidence intervals for dt )

Since the ε t,k are serially correlated, we fit a pth

4. Fit the following GAM to the bootstrapped data for each k:

5. Use the models in step 4 to compute dt for a given set of zi .

4.5 Study area and water-quality data

4.6 Conditional cross-correlation between turbidity at upstream and down-

ĉk (zt ) = η −1 φ̂o + ŝ1,k (level_upstream)t + ŝ2,k (temperature_upstream)t ), (8)

4.7 Lag time prediction

A Other results from the stream temperature application

Figure 13: Autocorrelation plot of the standardized y∗s=2,t .

B Data cleaning and preliminary analysis

B.1 Study area

B.2 Data cleaning

Flag Valid data Spike Flag Null Flag Step Flag

Flag valid data wiper anomaly

B.3 Preliminary analysis

B.4 Imputing missing values of turbidity upstream via conditional normal-

Figure 21: Visualization of the missing values in the variables.

Oct Nov Dec Jan

Figure 25: Conditionally normalized upstream and downstream turbidity.

Ogasawara, E, LC Martinez, D De Oliveira, G Zimbrão, GL Pappa & M Mattoso (2010). Adaptive

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.