SEE5211 Chapter10 2017 - P

Data Analysis in Envir Application
(SEE5211/SEE8212)
Dr. Wen Zhou

School of Energy and Environment
Outline
• The role of statistics and the data analysis process
• Numerical method of describing data
• Summarizing bivariate data
• Population distributions
• Sampling variability and Confidence interval
• Hypothesis Testing Using a Single Sample
• Comparing Two populations
• Regression Analysis
• Analysis of Variance
• Time Series analysis and Wavelet Analysis
Data Analysis in Envir Application
(SEE5211/SEE8212)
Time Series analysis

&
Wavelet Analysis (optional)
Chapter 10
Time Series analysis （AR, MA, ARMA)
• A time series is a sequence of numerical data in which each item is
associated with a particular instant in time.
• An analysis of a single sequence of data is called univariate time-series
analysis
Stationarity
• A common assumption in many time series techniques is that the data are
stationary. A stationary process has the property that the mean, variance and
autocorrelation structure do not change over time.
• Stationarity can be defined in precise mathematical terms, but for our purpose we
mean a flat looking series, without trend, constant variance over time, a constant
autocorrelation structure over time and no periodic fluctuations .
• Time series with constant location and scale.
periodical behavior
Linear fit
Original time series Residual

Radom Signal
• Stationary signals:
• Statistics don’t change with time
• Frequency contents don’t change with time
• Information doesn’t change with time
• Non-stationary signals:
• Statistics change with time
• Frequencies change with time
• Information quantity increases
• A weak stationary time series: 
-----The mean value function is constant u xt  E  xt   xf t ( x)dt



-----The autocovariance function depends on s and t only through their
difference (s-t)
rx s, t   cov( xs , xt )  E  xs   s ( xt  t )
Seasonality
• Many time series display seasonality. By seasonality, we mean periodic
fluctuations.
• For example, retail sales tend to peak for the Christmas season and then
decline after the holidays. So time series of retail sales will typically show
increasing sales from September through December and declining sales in
February and March.
If seasonality is present, it must be incorporated into the time series model.

winter
Summer
As plants begin to photosynthesize in the spring and

summer, they consume CO2 from the atmosphere and
eventually use it as a carbon source for growth and
reproduction. This causes the decrease in CO2 levels that
begins every year in May. Once winter arrives, plants save
energy by decreasing photosynthesis.
This plot shows periodic behavior. However, it is

difficult to determine the nature of the seasonality
from this plot.
The seasonal pattern is quite evident in the box plot.
Transformations to Achieve Stationarity
• We can difference the data. That is, given the series
Y(i) =Z(i)-Z(i-1)
The differenced data will contain one less point than the original data. Although you
can difference the data more than once, one difference is usually sufficient.
• If the data contain a trend, we can fit some type of curve to the data and then model
the residuals from that fit. Since the purpose of the fit is to simply remove long
term trend, a simple fit, such as a straight line, is typically used.
• For non-constant variance, taking the logarithm or square root of the series may
stabilize the variance. For negative data, you can add a suitable constant to make
all the data positive before applying the transformation. This constant can then be
subtracted from the model to obtain predicted (i.e., the fitted) values and forecasts
for future points.
Autocorrelation
• Purpose: Check Randomness
• Autocorrelation plots are a commonly-used tool for checking randomness in a data
set. This randomness is ascertained by computing autocorrelations for data values at
varying time lags. If random, such autocorrelations should be near zero for any and
all time-lag separations. If non-random, then one or more of the autocorrelations
will be significantly non-zero..
Autocorrelations should be near-zero for

randomness. Such is not the case in this
example and thus the randomness
assumption fails
This sample autocorrelation plot shows

that the time series is not random, but
rather has a high degree of autocorrelation
between adjacent and near-adjacent
observations.
testing random number generators

Autocorrelation Function (ACF)
• The autocorrelation for the kth lag is computed as follows:
For example, the likelihood of tomorrow being rainy is greater if today is rainy than if
today is dry. Geophysical time series are frequently auto correlated because of inertia or
carryover processes in the physical system.
Partial Autocorrelation Function (PACF)
• The ACF between X(t) and x(t-k): Co var iance( xt , xt  k ）

Variance( xt )
• For a time series, the partial autocorrelation between X (t) and X(t-k) is
defined as the conditional correlation between X(t) and X(t-k)
conditional on X (t-(k-1)),…X(t-1), the set of observations that come
between the time points t and t-k
• The 1st order partial autocorrelation will be defined to equal the 1st order
autocorrelation
• The 2nd order (lag) partial autocorrelation is
Co var iance( xt , xt  2 xt 1 )
Variance( xt xt 1 )Variance( xt  2 xt 1 )
ACF and PACF (Partial Autocorrelation)
• ACF: Auto correlation refers to the correlation of a time series with its
own past and future values. Positive ACF might be considered a specific
form of persistence, a tendency for a system to remain in the same state
from one observation to the next.
• PACF:takes into consideration the correlation between a time series and
each of its intermediate lagged values.
where denotes the projection of x onto the space spanned by

Autoregressive Models
An autoregressive model is when a value from a time series is regressed on previous

values from that same time series. for example, yt on yt−1 :
The first-order autoregression, AR(1):
Yt   0  1 yt 1  et
The second-order autoregression, AR(2):
Yt   0  1 yt 1   2 yt 2  et
More generally, a kth-order autoregression, AR (K)

Yt   0  1 yt 1   2 yt  2  ...   k yt  k  et
The PACF is most useful for identifying the order of autoregression model. The
partial correlations that are significantly different from 0 indicate lagged terms of y
that are useful predictors of Yt.
Moving Average Models (MA models)
• The 1st order moving average model MA(1)
xt    t  1t 1
• The 2nd order moving average model MA(2)
xt    t  1t 1   2t  2
• A moving average term in a time series model is a past error

(multiplied by a coefficient)
For MA (1) model xt    t  1t 1
• Mean is E ( xt )  
• Variance Var ( xt )   w (1  1 )
2 2
• Autocorrelation Function ACF , k-=1 1

1 
1  1
2
All other autocorrelations are 0 (K>=2)
eg. MA(1): xt  10  t  0.7t 1

1 0.7
ACF(k=1) 1    0.4698
1  1 1  0.7
2 2
For MA (2) model xt    t  1t 1   2t  2
• Mean is E ( xt )  
• Variance
Var ( xt )   w
2
(1  1
2
  2
2 )
• Autocorrelation Function ACF , k-=1, 2

1  1 2 2
1  ,  
1  1   2 1  1   2
2 2 2 2 2
All other autocorrelations are 0 (K>=3)
eg. MA(2): xt  10  t  0.5t 1  0.3t  2
ACF(k=1,2)
1  1 2 0.5  0.5 * 0.3 2 0.3

1    0.485,     0.224
1  1   2 1  0.52  0.32 1  1   2 1  0.52  0.32
2 2 2 2 2
The autocorrelation plot can provide answers to the following questions
• Are the data random?

• Is an observation related to an adjacent observation?
• Is an observation related to an observation twice-removed? (etc.)
• Is the observed time series white noise?
• Is the observed time series sinusoidal?
• Is the observed time series autoregressive?
• What is an appropriate model for the observed time series?
• Is the model
Y = constant + error
valid and sufficient?
1.There are no significant autocorrelations.

2.The data are random.
• 1. Most standard statistical tests depend on randomness. The validity of the test conclusions
is directly linked to the validity of the randomness assumption.
• 2. Many commonly-used statistical formulae depend on the randomness assumption, the
most common formula being the formula for determining the standard deviation of the
sample mean.
• 3. For univariate data, the default model is
Y = constant + error
If the data are not random, this model is incorrect and invalid, and the estimates for the
parameters (such as the constant) become nonsensical and invalid.
If the analyst does not check for randomness, then the validity of many of the statistical
conclusions becomes suspect. The autocorrelation plot is an excellent way of checking for
such randomness.
Moderate positive autocorrelation
The plot starts with a moderately high autocorrelation at lag 1 (approximately 0.75) that
gradually decreases. The decreasing autocorrelation is generally linear, but with significant
noise. Such a pattern is the autocorrelation plot signature of "moderate autocorrelation", which
in turn provides moderate predictability if modeled properly.
Autocorrelation and Partial autocorrelation
The autocorrelation graph describes the correlation between all the pairs of points in the time series
for a given separation in time (lag). Autocorrelation and partial autocorrelation graphs can help you
determine whether the time series is stationary (meaning it has a fixed mean and standard deviation
over time) and what model might be appropriate to fit the time series.
Partial autocorrelation plots are a commonly used tool for model identification in
Box-Jenkins models.
( xt   )  1 ( xt 1   )  ... p ( xt  p   )  t  1t 1...   qt  q
• The partial autocorrelation at lag p is the autocorrelation between Xt and Xt−p that
is not accounted for by lags 1 through p−1.
• Specifically, partial autocorrelations are useful in identifying the order of an
autoregressive model. The partial autocorrelation of an AR(p) process is zero at lag
p+1 and greater. If the sample autocorrelation plot indicates that an AR model may
be appropriate, then the sample partial autocorrelation plot is examined to help
identify the order. We look for the point on the plot where the partial autocorrelations
essentially become zero.
• The approximate 95 % confidence interval for the partial autocorrelations are at
±2/√N
Partial Autocorrelation Plot
The partial autocorrelation plot can help provide

answers to the following questions:
1.Is an AR model appropriate for the data?
2.If an AR model is appropriate, what order
should we use?
Yt   0  1 yt 1  et AR(1)
Yt   0  1 yt 1   2 yt 2  et AR(2)
Yt   0  1 yt 1   2 yt  2   3 yt 3  et AR(3)
This partial autocorrelation plot shows clear statistical significance for lags 1 and 2 (lag 0 is
always 1). The next few lags are at the borderline of statistical significance. If the
autocorrelation plot indicates that an AR model is appropriate, we could start our modeling
with an AR(2) model. We might compare this with an AR(3) model.
For ARMA models(Box-Jenkins model)
• The elements in the model (AR order, differencing, MA order)

A first difference account for a linear trend :
Z t  ( xt  xt 1 )
A second difference account for a quadratic trend:
Z t  ( xt  xt 1 )  ( xt 1  xt  2 )
Autoregressive moving models (p: autoregressive order, q: moving average

order)
ARMA(1,1) (x  )   (x  )    
t 1 t 1 t 1 t 1
ARMA(p, q)
( xt   )  1 ( xt 1   )  ... p ( xt  p   )  t  1t 1...   qt  q
Box-Jenkins Analysis of Seasonal Data
Non-constant variance can be we remove trend in the series
removed by performing a by taking first differences.
natural log transformation.
To identify an appropriate
model, we plot the ACF of
the time series.
If very large autocorrelations are observed at lags spaced n

periods apart, for example at lags 12 and 24, 36, then there is
evidence of periodicity. That effect should be removed, since
the objective of the identification stage is to reduce the
autocorrelation throughout.
Difference data
The autocorrelation plot has a 95% confidence band, which is constructed based on the assumption
that the process is a moving average process. The autocorrelation plot shows that the sample
autocorrelations are very strong and positive and decay very slowly. The autocorrelation plot
indicates that the process is non-stationary and suggests an ARMA model. The next step is to
difference the data.
The autocorrelation plot of the differenced The partial autocorrelation plot of the
data with a 95% confidence band shows that differenced data with 95% confidence bands
only the autocorrelation at lag 1 is shows that only the partial autocorrelations of
significant. The autocorrelation plot together the first and second lag are significant. This
with run sequence of the differenced data suggests an AR(2) model for the differenced
suggest that the differenced data are data.
stationary. Based on the autocorrelation plot,
an MA(1) model is suggested for the
differenced data.
Ljung-Box Q test
• Used to test whether or not observations over time are random and
independent
----Ho: the autocorrelations up to lag K are all 0
----Ha: the autocorrelations of one or more lags differ from 0.
k rj2
Qk  n(n  2)
j 1 n j
2
Which is approximately x k
distribution
2
x k
If k=1, Q1 =9.08, p value =0.0026 < a =0.05
Reject H0, accept Ha, there is strong evidence that
lag-1 autocorrelation is non-zero.
Augmented Dickey-Fuller (ADF) Test
Test for a unit root, random walk r
Test for a unit root with drift
Test for a unit root with drift and deterministic time trend
• t is the time index,

• α is an intercept constant called a drift,
• β is the coefficient on a time trend,
• γ is the coefficient presenting process root, i.e. the focus of testing,
• p is the lag order of the first-differences autoregressive process,
• et is an independent identically distributes residual term.
The augmented Dickey-Fuller test is a test that determines whether you can
conclude from a time series that it is stationary.
ADF test
• The difference between the three equations concerns the presence of the deterministic
elements α (a drift term) and βt (a linear time trend).
• The focus of testing is whether the coefficient γ equals to zero, what means that the original
process has a unit root;
• hence, the null hypothesis of γ = 0 (random walk process) is tested against the alternative
hypothesis γ < 0 of stationarity.
The ADF test ensures that the null hypothesis is accepted unless there is strong evidence
against it to reject in favor of the alternate stationarity hypothesis.
Time series-autocorrelation
• Open CO2.jmp
• Select Analyze > specialized modelling>time series
• Click OutputCO2 to Y, time series
• Red triangle– Spectral density
• Red Triangle-Autocorrelation, Partial autocorrelation
• Red triangle-decomposition-remove linear trend
•White noise is a signal(or process), named by analogy to white light, with a

flat frequency spectrum when plotted as a linear function of frequency .
•"red noise" may refer to any system where power density decreases with
increasing frequency.
Gaussian Process
. open 2D Gaussian Process Example.jmp.

. Select Analyze > Specialized Modeling > Gaussian Process.
. Select X1 and X2 and click X.
. Select Y and click Y
. Select Correlation Type > Cubic
. Click OK.
. Click the red triangle next to Gaussian Process Model of Y and select Save Prediction
Formula.
. Select Graph > Surface Plot.
. Select X1 through Y Prediction Formula and click Columns. Click OK.
. In the Surface column,
select Both sides for the Y Prediction Formula.
The two surfaces are similar. The impact of X1 and X2 on the response Y can be visualized.
You can rotate the plot to view it from different angles. Marginal plots are another tool to
use to understand the impact of the factors on the response.
Temporal Variation of Visibility in Hong Kong
 Clear seasonal variation with

better (poor) visibility being
observed in summer (winter)
 Dominant peaks on 3–10 days &
20–30 days (summer) and 20–40
days (winter)
Power spectrum of daily reduced visibility hours in Hong Kong

Interannual to interdecadal variation of Summer
rainfall in South China (SCMR)
(a) The normalized local wavelet power spectrum of SCMR (1910–2001) using the
real-valued Mexican hat wavelet (derivate of a Gaussian; DOG m = 2). The thick
curve on either end indicates the edge effects.
(b) The normalized reconstructed time series of SCMR at period of 8yr, 16–32 yr,
32-64yr.
Wavelet Analysis
• Wavelets are functions that satisfy certain mathematical requirements and are used to
represent data or other functions
• Idea is not new--- Joseph Fourier--- 1800's
• Wavelet-- the scale we use to see data plays an important role
• Fourier transform FT non local -- very poor job on sharp spikes
• Wavelet analysis is becoming a common tool for analyzing localized variations of
power within a time series.
• By decomposing a time series into time–frequency space, one is able to determine
both the dominant modes of variability and how those modes vary in time.
• The wavelet transform has been used for numerous studies in geophysics
Sine wave
Wavelet db10
Fourier Analysis
Frequency analysis
Idea: Transforms time-based signals to frequency-based signals.
Drawback:
1.Location information is stored in phases and difficult to extract.
2. The Fourier transform is very sensitive to changes in the function.
Fourier Analysis
Time domain to Frequency domain

• all matter is actually waves or a waveform-type phenomenon.
• The Fourier Transform consists of a different method of viewing the
universe (that is, a transformation from the time domain f(t) to the
frequency domain F(v) ).
• All waves can be viewed equally-accurately in the time or frequency
domain through Fourier Transform.



2ivt
f (t )  F ( v )e dv
 f (t )e
 2ivt
F (v )  dt


Inverse Fourier transform

Fourier transform
F (v) Frequency
e 2ivt  cos(2vt )  i sin(2vt ) 
v [ HZ ]
2
Fourier Series
the sum of simple sinusoids of different frequencies.
Any periodic function can be decomposed to a sum of sine and cosine

waves, i.e.: any periodic function f(x) can be represented by

f ( x )  a0   (a
k 1
k cos kx  bk sin kx )
where the coefficients are calculated by

2
1
a0 
2 
0
f ( x ) dx
2
1
ak 
 
0
f ( x ) cos( kx ) dx
2
1
bk 
 0
f ( x ) sin( kx ) dx
Energy of a function f ( x)
2
1
energy  
2
f ( x) dx
2 0
Any periodic function can be decomposed to a sum of sine and cosine waves, i.e.:
any periodic function f(x) can be represented by
Basis functions: sines and cosines

Draw back: transforming to the frequency domain, time information is lost. We don’t
know when an event happened.
Windowed Fourier Transform
Discrete Fourier Transform: Estimate the Fourier Transform

of function from a finite number of its sample points.
Windowed Fourier Transform: Represents non periodic signals.
. Truncates sines and cosines to fit a window of particular width.
. Cuts the signal into sections and each section is analysed separately.
STFTX  t , f    xt   * t  t  e  j 2 ft dt

t
A function of time and

t  : the window function frequency
Wavelets Transform
• Wavelet transform is capable of providing the time and frequency information
simultaneously, Suppose we have a signal which has frequencies up to 1000 Hz. In the
first stage we split up the signal in to two parts by passing the signal from a highpass and
a lowpass filter which results in two different versions of the same signal: portion of the
signal corresponding to 0-500 Hz (low pass portion), and 500-1000 Hz (high pass
portion).
• Space and frequency analysis (scale and time)
• A windowing technique with variable-sized regions.
----Long time intervals where more precise low frequency information is needed.
----Shorter regions where high-frequency information is of interest. F (v) Frequency

v [ HZ ]
2
Definition: A wavelet is a waveform of effectively limited duration that has an
average value of zero.
Scale aspect: The signal presents a very quick local variation.
Time aspect:
Rupture and edges detection.
Study of short-time phenomena as transient processes.
The decibel (db) originates from methods used to quantify signal loss in telegraph and
telephone circuits. Practically, this means that, armed only with the knowledge that
1 dB is approximately 26% power gain, 3 dB is approximately 2× power gain, and
10 dB is 10× power gain, it is possible to determine the power ratio of a system from
the gain in dB with only simple addition and multiplication.
Change in dB Change in factor

3 dB increase ≡ Sound energy doubled: factor √2
3 dB decrease ≡ Sound energy halved: factor √0.5
6 dB increase ≡ Sound pressure doubled: factor 2
6 dB decrease ≡ Sound energy halved: factor 0.5
10 dB increase ≡ Loudness perception doubled: factor of 10
10 dB decrease ≡ Loudness perception halved: factor of 0.1
From: http://www.bv-elbtal.de/html/was_ist_larm_.html
Vanishing Moments: if the average value of xkψ (x) is zero
(where ψ (x) is the wavelet function), for k = 0, 1, …, n then
the wavelet has n + 1 vanishing moments and polynomials of
degree n are suppressed by this wavelet.
There are infinite sets of Wavelets Transforms.
Different wavelet families: Different families provide different relationships between
how compact the basis function are localized in space and how smooth they are.
This decomposition can be done with a Fourier transform (or
Fourier series for periodic waveforms)
• Remove noise from time series

• Detect Long- Term Evolution
• Identify Pure Frequencies
• Suppress signals
Any waveform can be the sum of simple sinusoids of different frequencies.

The Continuous Wavelet Transform (CWT)
Continuous Wavelet Transform
• Definition:
1 * t   
CWT , s    , s  



xt     dt
 s 
x x
s
Translation
(The location of Scale
the window)
Mother Wavelet
Scale
Low scale => Compressed wavelet => Rapidly changing details => High frequency.
High scale => Stretched wavelet => Slowly changing, coarse features => Low frequency.
Position
Steps to a Continuous Wavelet Transform
1.Take a wavelet and compare it to a section at the start of the original signal.
2. Calculate C, i.e., how closely correlated the wavelet is with this section of the signal.
3. Shift the wavelet to the right and repeat steps 1 and 2 until you’ve covered the whole
signal.
4. Scale (stretch) the wavelet and repeat steps 1 through 3.
5. Repeat steps 1 through 4 for all scales.

Continuous Wavelet Transform
1 * t   
CWTx , s   x , s    x t     dt
s  s 
  X T    ,s t dt
1 t  
 ,s t    
s  s 
• CWT is the inner product of the signal and the basis function
 ,s t 
Wavelet basis functions
-1
j0 2 2
Morlet (0  frequency) :  e 4
e
2 m i m m!  m1
Paul m  order  : DOG 1  i
2m !
DOG m  devivative :
- 1m1 d m 2
e  2

 1  d m
 m  
 2
2nd derivative of a Gaussian
is the Marr or Mexican hat wavelet
Wavelet basis functions
Frequency domain
Time domain
Example: Run wavetest.m
load 'sst_nino3.dat' % input SST time series

sst = sst_nino3;
%------------------------------------------------------ Computation
% normalize by standard deviation (not necessary, but makes it easier

% to compare with plot on Interactive Wavelet page, at
% "http://paos.colorado.edu/research/wavelets/plot/"
variance = std(sst)^2;
dj = 0.25; % this will do 4 sub-octaves per octave

s0 = 2*dt; % this says start at a scale of 6 months
j1 = 7/dj; % this says do 7 powers-of-two with dj sub-octaves
each
lag1 = 0.72; % lag-1 autocorrelation for red noise background
mother = 'MORLET';
avg = find((scale >= 2) & (scale < 8));

Cdelta = 0.776; % this is for the MORLET wavelet
scale_avg = (scale')*(ones(1,n)); % expand scale --> (J+1)x(N) array
scale_avg = power ./ scale_avg; % [Eqn(24)]
scale_avg = variance*dj*dt/Cdelta*sum(scale_avg(avg,:)); % [Eqn(24)]
scaleavg_signif = wave_signif(variance,dt,scale,2,lag1,-1,[2,7.9],mother);
a) NINO3 Sea Surface Temperature (seasonal)
4
NINO3 SST (degC) 2
-2
-4
1880 1900 1920 1940 1960 1980 2000
Time (year)
b) NINO3 SST Wavelet Power Spectrum c) Global Wavelet Spectrum
2
Period (years)
16
32
64
1880 1900 1920 1940 1960 1980 2000 0 0.5 1 1.5 2 2.5 3
Time (year) Power (degC2)
d) 2-8 yr Scale-average Time Series

0.8
Avg variance (degC2)
0.6
0.4
0.2
0
1880 1900 1920 1940 1960 1980 2000
Time (year)
Project Aims (due time: week12)
• to develop and demonstrate students’ creativity and ability to carry out industrially-
related or research-type project work;
• to demonstrate application of mathematics, science, engineering, economics and
policy knowledge in practical situations to arrive at innovative solution; and
• to develop problem-solving skills, demonstrate teamwork, build self-confidence and
ability to make good oral presentations and report writing.
Group Project --- 20% (group presentation 10% and term paper 10%
(Individual Participation 2% ) students will first be divided into 10-12
small groups (4-6 students form a group) .
Each small group will conduct a forum on a topic of your choice. Your
group will select one type of datasets (such as Air pollutant concentration,
weather data, Power data, or others). Group members will work together
to prepare a 15-minute presentation and a term paper (1500 words + 4
figures） about data analysis , each project should first introduce the
environmental datasets or historical events and discuss the types of
datasets, especially focus on collecting, analyzing, and drawing
conclusions from data.
Group report (10%)
Group Presentation (10 mins) (10%)
Group Report submission

Information (4.5 marks) - Word limit: >1,500 words
- Lines spacing: 1.5
Clarity (2.5 marks) - Font size: 12
- Font type: Times New Roman
Interesting (1 mark)
Late submission
- Penalties for late submission
Time (1 mark)
Scientific/Engineering Work:
Cooperation (1 mark) - Significance & Methodology
- Literature Review
- Work Attitude & Initiative
Other comments
- Achievement
Groups 1-7
Group1(6): CHUI Ka Wing, Kai On Ki, LAU Sze Ho, NG Tsz Hin, Shimul ROY,
WANG Wenhao
Group2(4): CHEUNG Kong Hin, HO Tsz Him, TAN Zicong, Yeung Yat
Group3(5): JI Yiqiu, JIANG Zhi, KANG Bowen, LIU Liming, YUAN Shuai
Group4(6): CHEN Naiwen, LAI Jueqi, LI Xiaoyu, WANG Ying, XU Yang Zhong,
SHENG Weili
Group5(3): CHEUNG Ka Wing, HE Wenlin, JIN Mushan
Group6(6): LI Boqiao, LIANG Jingchao, MA Feifan, TONG Jing, ZHOU Zhikun,
ZONG Huixin
Group7(5): FU Limin, LUO Jingyi, SHEN Juanwen, XIE Qiaojie, ZHAO Weiwei
Groups 8-13
Group8(6): CAI Xiaohuan, CHU Yunqian, QI Naiye, TIAN Cong, TU Mingze,
YANG Hongrui
Group9(5): Cheng Wyman Paul, Jonas Frederic Kruse, SUEN Siu Tung, WONG
Wai Ling, Raymond Leung
Group10(6): CHAN Man Kai, CHEN Xingyu, HO Hiu Tung, NG Man Chi,
SZETO Shing Ka, TANG Jiawei
Group11(5): DENG Wei, JIANG Hanyu, RUAN Xuexuan, XIE Xian, MENG Shi
Grou12(5): CHOI Wing Hang, KONG Tat Fun Percy, LI Ming Ho, NGAN Tsz
Kwan, WONG Kwok Sui
Group13(6): LAI Hin Wang, FAN Yat Nam, HUNG Chau Ping, KAM Wing Nga,
CHENG Ka Tsun, CHOW Ka Chun

SEE5211 Chapter10 2017 - P

Uploaded by

Copyright:

Available Formats

SEE5211 Chapter10 2017 - P

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SEE5211 Chapter10 2017 - P

Uploaded by

Copyright:

Available Formats

Data Analysis in Envir Application

Dr. Wen Zhou

Time Series analysis

Original time series Residual

• A weak stationary time series: 

-----The mean value function is constant u xt  E  xt   xf t ( x)dt

If seasonality is present, it must be incorporated into the time series model.

As plants begin to photosynthesize in the spring and

This plot shows periodic behavior. However, it is

Autocorrelations should be near-zero for

This sample autocorrelation plot shows

testing random number generators

• The autocorrelation for the kth lag is computed as follows:

• The ACF between X(t) and x(t-k): Co var iance( xt , xt  k ）

where denotes the projection of x onto the space spanned by

An autoregressive model is when a value from a time series is regressed on previous

More generally, a kth-order autoregression, AR (K)

• The 1st order moving average model MA(1)

• The 2nd order moving average model MA(2)

• A moving average term in a time series model is a past error

• Autocorrelation Function ACF , k-=1 1

All other autocorrelations are 0 (K>=2)

eg. MA(1): xt  10  t  0.7t 1

• Autocorrelation Function ACF , k-=1, 2

All other autocorrelations are 0 (K>=3)

eg. MA(2): xt  10  t  0.5t 1  0.3t  2

1  1 2 0.5  0.5 * 0.3 2 0.3

• Are the data random?

1.There are no significant autocorrelations.

( xt   )  1 ( xt 1   )  ... p ( xt  p   )  t  1t 1...   qt  q

The partial autocorrelation plot can help provide

• The elements in the model (AR order, differencing, MA order)

Autoregressive moving models (p: autoregressive order, q: moving average

If very large autocorrelations are observed at lags spaced n

Test for a unit root, random walk r

Test for a unit root with drift

• t is the time index,

•White noise is a signal(or process), named by analogy to white light, with a

. open 2D Gaussian Process Example.jmp.

 Clear seasonal variation with

Power spectrum of daily reduced visibility hours in Hong Kong

Time domain to Frequency domain

Inverse Fourier transform

Any periodic function can be decomposed to a sum of sine and cosine

where the coefficients are calculated by

Basis functions: sines and cosines

Discrete Fourier Transform: Estimate the Fourier Transform

STFTX  t , f    xt   * t  t  e  j 2 ft dt

A function of time and

Change in dB Change in factor

• Remove noise from time series

Any waveform can be the sum of simple sinusoids of different frequencies.

4. Scale (stretch) the wavelet and repeat steps 1 through 3.

5. Repeat steps 1 through 4 for all scales.

load 'sst_nino3.dat' % input SST time series

% normalize by standard deviation (not necessary, but makes it easier

dj = 0.25; % this will do 4 sub-octaves per octave

avg = find((scale >= 2) & (scale < 8));

NINO3 SST (degC) 2