Exercise 6: Time Series Analysis and Stochastic Modelling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

9/26/2020 Jupyter Notebook Viewer

Course-work-and-data-analysis (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master)
/ Hydrology-Course (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master/Hydrology-Course)
/
GEO4310_2015 (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master/Hydrology-Course/GEO4310_2015)
/
EX6 (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master/Hydrology-Course/GEO4310_2015/EX6)

Exercise 6: Time series analysis and stochastic


modelling
Name: Bikas Chandra Bhattarai

Date: 2015/10/12

Question 1: Time series analysis

The annual mean runoff data in a station are given in the file Qyearlymean.txt.

1. Investigate if there is a trend using linear regression method for α =10%.


2. If the trend is significant, remove the trend from the data series.
3. Calculate and plot the autocorrelation ρ k and 95% confidence limits for time shifts, k, up to 5 years. Check up
to which time lag is the ρ k significant.

Test for significance of b

Xi = a + bTi

Ho : b = 0; Ha : b ≠ 0

b−o
Test statistic t =
sb

Ho is rejected if |t| ≥ t 1−α/2,n−2 i.e. b is significantally different from zero.

Where
−−−−−−−−− n 2
2 ∑ (χi −χ
^i )
S
sb = √ and S 2 =
i=1
     (1)
n ¯ 2 n−2
∑ (T t −T )
t=1

The equation for autocorrelation coefficient, ρ k is given as


1 n−k ¯ ¯
C ov(X t ,X t+k ) ∑ (X t −X )(X t+k −X )
t=1

(2)
n−1
ρk = =     
1 n ¯ 2
V ar(X t ) ∑ (X t −X )
n−1 t=1

Test the significance of ρ k


https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 1/18
9/26/2020 Jupyter Notebook Viewer

(−1−z1−α/2 √n−2)
l =
n−1

and

(−1+z1−α/2 √n−2)
u =
n−1

Ho : ρ k = 0 ; Ha : ρ k ≠ 0

If the calculated ρ k falls outside these confidence limits, then the hypothesis is rejected, i.e., ρ k is significant for
significant level α and time lag k.

In [2]:

%matplotlib inline
import matplotlib.pyplot as plt
import scipy.stats
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
from scipy.stats.stats import pearsonr
from pandas import Series
import statsmodels.tsa.stattools

In [3]:

q=pd.read_table('Qyearlymean.txt')
q.columns = ([u'YEAR', u'T', u'discharge'])
discharge = q['discharge']
print q.head(3)
print len(q)

YEAR T discharge
0 1940 1 398.0
1 1941 2 429.1
2 1942 3 690.2
51

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 2/18
9/26/2020 Jupyter Notebook Viewer

In [4]:

q= pd.read_table('Qyearlymean.txt')
# create a fitted model between discharge as dependent variable and time (T) as indip
fd = smf.ols(formula='discharge ~ T', data= q).fit()
#print summary statistics
print(fd.summary())

OLS Regression Results


==========================================================================
Dep. Variable: discharge R-squared: 0
Model: OLS Adj. R-squared: 0
Method: Least Squares F-statistic: 2
Date: Tue, 13 Oct 2015 Prob (F-statistic): 0
Time: 09:41:14 Log-Likelihood: -3
No. Observations: 51 AIC: 6
Df Residuals: 49 BIC: 6
Df Model: 1
Covariance Type: nonrobust
==========================================================================
coef std err t P>|t| [95.0% Conf.
-------------------------------------------------------------------------
Intercept 491.2333 36.817 13.343 0.000 417.247 565
T 2.1136 1.232 1.715 0.093 -0.363 4
==========================================================================
Omnibus: 2.044 Durbin-Watson: 0
Prob(Omnibus): 0.360 Jarque-Bera (JB):
Skew: 0.348 Prob(JB): 0
Kurtosis: 2.390 Cond. No.
==========================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is co

So the regression model becomes:

Xt = a + bTt = 491.23 + 2.11 ∗ Tt

Now formulating the test hypothesis for the coefficients to test wheather the coefficients are significantly different or
not and the test hypothesis can be formulated as given below:

H0 : b = 0

Ha : b ≠ 0

If the |t| > t critical , then H0 is rejected

From above table |t| = 1.715

In [5]:

# Obtaining the value of 't' from t-table using the python function at alpha = 10, de
round(scipy.stats.t.ppf(0.90,len(q)-2),3)

Out[5]:

1.299

From the t-table calculation tcritical = t 1− α ;n − 2 = 1.299 (for α = 10%)


2

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 3/18
9/26/2020 Jupyter Notebook Viewer

Conclusion: From the analysis, |t| value is greater than tcritical so H0 is rejected and concluded that, there is a
trend.

2. If the trend is significant, remove the trend from the data series.

To remove the trend, the difference between observed values and estimated values, the error, is calculated. The
error is always without trend. To get the time series of discharge without trend, interception and each error value is
added.

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 4/18
9/26/2020 Jupyter Notebook Viewer

In [6]:

# Defining x-axis
year = q['T']
#creating plot
fig, ax = plt.subplots(figsize=(15,7))

#creating linear fit


fit = np.polyfit(year, discharge, deg=1)

#plotting trend line to the figure


ax.plot(year, fit[0] * year + fit[1], color='red')
ax.plot(year, discharge,color='red', label = 'with trend')
plt.legend()
#setting the font size for x and y-axis
plt.rc('xtick', labelsize=16)
plt.rc('ytick', labelsize=16)

#priting the linear equation of fit


print 'y = %.2fx + %.3f'%(fit[0],fit[1])

#plot for without trend


#creating x-axis for the data with out trend
x1 = year
#date = range(1940, 1991,1)
#removing the trend from data ( compendium 139, 2013)
y1 = 2.114*x1
without_trend = discharge-y1
ax.plot(x1,without_trend,label='without trend')
plt.legend()

#creating linear fit for non trended data series


fit = np.polyfit(x1, without_trend, deg=1)
ax.plot(x1, fit[0] * x1 + fit[1], color='blue')
ax.set_xlim([0,52])
# display the equation for data without trend
print 'y_without_trend = %.2fx + %.3f'%(fit[0],fit[1])

plt.ylabel('Discharge ($m^3/s$)', fontsize =18)


plt.xlabel('Year',fontsize=18)
plt.show()

y = 2.11x + 491.233
y_without_trend = -0.00x + 491.233

3) Calculate and plot the autocorrelation ρ k and 95% confidence limits for time shifts, k, up to 5 years.

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 5/18
9/26/2020 Jupyter Notebook Viewer

Check up to which time lag is the ρ k significant.

Solution:The equation for autocorrelation coefficient, ρ k is given as


1 n−k ¯ ¯
C ov(X t ,X t+k ) ∑ (X t −X )(X t+k −X )
t=1

Which is given by the python function :


n−1
ρk = =     
1 n ¯ 2
V ar(X t ) ∑ (X t −X )
n−1 t=1

statsmodels.tsa.stattools.acf(Series, nlags= Timelag) is used in this calculation. Conficence intervals are


calculated by using the formula given below:

(−1−z1−α/2 √n−2)
l =
n−1

and

(−1+z1−α/2 √n−2)
u =
n−1

Ho : ρ k = 0 ; Ha : ρ k ≠ 0

If the calculated ρ k falls outside these confidence limits, then the hypothesis is rejected, i.e., ρ k is significant for
significant level α and time lag k.

In [7]:

# generating series of total number of data which is equivalent to n in the confidenc


v = pd.Series(range(51 ,45, -1))
# Defining the time lag as index in the dataframe
index = range(0 ,6)
# creating dataframe
auto = pd.DataFrame( index = index)
# assigning the title to index as time lag
auto.columns.name = 'Time lag'
#calculating autocorrelation by using the function from pytohn where nlags is the all
auto['autocorr (pk)'] = Series(statsmodels.tsa.stattools.acf(without_trend, nlags=5))
# calculating lower confidence interval where 1.96 is the value from normal distribut
auto['CI-lower'] = (-1-1.96*np.sqrt(v-2))/(v-1)
# calculating upper confidence interval
auto['CI-upper'] = (-1+1.96*np.sqrt(v-2))/(v-1)
# displaying the data
auto

Out[7]:

Time lag autocorr (pk) CI-lower CI-upper

0 1.000000 -0.294400 0.254400

1 0.527878 -0.297536 0.256720

2 0.453294 -0.300773 0.259106

3 0.386475 -0.304114 0.261561

4 0.194384 -0.307567 0.264089

5 0.159846 -0.311137 0.266693

Conclustion: From above calculation H0 : pk = 0 is rejected up to time lag of 3 years. This means the
autocorrelation coefficient is significant for α =5% up to 3 years of time lag. The autocorrelation for time lag up to 5
years together with limits for confidence interval with 5% leel of significance is given below:

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 6/18
9/26/2020 Jupyter Notebook Viewer

In [8]:

fig, ax = plt.subplots(figsize=(10,5))
ax.plot(auto, marker='o')
plt.grid()
plt.legend(['Auto correlation', 'Lower CI' , 'Uppper CI'])
plt.ylabel('rL', fontsize =18)
plt.xlabel('Time lag (Years)',fontsize=18)
plt.show()

In python there is a inbuild function instide statsmodels.graphics.tsaplots called plot_acf to calculate and plot the
auto-correlation with 95% significance level. The plot given below is plotted by using the plot_acf function where
area under the shaded region is the confidence intervals.

In [9]:

from statsmodels.graphics.tsaplots import plot_acf


plot_acf(without_trend)
plt.show()

Question 2: Stochastic models for stationary time series

Fit the first order Markov model for annual data (after the removal of trend if there is any)

Xt+1 = μ + β(1)(Xt − μ) + ϵ t+1      (3)


https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 7/18
9/26/2020 Jupyter Notebook Viewer

Parameter estimation:

^
β 1 = ρ1 where ρ 1 is calculated using equation (2).

n 2
2 ∑ (χ −χ̄ )
2 ^
2
^ = σ x (1 − β )
σ ϵ 1
where σ x2 =
i=1 i i

n−1

and

1 n
^ = x̄ =
μ ∑ xi
n i=1

equation (3) becomes


−−−−−
2
^ ^
xt = μ
^
x
+ β (x t−1 − μ
1
^
x
+ z t σ x )√1 − β
1
     (4)

Procedure of using equation (4):

1. Estimate μx , σ x , and β1 and substitute them in (4)


2. Generate random z t series which is from a N(0,1) distribution. N_yrs = 1991-1960+1
random.standard_normal((N_yrs,))
3. Use the 1959’s value as initial value and simulate for the period 1960-1991
4. Compare the simulated time series with the observed time series. Plot two time series in the same graph,
calculate and compare the statistics of the two series, i.e. mean, standard deviation, ρ 1 and ρ 2 , etc.,
5. Use AR(1) model to simulate 100 samples, each having 33 years, calculate the mean values of the statistics
(average, standard deviation, r1) from the 100 samples and compare them with the statistics calculated from
observed 33 years data series

Note that in step (4), one simulation sample is compared with the observed data, while in step (5) the mean values
of 100 samples are compared with the observed values.

Solution:

In [10]:

#Calculating the different statistics for the data:


#Defining the dataframe
frame= pd.DataFrame()
#counting the number of data
n = len(without_trend)
#calculating mean
mean = 491.22
frame['no_obs'] = Series(len(without_trend))
frame['mean']= np.mean(without_trend).round(2)
frame['std'] = (np.sqrt(sum((without_trend-mean)**2)/(n-1))).round(2)
#Displaying the basic statistics of data with out trend
frame

Out[10]:

no_obs mean std

0 51 491.22 128.23

Now Fitting of first order Markov model for annual data after trend is removed. Equation (4) to simulate
discharge.Autocorrelation for time lag of 1 is calculated by using the python function:

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 8/18
9/26/2020 Jupyter Notebook Viewer

In [11]:

#calculating the autocorrelation


auto_1 = statsmodels.tsa.stattools.acf(without_trend, nlags=1)
#converting the array data in to series
auto_1 = Series(auto_1[1,]).round(3)
print auto_1

0 0.528
dtype: float64

Here, the autocorrelation with time lag of 1 year is found to be 0.528

β1 = ρ 1 = auto1 = 0.528

mean = x̄ = 491.22

2
std = σ x = 128.23

and z t is series generated from random normal distribution: N(0,1). The initial value for Xt−1 is selected as the
3
1959 value i.e 552.63 m /s .

In [12]:

mean = 491.22
std = 128.23
#Selecting the data series 1959 to 1990
no_trend = without_trend[19:]
#generating normally distributed random data series
z= np.random.normal(0, 1, 32)
# Simulating the discharge by using equation (4)
simulated = pd.DataFrame(mean+(0.557*(no_trend -mean))+(z*std*(np.sqrt(1-0.557**2))))

calculating the auto-correlation upto 2 year time lag for observed and simulated datasets.

In [13]:

auto_2_obs = pd.DataFrame(statsmodels.tsa.stattools.acf(no_trend, nlags=2))[1:]


print auto_2_obs.T

1 2
0 0.595307 0.614706

In [14]:

auto_2_sim = pd.DataFrame(statsmodels.tsa.stattools.acf(simulated, nlags=2))[1:]


print auto_2_sim.T

1 2
0 0.252234 0.267889

Calculating other statistics and making the dataframe

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 9/18
9/26/2020 Jupyter Notebook Viewer

In [15]:

index = ['Observed','Simulated']
statistic = pd.DataFrame(index = index)
statistics = pd.DataFrame({'Auto_1': [0.595, 0.128], 'Auto_2': [0.6147, 0.0735]},inde
statistics['Mean'] = round(no_trend.mean(),2),round(simulated.mean(),2)
statistics['std'] = round(no_trend.std(),2), round(simulated.std(),2)
statistics['median'] = round(no_trend.median(),2), round(simulated.median(),2)
statistics

Out[15]:

Auto_1 Auto_2 Mean std median

Observed 0.595 0.6147 494.06 126.30 471.80

Simulated 0.128 0.0735 511.54 110.66 511.32

In [16]:

fig, ax = plt.subplots(figsize=(13,5))
ax.plot(no_trend, label='observed')
ax.plot(simulated, label='simulated')
plt.grid()
ax.set_xlim([1 ,31])
plt.legend()
plt.xlabel('Years',fontsize =18)
plt.ylabel('Discharge ($m^3/s$)',fontsize =18)
plt.show()

Conclusion: The statistics for observed and modelled time series (1960 -1990) is in above table. And aslo it is
clear from the plot that the variation between the observed and simulated data series is large. Comparision shows
that the Markov medeled time series is a more ramdom series as is expected when including the random z values
from a normal distribution in the Markov model. Simulated times series has lower mean, and lower standard
deviation then that of the observed series. And the value of mean and standard deviation changes everytimes after
the model because of random number.

5 Use AR(1) model to simulate 100 samples, each having 33 years, calculate the mean values of the statistics
(average, standard deviation, r1) from the 100 samples and compare them with the statistics calculated from
observed 33 years data series

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 10/18
9/26/2020 Jupyter Notebook Viewer

In [395]:

#Defining the dataframe


df = pd.DataFrame()
#defining the function for n times model run
def repeat_fun(times, f, args):
for i in range(times): f(i,args)

def f(iteration,df):
g = np.random.normal(0, 1, 32)
mm = Series(no_trend)
x = 491.22+(0.557*(mm -491.22))+(g*128.23*(np.sqrt(1-0.557**2)))
df[iteration] = x

repeat_fun(100,f,df)
#Dataframe with 100 columns and 31 rows will be obtained by printing df.
#calculating the mean for 100 times run datasets
Simulated_100 =df.mean(axis=1)

Calculating auto-correlation up to 2 year time lag for mean which is calculated from 100 years run datasets.

In [397]:

auto_2_100_sim = pd.DataFrame(statsmodels.tsa.stattools.acf(Simulated_100, nlags=2))[


print auto_2_sim.T

1 2
0 0.219762 0.432154

Calculating and comparing the different statistics for the mean of 100 times run and observed data without trend.

In [398]:

ind = ['Observed','Simulated']
simulated_100 = pd.DataFrame(index = ind)
simulated_100 = pd.DataFrame({'Auto_1': [0.595, 0.128], 'Auto_2': [0.219, 0.432]},ind
simulated_100['Mean'] = round(no_trend.mean(),2),round(Simulated_100.mean(),2)
simulated_100['std'] = round(no_trend.std(),2), round(Simulated_100.std(),2)
simulated_100['median'] = round(no_trend.median(),2), round(Simulated_100.median(),2)
simulated_100

Out[398]:

Auto_1 Auto_2 Mean std median

Observed 0.595 0.219 494.06 126.30 471.80

Simulated 0.128 0.432 491.35 70.28 476.09

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 11/18
9/26/2020 Jupyter Notebook Viewer

In [401]:

fig, ax = plt.subplots(figsize=(13,5))
ax.plot(Simulated_100,label='Simulated_100')
ax.plot(no_trend, label='observed')
plt.grid()
ax.set_xlim([1 ,31])
plt.ylabel('Discharge($m^3/s$)', fontsize =18)
plt.xlabel('Years',fontsize=18)
plt.legend()
plt.show()

Conclusion mean and standard deviation for mean simulated discharge is lower then that of observed one. On the
basis of standard deviation we can say that the variation in simulated data sets is comperatively less then that of
observed datasets.

Question 3: Stochastic model for non-stationary time series

Adapt a stochastic model to monthly discharge COLUMBIA_M1.txt.

Thomas-Fiering is a generalization of the first order Markov model. In general for the stochastic models (MA, AR,
ARMA) stationarity is assumed. For example, the first order Markov model assumes stationarity in the first three
moments. However, there are generalizations of these models that account for non-stationarity. For data sets which
are seasonally non-stationary, you can use Thomas-Fiering.

1. Fit the Thomas-Fiering Model to the data and simulate the monthly discharge from January 1949 to December
1975 using the value of December 1948 as the initial value. Plot the simulated and observed data on the same
graph and compare the statistics of the two data series
2. Simulate the 100-year discharge series from 1976 to 2075 using the value of December 1975 as initial value
and compare the statistics of the simulated data series with the observed data series.

The Thomas-Fiering Model is given as:


−−−−−
^j+1,i = q
q ¯ ) + z j+1,i ⋅ s j+1 √1 − r2     
¯j+1 + bj (qj,i − qj j
(1)

Where i = year, j = month, and z is a random series with N(0, 1).

Calculation procedure:

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 12/18
9/26/2020 Jupyter Notebook Viewer
1
(a) The mean flow: q¯j =
n
∑ qj,i ,     
i
where i = {j, 12 + j, 24 + j, ⋯}

−−−−−−−−
2
∑ (q j,i −q
¯ )
(b) The standard deviation: Sj = √
i j

n−1

(c) the correlation coefficient with flow in the next month,

∑ (q j,i −q
¯ j)(q j+1,i −q
¯ j+1 )
i
rj =
2 2
√∑ (q −q
¯ ) ∑ (q j+1,i −q
¯ )
i j,i j i j+1

(d) the slope of the regression equation relating the month’s flow to flow in the preceding month:

Sj+1
bj = rj
Sj

Solution

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 13/18
9/26/2020 Jupyter Notebook Viewer

In [18]:

def thomas_fiering(q_hist, q_init, j_init, period):


""" Thomas Fiering Model """

q = q_hist

months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

months1 = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1]

# create a random generator, note we have to call this '()' and take the
# first element returned [0]... so in use it is Z()[0]
Z = np.random.randn

# get the mean monthly discharges for the j months


q_mean_j = [q[q.index.month == i].discharge.mean() for i in months]

# get the mean monthly discharges for the j+1 months


q_mean_j1 = [q[q.index.month == i].discharge.mean() for i in months1]

# do the same for standard deviation


sj = [q[q.index.month == i].discharge.std() for i in months]

sj1 = [q[q.index.month == i].discharge.std() for i in months1]

# calculate the pearson r using built-ins


rj = [pearsonr(q[q.index.month == i].discharge,
q[q.index.month == j].discharge)[0] for i, j in zip(months, months

# the model, not vectorized. assumes each of the variables above are
# of length 12 ... and note that Python is zero-indexed (e.g. first element is '0
def tf1(qi, month):
i = month-1 #account for zero offset
return q_mean_j1[i] + (rj[i] * (sj1[i] / sj[i])) * (qi - q_mean_j[i]) + Z(1)[

# initiate simulated q with q_init and j_init (month)


# simq is a list, and we just call the model function to get the first element
simq = [tf1(q_init, j_init)]

# enumerate (e.g. makes 'i' a counter) and get each month in the simulation perio
for i, m in enumerate(period.month[1:]):

#append to the list by calling the model with the prior value and the month
simq.append(tf1(simq[i-1], m))

# convert it to a dataframe for convenience


return pd.DataFrame(np.array(simq), columns=['discharge'], index=period)

if __name__ == "__main__":

# use pandas to read the data


q = pd.read_table('mon.csv', sep=',',
names=['year', 'month', 'discharge'])[1:].convert_objects(convert_n
#q = q['discharge'].convert_objects(convert_numeric=True)

# create a pandas date range to index the data frame


rng = pd.date_range('1/1/1949', periods=len(q), freq='M')

# set the index (now we can easily get monthly values)


q.index = rng

# create a simulation period


period = pd.date_range('Jan-1949', periods=324, freq='M')
q_init = q.discharge['Dec-1949'] #december 1949

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 14/18
9/26/2020 Jupyter Notebook Viewer

simq = thomas_fiering(q, q_init, 12, period)


# plot the data
plt.figure(figsize=(15,5)); plt.hold(True)
plt.plot(q.index, q.discharge, c='b', label='observed')
plt.plot(simq.index, simq.discharge, c='r', label='simulated')
plt.legend()
plt.grid(True)

Comparing the statistics:

In [19]:

index = ['Observed','Simulated']
stat = pd.DataFrame(index = index)
stat['Mean'] = round(q.discharge.mean(),2),round(simq.mean(),2)
stat['std'] = round(q.discharge.std(),2), round(simq.std(),2)
stat['median'] = round(q.discharge.median(),2), round(simq.median(),2)
stat

Out[19]:

Mean std median

Observed 5551.51 3366.80 4207.00

Simulated 5697.04 2480.31 5043.47

Conclusion: From the plot above it is clear that the there is a variation between observed and simulated discharge
from thomas fiering model. Simulated mean is higher than that of observed while the simulated standard deviation
is lower then that of observed datasets. And also the mean is greater than the median in both case. This is common
for a distribution that is skewed to the right (that is, bunched up toward the left and with a "tail" stretching toward the
right)

Simulating discharge for monthly 100 years by using the Thomas-Fiering model

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 15/18
9/26/2020 Jupyter Notebook Viewer

In [20]:

# Repeat Thomas-Fiering 1200 time steps:


#Define an array of 1200 elements
#sim_100 = np.arange(100)

# Define the period (100 years = 1200 months time steps)


period = pd.date_range('Jan-1975', periods=1200, freq='M')
sim_100 = thomas_fiering(q, q_init, 12, period)
#print(sim_100)

# plot the data


plt.figure(figsize=(15,6)); plt.hold(True)
plt.plot(sim_100.index, sim_100, c='b')
#plt.plot(simq.index, simq.discharge, c='r', label='simulated')
plt.xlabel('Years',fontsize =18)
plt.ylabel('Discharge ($m^3/s$)',fontsize =18)
plt.grid(True)

Calculating the different statistics for 100 years simulated data

In [21]:

sim=pd.DataFrame(sim_100['discharge'])
mean_100 = pd.DataFrame(sim.mean().round(2))
std_100 = pd.DataFrame(sim.std().round(2))
median_100 =pd.DataFrame(sim.median().round(2))
stats_100= (pd.concat([mean_100, std_100,median_100], axis =1))
my_columns = ["mean", "std", "median"]
stats_100.columns = my_columns
stats_100

Out[21]:

mean std median

discharge 5685.43 2622.74 5085.56

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 16/18
9/26/2020 Jupyter Notebook Viewer

In [30]:

plot_acf(sim_yearly)

Out[30]:

Calculating the yearly sum for 100 years simulated data

In [29]:

sim_yearly = sim_100.groupby(sim_100.index.year).sum()

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 17/18
9/26/2020 Jupyter Notebook Viewer

In [40]:

# plotting 100 years simulated data


fig, ax = plt.subplots(figsize=(10,5))
ax.plot(sim_yearly.index,sim_yearly)
plt.grid()
plt.ylabel('Discharge($m^3/s$)', fontsize =18)
ax.set_xlim([1975 ,2075])
plt.xlabel('Years',fontsize=18)
plt.show()

In [ ]:

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 18/18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy