Pierian Data - Python For Finance & Algorithmic Trading Course Notes
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
Course Notes
Anaconda
Ipynb files
No space in variables
Tuple (, , ,) is not mutable whereas [, , ,] assignment is changeable
Set function returns unique assignments
!= inequality check
Def my_func(): #to define a function
Append allows you to add at the end of an array
s.lower() turns string into lower case
Always ensure (inplace=True)
Python allows you to create anonymous function i.e function having no names using a
facility called lambda function.
lambda functions are small functions usually not more than a line. It can have any
number of arguments just like a normal function. The body of lambda functions is very
small and consists of only one expression. The result of the expression is the value when
the lambda is applied to an argument. Also there is no need for any return statement in
lambda function.
Numpy
Multiple large variable assignments will consume ram hence smart variable assignment needed.
ax2.set_xlim(20,22) & ax2.set_ylim(30,50). With these commands you can set the x & y
limits for each axis to zoom in on a section of a plot
axes[0].plot(x,y,color="red",lw=5,ls=":"). Can be used to adjust the style and colour of the
chart
fig,axes=plt.subplots(1,2,figsize=(12,2)). Figsize object allows the user to code the size of the
figure
Pandas Visualisation
%matplotib inline
import numpy as np; import pandas as pd; %matplotlib inline
import matplotlib.pyplot as plt
There are several plot types built-in to pandas, most of them statistical plots by nature:
df.plot.area
df.plot.barh
df.plot.density
df.plot.hist
df.plot.line
df.plot.scatter
df.plot.bar
df.plot.box
df.plot.hexbin
df.plot.kde
df.plot.pie
You can also just call df.plot(kind='hist') or replace that kind argument with any of the key terms
shown in the list above (e.g. 'box','barh', etc..)
idx returns the index
%matplotlib notebook (makes the plot interactive)
df3['a'].plot.hist(color="blue",bins=100,alpha=0.5)
df3[['a','b']].plot.box()
Data Sources
Yahoo and google have changed their APIs and are sometimes unstable. Use the codes "iex" or
"morningstar" instead
Quandl
Import quandl
mydata = quandl.get(‘EIA/PET_RWTC_D’)
Bollinger Bands
Volatility bands which are placed above or below the moving average line
20 day means are used
.std() returns the standard deviation of data set
Ford['Volume'].idxmax(). Returns the index value of the maximum value in the ‘Volume’
column
gm['MA50'] = gm['Open'].rolling(50).mean() & gm['MA200'] =
gm['Open'].rolling(200).mean(). Computes moving averages
df.index = pd.to_datetime(df.index). Converts index to datetime object
car_comp = pd.concat([tesla['Open'],gm['Open'],ford['Open']],axis=1)
#Plot it
fig, ax = plt.subplots()
fig.subplots_adjust(bottom=0.2)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
Statsmodels
Statistics
ETS Models: refer to Error Trend Seasonality models that will take each of those terms for
smoothing. It breaks up data into the following components:
o Trend
o Seasonality
o Residual
EWMA Models (Error Weighted Moving Average): reduce time lag, more weight applied to
more recent values
EWMA
We just showed how to calculate the SMA based on some window.However, basic SMA has some
"weaknesses".
•It will never reach to full peak or valley of the data due to the averaging.
•Does not really inform you about possible future behaviour, all it really does is describe trends in
your data.
To help fix some of these issues, we can use an EWMA (Exponentially-weighted moving average).
EWMA will allow us to reduce the lag effect from SMA and it will put more weight on values that
occured more recently (by applying more weight to the more recent values, thus the name). The
amount of weight applied to the most recent values will depend on the actual parameters used in
the EWMA and the number of periods given a window size. Full details on Mathematics behind
this can be found here Here is the shorter version of the explanation behind EWMA.
The formula for EWMA is:
Where x_t is the input value, w_i is the applied weight (Note how it can change from i=0 to t), and
y_t is the output.
Now the question is, how to we define the weight term w_i ?
This depends on the adjust parameter you provide to the .ewm() method.
When adjust is True (default), weighted averages are calculated using weights:
ARIMA
Stationarity Tests
In statistics and econometrics, an augmented Dickey–Fuller test (ADF) tests the null hypothesis
that a unit root is present in a time series sample. The alternative hypothesis is different
depending on which version of the test is used, but is usually stationarity or trend-stationarity.
Basically, we are trying to whether to accept the Null Hypothesis H0 (that the time series has a
unit root, indicating it is non-stationary) or reject H0 and go with the Alternative Hypothesis (that
the time series has no unit root and is stationary).
We end up deciding this based on the p-value return.
A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis,
so you reject the null hypothesis.
A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail
to reject the null hypothesis.
Let's run the Augmented Dickey-Fuller test on our data:
Differencing
First difference of a time series is the series of changes from one period to the next (pandas can
handle this)
Can continue to take 2nd & 3rd difference and keep going until data reaches stationarity
Autocorrelation plot shows the correlation of the series with itself lagged by x time units
These plots are usually run on the differenced/stationary data
The ACF plots will also determine whether the AR or MA functions will be used in the ARIMA
model
o If the ACF Plot shows positive autocorrelation at the first lag (-1), AR is suggested for
use
o If the ACF plot shows negative autocorrelation at the first lag (-1) then it suggests
using MA terms
Partial Autocorrelation
Portfolio Allocation
Sharpe Ratio
(R p−R f )
S=
σp
Sigma-p is the Portfolio standard deviation
If the risk-free rate = 0% then Sharpe Ratio is simplified to Mean Return divided by Std.
Deviation
Can also be applied to compute average of yearly over daily returns
K-factor based off your sampling rate
o K = sqrt(252)
o K = sqrt(52)
o K = sqrt(12)
ASR = K*SR
Portfolio Optimisation
Efficient Frontier
Order Book: is an electronic list of buy & sell orders for an instrument organised by price
level. Number of products being bid/offered by each price point or market depth. It also
identifies the market participants but some can remain anonymous. Order imbalances may
also become apparent & point toward a certain market trend. The order book does not
show the activity/batches of ‘dark pools’