0% found this document useful (0 votes)
196 views

Pierian Data - Python For Finance & Algorithmic Trading Course Notes

The document provides notes on Python tools and libraries for finance and algorithmic trading, including Anaconda, Jupyter Notebooks, NumPy, Pandas, Matplotlib, and data sources. Key points covered include how to work with data frames in Pandas, handle time series data, perform visualizations in Matplotlib and Pandas, and example code for working with stock market and financial data. Examples of technical analysis indicators like moving averages and Bollinger Bands are also provided.

Uploaded by

Ishan Sane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views

Pierian Data - Python For Finance & Algorithmic Trading Course Notes

The document provides notes on Python tools and libraries for finance and algorithmic trading, including Anaconda, Jupyter Notebooks, NumPy, Pandas, Matplotlib, and data sources. Key points covered include how to work with data frames in Pandas, handle time series data, perform visualizations in Matplotlib and Pandas, and example code for working with stock market and financial data. Examples of technical analysis indicators like moving averages and Bollinger Bands are also provided.

Uploaded by

Ishan Sane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Pierian Data – Python for Finance & Algorithmic Trading

Course Notes

Anaconda

Jupyter Notebook system

Ipynb files

Nbconvert library for file conversion

 No space in variables
 Tuple (, , ,) is not mutable whereas [, , ,] assignment is changeable
 Set function returns unique assignments
 != inequality check
 Def my_func(): #to define a function
 Append allows you to add at the end of an array
 s.lower() turns string into lower case
 Always ensure (inplace=True)

Python allows you to create anonymous function i.e function having no names using a
facility called lambda function.

lambda functions are small functions usually not more than a line. It can have any
number of arguments just like a normal function. The body of lambda functions is very
small and consists of only one expression. The result of the expression is the value when
the lambda is applied to an argument. Also there is no need for any return statement in
lambda function.

Numpy

 linspace (evenly spaced numbers or linear)


 np.eye(2): identical matrix made of 1 & 0
 np.random rand (provides a random number distribution depending on choice)
 np.ones(10)*5 uses broadcasting to multiply each element by 5
 Why does this create an error a.reshape(3,3) when written on a separate line?
 Slicing formatting required when outputting as matrix
 Mat.sum(axis=0) is the sum of columns
 Mat.sum(axis=1) is the sum of rows
 Np.random(101) apply the same random number set & use Np.random.rand(1) to extract

Multiple large variable assignments will consume ram hence smart variable assignment needed.

 #conditional selection based on,


Pandas

 Panel Data created by Wes McKinney which is an open source library


 Series are like arrays except can be given date time index
 Pandas series can also hold functions
 Axis = 1 refers to columns
 df.drop() default axis set to zero
 Conditional statements return Boolean data frames
 Python’s and operator can only process Boolean values (true/false). Use & instead
 For an or statement use pipe operator ‘|’
 df.set_index() is used to turn an array into an index
 Multi-level index calling; df.loc[‘G1’].loc[1] (nested indexing)
 df.loc[‘G2’].loc[2][B] (specifies a particular column B)
 df.xs(1,level = ‘Num’) returns row 1 from column name Num
 Dealing with missing data:
 df.dropna() by default drops any rows with null values, df.dropna(1) will drop any columns
with null values
 df.fillna(value = ‘fill value’)
 Groupby allows for aggregation based off a column
 Finding unique values in a data frame. Df[‘colname’].unique returns unique values in an
array
 Data I/O: csv, excel, html, sql files can be linked
 to_csv allows to write to files (to_excel(‘file name.xlsx’,sheet_name=’newsheet’))
 pandas can only import data not macros or formulas
 DataFrame.columns (returns column names)
 DataFrame[‘column name’].nunique() = returns number of unique objects in a certain
column. nunique without brackets returns the full list

 banks.groupby("ST").count().sort_values('Bank Name',ascending=False).iloc[:5]['Bank
Name']
o uses a nested iloc function to return a sub set of values and the sort_values function
to rank
o .count() method counts
 banks['Acquiring Institution'].value_counts()
o the value_counts() method counts the number of times a certain value occurs in a
specified Data Frame column
 banks[banks['Acquiring Institution']=='State Bank of Texas']
o Nesting like the above within data frames can be used to selectively pull certain data
that full-fills a criteria such as only pulling up entries that are where the acquiring
institution was the State Bank of Texas
 banks[banks['ST']=='CA'].groupby('City').count().sort_values('Bank
Name',ascending=False).iloc[:1]
o can count within a particular subset such as number of banks for cities within a
particular state
o
o
 sum(banks['Bank Name'].apply(lambda name: 'Bank' not in name))
o
 sum(banks['Bank Name'].apply(lambda name: name[0].upper() == 'S'))
o finds any banks that start with capital S. name[0] returns the starting position of
name
o
 sum(banks['Bank Name'].apply(lambda name: len(name.split())==2))
o split () function, splits a string based off what is specified, if blank splits it based on
empty space
o the len function checks to see whether 2 name objects were returned or not

Matplotlib and Pandas – Visualisation

 Matplotlib has 2 API structures


o Object oriented structure & Function oriented structure
o Gallery has good examples
 Function oriented
o import matplotlib.pyplot as plt
o %matplotlib inline (for jupyter workbook)
o plt.plot(x,y,'b')
o plt.title('my first python chart')
o plt.xlabel('Jerrod')
o plt.ylabel('prpasfqf')
 Object oriented
o Fig = plt.figure()
o Axes = fig.add_axes([0.1,0.1,0.8,0.8])
o Axes.plot(x,y,’b’)
o axes.set_xlabel('Set X Label') # Notice the use of set_ to begin methods
o axes.set_ylabel('Set y Label')
o axes.set_title('Set Title')
 Matplotlib allows the aspect ratio, DPI and figure size to be specified when the Figure object
is created. You can use the `figsize` and `dpi` keyword arguments.
 * `figsize` is a tuple of the width and height of the figure in inches
 * `dpi` is the dots-per-inch (pixel per inch).
 For example:
 Matplotlib allows the aspect ratio, DPI and figure size to be specified when the Figure object
is created. You can use the figsize and dpi keyword arguments.
 figsize is a tuple of the width and height of the figure in inches
 dpi is the dots-per-inch (pixel per inch).

 ax2.set_xlim(20,22) & ax2.set_ylim(30,50). With these commands you can set the x & y
limits for each axis to zoom in on a section of a plot
 axes[0].plot(x,y,color="red",lw=5,ls=":"). Can be used to adjust the style and colour of the
chart
 fig,axes=plt.subplots(1,2,figsize=(12,2)). Figsize object allows the user to code the size of the
figure

Pandas Visualisation

 %matplotib inline
 import numpy as np; import pandas as pd; %matplotlib inline
 import matplotlib.pyplot as plt

There are several plot types built-in to pandas, most of them statistical plots by nature:

 df.plot.area
 df.plot.barh
 df.plot.density
 df.plot.hist
 df.plot.line
 df.plot.scatter
 df.plot.bar
 df.plot.box
 df.plot.hexbin
 df.plot.kde
 df.plot.pie

You can also just call df.plot(kind='hist') or replace that kind argument with any of the key terms
shown in the list above (e.g. 'box','barh', etc..)
 idx returns the index
 %matplotlib notebook (makes the plot interactive)
 df3['a'].plot.hist(color="blue",bins=100,alpha=0.5)
 df3[['a','b']].plot.box()

Data Sources

 Pandas data-reader (Google’s stock API)


 Quandl (robust python API). Need a user key for access more than 50 times a day

Yahoo and google have changed their APIs and are sometimes unstable. Use the codes "iex" or
"morningstar" instead

 Calls to commence pandas data reader


Import pandas_datareader.data as web
import datetime
 datetime.datetime(2015,1,1) records the time object
 E.g. facebook = web.DataReader(‘FB’,’google’,start,end)
 Returns data into a data frame
 For Options;
from pandas_datareader.data import Options
fb_options = Options(‘FB’,’google’)

Quandl

 Import quandl
mydata = quandl.get(‘EIA/PET_RWTC_D’)

Pandas with Time Series Data

 DateTime index, Time Resampling, Time Shifts, Rolling & Expanding


 From datetime import datetime (Python’s in-built date time library). Defaults to 0 hrs 0 mins
 Df[‘name’] = pd.to_datetime(df[‘name’])
 Df.resample(rule=’A’).mean() This will do an annual mean of data set
 Df.tshift allows for grouping by a certain time period frequency
 Moving average or rolling mean can be computed using df.rolling object
(df.rolling(7).mean().head(14))
 Plotting a 30 day MA on stock price would use the following
df.rolling(window=30).mean()[‘Close’].plot()
 Expanding() object returns the cumulative average

Bollinger Bands

 Volatility bands which are placed above or below the moving average line
 20 day means are used
 .std() returns the standard deviation of data set

Capstone Project: Stock Market

 Ford['Volume'].idxmax(). Returns the index value of the maximum value in the ‘Volume’
column
 gm['MA50'] = gm['Open'].rolling(50).mean() & gm['MA200'] =
gm['Open'].rolling(200).mean(). Computes moving averages
 df.index = pd.to_datetime(df.index). Converts index to datetime object
 car_comp = pd.concat([tesla['Open'],gm['Open'],ford['Open']],axis=1)

Candlestick Chart Code

from matplotlib.finance import candlestick_ohlc

from matplotlib.dates import DateFormatter, date2num, WeekdayLocator, DayLocator, MONDAY

# Rest the index to get a column of January Dates


ford_reset = ford.loc['2012-01':'2012-01'].reset_index()

# Create a new column of numerical "date" values for matplotlib to use

ford_reset['date_ax'] = ford_reset['Date'].apply(lambda date: date2num(date))

ford_values = [tuple(vals) for vals in ford_reset[['date_ax', 'Open', 'High', 'Low', 'Close']].values]

mondays = WeekdayLocator(MONDAY) # major ticks on the mondays

alldays = DayLocator() # minor ticks on the days

weekFormatter = DateFormatter('%b %d') # e.g., Jan 12

dayFormatter = DateFormatter('%d') # e.g., 12

#Plot it

fig, ax = plt.subplots()

fig.subplots_adjust(bottom=0.2)

ax.xaxis.set_major_locator(mondays)

ax.xaxis.set_minor_locator(alldays)

ax.xaxis.set_major_formatter(weekFormatter)

candlestick_ohlc(ax, ford_values, width=0.6, colorup='g',colordown='r');

 gm['returns'] = gm['Close'].pct_change(1). This method allows for computing of percent


change from one value to another off a certain column
 GM['returns'].plot(kind="kde",label="GM"). This plots a density distribution curve
 box_df=pd.concat([tesla['returns'],Ford['returns'],GM['returns']],axis=1)
box_df.columns=['Tesla Returns','Ford Returns','GM Returns']
box_df.plot(kind="box",figsize=(8,11)). This plots a box plot of the data set based on 3
columns.

Statsmodels

Statistics
 ETS Models: refer to Error Trend Seasonality models that will take each of those terms for
smoothing. It breaks up data into the following components:
o Trend
o Seasonality
o Residual

from statsmodels.tsa.seasonal import seasonal_decompose


result = seasonal_decompose(airline['Thousands of Passengers'],model='multiplicative')
result.plot()

 EWMA Models (Error Weighted Moving Average): reduce time lag, more weight applied to
more recent values

EWMA

Exponentially-weighted moving average

We just showed how to calculate the SMA based on some window.However, basic SMA has some
"weaknesses".

•Smaller windows will lead to more noise, rather than signal

•It will always lag by the size of the window

•It will never reach to full peak or valley of the data due to the averaging.

•Does not really inform you about possible future behaviour, all it really does is describe trends in
your data.

•Extreme historical values can skew your SMA significantly

To help fix some of these issues, we can use an EWMA (Exponentially-weighted moving average).

EWMA will allow us to reduce the lag effect from SMA and it will put more weight on values that
occured more recently (by applying more weight to the more recent values, thus the name). The
amount of weight applied to the most recent values will depend on the actual parameters used in
the EWMA and the number of periods given a window size. Full details on Mathematics behind
this can be found here Here is the shorter version of the explanation behind EWMA.
The formula for EWMA is:

Where x_t is the input value, w_i is the applied weight (Note how it can change from i=0 to t), and
y_t is the output.

Now the question is, how to we define the weight term w_i ?
This depends on the adjust parameter you provide to the .ewm() method.

When adjust is True (default), weighted averages are calculated using weights:

ARIMA

The general process for ARIMA models is the following:


 Visualize the Time Series Data
 Make the time series data stationary
 Plot the Correlation and Auto Correlation Charts
 Construct the ARIMA Model
 Use the model to make predictions

ARIMA: is a generalisation of the ARMA (autoregressive moving average model)

o Seasonal & Non-seasonal ARIMA


o Non-seasonal ARIMA: p,d,q
o P: Autoregression
o I: Integration
o Q: Moving average
o Stationary data: has constant mean & variance over time (co-variance should not be
a function of time, how fast variance moves over time)
 There are tests for stationarity in data. A common one is the Augmented Dickey Fuller Test
 Differencing (first order: computing difference, 2 nd order: adding those up sacrificing one row
of data in the process)
 For seasonal data, you could difference by 12 rather than 1 (shift.12 method does that)

Stationarity Tests

Uses the Augmented Dickey Fuller Test with Statsmodels

 Has a null hypothesis that it’s not stationary?


 Unit root test
 P value returned dictates whether it’s a stationary or non-stationary data set

In statistics and econometrics, an augmented Dickey–Fuller test (ADF) tests the null hypothesis
that a unit root is present in a time series sample. The alternative hypothesis is different
depending on which version of the test is used, but is usually stationarity or trend-stationarity.
Basically, we are trying to whether to accept the Null Hypothesis H0 (that the time series has a
unit root, indicating it is non-stationary) or reject H0 and go with the Alternative Hypothesis (that
the time series has no unit root and is stationary).
We end up deciding this based on the p-value return.
 A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis,
so you reject the null hypothesis.
 A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail
to reject the null hypothesis.
Let's run the Augmented Dickey-Fuller test on our data:

Differencing

First difference of a time series is the series of changes from one period to the next (pandas can
handle this)

Can continue to take 2nd & 3rd difference and keep going until data reaches stationarity

ACF & PACF

 Autocorrelation plot shows the correlation of the series with itself lagged by x time units
 These plots are usually run on the differenced/stationary data
 The ACF plots will also determine whether the AR or MA functions will be used in the ARIMA
model
o If the ACF Plot shows positive autocorrelation at the first lag (-1), AR is suggested for
use
o If the ACF plot shows negative autocorrelation at the first lag (-1) then it suggests
using MA terms

Partial Autocorrelation

 Partial autocorrelation is a conditional correlation between 2 variables under the


assumption that we know and account for the values of some other set of variables
Finance Fundamentals

Portfolio Allocation

Sharpe Ratio

(R p−R f )
 S=
σp
 Sigma-p is the Portfolio standard deviation
 If the risk-free rate = 0% then Sharpe Ratio is simplified to Mean Return divided by Std.
Deviation
 Can also be applied to compute average of yearly over daily returns
 K-factor based off your sampling rate
o K = sqrt(252)
o K = sqrt(52)
o K = sqrt(12)
 ASR = K*SR

Portfolio Optimisation

Efficient Frontier

CAPM – Capital Asset Pricing Model



Markowitz efficient portfolio optimisation

Monte Carlo Simulation

 Stocks.pct_change(1).corr(). Plots the Pearson


 Log returns are used for normalising more advanced time series data as this normalises &
de-trends the time series

Financial Markets Knowledge

 Order Book: is an electronic list of buy & sell orders for an instrument organised by price
level. Number of products being bid/offered by each price point or market depth. It also
identifies the market participants but some can remain anonymous. Order imbalances may
also become apparent & point toward a certain market trend. The order book does not
show the activity/batches of ‘dark pools’

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy