Machine Learning For Algorithmic Trading
Machine Learning For Algorithmic Trading
Machine Learning For Algorithmic Trading
Algorithmic Trading
Second Edition
Stefan Jansen
BIRMINGHAM - MUMBAI
Table of Contents
Prefacexiii
Chapter 1: Machine Learning for Trading – From Idea to Execution 1
The rise of ML in the investment industry 2
From electronic to high-frequency trading 3
Factor investing and smart beta funds 5
Algorithmic pioneers outperform humans 7
ML and alternative data 10
Crowdsourcing trading algorithms 11
Designing and executing an ML-driven strategy 12
Sourcing and managing data 13
From alpha factor research to portfolio management 13
Strategy backtesting 15
ML for trading – strategies and use cases 15
The evolution of algorithmic strategies 15
Use cases of ML for trading 16
Summary19
Chapter 2: Market and Fundamental Data – Sources and Techniques 21
Market data reflects its environment 22
Market microstructure – the nuts and bolts 23
How to trade – different types of orders 23
Where to trade – from exchanges to dark pools 24
Working with high-frequency data 26
How to work with Nasdaq order book data 26
Communicating trades with the FIX protocol 27
The Nasdaq TotalView-ITCH data feed 27
From ticks to bars – how to regularize market data 35
AlgoSeek minute bars – equity quote and trade data 40
API access to market data 44
Remote data access using pandas 44
yfinance – scraping data from Yahoo! Finance 46
[i]
Table of Contents
Quantopian48
Zipline48
Quandl50
Other market data providers 50
How to work with fundamental data 51
Financial statement data 51
Other fundamental data sources 56
Efficient data storage with pandas 57
Summary58
Chapter 3: Alternative Data for Finance – Categories and Use Cases 59
The alternative data revolution 60
Sources of alternative data 62
Individuals62
Business processes 63
Sensors63
Criteria for evaluating alternative data 65
Quality of the signal content 65
Quality of the data 67
Technical aspects 68
The market for alternative data 69
Data providers and use cases 70
Working with alternative data 72
Scraping OpenTable data 72
Scraping and parsing earnings call transcripts 77
Summary80
Chapter 4: Financial Feature Engineering – How to Research
Alpha Factors 81
Alpha factors in practice – from data to signals 82
Building on decades of factor research 84
Momentum and sentiment – the trend is your friend 84
Value factors – hunting fundamental bargains 88
Volatility and size anomalies 90
Quality factors for quantitative investing 92
Engineering alpha factors that predict returns 94
How to engineer factors using pandas and NumPy 94
How to use TA-Lib to create technical alpha factors 99
Denoising alpha factors with the Kalman filter 100
How to preprocess your noisy signals using wavelets 104
From signals to trades – Zipline for backtests 106
How to backtest a single-factor strategy 106
Combining factors from diverse data sources 109
Separating signal from noise with Alphalens 111
Creating forward returns and factor quantiles 112
Predictive performance by factor quantiles 113
[ ii ]
Table of Contents
[ iii ]
Table of Contents
[ iv ]
Table of Contents
Summary406
Chapter 13: Data-Driven Risk Factors and Asset Allocation with
Unsupervised Learning 407
Dimensionality reduction 408
The curse of dimensionality 409
Linear dimensionality reduction 411
Manifold learning – nonlinear dimensionality reduction 418
PCA for trading 421
Data-driven risk factors 421
Eigenportfolios424
Clustering426
k-means clustering 427
Hierarchical clustering 429
Density-based clustering 431
Gaussian mixture models 432
Hierarchical clustering for optimal portfolios 433
How hierarchical risk parity works 433
Backtesting HRP using an ML trading strategy 435
Summary438
Chapter 14: Text Data for Trading – Sentiment Analysis 439
ML with text data – from language to features 440
Key challenges of working with text data 440
The NLP workflow 441
Applications443
From text to tokens – the NLP pipeline 443
NLP pipeline with spaCy and textacy 444
NLP with TextBlob 448
Counting tokens – the document-term matrix 449
The bag-of-words model 450
Document-term matrix with scikit-learn 451
Key lessons instead of lessons learned 455
NLP for trading 455
The naive Bayes classifier 456
Classifying news articles 457
Sentiment analysis with Twitter and Yelp data 458
Summary462
Chapter 15: Topic Modeling – Summarizing Financial News 463
Learning latent topics – Goals and approaches 464
Latent semantic indexing 465
How to implement LSI using sklearn 466
Strengths and limitations 468
Probabilistic latent semantic analysis 469
How to implement pLSA using sklearn 470
[ vii ]
Table of Contents
[ xii ]