StefanZohren_Columbia-Bloomberg
StefanZohren_Columbia-Bloomberg
StefanZohren_Columbia-Bloomberg
Stefan Zohren
Momentum Transformer
Conclusions
Classical Time-Series Momentum Strategies
Momentum Strategies
▶ Based on the concept ‘attention is all you need’, doing away with
convolutions and recurrent neural networks (RNNs).
▶ The attention-based architecture allows the network to focus on
significant time steps in the past and longer-term patterns
▶ Have led to state-of-the-art performance in diverse fields, such as
of natural language processing, computer vision, and speech
processing (see Lin et al. [8]).
▶ Have recently have been harnessed for time-series modelling (Li
et al. [9] Lim et al. [10], Zhuo et al. [11]).
▶ Naturally adapts to new market regimes, such as during the
SARS-CoV-2 crisis.
Base Architectures Tested in the Momentum Transformer
▶ Transformer: (Vaswani et al. [12]) consists of encoder and
decoder – each consisting of l identical layers of a (multi)
self-attention mechanism, followed by a position-wise
feed-forward network and a residual connection between these
two components.
▶ Decoder-Only Transformer: (Li et al. [9]) only the decoder side.
▶ Convolutional Transformer: Li et al. [9] incorporates
convolutional and log-sparse self-attention.
▶ Informer Transformer: Zhuo et al. [11] replaces the naive
sparsity rule of the Conv. Transformer with a measurement based
on the Kullback-Leibler divergence to distinguish essential
queries, referred to as ProbSparse self-attention.
▶ Decoder-Only Temporal Fusion Transformer (TFT): an
attention-LSTM hybrid which uses recurrent LSTM layers for
local processing and interpretable self-attention layers for
long-term dependencies. We consider the Decoder-Only version
of the original TFT (Lim et al. [10]).
Momentum Transformer (Decoder-Only TFT)
▶ We observe significant
structure in attention
patterns.
▶ The attention on
momentum turning
points is pronounced,
segmenting the time
series into regimes.
▶ Our model focuses on
previous time-steps
which are in a similar
regime.
Papers:
Deep Momentum Networks [1904.04912]
DMNs with Changepoints [2105.13727]
Momentum Transform [2112.08534]
stefan.zohren@eng.ox.ac.uk
[1] T. J. Moskowitz, Y. H. Ooi, and L. H. Pedersen, “Time series momentum,” Journal of Financial Economics,
vol. 104, no. 2, pp. 228 – 250, 2012. Special Issue on Investor Sentiment.
[2] N. Jegadeesh and S. Titman, “Returns to buying winners and selling losers: Implications for stock market
efficiency,” The Journal of Finance, vol. 48, no. 1, pp. 65–91, 1993.
[3] A. Y. Kim, Y. Tse, and J. K. Wald, “Time series momentum and volatility scaling,” Journal of Financial Markets,
vol. 30, pp. 103 – 124, 2016.
[4] J. Baz, N. Granger, C. R. Harvey, N. Le Roux, and S. Rattray, “Dissecting investment strategies in the cross
section and time series,” SSRN, 2015.
[5] A. Garg, C. L. Goulding, C. R. Harvey, and M. Mazzoleni, “Momentum turning points,” Available at SSRN
3489539, 2021.
[6] R. P. Adams and D. J. MacKay, “Bayesian online changepoint detection,” arXiv preprint arXiv:0710.3742, 2007.
[7] C. K. Williams and C. E. Rasmussen, “Gaussian processes for regression,” 1996.
[8] T. Lin, Y. Wang, X. Liu, and X. Qiu, “A survey of Transformers,” arXiv preprint arXiv:2106.04554, 2021.
[9] S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang, and X. Yan, “Enhancing the locality and breaking the
memory bottleneck of Transformer on time series forecasting,” Advances in Neural Information Processing
Systems (NeurIPS), vol. 32, pp. 5243–5253, 2019.
[10] B. Lim, S. Ö. Arık, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time
series forecasting,” International Journal of Forecasting, 2021.
[11] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient Transformer
for long sequence time-series forecasting,” in The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI
2021, Virtual Conference, vol. 35, pp. 11106–11115, AAAI Press, 2021.
[12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention
is all you need,” arXiv preprint arXiv:1706.03762, 2017.