Machine Learning for Financial Market Forecasting
Machine Learning for Financial Market Forecasting
Machine Learning for Financial Market Forecasting
Citation
Johnson, Jaya. 2023. Machine Learning for Financial Market Forecasting. Master's thesis,
Harvard University Division of Continuing Education.
Permanent link
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37375052
Terms of Use
This article was downloaded from Harvard University’s DASH repository, and is made available
under the terms and conditions applicable to Other Posted Material, as set forth at http://
nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Accessibility
Machine Learning for Financial Market Forecasting
Jaya Johnson
Harvard University
May 2023
Copyright 2023 Jaya Johnson
Abstract
years machine learning algorithms have been applied to achieve better predictions.
data including news feeds, analysts calls and other online content have been used as
learning methods with more recent ones, including LSTM and FinBERT to assess
I want to thank my thesis director and advisor Professor Hongming Wang, for
her expertise, advice, encouragement, and understanding. Her guidance was invalu-
able, and her knowledge and patience helped provide me with an outstanding research
experience. My thesis year was an absolute pleasure and a great learning adventure.
Thank you to all the Harvard professors and teaching assistants whose knowl-
On a personal note, I want to thank my family, Jayin and Josh, for their
invaluable support and encouragement over the past few years. Jayin, my wonderful
iv
Contents
Table of Contents
List of Figures
List of Tables
Chapter I: Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
v
2.4 Raw Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.7 FinBERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
vi
3.6.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6.4 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6.5 Experiment 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.6.6 Experiment 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.6.7 Experiment 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.6.8 Experiment 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.6.9 Experiment 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.6.10 Experiment 10 . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.6.11 Experiment 11 . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.6.12 Experiment 12 . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Chapter V: Conclusion
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
vii
Appendix A
Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
References
List of References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
viii
List of Figures
2 Scrapy Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2021) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
et al., 2022) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
ix
13 Apple Inc. Word Distribution. . . . . . . . . . . . . . . . . . . . . . . 42
x
List of Tables
8 List of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
12 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
15 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
xi
17 Confusion Matrix - LSTM (Exp 3.) . . . . . . . . . . . . . . . . . . 55
18 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
21 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
24 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
27 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
30 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
33 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
36 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
xii
38 Confusion Matrix - Logistic Regression (Exp 10.) . . . . . . . . . . . 66
39 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
42 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
ment 7-12) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
xiii
Chapter I.
Introduction
in research due to the volatile, non-parametric, and nonlinear data sets (Huynh et al.,
2017). Earlier research has attempted to use various computational methods to model
financial time series, starting with Neural Networks (Zhang & Wu, 2009), Fuzzy
Systems (Moghaddam et al., 2016), Hidden Markov models (jae Kim & Han, 2000),
and other hybrid combinations (Huynh et al., 2017). Various degree of success rates
News articles and stock research have traditionally been important influences
in market trends. Many of these studies have only recently started to include non-
technical sources of information, such as news media, that impact the behavior of
investors (Zhai et al., 2007). Recent developments in AI and Natural Language Pro-
cessing (NLP), have made it possible to quantify their impact and apply them in
This thesis applies AI techniques that include historic price indicators and
news articles to predict the direction of individual stocks and market indices. It
also provides exploratory research into how these techniques predict the stock/index
1.1. Motivation
Textual data provides additional market insight and has improved stock mar-
tions in finance, for instance, stock market prediction. It attempts to categorize the
negative words in their context (Ashtiani, 2018). Some examples of data sources are
social media (Twitter, Reddit), news sites (Reuters, Bloomberg, Wall Street Journal),
blogs, forums, and financial reports (SEC documents) (Mao et al., 2012).
Sophisticated language models have been introduced in the past few years.
understand the context of a word (Vaswani et al., 2017). The bidirectional encoder
word in its entire context while learning to improve the word embedding models
One of the fundamental issues with NLP models for sentimental analysis is that
these models often need the correct financial language and vocabulary in the training
corpora. In most cases, the training corpora are more general (Araci, 2019). FinBERT
resolves this issue; it is a pre-trained NLP model that analyzes the sentiment of the
2
financial text.
and advancement of algorithms that not only process vast amounts of data but con-
tinually address the problems with previously applied solutions. Different Support
Vector Machine (SVM) models are proving to be effective and have been used in past
Much recent research has focused on using long short-term memory (LSTM)
neural networks to predict market trends (Zhai et al., 2007) (Lin & Chen, 2018) The
for market predictions. Most of the experiments have been effective for short-term
forecasting. In many instances, LSTM has performed better than other recurrent
index prices and direction, leading us to add fundamental data indicators to the
Technical and fundamental indicators are both used in market trend predic-
tion, including stocks and indices. In stock market prediction, technical indicators
have always been the primary source of feature selection (Zhai et al., 2007). Technical
indicators computed using mathematical formulae and historical prices to analyze the
patterns of past trends and predict future movements (Murphy, 1999). Fundamental
3
make the prediction. Unstructured data from news articles, financial reports, and
micro-blogs are most commonly used, to factor in the fundamental indicators (Mur-
phy, 1999).
prices.
Rate of change - Rate of change (ROC) refers to how fast price changes over
cator using the volume and price to determine whether a security is distributed or
accumulated.
percentage.
4
Figure 1: Summary of price features (Zhai et al., 2007)
Zhai et al. evaluated the technical indicators described above to predict daily
stock price trends (Zhai et al., 2007). Price momentum was one of the indicators
that had the most impact on the price movement (Zhai et al., 2007). Prior research
also finds news articles to have a significant impact on the behavior of stock prices
The various NLP techniques to extract a sentiment value from contextual data
that impact market trends, as well as the promising AI techniques and algorithms
technical and fundamental indicators using machine learning to predict stock and
index prices.
5
1.2. Related Work
This section reviews all related literature and summarizes the relevant content
to our thesis. Considerable progress has been made in stock market research since
and performance.
The internet has contributed to vast online data, including various content in
financial news data, research publications, and stock reviews. Manning et al. discuss
how NLP has developed considerably, given the massive volume of contextual data,
the highly advanced machine learning algorithms, and the availability of increased
computing that gives a much richer understanding of language models in their context
some exciting findings. Techniques such as word vectors and polarity based on both
positive and negative sentiment analysis of articles have an impact on stock price
changes and help predict future trends (Souma et al., 2019). However, NLP in the
ing to Chen et al. Li et al. (2021). Many studies have been conducted to review the
impact on market predictions. Li, Wang, et al.(Li et al., 2020) used the BERT model
to obtain investor sentiment and study its impact on stock yield. They confirmed
the relationship between the two (with 97% accuracy). RNNs including LSTM are
deep learning models used more recently in stock market prediction. In their re-
6
search, Alexander, Härdle, et al. (Dautel et al., 2020b) compared different RNNs for
networks, and LSTMs for the performance of the time series data seemed to be best
with LSTMs compared to the more traditional methods (Dautel et al., 2020b).
various domains of predictive analytics, especially stock market analysis. Stock news
headlines, reviews, 10K filings, and earnings statements are different financial artifacts
that can be valuable input features in stock market prediction. Sentiment analysis of
news articles, and headlines, in particular, help make more informed trading decisions
(Osterrieder, 2023).
Andrawos (Andrawos, 2022) has compared and analyzed textual data that
7
might help predict stock market pricing. They analyzed 10-K and 10-Q reports of 48
companies in the SP 500 from 2013 to 2017 and tested their report from 2018-2019.
where new releases often impact stock prices (Villamil et al., 2023). They use bidirec-
the paper by Wójcik et al. that examines the impact of financial statements on stock
and foreign exchange markets. They use the tone/sentiment of these statements and
measure the direction of the markets. They found non-linear methods to be more
effective than linear methods. One of the methods used in this study was support
vector regression using polynomial and radial kernels (Wójcik & Osowska, 2023).
In this thesis, we extract the sentiment of news articles from Reuters and
analyze their impact on various models to predict the direction of the index. For the
baseline methodology, DistilBERT is chosen as the library for sentiment analysis. Per
SANH et al., who introduce DistilBERT, claim it to be a smaller, faster, cheaper, and
source of text, such as Wikipedia. This language model can be applied to other NLP
8
models is that they are trained with a general corpus, and analyzing financial text
is complicated and does not yield the same results as non-financial text. The lack of
models with BERT. In their paper, Yang et al. compared BERT and FinBERT to
run economic sentiment classification. They have compiled large-scale corpora that
Corporate Reports 10K & 10Q, Analyst Reports, and Earnings Calls Statements
Huang et al. compare several machine-learning algorithms with FinBERT with la-
beled sentences from financial reports (Huang et al., 2022). They conclude that
FinBERT achieves higher accuracy than other approaches. Peng et al. have also
compared BERT with FinBERT, specifically the performances of the two models us-
ing a variety of financial text processing tasks. FinBERT outperformed all models
in the financial data sets. Thus, we are motivated to compare the sentiment score
output of BERT and FinBERT as an additional input feature to the model (Peng
et al., 2021).
from 10-K reports. They concluded that BERT performed better on Earning Calls
transcripts. The performance improvement indicated that BERT had better contex-
9
tualization than FinBERT (DeSola et al., 2019) .
Although many studies show significantly better results with the sentiments
derived from FinBERT language models when classifying financial data, Chuang and
Yang note that there are nuances in their preference towards certain domains, and
they highlight this to help NLP practitioners improve the robustness of their models
(Chuang & Yang, 2022). Overall FinBERT performs better on earnings calls versus
10-Ks.
Based on these studies, we used FinBERT for sentiment analysis and Distil-
BERT, a condensed form of BERT, as a baseline. This work could show the difference
between the two language models BERT (trained on more general corpora) and Fin-
BERT (trained on more specific corpora) and their impact on news headlines, various
Statistical methods have long been used for stock market predictions. Different
techniques have been used to build models for stock predictions; logistic regression is
the most commonly used classification model for smaller data sets.
Logistic regression is one of the baseline methodologies used for this work. It
has long been used as a preliminary algorithm to predict indexes and stock prices.
Yang et al. investigate correctly exploring and predicting the up and down trends
for stock prices (Yang et al., 2022b) using look group penalized logistic regression
to create a model with technical indicators. They use the confusion matrix and
10
AUC scores for bench-marking to improve prediction accuracy. In their work, logistic
regression fails with many parameters with few samples. As the number of parameters
in this thesis are limited to only a few, logistic regression was a suitable choice to
Ballings et al., also use single classifier models (Neural Networks, Logistic
to compare their impact on predicting stock prices of 5767 stocks (Ballings et al.,
2015).
stock market trends in the last few years. Many studies show improved results.
Agusta et al. proposed a system to predict the optimal time to buy and sell stocks
with SVM (Agusta et al., 2022). Yang et al. used it for stock market forecasting
(Chuang & Yang, 2022). Kang et al. proposed using a hybrid Support Vector Machine
(SVM) to forecast daily returns of popular stock indices in the world (Kang et al.,
2023) . Achyutha, et al. analyzed the impact of different tweets and the performance
One of the essential parameters to tune the SVM model is the type of kernel to
be used. The above papers all had significant success with different kernel types. Yang
leveraged the kernel function selection and kernel parameter selection to optimize the
results (Chuang & Yang, 2022) . He used the linear kernel function to get the most
11
accurate results. Agusta et al. also leveraged the non-linear kernel functions to
improve results (Agusta et al., 2022). The performance also increased when labeling
the parameters was implemented with about 77% accuracy. For this study comparison
of the different kernel methods and their effectiveness on prediction are to be tested.
SVM has proven to be better than traditional neural network methods. For
instance, in their paper, Tay et al. look at the application of SVM in financial time
series forecasting (Tay & Cao, 2001). They compare this methodology with a back-
propagation neural network. They have examined five futures contracts from the
Chicago Mercantile Market. Their experiments with SVMs found Gaussian kernel
functions perform better than the polynomial kernel. They concluded that SVMs
LSTM is the most recent neural network methodology used for time series
prediction. Many studies are conducted to compare SVM with LSTM, with LSTM
having better results on stock market prediction because of the nature of the data
(time series). In his paper, Zhang has evaluated the accuracy of SVM and LSTM
models to assess efficient markets hypothesis (EMH) by predicting the typical stock
indexes of the American stock market and the Chinese stock market (Zhang et al.,
2021). He concludes that running random walks shows little difference between LSTM
and SVM models. In this study, we too compare SVM and LSTM but add another
fundamental indicator, the sentiment of the articles, to see if that enhances the per-
12
1.2.5 LSTM and Stock Market Prediction.
Long short-term memory (LSTM) cells are used in recurrent neural networks
(RNNs); these cells then learn to predict the future from sequences of different lengths.
LSTM models have been the most recent trend in deep learning algorithms
with promise in stock market prediction and financial engineering. Koosha et al. have
used and compared different machine learning models to predict the price of Bitcoin
(Koosha et al., 2022) . The challenge that most pricing models face is the volatility
of the data.
In their paper, Li et al. compare different machine learning models and develop
their own. Their model is O-LGT, where the model’s layers use LSTM and GRU (Li
et al., 2023). Some researchers explore the volatile but profitable foreign exchange
(FX) markets (Yıldırım et al., 2021). Dautel et al. (Dautel et al., 2020a) have also
used LSTM in foreign exchange market predictions. Livieris et al. look at prediction
models for gold prices as the volatility significantly impacts many world financial
Although LSTM shows promise, tuning the parameters is an area that could
13
1.3. Research Problem
ment and fundamental indicators such as sentiment analysis of news articles to predict
the direction of the S&P 500 index and stock prices. Based on prior research, index
prices and stock prices are influenced by both technical and fundamental indicators.
linear regression. BERT is used for sentiment classification. SVM is another baseline
method used in the experiments. One of our goals (see below) is to determine if
LSTM is better than SVM at predictions for time-series data in the stock market.
We also introduce sentiment scale derived from FinBERT as one of the input
features to the LSTM model and compare and contrast the output and results.
Since short term news articles tend to have the most impact on the price of
stocks, we go back a year to gather the data. Our data includes stock prices and
news articles of 3 companies that influence the S&P 500 index. These companies are
(i) How does LSTM perform compared to traditional machine learning meth-
(ii) How does FinBERT compare with BERT methods to predict the direction
of the S&P?
(iii) How does using FinBERT and technical indicators from prior research
14
impact the overall prediction of the movement of the S&P?
(iv) What is the impact of different parameters for fine tuning the algorithm?
15
Chapter II.
Methodology
2.1. Overview
to predict the direction of the S&P 500 index. Based on previous research, we use
stocks of companies with the most weight to calculate the S&P 500 Index. These
companies are Microsoft Corp. (MSFT), Apple Inc. (APPL), and Amazon (AMZN).
Different statistical and machine learning algorithms are compared for accuracy and
best fit.
damental data. The historical price data of the stock tickers of the companies men-
tioned above constitute the technical indicators. These prices are downloaded from
yahoo.com. NLP methodologies are used to analyze sentiment from news articles
about stocks.
Once the sentiment score and class are extracted from the tone of the news
articles, technical indicators such as moving averages of the closing prices are used as
16
This section gives an overview of the different methodologies as well as the
and forecast financial trends. In investing, indicators refer to technical chart pat-
terns derived from a given security’s price and volume. Indicators can be broadly
Fundamental indicators
These indicators are used to calculate the actual intrinsic value of a share. In
tals. These indicators are derived from the company’s financial reports, reports about
various macroeconomic indicators, and textual sources like news articles (Petrusheva
& Jordanoski, nown). For the experiments conducted as a part of this thesis, NLP
techniques like BERT and FinBERT are used to calculate the sentiment score from
Technical indicators
These indicators are used to predict the future market price of a share. The
technical analysis considers past changes in a share’s price and attempts to predict its
future price movements and changes (Petrusheva & Jordanoski, nown). This paper
uses price momentum as the technical indicator to predict the S&P index.
17
Momentum is the rate of price changes in a stock, security, or tradable in-
strument. It shows the rate of change in price movement so investors can assess the
the closing price ten days ago from the last closing price. (Murphy, 1999)
The proposed method for data collection is detailed below. This section is
divided into downloading fundamental data, downloading technical data, and pre-
Fundamental Data such as stock, reviews are collected via a web extraction
tool from Reuters (www.reuters.com). These include reviews in all categories. This
data is collected for six months and one year. This data is further processed to extract
Technical Data includes the six-month and one-year performance of the S&P
performance data, including the index’s high, low, open, close, and volume. This data
is used to extract the price momentum, which is C − Ct where C is the close price
and t is the number of days (4). The data is downloaded from Yahoo finance.
the index are extracted from reuters.com. Textual data (news articles) was collected
18
for six months and one year. The companies for which the data was collected included
The web-services protocol from the network tab extracts information from
the Reuters API. The web-service call is parameterized to download articles for dif-
ferent companies. The library utilized to develop the module for web extraction is
Scrapy. Scrapy is a python-based fast web extraction framework useful for extracting
unstructured content from websites. It has many use cases in the field of data mining.
Scrapy has many powerful features that make it the right choice for extracting
i) Built-in support for selecting and extracting HTML data using regular ex-
ii) An interactive shell console that makes it easy to install the library in colab
iii) Json support that enables run-time validations and basic parsing when
downloading the articles and checking the return status code, and downloading select
attributes.
iv) Extensible APIs that support module development, reducing the need to
build pipelines from scratch. These APIs automate 90% of the orchestration when
19
Figure 2: Scrapy Architecture.
ii) The engine schedules the requests in the order they are received.
iii) The requests are sent to the downloader via the pipeline from the engine.
iv) The downloader generates a response after downloading the page and re-
v) The engine sends the Spider the response via the spider middleware. The
vi) The Spider processes the response and returns content and other responses
to the engine.
vii) The processed items are sent to the items pipelines, which send processed
The process repeats until there are no more requests (Zyte, 2023).
20
The technical data for the experiments is downloaded from Yahoo Finance.
This data is exported manually into CSV files. The data points include open, close,
Pre-processing raw textual data is a key step in NLP (natural language process-
ing). The text identified at this stage are fundamental in NLP modeling. Normalizing
is a series of steps where text documents are processed and cleansed. Data cleansing
The fundamental data with the reviews is cleansed and persisted into a data
frame with the following columns stock, date, review, and headline.
iii) Tokenization The text is split into smaller tokens using sentence and word
tokenization.
iv) Stop word removal; common words are removed from the text as they do
not add meaning to the analysis. Stop words usually do not add to the context of
the text processed; hence they are typically eliminated from the text.
v) Perform stemming. The words in the text are stemmed or reduced to their
root/base form.
21
vi) Perform lemmatization. It performs stemming of the word but ensures it
Step Library
2.6. BERT
to input elements therefore the model is able to add context by assigning weights to
This model has become a widely-used deep learning framework for natural
language processing (NLP). The advantage BERT has over traditional NLP models
is the ability add context to language in text from its surrounding text using the
weights mentioned between the input and output elements mentioned above. This is
BERT was trained using text from Wikipedia - it can be tuned further using
22
question and answer data sets.
Language models preceding BERT could only read text sequentially however
with BERT reads in both directions at once. This was enabled by Transformers which
and Next Sentence Prediction. Masked Language Model training obfuscates a word
in a sentence and has the program predict that word. Next Sentence Prediction
23
a BERT base. It has 40% fewer parameters, runs 60% faster, and preserves over 95%
The following are the parameters used to fine tune the model.
Parameter Description/Values
n heads Each attention layer, has attention heads that are set with this parameter.
hidden dim The size of the “intermediate” (often named feedforward) layer in the Transformer encoder.
2.7. FinBERT
BERT trained on financial terms. The main driver for FinBERT is that BERT is
only trained on general corpus and computes sentiment poorly for financial text.
et al., 2020). Huang et al. document that FinBERT outperforms many other NLP
24
FinBERT’s main advantage is Google’s original bidirectional encoder repre-
sentations from transformers (BERT) model. The transformers are essential for small
To train FinBERT, the base BERT model is pre-trained using three types of
financial texts:
i) 60, 490, 10-Ks and 142, 622 10-Qs of Russell 3000 firms from the SEC’s
EDGAR website.
ii) S&P 500 firms’ analyst reports from the Thomson Investext database; and
2021)
The above figure shows the architecture of FinBERT, the model is trained
25
2.8. Multiple Logistic Regression
Multiple Logistic Regression predicts a binary variable using one or more input
function of the sentiment score and market momentum (Maindonald & Braun, nown).
variable, which is the direction of the S&P for the experiments. Since the categories
For the experiments the sklearn library’s module logistic regression is used.
The following parameters provided by the library are tuned to optimize the model.
26
Parameter Description/Values
ing model. It can perform linear and nonlinear classifications, regression, and even
pattern detection. SVMs perform best for classification tasks with medium-sized
nonlinear data sets. They need to scale better to massive data sets.
Kim (Kim, 2003) find that SVMs show promise in predicting financial time-
series data. They have found this to be effective in predicting the stock price index.
Identifying the right hyperplane can get challenging in the real world. SVMs
are based on finding a hyperplane that divides the data into classes. These vectors
Fortunately, when using SVMs, one can apply the kernel trick. The kernel
allows one to add a third dimension to capture non-linear use cases. SVM constructs
hyperplanes in a higher dimensional space for classifying data. The optimal hyper-
plane is the one with the largest distance to the training data point of any class.
The SVM has different Kernel functions that can be specified. Standard ker-
27
Figure 5: Maximum-margin hyperplane for an SVM. Samples on the margin are called
the support vectors.
ii) Polynomial - It consists of the similarity of vectors in the training data set
iv) Sigmoid - This equates to a two layer, perceptron neural network. This
For the experiments below we use scikit learn’s SVM module. The following
28
Figure 6: The process of classifying nonlinear data using kernel methods (Raschka
et al., 2022)
Hyper-parameter Description
29
Figure 7: Kernel functions for SVMs
LSTM is a type of RNN. Recurrent Neural Networks (RNNs) are neural net-
works that specialize in processing temporal data. In an RNN, the output of the
recurrent cell in the neural network in a previous step is used to determine the out-
put in the current step. The output of these networks is used as input for the next
steps. This process is repeated and resembles a feed-forward network with multiple
hidden layers. The errors are calculated for each time step, and the weights are up-
dated. This process is called back-propagation through time (BPTT) (Pajankar &
30
Figure 8: RNN architecture (Protopapas et al., 2023)
However, one of the limitations of RNN is that RNNs cannot propagate the
gradients that tend to become zero (or, in some cases, infinite); this is known as the
LSTMs maintain a cell state that can be referenced as a memory. With the
cell having a memory, information from earlier time steps can be transferred to the
later time steps, thus preserving long-term dependencies in sequences. LSTM unit
contains a cell, an input gate, an output gate, and a forget gate. These gates address
the vanishing gradient problem common in ordinary recurrent neural networks with
The input goes through the tanh function to determine the candidate and a
31
sigmoid function to determine the evaluation function. Both these outputs are then
The new output is generated by the forget gate, which provides a new cell
state. It consists of a vector multiplied by the previous cell state to make the values
The output gate combines the cells in the last two gates and the computed
cell state. The tanh function is applied to this state to provide the following cell state
The attention mechanism overcomes the issue with long term memory.
32
Figure 10: Multivariate LSTM model (Ismail et al., 2018)
rameters to the model to find the optimal ones that increase accuracy and give us
the best results. Tuning the parameters can be cumbersome, so we used third-party
libraries with various techniques for finding the optimal parameter values. A cross-
validation technique where the model, and the parameters, are entered. After the
33
“decision function”, “transform” and “inverse transform”. The parameters of the es-
timator used to apply these methods use cross-validated grid-search over a parameter
This is the tool to perform hyper-parameter tuning for Logistic Regression and
SVM models.
Tuner The Keras Tuner is used to tune the parameters for the TensorFlow
LSTM model. It selects the optimal set of hyper-parameters to tune the model.
layers.
algorithm.
algorithms, the most common measure is calculating the model’s accuracy, which
involves comparing the actual prediction to the real value. Below are some of the
Confusion Matrix
34
evaluate the performance of the classification algorithm. The confusion matrix is a
table that contains the performance of the model and is described as follows:
- The rows refer to the instances that belong to that class (ground truth).
Confusion matrices’ configuration allows the user to quickly spot the areas in
which the model has greater difficulty. (Saleh & Sen, 2019)
tion. By definition, a confusion matrix C is such that Cx,y is the number of observa-
From the matrix we get the following information: Tay & Cao (2001)
is present.
is absent.
The sklearn library’s module metrics is used to calculate the following (Pe-
35
T P +T N
Accuracy = T P +T N +F P +F N
TP
Precision = T P +F P
TP
Recall = T P +F N
ROC Curve
The ROC curve considers all possible thresholds for a given classifier. It dis-
plays the false positive rate (FPR) against the true positive rate (TPR). For the ROC
curve, the ideal curve is towards the top left: The classifier that produces a high recall
while keeping a low false positive rate is the one that performs well. (Muller & Guido,
2018)
We use the roc curve and the auc roc score function from the sklearns library
for the experiments. Its purpose is to compute ROC Curve and Area Under the
Receiver Operating Characteristic Curve (AUC) from prediction scores. AUC Score
36
Chapter III.
Below is the list of system resources and software packages used in the exper-
iments.
Computation
Software
37
3.2. Data Collection and Analysis
One of the biggest challenges in data collection was downloading news articles
from dynamic web pages. Most web article aggregators only provide six months of
data that is available at no cost. The callback in the network tab that would allow
Scrapy, a python web extraction framework was used to develop modules also
known as spiders, that make the asynchronous callbacks to the Rest APIs and down-
URL https://www.reuters.com/site-search/?
Performance 6 Months Data (30 secs per ticker) 1 Year Data (2 mins per ticker)
38
Figure 11: Scrapy function call.
There is two types of data that are used for the experiments.
List of companies for which Price Data is available are Microsoft, Apple, Ama-
39
Price charts from left to right, Row 1. AMZN, MSFT. Row 2. APPL, S&P
Observations: As expected, the price data is random time series. The nature
of the data is essential for the input features like price data. The SP prices will be
processed to determine an increase or decrease from the previous day’s price. The
output variable is the increase/decrease in the S&P index price, which makes it binary.
However, if the trends are observed, the general direction of the stocks seem to be in
Articles from reuters.com are downloaded and parsed to extract the following
fields:
As the data is from the JSON payload, there is no need for data cleansing.
Data exploration is essential as reviewing the data content manually could be cum-
40
bersome. Various tools make it possible to visualize the data and interpret patterns
pandas matplotlib
numpy nltk
seaborn wordcloud
textblob spacy
textstat
First, the raw data is analyzed to ensure there are no empty rows the format
Next, the word distribution, character count, and word count were reviewed
41
Figure 13: Apple Inc. Word Distribution.
These tools help us determine how many of the headlines contains the company
searched. While observing the raw data, few headlines mentioned the company name.
The company name appears in the detailed description or the article. In all cases,
The character distribution is between 30 and 90 characters for all articles, with
sentiment analysis, as the process is computationally expensive and uses heavy system
resources. For articles > 150 characters, the FinBERT classification would need to
be batched into 150-200 headlines so we would have enough memory. For articles <
100 chars, the headlines would be batched into batches of 250-300 to be classified.
42
DistilBERT is more efficient than FinBERT, which processed 20K rows of 80-100
Words clouds help review the distribution of bi-grams and tri-grams to ensure
enough context since BERT and FinBERT rely extensively on context. It is evident
from the word cloud that a fair amount of financial terminology is used. If there was
not a good distribution of the financial keywords, additional filters might need to be
The data is ready for sentiment analysis with satisfactory data analysis for all
the companies. Failing that, one would have to review the sources, meta-tags, and
search keywords and repeat the download process to ensure the correct data set for
BERT is used for the baseline analysis of the article sentiment. BERT and
NLP projects.
43
Experimental Setup
timent analysis
english.
Positive 112 24 90
44
A sample of the classified sentiment from the model is shown below.
Observations
i) The ”Negative” sentiment of the news articles for all three companies indi-
ii) As demonstrated in the tables above, some articles can fall into the ”Neu-
tral” category and make their way into the ”Positive” and ”Negative” categories.
iii) The sentiment distribution of the three companies was similar, where there
were more ”Negative” sentiments than ”Positive.” However, this lines up with the
ations, each taking 30-45 minutes. The most performant model was ”distilbert-base-
uncased-finetuned-sst-2-english.”
45
v) There was a need for a third neutral category based on the output of the
news sentiment analysis library. BERT in FinBERT stands for Bidirectional Encoder
Representations from Transformers. FinBERT uses the Reuters TRC2 dataset and
The article headlines are passed to the FinBERT analyzer, and three sentiment
categories are labeled. These categories are ”Positive,” ”Neutral,” and ”Negative.”
There are scores associated with each sentiment. The max of these scores determines
Experimental Setup
timent analysis
Processing Time Avg: 5 mins for 300 articles using Hi RAM, TPUs.
46
Sentiment MSFT AMZN AMAZON
Positive 110 30 92
A sample of the classified sentiment from the FinBERT model is shown below.
47
Microsoft’s headlines with positive sentiment
Observations
Which indicates DistilBERT was probably classifying many ”Neutral” sentiments are
48
”Negative”. There was no change to the ”Positive” sentiments.
List of Experiments: This section goes over the different experiments con-
ducted and the results. All experiments will use the following metrics to evaluate the
model.
i) Confusion Matrix
49
No. Category Algorithm Duration Sentiment Model Target
3.6.1 Experiment 1
Logistic Regression model that predicts the direction of the MSFT stock price.
Input Parameters:
50
Fundamental Sentiment of news headlines.
Technical Momentum.
Ticker MSFT.
Data 6 months.
Hyper-parameter tuning was performed using the criteria listed in the table
below.
Class LogisticRegression()
C np.logspace(-4, 4, 20)
solver [’liblinear’]
51
Predicted
0 1 Total
Actual 0 40 29 69
1 18 30 48
Total 58 60 118
3.6.2 Experiment 2
Input Parameters:
Technical Momentum.
Ticker MSFT.
Data 6 months.
52
′
Input Array C ′ : [0.1, 1, 10, 100, 1000, 10000],
’kernel’: [’rbf’,’sigmoid’,’poly’]
Predicted
0 1 Total
Actual 0 50 4 54
1 1 42 43
Total 51 46 97
3.6.3 Experiment 3
the direction of the stock price. The table below lists the parameters for the experi-
ment.
Input Parameters:
53
Fundamental Sentiment of news headlines.
Technical Momentum.
Ticker MSFT.
Data 6 months.
input unit: 64
Score: 0.97
54
Model Recall = 0.63
Predicted
0 1 Total
Actual 0 44 28 72
1 15 28 43
Total 59 56 115
3.6.4 Experiment 4
Logistic Regression model that predicts the direction of the index price for a period
of 6 months.
Input Parameters:
Technical Momentum.
Data 6 months.
55
Model Accuracy = 0.71
Class LogisticRegression()
C np.logspace(-4, 4, 20)
solver [’liblinear’]
Predicted
0 1 Total
3.6.5 Experiment 5
the SVM model that predicts the direction of the index price for a period of 6 months.
Input Parameters:
56
Fundamental Sentiment of news headlines.
Technical Momentum.
Data 6 months.
′
Input Array C ′ : [0.1, 1, 10, 100, 1000, 10000],
’kernel’: [’rbf’,’sigmoid’,’poly’]
57
Predicted
0 1 Total
1 0 614 614
3.6.6 Experiment 6
and the LSTM model that predicts the direction of the index price for period of 6
months.
Input Parameters:
58
Fundamental Sentiment of news headlines.
Technical Momentum.
Data 6 months.
Score 1.0
Listed below are the keras tuner Hyper Parameter Tuning parameters.
59
Model Recall = .54
Predicted
0 1 Total
1 0 614 614
3.6.7 Experiment 7
that predicts the direction of the stock price (MSFT) over the period of a year. The
Input Parameters:
Technical Momentum.
Ticker MSFT.
Data 1 year.
60
Listed below are the GridSearch Hyper Parameter Tuning parameters.
Class LogisticRegression()
C np.logspace(-4, 4, 20)
solver [’liblinear’]
Predicted
0 1 Total
Actual 0 82 52 134
1 26 68 94
3.6.8 Experiment 8
61
Input Parameters:
Technical Momentum.
Ticker MSFT.
Data 1 year.
′
Input Array C ′ : [0.1, 1, 10, 100, 1000, 10000],
’kernel’: [’rbf’,’sigmoid’,’poly’]
62
Predicted
0 1 Total
1 38 71 109
3.6.9 Experiment 9
ysis and the LSTM that predicts the direction of the stock price (MSFT) over the
period of 1 year.
Input Parameters:
63
Fundamental Sentiment of news headlines.
Technical Momentum.
Ticker MSFT.
Data 1 year.
Hyperparameters:
input unit: 64
Score: 0.8656987249851227
64
Model Precision = .72
Predicted
0 1 Total
1 38 71 109
3.6.10 Experiment 10
and Logistic Regression model that predicts the direction of the index price over the
period of a year. The table below lists the parameters for the experiment.
Input Parameters:
Technical Momentum.
Data 1 year.
solver=’newton-cg’, penalty=’l2’
65
Listed below are the GridSearch Hyper Parameter Tuning parameters.
Class LogisticRegression()
C np.logspace(-4, 4, 20)
solver [’liblinear’]
Predicted
0 1 Total
3.6.11 Experiment 11
the direction of the index price over the period of a year. The table below lists the
66
parameters for the experiment.
Input Parameters:
Technical Momentum.
Data 1 year.
′
Input Array C ′ : [0.1, 1, 10, 100, 1000, 10000],
’kernel’: [’rbf’,’sigmoid’,’poly’]
67
Predicted
0 1 Total
3.6.12 Experiment 12
ysis and the LSTM that predicts the direction of the S&P index over the period of 1
year. The table below lists the parameters for the experiment.
Input Parameters:
68
Fundamental Sentiment of news headlines.
Technical Momentum.
Data 1 year.
Hyperparameters:
input unit: 32
Score: 0.9998965561389923
69
Model Precision = .68
Predicted
0 1 Total
70
Chapter IV.
Discussion
The table below has the performance metrics for each of the experiments to
71
Experiments Performance Matrix - 1 year
Table 45: Roc Curve for Experiments (row 1 - Experiment 1-6) (row 2 - Experiment
7-12)
One of the challenges in the sentiment analysis was to collect articles that
were available at a reasonable cost. Although many sites accumulate data over some
time and charge for data that is over six months, there are no filters for companies.
Reuters API tags headlines in specific categories like business, world, sports, legal,
72
markets, technology, and breaking views. In short, Reuters API provided filters that
was easy since the orchestration of the pipelines is all handled by the framework. With
auto-throttle features, it helps with faster downloads and managing large data loads.
DistilBERT from hugging face was performant and reasonably accurate com-
pared to other BERT-based models. Data for the entire year with all three companies
were classified in less than 30 minutes. The sentiment distribution from the initial
data analysis showed that the data aligned with the market trends. FinBERT per-
formance was good for smaller data sets (six months); however, for larger data sets,
high RAM usage was observed ( > 64GB) and a long processing time. The data
was batched into 200 headlines per batch with TPU and High RAM configuration
in Google Colab. FinBERT had some advantages over DistilBERT, where it added
another category for classification. The sentiment was classified into three different
gories, they also provide a sentiment score between 0 and 1 for each category. There
were 20,000 articles classified over 30 hours in batches of 200 articles per batch.
In conclusion, Both libraries had more negative than positive articles for six
months. However, FinBERT had better distribution when classifying articles for a
year, with slightly more negative articles than positive ones and a more significant
number of neutral and positive articles together than the negative articles.
73
4.2. Baseline Models with BERT Sentiment Analysis
The results of the experiments for the baseline metrics are explained below.
The baseline metrics included the Logistic Regression and SVM models for binary
classification. The two use cases evaluated were predicting the stock (MSFT) price
Stock price prediction 6 months - The baseline metrics for predicting the
direction of the stock price (MSFT) with six months of data showed better results
with SVM (95% accuracy) when compared with Logistic Regression (61%). In the
case of stock prediction, high precision is desirable since about 95% of the time;
the model predicts the direction of the stock price correctly using BERT sentiment
results and the stock price momentum. The SVM model also had a high rate of recall
which means 95% of the time, the model was accurately able to predict the direction
Stock price prediction 1 year - The baseline metrics did improve marginally
for the Logistic Regression model to 66% from 61% suggesting that Logistic Regres-
sion did not perform well with larger data-sets as expected. The SVM model per-
formed better with 72% accuracy and a high level of precision and recall both at
72%.
Index price prediction 6 months - The baseline metrics for predicting the
direction of the index price (S&P 500) with six months of data showed good results
with both Logistic Regression (71% accuracy) and SVM (99.5% accuracy). In the
74
case of stock prediction, high precision is desirable since about 99.5% of the time; the
model predicts the direction of the stock price correctly using BERT sentiment results
and the stock price (MSFT, AMZN, APPL) momentum. The SVM model also had
a high rate of recall which means 99.5% of the time, the model was accurately able
Stock price prediction 1 year - The baseline metrics remained the same for
the Logistic Regression model at 71%, suggesting that adding data did not improve
the Logistic Regression model. The SVM model performed better with 95% accuracy
and a high level of precision and recall, both at 95%. The SVM model performed
well for both six months of data and one year of data when predicting the direction
The performance metrics show that the SVM model significantly outperformed
the traditional logistic regression model. The baseline methods performed better
for index price prediction than stock price prediction. DistilBERT classified the
sentiments well, given the accuracy of the SVM model at 95% for six months for
The results of the experiments for the end state metrics are explained below.
The end state included the LSTM models for time series classification. The two use
cases evaluated were predicting the stock (MSFT) price and the index (S&P 500)
75
price direction.
Stock price prediction - Since in stock price prediction classifying the up-
ward direction is equally as crucial as the downward direction, precision, and recall
must be high. For stock price prediction using FinBERT sentiment analysis, the
LSTM resulted in a 63% accuracy with precision at 65% and recall at 63%. The
model did perform better with more data points, indicating that adding data im-
proved the accuracy (at 72%), precision (at 72%), and recall (at 72%).
Index price prediction - Index price prediction showed weak results for six
months of data with accuracy at 55%; however, with a high rate of precision of 85%,
even with a low accuracy rate, the model could give correct predictions 85% of the
time. However, the recall was low at 54%. When the data set was increased to a year,
there was higher accuracy at 71%. The model accuracy was a 16% improvement in
accuracy. With both precision and recall increasing to 70% and 71%, respectively.
LSTM showed similar results for both stock price prediction and index price
prediction. Lower with smaller data sets and improved with larger data sets. However,
the accuracy still needs to be significantly better than SVM. Hyperparameter tuning
needed careful thought and consideration and was more time-consuming than the
SVM model.
76
4.4. Stock Price Prediction vs Index Price Prediction
All the experiments had interesting outcomes for market predictions. The
results provide a fascinating insight into the algorithms that work well for index and
stock price predictions. Microsoft price data was used for stock price prediction,
and for index price prediction, the S&P price data was used. Three companies’
fundamental and technical data were input parameters for index price prediction.
These companies included Microsoft, Amazon, and Apple based on their weights in
The difference in the input parameters was crucial in the different models
used for stock price prediction and index price prediction. The number of stocks used
resulted in the stock price models having fewer input parameters than the index price
models. As a result, the number of articles for a single stock was less than the total
number for multiple stocks resulting in larger data sets for stock price prediction.
The stock price prediction models did not perform as well as the index pre-
diction models with most algorithms (except the LSTM model). In both cases, the
Additional data points did result in better model accuracy except in the case
of Logistic Regression, where stock market prediction dropped accuracy with a larger
data set. To summarize, data for one year yielded better accuracy in most experiments
LSTM models favored stock market prediction when compared to index market
77
prediction. BERT and FinBERT kept the accuracy of the model relatively the same.
Adding data for a whole year brought challenges with performance. The sys-
tem thresholds were throttled with the sentiment classification tasks (FinBERT sen-
timent classification).
As the experiments above showed, not all models were performed with a year’s
data. For stock prediction, the SVM model that performed well in most experiments
performed poorly with one year’s data giving a 72% accuracy rate compared to a 95%
For index price predictions, most models performed the same with one year’s
data compared to six months, except for the LSTM model. The Logistic Regression
model had the same accuracy at 71% for six months of data and one year’s worth of
data. The SVM model performed well at 95% with one year of data, just as it did
for six months. The LSTM model, however, improved performance by 20% with one
year of data.
78
Chapter V.
Conclusion
5.1. Summary
progress made in prediction accuracy. This work compares traditional machine learn-
ing methods with recent ones, including LSTM and FinBERT, to assess improve-
Our observations show that the FinBERT model had better distribution of
sentiments than the out-of-box BERT or DistilBERT libraries. For both data sets,
the distributions from FinBERT were classified more specifically with the additional
and negative even when the tone was more neutral, which could impact the outcomes
of the predictions.
not improve the performance of the models as expected. We were able to achieve
statistically significant results with DistilBERT. The SVM models with DistilBERT
79
The traditional Logistic Regression model did not perform well, especially
with stock price prediction. Adding features may improve the accuracy of the mod-
els. However, this observation supports the existing trend to move towards machine
learning algorithms, given the exponential increase in the volume of data and pro-
cessing power.
The SVM models performed the best with high accuracy ranging from 72% to
99%. These models performed exceptionally well with the S&P index predictions.
composition. With limited literature on optimizing different layers, input values, and
error functions, this required many iterations and yielded marginal gain in accuracy.
Improvement was achieved with one year’s data, indicating that additional data points
improved model performance, but it still did not perform as well as the SVM model.
The accuracy of the models remained at 72% at their best. The accuracy is a higher
percentage from research listed in chapter 1; it is lower than the accuracy of SVM
models.
That the models build for predicting the direction of the S&P index with
SVM lead to the best performance statistics (accuracy, precision, and recall). SVM
has traditionally done well for classification problems. LSTM and SVM outperform
Some lessons learned were that performing sentiment analysis using FinBERT
pre-trained model required heavy system resources and was time-consuming. When
80
data for an entire year was classified, it required over 30 hours for classification.
DistilBERT resulted in fairly accurate models and took less time. Given the trade-
LSTM models performed better with fewer feature variables which makes us
rethink the approach; LSTM is also a better solution for tracking stock price classi-
SVM outperformed all other models, especially for index price movement,
demonstrating that it did well in classifying the movement and did better with a
with a single feature set, i.e., only price movement or sentiment score as input pa-
rameters. Adding intra-day prices along with the market sentiment might be another
In conclusion, this thesis has attempted to compare the LSTM algorithm and
provide metrics for future experiments to optimize the model’s structure and param-
81
Source Code
The source code for this project can be found in the github repository.
82
References
Achyutha, P. N., Chaudhury, S., Bose, S. C., Kler, R., Surve, J., & Kaliyaperumal, K.
2022.
Agusta, I. M. A. I., Barakbah, A., & Fariza, A. (2022). Technical analysis based auto-
matic trading prediction system for stock exchange using support vector machine.
text mining and machine learning: a systematic literature review. Expert Systems
With Applications.
83
Ballings, M., Van den Poel, D., Hespeels, N., & Gryp, R. (2015). Evaluating multiple
classifiers for stock price direction prediction. Expert Systems with Applications,
42(20), 7046–7056.
Chuang, C. & Yang, Y. (2022). Buy tesla, sell ford: Assessing implicit stock market
ing of the Association for Computational Linguistics (Volume 2: Short Papers) (pp.
100–105).
Dautel, A. J. et al. (2020a). Forex exchange rate forecasting using deep recurrent
Dautel, A. J., Härdle, W. K., Lessmann, S., & Seow, H.-V. (2020b). Forex exchange
rate forecasting using deep recurrent neural networks. Digital Finance, 2(1), 69–96.
DeSola, V., Hanna, K., & Nonis, P. (2019). Finbert: pre-trained model on sec filings
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training
arXiv:1810.04805.
Geron, A. (2019). Hands-on machine learning with Scikit-learn, Keras, and Ten-
84
Gron, A. (2017). Hands-On Machine Learning with Scikit-Learn and TensorFlow:
Huang, A. H., Wang, H., & Yang, Y. (2020). Finbert: A large language model for
Huang, A. H., Wang, H., & Yang, Y. (2022). Finbert: A large language model for
Huynh, H. D., Dang, L. M., & Duong, D. (2017). A new model for stock price
movements prediction using deep neural network. SoICT ’17 (pp. 57–62). New
Ismail, A., Wood, T., & Bravo, H. (2018). Improving long-horizon forecasts with
jae Kim, K. & Han, I. (2000). Genetic algorithms approach to feature discretization
in artificial neural networks for the prediction of stock price index. Expert Systems
Kang, H., Zong, X., Wang, J., & Chen, H. (2023). Binary gravity search algorithm
and support vector machine for forecasting and trading stock indices. International
Kim, K.-J. (2003). Financial time series forecasting using support vector machines.
85
Koosha, E., Seighaly, M., & Abbasi, E. (2022). Measuring the accuracy and precision
of random forest, long short-term memory, and recurrent neural network models
in predicting the top and bottom of bitcoin price. Journal of Mathematics and
Modeling in Finance.
Li, C., Shen, L., & Qian, G. (2023). Online hybrid neural network for stock prices
Li, M., Chen, L., Zhao, J., & Li, Q. (2021). Sentiment analysis of chinese stock
Li, M., Li, W., Wang, F., Jia, X., & Rui, G. (2020). Applying bert to analyze investor
Lin, M. & Chen, C. (2018). Short-term prediction of stock market price based on ga
optimization lstm neurons. ICDLT ’18 (pp. 66–70). New York, NY, USA: Associ-
Liu, Z., Huang, D., Huang, K., Li, Z., & Zhao, J. (2021). Finbert: A pre-trained
Livieris, I. E., Pintelas, E., & Pintelas, P. (2020). A cnn–lstm model for gold price
86
Maindonald, J. & Braun, W. J. (unknown). Data analysis and graphics using r.
Mao, Y., Wei, W., Wang, B., & Liu, B. (2012). : New York, NY, USA: Association
Pajankar, A. & Joshi, A. (2022). Recurrent Neural Networks, (pp. 285–305). Apress:
Berkeley, CA.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn:
Peng, B., Chersoni, E., Hsu, Y.-Y., & Huang, C.-R. (2021). Is domain adaptation
87
worth your investment? comparing BERT and FinBERT on financial tasks. In
Linguistics.
Protopapas, P., Mark, G., & Chris, T. (2023). Lecture Notes. Harvard University.
Raschka, S., Liu, Y., Mirjalili, V., & Dzhulgakov, D. (2022). Machine Learning with
PyTorch and Scikit-Learn: Develop Machine Learning and Deep Learning Models
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert, a distilled version
Socher, R., Bengio, Y., & Manning, C. (2012). Deep learning for nlp. Tutorial at
Souma, W., Vodenska, I., & Aoyama, H. (2019). Enhanced news sentiment analysis
using deep learning methods. Journal of Computational Social Science, 2(1), 33–46.
Steven Bird, E. K. & Loper, E. (2023). Natural language processing with python.
88
Tay, F. E. & Cao, L. (2001). Application of support vector machines in financial time
Vargas, M. R., de Lima, B. S. L. P., & Evsukoff, A. G. (2017). Deep learning for stock
market prediction from financial news articles. In 2017 IEEE International Con-
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser,
L., & Polosukhin, I. (2017). Attention is all you need. Advances in neural infor-
Villamil, L., Bausback, R., Salman, S., Liu, T. L., Horn, C., & Liu, X. (2023). Im-
proved stock price movement classification using news articles based on embeddings
Wójcik, P. & Osowska, E. (2023). The impact of federal open market committee
Yang, C., Ou, K., & Hong, S. (2022a). Application of nonstationary time series
prediction to shanghai stock index based on svm. In Proceedings of the 3rd Asia-
Pacific Conference on Image Processing, Electronics and Computers, IPEC ’22 (pp.
Yang, Y., Hu, X., & Jiang, H. (2022b). Group penalized logistic regressions predict
89
up and down trends for stock prices. The North American Journal of Economics
Yang, Y., Uy, M. C. S., & Huang, A. (2020). Finbert: A pretrained language model
Yıldırım, D. C., Toroslu, I. H., & Fiore, U. (2021). Forecasting directional movement
of forex data using LSTM with technical and macroeconomic indicators. Financial
Innovation, 7, 1–36.
Zhai, Y. et al. (2007). Combining news and technical indicators in daily stock price
Springer.
Zhang, W., Yan, K., & Shen, D. (2021). Can the baidu index predict realized volatility
Zhang, Y. & Wu, L. (2009). Stock market prediction of sp 500 via combination of
improved bco approach and bp neural network. Expert Systems with Applications,
36, 8849–8854.
90