0% found this document useful (0 votes)
159 views15 pages

Predicting Stock Market Trends

This article discusses predicting stock market trends using machine learning and deep learning algorithms based on data from the Tehran stock exchange. It compares nine machine learning models and two deep learning models (RNN and LSTM) on their ability to predict stock market trends from 10 years of historical data on four stock groups. The models are evaluated using both continuous and binary representations of 10 technical indicators. The results show that RNN and LSTM outperform other models when using continuous data, and still perform best with binary data, though the difference is reduced as other models improve with binary inputs.

Uploaded by

şafak erdoğdu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views15 pages

Predicting Stock Market Trends

This article discusses predicting stock market trends using machine learning and deep learning algorithms based on data from the Tehran stock exchange. It compares nine machine learning models and two deep learning models (RNN and LSTM) on their ability to predict stock market trends from 10 years of historical data on four stock groups. The models are evaluated using both continuous and binary representations of 10 technical indicators. The results show that RNN and LSTM outperform other models when using continuous data, and still perform best with binary data, though the difference is reduced as other models improve with binary inputs.

Uploaded by

şafak erdoğdu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

Predicting stock market trends using machine


learning and deep learning algorithms via
continuous and binary data; a comparative
analysis on the Tehran stock exchange
Mojtaba Nabipour 1, Pooyan Nayyeri 2, Hamed Jabani 3, Shahab S. 4, Amir Mosavi 5*

1
Faculty of Mechanical Engineering, Tarbiat Modares University, Tehran, Iran, Mojtaba.nabipour@modares.ac.ir
2
School of Mechanical Engineering, College of Engineering, University of Tehran, Tehran, Iran, pnnayyeri@ut.ac.ir
3
Department of Economics, Payame Noor University, West Tehran Branch, Tehran, Iran, h.jabani@gmail.com
4
Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
5
Kalman Kando Faculty of Electrical Engineering, Obuda University, 1034 Budapest, Hungary

Corresponding author: Amir Mosavi (e-mail: amir.mosavi@kvk.uni-obuda.hu), Shahab S. (shamshirbandshahaboddin@duytan.edu.vn)

ABSTRACT The nature of stock market movement has always been ambiguous for investors because of
various influential factors. This study aims to significantly reduce the risk of trend prediction with machine
learning and deep learning algorithms. Four stock market groups, namely diversified financials, petroleum,
non-metallic minerals and basic metals from Tehran stock exchange, are chosen for experimental
evaluations. This study compares nine machine learning models (Decision Tree, Random Forest, Adaptive
Boosting (Adaboost), eXtreme Gradient Boosting (XGBoost), Support Vector Classifier (SVC), Naïve
Bayes, K-Nearest Neighbors (KNN), Logistic Regression and Artificial Neural Network (ANN)) and two
powerful deep learning methods (Recurrent Neural Network (RNN) and Long short-term memory (LSTM).
Ten technical indicators from ten years of historical data are our input values, and two ways are supposed
for employing them. Firstly, calculating the indicators by stock trading values as continues data, and
secondly converting indicators to binary data before using. Each prediction model is evaluated by three
metrics based on the input ways. The evaluation results indicate that for the continues data, RNN and
LSTM outperform other prediction models with a considerable difference. Also, results show that in the
binary data evaluation, those deep learning methods are the best; however, the difference becomes less
because of the noticeable improvement of models’ performance in the second way.

KEYWORDS Stock market, Trends prediction, Classification, Machine learning, Deep learning

I. INTRODUCTION analysis uses historical charts and patterns to predict future


The task of stock prediction has always been a challenging prices [1&2].
problem for statistics experts and finance. The main reason Stock markets were normally predicted by financial experts
behind this prediction is buying stocks that are likely to in the past time. However, data scientists have started solving
increase in price and then selling stocks that are probably to prediction problems with the progress of learning techniques.
fall. Generally, there are two ways for stock market Also, computer scientists have begun using machine learning
prediction. Fundamental analysis is one of them and relies on methods to improve the performance of prediction models
a company’s technique and fundamental information like and enhance the accuracy of predictions. Employing deep
market position, expenses and annual growth rates. The learning was the next phase in improving prediction models
second one is the technical analysis method, which with better performance [3&4]. Stock market prediction is
concentrates on previous stock prices and values. This full of challenges, and data scientists usually confront some
problems when they try to develop a predictive model.

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

Complexity and nonlinearity are two main challenges caused of all features can noticeably develop the accuracy of the
by the instability of stock market and the correlation between prediction models.
investment psychology and market behavior [5]. Employing tree-based ensemble methods and deep learning
It is clear that there are always unpredictable factors such as algorithms for predicting the stock and stock market trend is
the public image of companies or political situation of a recent research activity. In light of employing bagging and
countries, which affect stock markets trend. Therefore, if the majority vote methods, Tsai et al. [13] used two different
data gained from stock values are efficiently preprocessed kinds of ensemble classifiers, such as heterogeneous and
and suitable algorithms are employed, the trend of stock homogeneous methods. They also consider macroeconomic
values and index can be predicted. In stock market prediction features and financial ratios from Taiwan stock market to
systems, machine learning and deep learning approaches can examine the performance of models. The results
help investors and traders through their decisions.These demonstrated that with respect to the investment returns and
methods intend to automatically recognize and learn patterns prediction accuracy, ensemble classifiers were superior to
among big amounts of information. The algorithms can be single classifiers. Ballings et al. [14] compared the
effectively self-learning, and can tackle the predicting task of performance of AdaBoost, Random Forest and kernel factory
price fluctuations in order to improve trading strategies [6]. versus single models involving SVM, KNN, Logistic
Since recent years, many methods have been improved to Regression and ANN. They predict European company’s
predict stock market trends. The implementation of a model prices for one-year ahead. The final results showed that
combination with Genetic Algorithms (GA), Artificial Neural Random Forest outperformed among all models. Basak et al.
Networks and Hidden Markov Model (HMM) was proposed [15] employed XGBoost and Random Forest methods for the
by Hassan et al. [7]; the purpose was transforming the daily classification problem to forecast the stock increase or
stock prices to independent sets of values as input to HMM. decrease based on previous values. Results showed that the
The predictability of financial trend with SVM model by prediction performances have advanced for several
evaluating the weekly trend of NIKKEI 225 index was companies in comparison with the existing ones. For
investigated by Huang et al. [8]. A comparison between examining macroeconomic indicators to accurately predict
SVM, Linear Discriminant Analysis, Elman stock market for one-month ahead, Weng et al. [16]
Backpropagation Neural Networks and Quadratic improved four ensemble models, boosting regressor, bagging
Discriminant Analysis was their goal. The results indicated regressor, neural network ensemble regressor and random
that SVM was the best classifier method. New financial forest regressor. Indeed, another aim was employing a hybrid
prediction algorithm based on SVM ensemble was proposed way of LSTM to prove that the macroeconomic features are
by Sun et al. [9]. The method for choosing SVM ensemble’ s the most successful predictors for stock market.
base classifiers from candidate ones was proposed by Moving on using deep learning algorithms, Long et al. [17]
deeming both diversity analysis and individual performance. examined a deep neural network model with public market
Final results showed that SVM ensemble was importantly data and the transaction records to evaluate stock price
better than individual SVM for classification. Ten data movement. The experimental results showed that
mining methods were employed by Ou et al. [10] to predict bidirectional LSTM could predict the stock price for financial
value trends of Hang index from Hong Kong stock market. decisions, and the method acquired the best performance
The methods involved Tree based classification, K-nearest compared to other prediction models. Rekha et al. [18]
neighbor, Bayesian classification, SVM and neural network. employed CNN and RNN to make a comparison between
Results indicated that the SVM outperformed other two algorithms’ results and actual results via stock market
predictive models. The price fluctuation by a developed data. Pang et al. [19] tried to improve an advanced neural
Legendre neural network was forecasted by Liu et al. [11] by network method to get better stock market predictions. They
assuming investors’ positions and their decisions by proposed LSTM with an embedded layer and LSTM with an
analyzing the prior data on the stock values. They also automatic encoder to evaluate the stock market movement.
examined a random function (time strength) in the The results showed that the LSTM with embedded layer
forecasting model. Araújo et al. [12] proposed the outperformed and the models’ accuracy for the Shanghai
morphological rank linear forecasting approach to compare composite index is 57.2 and 56.9%, respectively. Kelotra and
its results with time-delay added evolutionary forecasting Pandey [20] used the deep convolutional LSTM model as a
approach and multilayer perceptron networks. predictor to effectively examine stock market movements.
From the above research background, it is clear that each of The model was trained with Rider-based monarch butterfly
the algorithms can effectively solve stock prediction optimization algorithm and they achieved a minimal MSE
problems. However, it is vital to notice that there are specific and RMSE of 7.2487 and 2.6923. Baek and Kim [21]
limitations for each of them. The prediction results not only proposed an approach for stock market index forecasting,
are affected by the representation of the input data but also which included a prediction LSTM module and an overfitting
depend on the prediction method. Moreover, using only prevention LSTM module. The results confirmed that the
prominent features and identifying them as input data instead proposed model had an excellent forecasting accuracy

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

compared to model without an overfitting prevention LSTM and deep learning methods to improve the prediction task of
module. Chung and Shin [22] employed a hybrid approach of stock groups’ trend and movement.
LSTM and GA to improve a novel stock market prediction This paragraph is organized to show the structure of our
model. The final results showed that the hybrid model of paper. Section 2 defines our research data with some
LSTM network and GA was superior in comparison with the statistical data, and two approaches supposed for input
benchmark model. values. Eleven prediction models, including nine machine
Overall, regarding the above literature, prior studies often learning and two deep learning algorithms, are introduced
concentrated on macroeconomic or technical features with and discussed in Section 3. The final results of prediction are
recent machine learning methods to detect stock index or presented in Section 4 with analyzing, and Section 5
values movement without considering appropriate concludes our paper.
preprocessing methods.
Iran's stock market has been highly popular recently because II. Research data
of arising growth of Tehran Price Index in the last decades, In this study, ten years of historical data of four stock market
and one of the reasons is that most of the state-owned firms groups (diversified financials, petroleum, non-metallic
are being privatized under the general policies of article 44 in minerals and basic metals) from November 2009 to
the Iranian constitution, and people are allowed to buy the November 2019 is employed, and all data is gained from
shares of newly privatized firms under the specific www.tsetmc.com website. Figures 1-4 show the number of
circumstances. This market has some specific attributes in increase or decrease cases for each group during ten years.
comparison with other country's stock markets, one of them
is dealing price limitation of ±5% of opening price of the day
for every indexes; this issue hinders the abnormal market
fluctuation and scatter market shocks, political issues, etc.
over specific time and could make the market smoother;
however, the effect of fundamental parameters on this market
is relatively high and the prediction task of future movements
is not simple.
This study concentrates on the process of future trends
prediction for stock market groups, which are crucial for
investors. Despite significant development in Iran stock
market in recent years, there has been not enough research on
the stock price predictions and movements using novel
machine learning methods.
In this paper, we concentrate on comparing prediction
performance of nine machine learning models (Decision
FIGURE 1. The number of increasing and decreasing cases (trading
Tree, Random Forest, Adaboost, XGBoost, SVC, Naïve days) in each year for the diversified financials group.
Bayes, KNN, Logistic Regression and ANN) and two deep
learning methods (RNN and LSTM) to predict stock market
movement. Ten technical indicators are employed as input
values to our models. Our study includes two different
approaches for inputs, continues data and binary data, to
investigate the effect of preprocessing; the former uses stock
trading data (open, close, high and low values) while the
latter employs preprocessing step to convert continues data to
binary one. Each technical indicator has its specific
possibility of up or down movement based on market
inherent properties. The performance of the mentioned
models is compared for the both approaches with three
classification metrics, and the best tuning parameter for each
model (except Naïve Bayes and Logistic Regression) is
reported. All experimental tests are done with ten years of
historical data of four stock market groups (diversified
financials, petroleum, non-metallic minerals and basic FIGURE 2. The number of increasing and decreasing cases (trading
metals), which are completely crucial for investors, from days) in each year for the petroleum group.

Tehran stock exchange. We believe that this study is a new


research paper that incorporates multiple machine learning

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

indicator. The indicators are normalized in the range of (0,


+1) before using to prevent overwhelming smaller values by
larger ones. Figure 5 shows the process of stock trend
prediction with continues data.

FIGURE 1. The number of increasing and decreasing cases (trading


days) in each year for the diversified financials group.

FIGURE 5. Predicting stock movement with continuous data.

B. Binary data
In this approach, a new step is added to convert continuous
values of indicators to binary data based on each indicator’s
nature and property. Figure 6 indicates the process of stock
trend prediction with binary data. Here, binary data is
introduced by +1 as the sign of upward trend and -1 as the
sign of downward trend.

FIGURE 4. The number of increasing and decreasing cases (trading


days) in each year for the basic metals group

In the case of predicting stock market movement, there are


several technical indicators and each of them has a specific
ability to predict future trends of market; however, we choose
ten technical indicators in this paper based on previous
studies [23-25]. Table 1 (in Appendix section) shows
technical indicators and their formulas, and Table 2 (in FIGURE 6. Predicting stock movement with binary data
Appendix section) indicates summary statistics of the
indicators of four stock groups. The inputs for calculating Details about the way of calculating indicators are presented
indicators are open, close, high and low values in each here [25-27]:
trading day. SMA is calculated by the average of prices in a selected
range, and this indicator can help to determine if a price will
continue its trend. WMA gives us a weighted average of the
This paper involves two approaches for input information. last n values, where the weighting falls with each prior price.
continues data is supposed to be based on actual time series, • SMA and WMA: if current value is below the
and binary data is presented with a preprocessing step to moving average then the trend is -1, and if current
convert continues data to binary one with respect to each value is above the moving average then the trend is
indicator nature. +1.
A. Continuous data MOM calculates the speed of the rise or falls in stock prices
In this method, input values to prediction models are and it is a very useful indicator of weakness or strength in
computed from formulas in Table 1 for each technical evaluating prices.

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

• MOM: if the value of MOM is positive then the Naïve Bayes, KNN, Logistic Regression and ANN) and two
trend is +1, otherwise it is -1. deep learning algorithms (RNN and LSTM).
STCK is a momentum indicator over a particular period of A. Decision Tree
time to compare a certain closing price of a stock to its price Decision Tree is a popular supervised learning approach
range. The oscillator sensitivity to market trends can be employed for both regression and classification problems.
reduced by modifying that time period or by a moving The purpose is to make a model which is able to predict a
average of results. STCD measures the relative position of target value by learning easy decision rules formed from the
the closing prices in comparison with the amplitude of price data features. There are some advantages of using this
oscillations in a certain period. This indicator is based on the method like being easy to interpret and understand or Able to
assumption that as prices increase, the closing price tends work out problems with multi-outputs; in contrast, creating
towards the values which belong to the upper part of the area over-complex trees that results in overfitting is a common
of price movements in the preceding period and when prices disadvantage. A schematic illustration of Decision Tree is
decrease, the opposite is correct. LWR is a type of shown in Figure 7.
momentum indicator which evaluates oversold and
overbought levels. Sometimes LWR is used to find exit and
entry times in the stock market. MACD is another type of
momentum indicator which indicates the relationship
between two moving averages of a share’s price. Traders
usually can use it to buy the stock when the MACD crosses
above its signal line and sell the shares when the MACD
crosses below the signal line. ADO is usually used to find out
the flow of money into or out of stock. ADO line is normally
employed by traders seeking to determine buying or selling
time of stock or verify the strength of a trend.
• STCK, STCD, LWR, MACD and ADO: if the
current value (time t) is more than the previous
value (time t-1) then the trend is +1, otherwise it is -
1. FIGURE 7. Schematic illustration of Decision tree

RSI is a momentum indicator that evaluates the magnitude of


recent value changes to assess oversold or overbought B. Random Forest
conditions for stock prices. RSI is showed as an oscillator (a Great number of decision trees make a random forest model.
line graph which moves between two extremes) and moves The method simply averages the prediction result of trees,
between 0 to 100. which is called a forest. Also, this model has three random
• RSI: its value is between 0 and 100. If the RSI value concepts, randomly choosing training data when making
surpasses 70 then the trend is -1, and if the value trees, selecting some subsets of features when splitting nodes
goes below 30 then the trend is +1. For values and considering only a subset of all features for splitting each
between 30 and 70, if the current value (time t) is node in each simple decision tree. During training data in a
larger than the prior value (time t-1) then the trend random forest, each tree learns from a random sample of the
is +1, otherwise it is -1. data points. A schematic illustration of Random forest is
CCI is employed as a momentum-based oscillator to indicated in Figure 8.
determine when a stock price is reaching a condition of being
oversold or overbought. CCI also measures the difference
between the historical average price and the current price.
The indicator determines the time of entry or exit for traders
by providing trade signals.
• CCI: if values surpass 200 then the trend is -1 and if
values go below -200 then the trend is +1. For
values between -200 and 200, if the current value
(time t) is larger than the prior value (time t-1) then
the trend is +1, otherwise it is -1.

III. Prediction models


In this study, we use nine machine learning methods
(Decision Tree, Random Forest, Adaboost, XGBoost, SVC,

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

n
f ( x) = sgn(i yi .K ( x, xi ) + b) (1)
i =1

FIGURE 8. Schematic illustration of Random forest

FIGURE 9. Schematic illustration of SVM


C. Adaboost
Boosting methods are a group of algorithms which convert
SVMs can perform a linear or non-linear classification
weak learners to a powerful learner. The method is an
efficiently, but for non-linear, they must use a kernel trick
ensemble for improving the model predictions of any
which map inputs to high-dimensional feature spaces. SVMs
learning algorithm. The concept of boosting is to sequentially
convert non-separable classes to separable ones by kernel
train weak learners in order to modify their past prediction.
functions such as linear, non-linear, sigmoid, radial basis
AdaBoost is a meta-estimator which starts by fitting a model
function (RBF) and polynomial. The formula of kernel
on the main dataset before fitting additional copies of the
functions is shown in Equations 2-4 where γ is the constant
model on the similar dataset. During the process, samples’
of radial basis function and d is the degree of polynomial
weights are adapted based on the current prediction error, so
function. Indeed, there are two adjustable parameters in the
the subsequent model concentrates more on difficult items.
sigmoid function, the slope α and the intercepted constant c.
D. XGBoost 2
XGBoost is an ensemble tree-based method, and the model RBF : K ( xi , x j ) = exp(− xi − x j ) (2)
applies the principle of boosting for weak learners. XGBoost
Polynomial: K ( xi , x j ) = ( xi .x j + 1)d (3)
was introduced for better speed and performance in
comparison with other tree-bassed models. In-built cross- Sigmoid : K ( xi , x j ) = tanh( xiT y + c) (4)
validation ability, regularization for avoiding overfitting,
efficient handling of missing data, catch awareness, tree
pruning and parallelized tree building are common SVMs are often effective in high dimensional spaces and
advantages of XGBoost method. cases where the number of dimensions is greater than the
number of samples, but to avoid over-fitting in selecting
E. SVC regularization term and kernel functions, the number of
Support Vector Machines (SVMs) are a set of supervised features should be much greater than the number of samples.
learning approaches that can be employed for classification
and regression problems. The classifier version is named F. Naïve Bayes
SVC. The method’s purpose is finding a decision boundary Naïve Bayes classifier is a member of probabilistic classifiers
between two classes with vectors. The boundary must be far based on Bayes' theorem with strong independence
from any point in the dataset, and support vectors are the sign assumptions between the features given the value of the class
of observation coordinates with a gap named margin. SVM is variable. This method is a set of supervised learning
a boundary that best separates two classes with employing a algorithms. The following relationship is stated in Equation 5
line or hyperplane. The decision boundary is defined in by Bayes’ theorem where y is class variable, and x1 through
Equation 1 where SVMs can map input vectors xi ϵ Rd into a xn are dependent feature vectors.
high dimensional feature space Ф(xi) ϵ H, and Ф(.) is mapped
by a kernel function K(xi, xj). Figure 9 shows the schematic
illustration of SVM method.

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

n I. ANN
P( y)  P ( xi y) ANNs are single or multi-layer neural nets which fully
P ( y x1 ,..., xn ) = i =1
P ( x1 ,..., xn ) (5) connected together. Figure 11 shows a sample of ANN with
an input and output layer and also two hidden layers. In a
layer, each node is connected to every other node in the next
Naive Bayes classifier can be highly fast in comparison with layer. By the rise in the number of hidden layers, it is
more sophisticated algorithms. The separation of the class possible to make the network deeper.
distributions means that each one can be independently
evaluated as a one-dimensional distribution. This in turn
helps for alleviating problems from the dimensionality curse.
G. KNN
Two properties usually are suggested for KNN, lazy learning
and non-parametric algorithm, because there is not any
assumption for underlying data distribution by KNN. The
method follows some steps to find targets: Dividing dataset
into training and test data, selecting the value of K,
determining which distance function should be used,
choosing a sample from test data (as a new sample) and
computing the distance to its n training samples, sorting
distances gained and taking k-nearest data samples, and
finally, assigning the test class to the sample on the majority
vote of its k neighbors. Figure 10 shows the schematic
illustration of KNN method.
FIGURE 11. Schematic illustration of ANN

Figure 12 is indicated for each of the hidden or output nodes,


while a node takes the weighted sum of the inputs, added to a
bias value, and passes it through an activation function
(usually a non-linear function). The result is the output of the
node that becomes another node input for the next layer. The
procedure moves from the input to the output, and the final
output is determined by doing this process for all nodes.
Learning process of weights and biases associated with all
nodes for training the neural network.

FIGURE 10. Schematic illustration of KNN

H. Logistic Regression
Logistic regression is used to assign observations to a
separated set of classes as a classifier. The algorithm
transforms its output to return a probability value with the
logistic sigmoid function, and predicts the target by the
concept of probability. Logistic Regression is similar to
Linear Regression model, but the Logistic Regression
employs sigmoid function, instead of logistic one, with more
complexity. The hypothesis behind logistic regression tries to
limit the cost function between 0 and 1.
FIGURE 12. An illustration of relationship between inputs and output
for ANN.

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

Equation 6 shows the relationship between nodes, weights


and biases. The weighted sum of inputs for a layer passed
through a non-linear activation function to another node in
the next layer. It can be interpreted as a vector, where X1, X2
… and Xn are inputs, w1, w2, … and wn are weights
respectively, n is the number of inputs for the final node, f is
activation function and z is the output.

Z = f ( x.w + b) = f ( i =1 xi wi + b)
n
(6)

By calculating weights and biases, the training process is


completed by some rules: initialize the weights and biases for
all the nodes randomly, performing a forward pass by the
current weights and biases, calculating each node output,
comparing the final output with the actual target, and
modifying the weights/biases consequently by gradient
descent with the backward pass, generally known as
backpropagation algorithm. FIGURE 13. An illustration of recurrent network

J. RNN K. LSTM
A very prominent version of neural networks is recognized as LSTM is a specific kind of RNN with a wide range of
RNN which is extensively used in various processes. In a applications like time series analysis, document
normal neural network, the input is processed through a classification, voice and speech recognition. In contrast with
number of layers and an output is made. It is proposed that feedforward ANNs, the predictions made by RNNs are
two consecutive inputs are independent of each other. dependent on previous estimations. In real, RNNs are not
However, the situation is not correct in all processes. For employed extensively because they have a few deficiencies
example, for the prediction of stock market at a certain time, which cause impractical evaluations.
it is crucial to consider the previous observations. Without investigation of too much detail, LSTM solves the
RNN is named recurrent due to it does the same task for each problems by employing assigned gates for forgetting old
item of a sequence when the output is related to the previous information and learning new ones. LSTM layer is made of
computed values. As another important point, RNN has a four neural network layers that interact in a specific method.
specific memory, which stores previous computed A usual LSTM unit involves three different parts, a cell, an
information for a long time. In theory, RNN can use output gate and a forget gate. The main task of cell is
information randomly for long sequences, but in real recognizing values over random time intervals and the task of
practices, there is a limitation to look back just a few steps. controlling the information flow into the cell and out of it
Figure 13 shows the architecture of RNN. belongs to the gates.
L. Models’ parameters
Since stock market data are time-series information, there are
two approaches for training dataset of prediction models.
Because of the recurrent nature of RNN and LSTM models,
the technical indicators of one or more days (up to 30 days)
are considered and rearranged as input data to be fed into the
models. For other models except RNN and LSTM, ten
technical indicators are fed to the model. Output of all
models is the stock trend value with respect to input data. For
recurrent models, output is the stock trend value of the last
day of the training sample.
All models (except Naïve Bayes) have one or several
parameters known as hyper-parameters which should be
adjusted to obtain optimal results. In this paper, one or two
parameters of every model (except Decision Tree and
Logistic Regression which fixed parameter(s) is used) is
selected to be adjusted for an optimal result based on

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

numerous experimental works. In Tables 3-5, all fixed and TABLE 5


ANN, RNN AND LSTM PARAMETERS
variable parameters of tree-based models, traditional ANN Parameters
supervised models, and neural-network-based models are Parameters Value(s)
presented, respectively. Hidden Layer Neuron 20, 50, 100, 200, 500
Count
TABLE 3 Activation Function ReLU, Sigmoid, Tanh
TREE-BASED MODELS PARAMETERS Optimizer Adam:
Model Parameters Value(s) learning rate = 0.001
Decision Tree Max Depth 10 β1=0.9, β2=0.999
Training Stop Condition Early stopping:
Bagging Classifier Max Depth 10 Monitoring parameter = validation
Estimator Decision Tree data accuracy
Number of Trees 50, 100, 150, … , 500 Patience = 100 epochs
Max Epochs 10000
Random Forest Max Depth 10 RNN and LSTM Parameters
Number of Trees 50, 100, 150, … , 500 Parameters Value(s)
Hidden Layer Neuron 500
Adaboost Max Depth 10 Count
Estimator Decision Tree Number of Training Days 1, 2, 5, 10, 20, 30
Number of Trees 50, 100, 150, … , 500 Neuron Type RNN/LSTM
Activation Function Tanh, Softmax
Learning Rate 0.1 Optimizer Adam:
learning rate = 0.00005
Gradient Boosting Max Depth 10 β1=0.9, β2=0.999
Number of Trees 50, 100, 150, … , 500 Training Stop Condition Early stopping:
monitoring parameter = validation
Learning Rate 0.1 data accuracy
patience = 100 epochs
XGBoost Max Depth 10 Max Epochs 10000
Number of Trees 50, 100, 150, … , 500

Objective Logistic Regression for


Binary Classification IV. Experimental results
A. Classification metrics
F1-Score, Accuracy and Receiver Operating Characteristics-
TABLE 4 Area Under the Curve (ROC-AUC) metrics are employed to
TRADITIONAL SUPERVISED MODELS PARAMETERS
evaluate the performance of our models. For Computing F1-
Model Parameters Value(s)
score and Accuracy, Precision and Recall must be evaluated
SVC Kernels Linear, Poly (degree = 3), RBF,
Sigmoid by Positive (TP), True Negative (TN), False Positive (FP)
and False Negative (FN). These values are indicated in
Naïve Bayes C 1.0
Equations 7 and 8.
Gamma 1/(numf×variancef)
f : features
Precision = TP (7)
Algorithm Gaussian
TP + FP
KNN Classifier Number of 1, 2, 3, … , 100
Neighbors Recall = TP (8)
TP + FN
Algorithm K-dimensional Tree
Logistic Weights Uniform
Regression
By calculation of above equations, F1-Score and Accuracy
Leaf Size 30
are defined in Equations 9 and 10.
Accuracy = TP + TN
Metric Euclidean Distance (L2)
(9)
Tolerance 10-4 TP + FP + TN + FN
F1 − Score = 2  Pr ecision  Re call
Model C 1.0
Penalty Euclidean Distance (L2) Pr ecision +(10
Re
)
call
Parameters Value(s)
SVC Kernels Linear, Poly (degree = 3), RBF,
Sigmoid Among classification metrics, Accuracy is a good metric, but
C 1.0 it is not enough for all classification problems. It is often
Gamma 1/((num)f×(variance)f) necessary to look at some other metrics to make sure that a
f : features model is reliable. F1-Score might be a better metric to
employ if results need to achieve a balance between Recall
and Precision, especially when there is an uneven class

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

distribution. ROC-AUC is another powerful metric for


classification problems, and is calculated based on the area TABLE 6
TREE-BASED MODELS WITH BEST PARAMETERS FOR CONTINUOUS DATA
under ROC-AUC curve from prediction scores. Stock
Prediction Model
Group
Decision Tree Random Forest
B. Results
RO RO
For training machine learning models, we implement the F1- F1-
Accur C ntre Accur C ntre
scor scor
following steps: normalizing features (just for continues e
acy AU es
e
acy AU es
data), randomly splitting the main dataset into train data and C C
Div. 0.69 0.684 0.68 0.72 0.721 0.72
test data (30% of dataset was assigned to the test part), fitting Fin. 93 6 38
1
00 8 24
50
the models and evaluating them by validation data (and 0.71 0.653 0.63 0.75 0.707 0.68
Metals 1 100
“early stopping”) to prevent overfitting, and using metrics for 64 8 47 53 7 98
final evaluation with test data. The creating deep models is Miner 0.66 0.651 0.65 0.74 0.728 0.72
1 100
als 58 3 19 64 2 71
different from machine learning when the input values must Petrole 0.64 0.664 0.66 0.70 0.730 0.72
1 250
be three dimensional (samples, time_steps, features); so, we um 59 1 32 42 8 88
use a function to reshape the input values. Also, weight Adaboost XGBoost
RO RO
regularization and dropout layer are employed to prevent F1-
Accur C ntre
F1-
Accur C ntre
overfitting here. All coding process in this study is scor scor
acy AU es acy AU es
e e
implemented by python3 with Scikit Learn and Kears library. C C
Based on extensive experimental works by deeming the Div. 0.72 0.723 0.72 0.72 0.716 0.71
250 100
Fin. 66 1 05 13 7 67
approaches, the following outcomes are obtained: 0.75 0.705 0.69 0.75 0.706 0.69
In the first approach, continuous data for the features is used, Metals 250 150
53 1 04 77 4 06
and Tables 6-8 show the result of this method. For each Miner 0.72 0.706 0.70
100
0.71 0.701 0.70
50
als 77 4 46 96 3 05
model, the prediction performance is evaluated by the three
Petrole 0.71 0.721 0.72 0.69 0.711 0.71
metrics. Also, the best tuning parameter for all models 50 250
um 48 8 17 64 5 07
(except Naïve Bayes and Logistic Regression) is reported.
For achieving a better image of experimental works, Figure
14 is made to indicate the average of F1-score based on TABLE 7. SUPERVISED MODELS WITH BEST PARAMETERS FOR
CONTINUOUS DATA
average running time through the stock market groups. It can
be seen that Naive-Bayes and Decision Tree are least Stock
Prediction Model
accurate (approximately 68%) while RNN and LSTM are top Group
predictors (roughly 86%) with a considerable difference
compared to other models. Indeed, the running time of those SVC Naïve Bayes
superiors is more than other algorithms. F1- Accura ROC F1- Accura ROC
In the second approach, binary data for the features is score cy AUC score cy AUC
employed, and Tables 9-11 demonstrate the result of this Kernel
way. The structure and experimental works here are similar
to the first approach except inputs where we use an extra Div. 0.73 0.7154 0.71 0.68 0.6782 0.67
Poly
Fin. 12 43 66 80
layer to convert continues data to binary one based on the
0.78 0.7269 0.70 0.72 0.6846 0.68
nature and property of the features. Similarly, for better Metals
33 29
RBF
23 19
understanding, Figure 15 is made to show the average of F1- Minera 0.75 0.7282 0.72 0.66 0.6692 0.67
score based on average running time through the stock 29 48 Linear 58 43
ls
market groups. It is clear that there is a significant Petrole 0.69 0.7051 0.70
RBF
0.64 0.6795 0.67
improvement in the prediction performance of all models in um 17 45 29 71
Logistic Regression
comparison with the first approach, and this achievement is KNN
obviously shown in Figure 16. There is no change in the F1- Accura ROC
inferior methods (Naive-Bayes and Decision Tree with score cy AUC Neighb F1- Accura ROC
roughly 85% F1-score) and the superior predictors (RNN and ors score cy AUC
LSTM with approximately 90% F1-score), but the difference
between them becomes less by binary data. Also, the Div. 0.72 0.7141 0.71 0.73 0.7167 0.71
47
Fin. 44 36 21 36
prediction process for all models is faster in the second
0.78 0.7359 0.71 0.77 0.7167 0.69
approach. Metals 59 71 21 10 65
Minera 0.73 0.7167 0.74 0.75 0.7282 0.72
41
ls 53 15 29 48
Petrole 0.69 0.7103 0.70 0.69 0.7077 0.70
17
um 29 92 11 67

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

TABLE 8 RO RO
F1- F1-
NEURAL-NETWORK-BASED MODELS WITH BEST PARAMETERS FOR Accur C ntre Accur C ntre
scor scor
CONTINUOUS DATA acy AU es acy AU es
e e
Stock C C
Grou Prediction Model Div. 0.85 0.856 0.85 0.85 0.855 0.85
400 50
p Fin. 38 4 64 23 1 51
ANN 0.87 0.851 0.83 0.87 0.852 0.84
Metals 450 50
F1- Accu RO 92 3 65 88 6 03
sco racy C Miner 0.86 0.867 0.86 0.86 0.867 0.86
Activation Func./epochs 300 150
re AU als 74 9 80 68 9 81
C Petrole 0.84 0.846 0.84 0.84 0.843 0.84
50 100
Div. 0.7 0.75 0.7 um 13 2 70 07 6 51
ReLU/245
Fin. 590 00 495
Metal 0.7 0.73 0.7
ReLU/90
s 932 59 091 TABLE 10
Mine 0.7 0.74 0.7 SUPERVISED MODELS WITH BEST PARAMETERS FOR BINARY DATA
ReLU/233
rals 671 62 437
Stock
Petro 0.6 0.71 0.7 Prediction Model
Tanh/148 Group
leum 932 28 116
RNN LSTM SVC Naïve Bayes
F1- Accu RO F1- Accu RO F1- Accura ROC F1- Accura ROC
score cy AUC Kernel score cy AUC
sco racy C ndays/e sco racy C ndays/e
re AU pochs re AU pochs Div. 0.85 0.8590 0.85 0.83 0.8410 0.84
C C Linear
Fin. 53 88 51 06
Div. 0.8 0.86 0.8 0.8 0.86 0.8 0.88 0.8679 0.86 0.84 0.8295 0.83
20/842 20/773 Metals Poly
Fin. 620 43 643 638 43 643 72 45 66 54
Metal 0.8 0.82 0.8 0.8 0.82 0.8 Minera 0.87 0.8718 0.87 0.83 0.8372 0.83
20/772 20/525 Linear
s 571 82 238 581 95 254 ls 21 18 13 75
Mine 0.8 0.87 0.8 0.8 0.87 0.8 Petrole 0.85 0.8641 0.86 0.83 0.8423 0.84
5/398 5/402 Poly
rals 810 16 702 798 16 709 um 44 30 27 16
Petro 0.8 0.82 0.8 0.8 0.83 0.8 KNN Logistic Regression
10/373 10/358
leum 279 24 221 356 14 312 F1- Accura ROC Neighb F1- Accura ROC
score cy AUC ors score cy AUC
Div. 0.86 0.8551 0.85 0.85 0.8564 0.85
13
Fin. 07 63 26 62
0.88 0.8641 0.85 0.88 0.8603 0.85
Metals 60
94 02 37 10
Minera 0.86 0.8667 0.86 0.86 0.8667 0.86
27
ls 49 68 80 66
Petrole 0.84 0.8526 0.85 0.85 0.8641 0.86
21
um 73 32 32 26

TABLE 11
NEURAL-NETWORK-BASED MODELS WITH BEST PARAMETERS FOR BINARY
DATA
Stock
Grou Prediction Model
p
FIGURE 14. Average of F1-Score based on average logarithmic running ANN
per sample for continues data F1- Accu RO
sco racy C
Activation Func./epochs
TABLE 9 re AU
TREE-BASED MODELS WITH BEST PARAMETERS FOR BINARY DATA C
Stock Div. 0.8 0.87 0.8
Prediction Model Sigmoid/111
Group Fin. 691 56 750
Decision Tree Random Forest Metal 0.8 0.87 0.8
Tanh/6
RO RO s 925 18 645
F1- F1- Mine 0.8 0.87 0.8
Accur C ntre Accur C ntre Tanh/305
scor scor rals 733 05 704
acy AU es acy AU es
e e Petro 0.8 0.87 0.8
C C ReLU/19
Div. 0.84 0.846 0.84 0.85 0.853 0.85 leum 646 31 722
1 450 RNN LSTM
Fin. 21 2 60 08 8 38
0.87 0.847 0.83 0.87 0.851 0.83 F1- Accu RO F1- Accu RO
Metals 1 400 sco racy C ndays/e sco racy C ndays/e
38 4 64 94 3 60
Miner 0.86 0.866 0.86 0.86 0.867 0.86 re AU pochs re AU pochs
1 100 C C
als 60 7 68 71 9 80
Petrole 0.82 0.834 0.83 0.84 0.844 0.84 Div. 0.9 0.90 0.9 0.8 0.89 0.8
1 150 5/68 5/61
um 78 6 49 02 9 57 Fin. 024 12 016 994 86 991
Adaboost XGBoost Metal 0.9 0.88 0.8 5/233 0.9 0.88 0.8 5/252

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

s 011 19 727 017 19 714 more than others because of using large amount of epochs
Mine 0.8 0.88 0.8 0.8 0.88 0.8 and values related to some days before.
2/284 2/143
rals 943 97 895 900 46 842
Petro 0.8 0.89 0.8 0.8 0.89 0.8 Overall, it is obvious that all the prediction models perform
2/115 2/152
leum 852 36 923 828 10 899 well when they are trained with continuous values (up to
67%), but the models’ performance is remarkably improved
when they are trained with binary data (up to 83%). The
result behind this improvement is interpreted as follows: an
extra layer is employed in the second approach, and the duty
of the layer is comparing each current continuous value (at
time t) with previous value (at time t-1). So the future up or
down trend is identified and when binary data is given as the
input values to the predictors, we enter data with a
recognized trend based on each feature’s property. This
critical layer is able to convert non-stationary values in the
first approach to trend deterministic values in the second one,
and algorithms must find the correlation between input trends
and output movement as an easier prediction task.
Despite noticeable efforts to find valuable studies on the
FIGURE 15. Average of F1-Score based on average logarithmic running
per sample for binary data. same stock market, there is not any significant paper to
report, and this deficiency is one of the novelty of this
research. We believe that this paper can be a baseline to
compare for future studies.

V. Conclusions
The purpose of this study was the prediction task of stock
market movement by machine learning and deep learning
algorithms. Four stock market groups, namely diversified
financials, petroleum, non-metallic minerals and basic
metals, from Tehran stock exchange were chosen, and the
dataset was based on ten years of historical records with ten
technical features. Also, nine machine learning models
(Decision Tree, Random Forest, Adaboost, XGBoost, SVC,
Naïve Bayes, KNN, Logistic Regression and ANN) and two
deep learning methods (RNN and LSTM) were employed as
predictors. We supposed two approaches for input values to
FIGURE 16. The average of F1-Score with continuous and binary data models, continuous data and binary data, and we employed
for all models.
three classification metrics for evaluations. Our experimental
works showed that there was a significant improvement in
As a prominent result, deep learning methods (RNN and
the performance of models when they use binary data instead
LSTM) show a powerful ability to predict stock movement in
of continuous one. Indeed, deep learning algorithms (RNN
both approaches, especially for continues data when the
and LSTM) were our superior models in both approaches.
performance of machine learning models is so weaker than
binary method. However, the running time of those is always
[4] Long, Wen, Zhichen Lu, and Lingxiao Cui. "Deep
learning-based feature engineering for stock price
movement prediction." Knowledge-Based Systems 164
REFERENCES (2019): 163-173.
[5] Duarte, Juan Benjamin Duarte, Leonardo Hernán Talero
Sarmiento, and Katherine Julieth Sierra Juárez.
[1] Murphy, John J. Technical analysis of the financial "Evaluation of the effect of investor psychology on an
markets: A comprehensive guide to trading methods and artificial stock market through its degree of
applications. Penguin, 1999. efficiency." Contaduría y Administración 62.4 (2017):
[2] Turner, Toni. A Beginner's Guide To Day Trading 1361-1376.
Online 2nd Edition. Simon and Schuster, 2007. [6] Lu, Ning. "A machine learning approach to automated
[3] Maqsood, Haider, et al. "A local and global event trading." Boston, MA: Boston College Computer Science
sentiment based efficient stock exchange forecasting Senior Thesis (2016).
using deep learning." International Journal of [7] Hassan, Md Rafiul, Baikunth Nath, and Michael Kirley.
Information Management 50 (2020): 432-451. "A fusion model of HMM, ANN and GA for stock

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

market forecasting." Expert systems with [26] Majhi, Ritanjali, et al. "Efficient prediction of stock
Applications 33.1 (2007): 171-180. market indices using adaptive bacterial foraging
[8] Huang, Wei, Yoshiteru Nakamori, and Shou-Yang optimization (ABFO) and BFO based
Wang. "Forecasting stock market movement direction techniques." Expert Systems with Applications 36.6
with support vector machine." Computers & operations (2009): 10097-10104.
research 32.10 (2005): 2513-2522.gg [27] Chen, Yingjun, and Yongtao Hao. "A feature weighted
[9] Sun, Jie, and Hui Li. "Financial distress prediction using support vector machine and K-nearest neighbor
support vector machines: Ensemble vs. algorithm for stock market indices prediction." Expert
individual." Applied Soft Computing 12.8 (2012): 2254- Systems with Applications 80 (2017): 340-355.
2265.
[10] Ou, Phichhang, and Hengshan Wang. "Prediction of
stock market index movement by ten data mining
techniques." Modern Applied Science 3.12 (2009): 28-
42.
[11] Liu, Fajiang, and Jun Wang. "Fluctuation prediction of
stock market index by Legendre neural network with
random time strength function." Neurocomputing 83
(2012): 12-21.
[12] Tsai, Chih-Fong, et al. "Predicting stock returns by
classifier ensembles." Applied Soft Computing 11.2
(2011): 2452-2459.
[13] AraúJo, Ricardo De A., and Tiago AE Ferreira. "A
morphological-rank-linear evolutionary method for
stock market prediction." Information Sciences 237
(2013): 3-17.
[14] Ballings, Michel, et al. "Evaluating multiple classifiers
for stock price direction prediction." Expert Systems
with Applications 42.20 (2015): 7046-7056.
[15] Basak, Suryoday, et al. "Predicting the direction of stock
market prices using tree-based classifiers." The North
American Journal of Economics and Finance 47 (2019):
552-567.
[16] Weng, Bin, et al. "Macroeconomic indicators alone can
predict the monthly closing price of major US indices:
Insights from artificial intelligence, time-series analysis
and hybrid models." Applied Soft Computing 71 (2018):
685-697.
[17] Long, Jiawei, et al. "An integrated framework of deep
learning and knowledge graph for prediction of stock
price trend: An application in Chinese stock exchange
market." Applied Soft Computing (2020): 106205.
[18] Rekha, G., et al. "Prediction of Stock Market Using
Neural Network Strategies." Journal of Computational
and Theoretical Nanoscience 16.5-6 (2019): 2333-2336.
[19] Pang, Xiongwen, et al. "An innovative neural network
approach for stock market prediction." The Journal of
Supercomputing (2018): 1-21.
[20] Kelotra, A. and P. Pandey, Stock Market Prediction
Using Optimized Deep-ConvLSTM Model. Big Data,
2020. 8(1): p. 5-24.
[21] Baek, Yujin, and Ha Young Kim. "ModAugNet: A new
forecasting framework for stock market index value with
an overfitting prevention LSTM module and a prediction
LSTM module." Expert Systems with Applications 113
(2018): 457-480.
[22] Chung, H. and K.-s. Shin, Genetic algorithm-optimized
long short-term memory network for stock market
prediction. Sustainability, 2018. 10(10): p. 3765.
[23] Kara, Yakup, Melek Acar Boyacioglu, and Ömer Kaan
Baykan. "Predicting direction of stock price index
movement using artificial neural networks and support
vector machines: The sample of the Istanbul Stock
Exchange." Expert systems with Applications 38.5
(2011): 5311-5319.
[24] Patel, Jigar, et al. "Predicting stock market index using
fusion of machine learning techniques." Expert Systems
with Applications 42.4 (2015): 2162-2172.
[25] Patel, Jigar, et al. "Predicting stock and stock price index
movement using trend deterministic data preparation
and machine learning techniques." Expert systems with
applications 42.1 (2015): 259-268

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

Appendix section

TABLE 1
SELECTED TECHNICAL INDICATORS (N IS 10 HERE)
Ct + Ct −1 + ... + Ct −n +1
Simple n-day moving average (SMA) =
n
n  Ct + (n − 1)  Ct −1 + ... + Ct −n+1
n + (n − 1) + ... + 1
Weighted 14-day moving average (WMA) =

Momentum (MOM) = Ct − Ct −n +1
Ct − LL t _ t −n +1
Stochastic K% (STCK) = 100
HH t _ t −n+1 − LL
t _ t −n+1

K t + K t −1 + ... + K t − n +1
Stochastic D% (STCD) = 100
n
100
Relative strength index (RSI) = 100 − n −1
1 + ( UPt −i )
i =1
n −1
( DWt −i )
i =1

2 2
Signal(n)t (SIG) = MACDt  + Signal ( n )t −1 *(1 − )
n +1 n +1

HH t _ t −n +1 − Ct
Larry William’s R% (LWR) = 100
HH t _ t −n+1 − LLt _ t −n+1

H t − Ct
Accumulation/Distribution oscillator (ADO) =
H t − Lt

M t − SM t
Commodity channel index (CCI) =
0.015Dt
While:
Ct is the closing price at time t
Lt and Ht is the low price and high price at time t respectively
LLt_t-n+1 and HH t_t-n+1 is the lowest low and highest high prices in the last n days respectively
UPt and DWt means upward price change and downward price change at time t respectively
2 2
EMA(K)t = EMA ( K )t −1  (1 − ) + Ct 
k +1 k +1
Moving average convergence divergence (MACDt) = EMA(12)t- EMA(26)t
H t + Lt + Ct
Mt =
3
n −1

M
i =0
t −i
SMt =
n

n −1
i =0
M t −i − SM t
Dt =
n
VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3015966, IEEE Access

TABLE 2
SUMMARY STATISTICS OF INDICATORS.
Feature Max Min Mean Standard Deviation
Diversified Financials
SMA 6969.46 227.5 1471.201 1196.926
WMA 3672.226 119.1419 772.5263 630.0753
MOM 970.8 -1017.8 21.77033 126.5205
STCK 99.93224 0.159245 53.38083 19.18339
STCD 96.9948 14.31843 53.34332 15.28929
RSI 68.96463 27.21497 50.18898 6.471652
SIG 310.5154 -58.4724 16.64652 51.62368
LWR 99.84076 0.06776 46.61917 19.18339
ADO 0.99986 0.000682 0.504808 0.238426
CCI 270.5349 -265.544 14.68813 101.8721
Basic Metals
SMA 322111.5 7976.93 69284.11 60220.95
WMA 169013.9 4179.439 36381.48 31677.51
MOM 39393.8 -20653.8 1030.265 4457.872
STCK 98.47765 1.028891 54.64576 16.41241
STCD 90.93235 12.94656 54.64294 13.25043
RSI 72.18141 27.34428 49.8294 6.113667
SIG 12417.1 -4019.14 803.5174 2155.701
LWR 98.97111 1.522349 45.36526 16.43646
ADO 0.999141 0.00097 0.498722 0.234644
CCI 264.6937 -242.589 23.4683 99.14922
Non-metallic Minerals
SMA 15393.62 134.15 1872.483 2410.316
WMA 8081.05 69.72762 985.1065 1272.247
MOM 1726.5 -2998.3 49.21097 264.0393
STCK 100.00 0.154268 54.71477 20.2825
STCD 96.7883 13.15626 54.68918 16.37712
RSI 70.89401 24.07408 49.67247 6.449379
SIG 848.558 -127.47 37.36441 123.9744
LWR 99.84573 -2.66648 45.28523 20.2825
ADO 0.998941 0.00036 0.501229 0.238008
CCI 296.651 -253.214 20.06145 101.9735
Petroleum
SMA 1349138 16056.48 243334.2 262509.8
WMA 707796.4 8580.536 127839.1 138101
MOM 227794 -136467 4352.208 26797.25
STCK 100.00 0.253489 53.78946 22.0595
STCD 95.93565 2.539517 53.83312 17.46646
RSI 75.05218 23.26627 50.02778 6.838486
SIG 71830.91 -33132 3411.408 11537.98
LWR 99.74651 -1.8345 46.23697 22.02162
ADO 0.999933 0.000288 0.498381 0.239229
CCI 286.7812 -284.298 14.79592 101.8417

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy