Stock Market Price Prediction
Stock Market Price Prediction
Stock Market Price Prediction
ACKNOWLEDGEMENT..........................................................................................
Declaration..............................................................................................................
Chapter 1 Introduction...........................................................................................
Purpose................................................................................................................
Scope...................................................................................................................
Stock analysis: fundamental analysis vs. technical analysis...............................
Problem Statement..............................................................................................
Chapter 2 Literature Survey.................................................................................
2.1 Introduction to Literature Review 7
Chapter 3 Methodology.........................................................................................14
3.1 PROPOSED SYSTEMS 14
3.1.1 Time Series Analysis 14
3.1.2 Long short-term memory network: 20
3.2 SYSTEM ARCHITECTURE 25
Chapter 4 Design..........................................................................................26
4.1 Structure Chart 26
4.2 UML Diagrams 27
4.2.1 Use Case Diagram 29
4.2.2 Sequence Diagram 30
4.2.3 Activity Diagram 31
4.2.4 Collaboration Diagram 32
4.2.5 Flow Chart 33
4.2.6 Component Diagram 34
Chapter 5 Experiment Analysis...........................................................................35
5.1 System configuration 35
5.2 Sample code 36
Chapter 6 Conclusion And Future Work.................................................66
6.1 Conclusion 66
6.2 Future work 66
ACKNOWLEDGEMENT
I would like to thanks my project guide Indra Nath Sahu sir, for
contributing a lot of efforts in providing me with resources, study
materials and reading manuscript and for devoting his valuable time
and always begins a constant source of advice and inspiration
without which this work wouldn’t have been possible.
I would like to thanks Dr. A.K. Mahto, coordinator of MCA/MSC
IT, all the faculties and other staff of department MASTER IN
COMPUTER APPLICATION of DSPMU, for their cooperation
and encouragement throughout the period of my projects and stay in
DSPMU, Ranchi.
Date: Sawan
Kumar
Declaration
I, Sawan Kumar hereby declare that the project report “Stock
Market Price Prediction “is based on my own work carried out
during the course of our study under the supervision of Pooja Jha
Mam, Department of MCA, DSPMU RANCHI.
I assert the statements made and conclusions drawn are an outcome
of my research work. I further certify that
I. The work contained in the report is original and has been done
by me under the general supervision of my supervisor.
II. The work has not been submitted to any other Institution for
any other degree/diploma/certificate in this university or any
other University of India or abroad.
III. We have followed the guidelines provided by the university in
writing the report.
IV. Whenever we have used materials (data, theoretical analysis, and
text) from other sources, we have given due credit to them in the text
of the report and giving their details in the references.
Chapter 1 Introduction
Purpose
Stock Market Price is known for being volatile, dynamic and nonlinear.
Accurate stock price prediction is extremely challenging because of
multiple (macro and micro) factors, such as politics, global economic
conditions, unexpected events, a company’s financial performance,
and so on.
But, all of this also means that there’s a lot of data to find patterns in.
So, financial analysts, researchers, and data scientists keep exploring
analytics techniques to detect stock market trends. This gave rise to the
concept of algorithmic trading, which uses automated, pre-
programmed trading strategies to execute orders.
Scope
Despite the volatility, stock prices aren’t just randomly generated
numbers. So, they can be analysed as a sequence of discrete-time data;
in other words, time-series observations taken at successive points in
time (usually on a daily basis). Time series forecasting (predicting
future values based on historical values) applies well to stock
forecasting.
Because of the sequential nature of time-series data, we need a way to
aggregate this sequence of information. From all the potential
techniques, the most intuitive one is MA with the ability to smooth out
short-term fluctuations.
Fundamental analysis:
5
Evaluates a company’s stock by examining its intrinsic
o
value, including but not limited to tangible assets,
financial statements, management effectiveness, strategic
initiatives, and consumer behaviors; essentially all the
basics of a company.
o Being a relevant indicator for long-term investment, the
fundamental analysis relies on both historical and
present data to measure revenues, assets, costs,
liabilities, and so on.
o Generally speaking, the results from fundamental
analysis don’t change with short-term news.
Technical analysis:
o Analyzes measurable data from stock market activities,
such as stock prices, historical returns, and volume of
historical trades; i.e. quantitative information that could
identify trading signals and capture the movement
patterns of the stock market.
o Technical analysis focuses on historical data and current
data just like fundamental analysis, but it’s mainly used
for short-term trading purposes.
o Due to its short-term nature, technical analysis results
are easily influenced by news.
o Popular technical analysis methodologies include moving
average (MA), support and resistance levels, as well
as trend lines and channels.
Problem Statement
Time Series forecasting & modelling plays an
important role in data analysis. Time series analysis is a
specialized branch of statistics used extensively in fields such as
Econometrics & Operation Research. Time Series is being widely
used in analytics & data science. Stock prices are volatile in
nature and price depends on various factors. The main aim of
this project is to predict stock prices using Long short-term
memory (LSTM).
2.1.1.1 Interval
Millions of trades take place every second. Hence, for analysis we need
to classify these trades based on the interval at which they took place.
We can divide these trades into intra-day and long-term intervals.
7
Intra-day can be sub-classified into 1 minute, 2, 5, 10, 15, 30 and 60
minutes. Long term intervals can be classified into daily, weekly,
monthly and so on.
2.1.1.3 Trend
At any instance of time, any stock will have either higher demand than
supply or lower demand than available supply. Hence, we can classify
the Trend into two types namely Bearish and Bullish. A stock is said to
be in Bullish Trend if it has higher demand than its supply at that
instance of time. If the stock has higher available supply when
compared with demand, it is said it be in Bearish Trend.
8
Figure 1 Japanese Candle
Figure 1 represents two different candle sticks. The green candle
represents gain, and red candle represents loss. This candle stick
always contains all attributes associated with an interval (Example:
tick prices). Since it represents an interval what we are looking at, we
will refer to a candle stick as an interval in the rest of this report.
9
2.1.3 Accuracy
Accuracy is defined as the result of number of right decisions made
divided by total number of decisions.
2.1.3.1 Accuracy based on Close Price
When we calculated accuracy, if we relied on next interval's close
price for decision verification, we have mentioned it as Accuracy based
on Close price in the rest of the report. When we make a Buy call, to
verify based on close price, we check if next interval's close price is
above current interval's close price. If so, we consider the decision as a
right decision. Also, if we make a Short sell, to verify the decision based
on close price, we check if next interval's close price is lesser than that
of current interval's close price. If so, we consider the decision as right
decision.
10
2.1.4 Technical Indicators
Technical Indicators are the properties associated with any stock
based on its tick prices (2.1.1.2). Calculation of each indicator is
mentioned in the following subsections.
Interpretation of each of the indicator is explained in detail in section
3.0.
EMA =
RSI indicates the strength of the current trend. Higher the value
of interval we choose, we get stable RSI values. We need to find out a
threshold value. If RSI falls below its threshold, it is an indication of
sellers taking over buyers. If RSI value rises over its threshold, it
indicates that buyers are taking over sellers and stock prices will go
high. [14]
2.1.4.4 Bollinger Band (BB)
Bollinger Band is calculated based on Standard Deviation
(2.1.2.3) and close price of the stock at a given interval. [12] Bollinger
Bands are calculated based on the following formulae:
stock can rise if it is in Bullish trend. Or, how much can it fall, if the
At a given interval %k’s value falls below %d’s value, market takes
Bearish trend.
Otherwise, if %k’s value rises above %d’s value it indicates Bullish
trend. However, it is very difficult to identify how long will the trend
predicted from FS is valid.
Whenever lower MACD value falls below upper MACD value it shows
a trend reversal from Bullish to Bearish. If lower MACD value rises
above its upper MACD value it indicates a trend reversal from Bearish
to Bullish.
13
2.1.4.7 Williams Average (WA)
Williams Average is calculated based on current close, highest
high and lowest low (2.1.2.5). Here is the formula to calculate Williams
Average: In order to calculate this indicator we first subtract current
close from highest high and store the result and call it HC. Now we
subtract lowest low from highest high, store it and call it HL. With
these results we now calculate Williams Average by the following
formula.
14
Rate of change is directly proportional to the trend of the stock
market. If ROC is lesser than 20 it is an indication that the market is in
bearish trend. If ROC is greater than 80 it is a strong signal to buy
stocks.
back and forth to provide final results. [7] Author in his approach has
argued that, by modifying the model he could minimize the back and
positives. Using this model author is making the decision to hold, sell
data from the given data set.[20] This selected data will be given as an
15
input to ANN which would classify the labels based on the weight of
feasible than a single layer perceptron. The author [9] has improved
However, for given patterns the algorithm will detect the re-occurrence
opened the doors of fortunes. This improved model can now be used
for prediction on data that contains extreme noise (stock market often
is the case). He has also proved the same with his results in the
publication. [9]
16
behind any set of sequences that has happened. In stock market ticks
the trading sequence. When he trained this model with ticks whose
likeliness; we predict the future trend for that particular stock [6]. The
This model now can be used to predict next most likely observation.
17
Chapter 3 Methodology
18
The approach to problems related to time series analysis is unique in
its own way. Most machine learning problems utilize a dataset with
certain outcomes to be predicted, such as class labels. This statement
means that most machine learning tasks usually utilize an independent
and dependent variable for the computation of a particular question.
This procedure involves machine learning algorithms analysing a
dataset (say, XX) and using the predictions (YY) to form an assessment
and approach to solving the problem statement. Most supervised
machine learning algorithms perform in this manner. However, time
series analysis is unique because it has only one variable: time. We will
dive deeper into how to solve the stock market price prediction task
with deep learning in the next part of this article. For now, our primary
objective will be understanding the terms and important concepts
required for approaching this task. Let us begin!
19
Time series analysis is all about the collection of previous data to study
the patterns of various trends. By conducting a detailed analysis of
these time series forecasting patterns, we can determine the future
outcomes with the help of our constructive deep learning models.
While there are other methods to determine the realistic results of
future trends, deep learning models are an outstanding approach to
receive some of the best predictions for each concept.
Let's now focus on the components of time series forecasting, namely
Trend, Seasonality, Cyclicity, and Irregularity. These four components
of time series analysis refer to types of variations in their graphical
representations. Let us look at each of these concepts individually and
try to gain more intuition behind them with the help of some realistic
examples.
Trend
A Trend in time series forecasting is defined as a long period of time with a
consistent increase or decrease in the data. There may be slight fluctuations in
the data points and elements at various instances of time, but the overall
variation and direction of change remain constant for a longer duration of
time. When a Trend goes from a long duration of constant increase to a long
duration of constant decrease, this is often referred to as a "Changing
Direction" trend.
There a few terminologies used to define the type of trend that we are dealing
with in time series forecasting. A slightly or moderately increasing trend over
a long duration of time can be referred to as an uptrend, while a slightly or
moderately decreasing trend over a long duration of time is called a
downtrend. If the trend is following a consistent pattern of gradually
increasing or gradually decreasing, and there is not much effect in the overall
pattern of the graph, then this trend can be referred to as a horizontal or
stationary trend. The graphical representation shown in the above image has a
downward trajectory over a long period of time. Hence, this image shows the
representation of a downward trend, also called a downtrend.
Let us consider an example to understand the concept of trend better.
Successful companies like Amazon, Apple, Microsoft, Tesla, and similar tech
giants have a reasonably performing upward stock price curve. We have
learned that we can determine this upward rise in stock prices as an uptrend.
20
While successful companies have an uptrend, some companies that are not
performing that well in the stock market have downward trajectories in stock
prices, or a downtrend. Companies that are making a neutral profit rate with
decent amounts of profits and losses at regular intervals are determined as a
horizontal or stationary trend.
Seasonality
Cyclicity
When a pattern exhibits a rise and fall of mixed frequencies, and the graphical
representation has peaks and troughs that occur randomly over a period of
time, it is called a cyclic component. The duration of these occurrences
usually ranges over the period of at least one year. Stock prices of certain
companies that are hard to predict usually have cyclic patterns where they
flourish during a certain period, while having lower profits at other times.
22
Cyclic trends are some of the hardest for our deep learning models to predict.
The graphical representation above shows the cyclic behaviour of house sales
over the span of two decades.
A great realistic example of cyclic behavioural patterns is when a person
decides to invest in their own start-up. During the set-up and progression of
start-ups, every business experiences a cyclic phase. These cyclic fluctuations
often occur in business cycles. Usually, the phases of a start-up would include
the investment stage, which would have a slightly negative impact on our
prices. The next phases could include your marketing and profitable stages,
where you start to earn profits from your successful start-up. Here, you
experience an increase in the graphical curve. However, you will eventually
experience a depreciation phase as well. These will show lesser profits until
you make constant improvements and investments. This procedure begins the
cyclic stage again, lasting for periods of a few years. Rinse and repeat.
Irregularity
Irregularity is a component that is almost impossible to make accurate
predictions for with a deep learning model. Irregularity, or random variations
(as the name suggests), involves an abnormal or atypical pattern where it
becomes hard to deduce the occurrences of the data elements with respect to
23
time. Above is a graphical representation of the Google Stock Price changing
rapidly with an irregular and random pattern; this is hard to read. Despite the
information and data patterns present, the modeling procedure for such kinds
of representations will be hard to crack. The primary objective of the model is
to predict future possibilities based on previous outcomes. Hence, models for
irregular patterns are slightly harder to construct.
To provide a realistic example for irregular patterns, let us analyze the current
status of the world, where many businesses and other industries are affected
on a large scale. The global pandemic is a great example of irregular activity
that is impossible for anyone to predict. These disturbances that occur due to a
natural calamity or phenomenon will affect the trading prices, stock price
charts, companies, and businesses. It is not possible for the model you
construct to detect the occurrences of these situational tragedies. Hence,
irregular patterns are an interesting component of time series forecasting to
analyze and study.
24
Another significant aspect of time series analysis that we will encounter in
this article is the concept of stationarity. When we have a certain number of
data points or elements, and the mean and variance of all these data points
remain constant with time, then we call it a stationary series. However, if
these elements vary with time, we call it a non-stationary series. Most
forecasting tasks, including stock market prices or cryptocurrency graphs, are
non-stationary. Unfortunately, the results obtained by non-stationary patterns
are not efficient. Hence, they need to be converted into a stationary pattern.
The entire data preparation process for stock market price prediction in the
next article is explained with code snippets. Here, we will discuss a few other
topics of importance. Several methods for converting non-stationary series
into stationary series include differencing and transformation. Differencing
involves subtracting two consecutive data points from the higher to lower
order, while transformation involves transforming or diverging the series.
Typically, a log transform is used for this process. For further information on
this topic, refer to the link provided in the image source above.
To test for stationarity in Python, the two main methods that are utilized
include rolling statistics and the ADCF Test. Rolling statistics are more of a
visual technique that involves plotting the moving average or moving
variance to check if it varies with time. The Augmented Dickey-Fuller test
(ADCF) is used to give us various values that can help in identifying
stationarity. These test results comprise some statistics and critical values.
They are used to verify stationarity.
Working of LSTM:
LSTM is a special network structure with three
“gate” structures. Three gates are placed in an LSTM unit, called
input gate, forgetting gate and output gate. While information
enters the LSTM’s network, it can be selected by rules. Only the
information conforms to the algorithm will be left, and the
information that does not conform will be forgotten through the
forgetting gate.
Forget Gate:
Input Gate:
Output Gate:
29
Where:
X(t)X(t): input vector to the LSTM unit
• F(t)F(t): forget gate's activation vector
• I(t)I(t): input/update gate's activation vector
• O(t)O(t): output gate's activation vector
• H(t)H(t): hidden state vector, also known as output vector of the
LSTM unit
• c˜(t)c̃(t): cell input activation vector
• C(t)C(t): cell state vector
• WW, UU, and BB: weight matrices and bias vector parameters
which need to be learned during training
• # LSTM
• Inputs: dataset
• Outputs: RMSE of the forecasted data
•
• # Split dataset into 75% training and 25% testing data
• size = length(dataset) * 0.75
• train = dataset [0 to size]
• test = dataset [size to length(dataset)]
•
• # Procedure to fit the LSTM model
• # Procedure LSTMAlgorithm (train, test, train_size, epochs)
30
• X = train
• y = test
• model = Sequential ()
• model.add (LSTM (50), stateful=True)
• model. compile (optimizer='adam', loss='mse')
• model.fit (X, y, epochs=epochs, validation_split=0.2)
• return model
•
• # Procedure to make predictions
• # Procedure getPredictonsFromModel (model, X)
• predictions = model.predict(X)
• return predictions
•
• epochs = 100
• neurons = 50
• predictions = empty
31
32
• # Fit the LSTM model
• model = LSTMAlgorithm (train, epoch, neurons)
• Preprocessing of data
• Overall Architecture
• # Make predictions
• pred = model.predict(train)
33
Chapter 4 Design
34
Fig. 7: Training and prediction
35
Structure diagrams show the static structure
of the system and its parts on different abstraction and
implementation levels and how they are related to each
other. The elements in a structure diagram represent the
meaningful concepts of a system, and may include abstract,
real world and implementation concepts.
36
4.2.1 Use Case Diagram
Fig. 8: Using LMS, LSTM and LSTM with LMS in the system
37
4.2.2 Sequence Diagram
38
Fig. 9: Execution based on model selection
39
4.2.3 Activity Diagram
40
Fig. 10: Execution based on algorithm selection
41
4.2.4 Collaboration Diagram
42
Fig. 11: Data transfer between modules
43
4.2.5 Flow Chart
44
Fig. 12: Flow of execution
45
4.2.6 Component Diagram
Page | 46
Fig. 13: Components present in the system
Page | 47
5.1.3 Functional requirements
Functional requirements describe what the software
should do
(the functions). Think about the core operations.
Page | 48
Performance: It is a quality attribute of the
stock prediction software that describes the
responsiveness to various user interactions with
it.
Page | 49
Page | 50
Page | 51
Page | 52
Page | 53
Page | 54
Page | 55
Page | 56
Page | 57
Page | 58
Arima Model
Page | 59
Page | 60
Ad-fuller Test
Page | 61
Page | 62
Page | 63
Predict the closing Stock Price
Page | 64
Machine learning Model
import tensorflow as tf
def Dataset(Data,Date):
Train_Data=Data['Adj. Close'][Data['Date']<Date].to_numpy()
Data_Train=[]
Data_Train_X=[]
Data_Train_Y=[]
for i in range(0,len(Train_Data),5):
try:
Data_Train.append(Train_Data[i:i+5])
except:
pass
if len(Data_Train[-1])<5:
Data_Train.pop(-1)
Data_Train_X=Data_Train[0:-1]
Data_Train_X=np.array(Data_Train_X)
Data_Train_X=Data_Train_X.reshape((-1,5,1))
Data_Train_Y=Data_Train[1:len(Data_Train)]
Data_Train_Y=np.array(Data_Train_Y)
Data_Train_Y=Data_Train_Y.reshape((-1,5,1))
Page | 65
Test_Data=Data['Adj. Close'][Data['Date']>=Date].to_numpy()
Data_Test=[]
Data_Test_X=[]
Data_Test_Y=[]
for i in range(0,len(Test_Data),5):
try:
Data_Test.append(Test_Data[i:i+5])
except:
pass
if len(Data_Test[-1])<5:
Data_Test.pop(-1)
Data_Test_X=Data_Test[0:-1]
Data_Test_X=np.array(Data_Test_X)
Data_Test_X=Data_Test_X.reshape((-1,5,1))
Data_Test_Y=Data_Test[1:len(Data_Test)]
Data_Test_Y=np.array(Data_Test_Y)
Data_Test_Y=Data_Test_Y.reshape((-1,5,1))
return Data_Train_X,Data_Train_Y,Data_Test_X,Data_Test_Y
def Model():
model=tf.keras.models.Sequential([
tf.keras.layers.LSTM(200,input_shape=(5,1),activation=tf.nn.leaky_r
elu,return_sequences=True),
tf.keras.layers.LSTM(200,activation=tf.nn.leaky_relu),
tf.keras.layers.Dense(200,activation=tf.nn.leaky_relu),
tf.keras.layers.Dense(100,activation=tf.nn.leaky_relu),
tf.keras.layers.Dense(50,activation=tf.nn.leaky_relu),
tf.keras.layers.Dense(5,activation=tf.nn.leaky_relu)
])
return model
Page | 66
model=Model()
tf.keras.utils.plot_model(model,show_shapes=True)
Page | 67
Page | 68
Model Fitting
Compile model
Page | 69
Save model in folder
Page | 70
Webpage Code
# App title
st.markdown("""<style>#root>div:nth-
child(1)>div>div>div>div>section>div {color:red}
</style>
""",unsafe_allow_html=True)
st.markdown('''
# Stock Price App
Shown are the stock price data for query companies!
**Credits**
- Built in `Python` using `streamlit`,`yfinance`, `cufflinks`, `pandas`
and `datetime`
''')
Page | 71
st.write('---')
# Sidebar
st.sidebar.subheader('Query parameters')
start_date = st.sidebar.date_input("Start date", datetime.date(2005, 1,
1))
end_date = st.sidebar.date_input("End date", datetime.date(2022, 4,
30))
n_years=st.slider("years of predictions: ",1,4)
period=n_years*365
# Ticker information
string_logo = '<img src=%s>' % tickerData.info['logo_url']
st.markdown(string_logo, unsafe_allow_html=True)
string_name = tickerData.info['longName']
st.header('**%s**' % string_name)
string_summary = tickerData.info['longBusinessSummary']
st.info(string_summary)
# Ticker data
st.header('**Ticker data**')
Page | 72
st.write(tickerDf)
@st.cache
def load_data(ticker):
data=web.DataReader(ticker,data_source='yahoo',start=start_date)
data.reset_index(inplace=True)
return data
data=load_data(tickerSymbol)
#describing data
st.subheader('Data from 2005 - Till Now')
st.write(data.describe())
plot_row_data()
Page | 73
st.subheader('Closing Price vs time chart 100MA & 200MA')
ma100=data.Close.rolling(100).mean()
ma200=data.Close.rolling(200).mean()
fig=plt.figure(figsize=(12,6))
plt.plot(ma100,'r')
plt.plot(ma200,'g')
plt.plot(data.Close,'b')
plt.legend()
st.pyplot(fig)
# Bollinger bands
st.header('**Bollinger Bands**')
qf=cf.QuantFig(tickerDf,title='First Quant
Figure',legend='top',name='GS')
qf.add_bollinger_bands(periods=20,boll_std=2,colors=['cyan','gray'],
fill=True)
# qf.add_volume(name='volume',up_color='green',down_color='red')
fig = qf.iplot(asFigure=True)
fig.layout.update(title_text="Time Series
Data",xaxis_rangeslider_visible=True)
st.plotly_chart(fig)
#forecasting
df_train=data[['Date','Close']]
df_train=df_train.rename(columns={"Date":"ds","Close":"y"})
m=Prophet()
m.fit(df_train)
future=m.make_future_dataframe(periods=period)
forecast=m.predict(future)
st.subheader("Forecast data")
Page | 74
st.write(forecast.tail())
st.write("Forecast data")
fig1=plot_plotly(m, forecast)
st.plotly_chart(fig1)
st.write("Forecast components")
fig2=plot_components_plotly(m,forecast)
st.plotly_chart(fig2)
data_training=pd.DataFrame(data['Close'][0:int(len(data)*0.70)])
data_testing=pd.DataFrame(data['Close']
[int(len(data)*0.70):int(len(data))])
data_training_array=scaler.fit_transform(data_training)
#load model
model =load_model('keras_model.h5')
#testing part
past_100_days=data_training.tail(100)
final_df=past_100_days.append(data_testing,ignore_index=True)
input_data=scaler.fit_transform(final_df)
x_test=[]
y_test=[]
Page | 75
for i in range(100,input_data.shape[0]):
x_test.append(input_data[i-100:i])
y_test.append(input_data[i,0])
x_test,y_test=np.array(x_test),np.array(y_test)
# making predictions
y_predicted=model.predict(x_test)
scaler=scaler.scale_
scaler_factor=1/scaler[0]
y_predicted=y_predicted*scaler_factor
y_test=y_test*scaler_factor
#final graph
Web Screen
Main Screen
Page | 76
Company Information
Page | 77
Page | 78
Page | 79
Forecast Component
Page | 80
Forecast Component
Page | 81
Chapter 6 Conclusion And
Future Work
6.1 Conclusion
Page | 82
We want to extend this application for predicting
cryptocurrency trading.
We want to add sentiment analysis for better analysis.
Page | 83