Govind Dwivedi Mca - Merged - Removed
Govind Dwivedi Mca - Merged - Removed
Govind Dwivedi Mca - Merged - Removed
ON
I hereby declare that the project work entitled “Stock Market Price Prediction using Machine
is a record of an original work done by me under the guidance of Mr. Amarnath Awasthi sir , and
this project work is submitted in the partial fulfillment of the requirements for the award of the
degree of Master of computer application. The results embodied in this project have not been
submitted to any other University or Institutefor the award of any degree or diploma.
I would like to express my heartfelt gratitude to all those who have contributed to the
completion of this project. Their support and encouragement have been invaluable throughout
this journey.
First and foremost, I am deeply indebted to Mr. Amarnath Awasthi sir for their unwavering
guidance, patience, and encouragement. Their expertise and insights have played a pivotal
role in shaping the direction of this project and refining its outcomes. I am truly grateful for
their mentorship andsupport.
I extend my sincere thanks to the faculty members of the MCA department for their
continuous support and encouragement. Their dedication to fostering academic excellence
has been a constant source ofinspiration.
I would like to acknowledge the assistance and cooperation of my peers and friends.
Their valuable feedback, discussions, and moral support have been immensely helpful in
overcoming challenges andstaying motivated throughout this endeavour.
Stock market price prediction is a challenging task due to its highly dynamic and complex
nature. Machine learning techniques have emerged as powerful tools to analyze and predict
thefuture trends of the stock market. In this study, we provide an abstract on stock market price
predictionusing machine learning.
The proposed approach involves collecting historical data of stocks, pre-processing and
feature engineering of the data, and developing machine learning models to predict the future
stock prices. Different machine learning algorithms such as regression, support vector
machines, decision trees, and neural networks can be used for this purpose.
To evaluate the performance of the models, different metrics such as mean squared error, root
mean squared error, and correlation coefficient are used. In addition, various data
visualization techniques are employed to analyze the trends and patterns in the stock market
data.
The results show that machine learning models can effectively predict the stock market prices
with high accuracy. The accuracy of the predictions depends on the quality of the data, feature
selection, and the choice of the machine learning algorithm. This study demonstrates the
potential of machine learning techniques to provide valuable insights and aid in decision-
making for investors and traders in the stock market.
Stock market price prediction is a complex and challenging task that has attracted significant
attention from researchers and practitioners in recent years. With the availability of large
amounts of financial data, machine learning techniques have emerged asa promising approach
for predicting stock prices.
In this study, we review the current state of research on stock market price prediction using
machine learning. We first discuss the challenges and limitations of traditional methods for
predicting stock prices and the potential advantages of machine learning-based approaches. We
then provide an overview of the various machine learning algorithms that have been used for
stock price prediction, including regression, decision trees, neural networks, and support
vectormachines.
We also discuss the key factors and variables that affect stock prices and the strategies used to
preprocess and select relevant features for machine learning models. We further examine the
evaluation metrics and performance measures used to assess the accuracy and effectiveness of
machine learning models for stock price prediction
.
Finally, we highlight some of the major trends and directions in the field, including the use
of deep learning techniques, the incorporation of alternative data sources, and the
integration of human expertise and judgment in machine learning models. Overall, we
conclude that machine learning holds great promise for improving the accuracy and
efficiency of stock market price prediction and that further research is needed to fully
realize its potential in this field.
Stock market price prediction using machine learning is an area of research that involves the
use of statistical algorithms and techniques to forecast the future value of a particular stock
or the overall stock market. The goal of this approach is to help investors and traders
make informed decisions about their investments by providing them with accurate
predictions of future stock prices.
Machine learning models are well-suited to this task because they can analyze large
amounts of data, identify patterns and trends, and make predictions based on those patterns.
The most common machine learning techniques used in stock market price prediction
include
regression analysis, decision trees, random forests, neural networks, and support vector
machines.
To build a machine learning model for stock market price prediction, historical data on
stock prices and other relevant factors, such as economic indicators and news events, are
fed into the model. The model is then trained on this data to identify patterns and
relationships that
these models will become even more accurate and effective in predicting stock prices.
CONTENT
Introduction 01
Problem Definition 02
Purpose 03-05
Hardware and Software Specifications 06-07
Problem Statement & Proposed Solutions 08-10
Project Design 35
Software Requirement Specifications 35
Software Functional Specifications 35
Data Flow Diagram 36-38
Use Case Diagram 38-39
Activity Diagram 40
Collaboration Diagram 41
System Implementation 42-43
Testing 44-47
System Input and Output Screenshots 48-55
Limitations & Scope of Project 56-59
Gantt Chart 60
Impact of Proposed System In Academics and Industry 61
Conclusion 62-63
Reference 64
INTRODUCTION
Stock market price prediction using machine learning is the process of using algorithms
and statistical models to forecast the future value of stocks. The stock market is a
complex system influenced by various factors, such as economic indicators, company
performance, news, and geopolitical events. Predicting the stock market's future
movements accurately can be challenging, but machine learning algorithms can help
improve the accuracy of predictions.
Machine learning models are trained on historical stock market data, which includes
a variety of financial indicators and technical analysis measures. The models then use
this data to identify patterns and trends, which can be used to make predictions about
future stock prices.
There are different machine learning techniques used in stock market prediction,
including linear regression, decision trees, random forest, artificial neural networks, and
deep learning. These techniques help in analyzing large datasets, and they can
automatically identify complex patterns that are difficult for humans to detect.
Stock market price prediction using machine learning has many potential applications,
including portfolio optimization, risk management, and trading strategy development.
However, it's essential to note that while machine learning models can provide valuable
insights, they are not perfect, and predictions can be influenced by unexpected events
that are difficult to predict accurately.
Overall, stock market price prediction using machine learning is an exciting field with
enormous potential for investors, traders, and financial analysts. As machinelearning
algorithms continue to evolve, we can expect more accurate predictions and insights
that can help investors make better decisions.
1
1.2 PROBLEM DEFINITION
The problem definition for stock market price prediction using machine learning
involves developing a model that can accurately forecast the future value of a particular
stock or the overall stock market based on historical data and other relevant factors.
This problem is challenging due to several reasons. First, the stock market is highly
complex and influenced by numerous factors, such as economic indicators, news events,
and investor sentiment. Second, the stock market is highly volatile and subject to
sudden changes, making it difficult to predict with certainty. Finally, there is a vast
amount of data available for analysis, and it can be challenging to identify which factors
are most important for accurate predictions.
To address these challenges, machine learning models are employed, which can analyze
large amounts of data and identify patterns and relationships that are not easily detected
by human analysts. The objective is to develop a model that can accurately predict
future stock prices, enabling investors and traders to make informed decisions about
their investments.
The key to developing an effective machine learning model for stock market price
prediction is to identify the most relevant features and factors that contribute to price
movements, as well as selecting an appropriate algorithm that can effectively learn from
the data. It is also crucial to evaluate the model's accuracy on historical data and in real-
world scenarios, using metrics such as mean squared error, root mean squared error, and
accuracy.
Overall, the problem of stock market price prediction using machine learning is a
challenging but important one, with potential implications for investors, traders, and
financial markets.
2
1.1 PURPOSE
The purpose of stock market price prediction using machine learning is to provide
investors and traders with accurate forecasts of future stock prices, enabling them to
make informed decisions about their investments. By using machine learning
techniques to analyze historical data on stock prices and other relevant factors, such as
economic indicators and news events, the aim is to identify patterns and relationships
that can be used to predict future prices.
The accurate prediction of stock prices can have a significant impact on investment
decisions. For example, if a stock is predicted to increase in value in the future, investors
may choose to buy the stock to take advantage of the potential gains. Similarly, if a stock
is predicted to decrease in value, investors may choose to sell the stock to
In addition to helping individual investors and traders, stock market price prediction
can also be useful for financial institutions, such as banks and hedge funds, that manage
large portfolios of stocks. Byaccuratelypredicting future stock prices, these institutions
can make informed decisions about their investments and potentially improve
their overall returns.
Overall, the purpose of stock market price prediction using machine learning is to help
investors and traders make more informed investment decisions by providing them with
accurate forecasts of future stock prices.
The purpose of stock market price prediction using machine learning is to develop
models that can accurately forecast the future value of a particular stock or the overall
stock market. These models are designed to help investors and traders make
informed decisions about their investments, based on the predicted future prices.
The use of machine learning in stock market price prediction offers several
benefits, including:
1. Improved accuracy: Machine learning models can analyze large amounts of data and
3
Identify patterns and trends that may not be apparent to human analysts. This can lead
to more accurate predictions of future stock prices.
2. Faster decision-making: Machine learning models can process data quickly and make
predictions in real-time, allowing investors and traders to make faster and more
informed decisions about their investments.
3. Reduced human bias: Machine learning models are not subject to the same biases as
human analysts, such as emotional biases or cognitive biases, which can impact the
accuracyof stock price predictions.
4. Better risk management: Machine learning models can be used to identify potential
risks and opportunities in the stock market, helping investors and traders to manage
their risks more effectively.
Overall, the purpose of stock market price prediction using machine learning is
to provide investors and traders with accurate and timely information that can help
them make better-informed decisions about their investments, leading to improved
financial outcomes.
The primary purpose of stock market price prediction using machine learning is to
improve the accuracy of forecasting future stock prices. Accurately predicting stock
prices is crucial for investors, traders, and financial institutions to make informed
decisions regarding buying and selling securities. Machine learning models can process
vast amounts of historical data to identify patterns and trends that are not apparent to
humans, thus improving the accuracyof predictions.
The use of machine learning for stock market price prediction has several potential
applications. For example, it can help investors identify undervalued stocks, develop
profitable trading strategies, and optimize portfolio management. Machine learning
algorithms can also help financial institutions manage risk by identifying potential
market shifts or changes in companyperformance.
Another purpose of using machine learning for stock market price prediction is to
4
Reduce the impact of human biases on decision-making. Human biases, such as
Emotional attachments to particular stocks or a tendency to overlook
critical information, can lead to poor investment decisions. Machine learning
Models can process vast amounts of data without any biases and provide objective
Insights.
Overall, the primary purpose of stock market price prediction using machine learning
is to provide investors and financial institutions with accurate insights and predictions
that can help them make informed decisions, reduce risks, and improve their returns.
5
1.2 HARDWARE AND SOFTWARE SPECIFICATION
SOFTWARE SPECIFICATIONS
NumPy
Wordcloud
Plost
Pathlib
Collection
6
❖ The connections of your software with othertools or plugin :
● The server is hosted on heroku.
HARDWARE SPECIFICATIONS
CPUTYPE: - Intel i3, i5, i7 or AMD
7
1.3 PROBLEM STATEMENT
Stock market prediction is basically defined as trying to determine the stock value and
offer a robust idea for the people to know and predict the market and the stock prices.
It is generally presented using the quarterly financial ratio using the dataset. Thus, relying
on a single dataset may not be sufficient for the prediction and can give a result which is
inaccurate. Hence, we are contemplating towards the study of machine learning with
various datasets integration to predict the market and the stock trends.The problem
with estimating the stock price will remain a problem if a better stock market prediction
algorithm is not proposed. Predicting how the stock market will perform is quite difficult.
The movement in the stock market is usually determined by the sentiments of thousands
of investors. Stock market prediction, calls for an ability to predict the effect of recent
events on the investors. These events can be political events like a statement by a political
leader, a piece of news on scam etc. It can also be an international event like sharp
movements in currencies and commodity etc. All these events affect the corporate
earnings, which in turn affects the sentiment of investors. Itis beyond the scope of almost
all investors to correctly and consistently predict these hyper parameters. All these factors
make stock price prediction very difficult. Once the right data is collected, it then can be
used to get prediction
8
1.4 PROPOSED SOLUTION
Data pre-processing, the initial part of the project is to understand implementation and
usage of various python-built modules.
The above process helps us to understand why different modules are helpful rather than
implementing those functions from scratch by the developer. These various modules
provide better code representation and user understandability. The following libraries are
used such as NumPy, SciPy pandas, csv, sklearn, matplotlib, sys, re, emoji, nltk seaborn
etc.
Exploratory data analysis, first step in this to apply a sentiment analysis algorithm which
provides positives negative and neutral part of the chat and is used to plot pie chart based
on these parameters. To plot a line graph which shows author and message count of each
date, to plot a line graph which shows author and message count of each author, ordered
graph of date vs message count, media sent by authors and their count, Displaythe
message which is
Dataset is a simple text file that has been extracted from any of the Whatsapp groups or
one to one individual chat. More the number of text messages, the more the accuracy
will be in identifying the data. Chat from the Whatsapp can be extracted using a
feature called export chat and this will mail the compressed zip file which has a text file
of the chat from the beginning and all the undeleted chat will be included in this text
file.Alot of pre-processing needs to be done.
The goal of this project is to predict a stock price of a company according to its
previous historical data. Stock Market Prediction is composed of main components: a
company’s historical data of stock which will help to analyze the current and previous
changes of stock price. The above proposed model is easy to
9
implement considering the Available technology infrastructure. The model is simple,
secure and scalable. The proposed model is based on serial communication. These
models will help the investors to invest their money according to the predicted value.
This project will focus exclusively on predicting the daily trend (price movement)of
individual stocks. The project will make no attempt to deciding how much money
to allocate to each prediction.
A system is essential to be built which will work with maximum accuracy and it should
Consider all important factors that could influence the result.
The goal of this project is to predict a stock price of a company according to its previous
historical data. Stock Market Prediction is composed of main components: a company’s
historical data of stock which will help to analyze the current and previous
10
PROJECT ANALYSIS
This system named “Stock Buy/Sell Predictive Analytics for Using Predictive
Algorithms & Machine Learning Techniques” is a web application that aims to predict
stock market value using technical stock indicators and Prediction models: Decision
Tree & Multiple Linear Regression. This project is intended to solve the economic
dilemma created in individuals who want to invest in Stock Market.
Stockmarket prediction:
Stock price movements are in somewhat repetitive in nature in the time series of stock
values. The prediction feature of this system tries to predict the stock return in the time
series value by training the Decision Tree/Regression model or analyzing thetrend
charts of technical indicators, which involves producing an output and correcting the
error.
The system tries to automate the stock analysis for the user by downloading latest data,
analyzing technical indicator trends, creating prediction models, validating the
prediction models and giving the end results to the users as to whether the stock should
be bought/sold or whether the stock is stable/risky, just at the click of a button.
After the extensive analysis of the problems in the system, we are familiarized with
the requirement that the current system needs. The requirement that the system
needs is
11
Categorized into the functional and non-functional requirements. These
requirements are listed below:
Functional Requirements:
Functional requirement are the functions or features that must be included in any system
to satisfy the business needs and be acceptable to the users. Based on this, the functional
requirements that the system must require are as follows:
The system should be able to predict the approximate share price movement.
The system should collect accurate data from the Yahoo Finance website in consistent
manner.
Non-Functional Requirements:
12
2.1 FEASIBILITY STUDY
Stock market cannot be accurately predicted. The future, like any complex problem, has
far too many variables to be predicted. The stock market is a place where buyers and
sellers converge. When there are more buyers than sellers, the price increases. When
there are more sellers than buyers, the price decreases. So, there is a factor whichcauses
people to buy and sell. It has more to do with emotion than logic. Because emotion is
unpredictable, stock market movements will be unpredictable. It’s futile to tryto predict
where markets are going. Theyare designed to be unpredictable.
The proposed system will not always produce accurate results since it does not account
for the human behaviors. Factors like change in company’s leadership, internal matters,
strikes, protests, natural disasters,
and change in the authority cannot be taken into account for relating it to the change
in Stock market by the machine.
The objective of the system is to give an approximate idea of where the stock market
might be headed. It does not give a long-term forecasting of a stock value. There are
waytoo many reasons to acknowledge for the long-term output of a current stock. Many
things and parameters may affect it on the way due to which long term forecasting is just
not feasible.
Feasibility studies undergo four major analyses to predict the system to be success and
theyare as follows:-
Operational Feasibility
Technical Feasibility
Economic Feasibility
13
1. TECHNICAL FEASIBILTY
The analyst must find out whether current technical resources can be upgraded or added
to in a manner that fulfills the request under consideration. This is where the expertise of
system analysts is beneficial, since using their own experience and their contact with
vendors they will be able to answer the question of technical feasibility. The essential
questions that help in testing the technical feasibility of a system include the following:
Automated Stock Prediction system deals with the modern technology system that
14
Needs the well efficient technical system to run this project. All the resource constrains
must be in the favor of the better influence of the system. Keeping all these factsin
mind we had selected the favorable hardware and software utilities to make it more
feasible.
Back-end Tools:
15
The application needs latest stock OHLVC (Open Price, High Price, Low Price, Volume,
and Close Price) data for each NIFTY stock. Since most Finance sites tend to be
overloaded with advertising and tetchy about scrapers, this can be a little challenging at
first.
As the purpose of this project is to develop an automated data fetch application, the stock
data should be automatically retrieved from the web, which can be done through Web
scraping R. In sucha case, Yahoo Finance is a good source for extracting financial
data as the format of the data is mostly consistent for this source and is easily accessible
through the R packages like “Rvest”.
VBA in excel is used to generate, format and print reports using graphical representations
like charts. The reports are generated with ease and it is simple with thehelp of VBA.
The reports are generated using various options as per the need of the management.
Why R?
R is a programming language and free software environment for statistical computing
And graphics that is supported bythe R Foundation for Statistical Computing.
16
One of the great advantages of using R for data analysis is the amount ofdata that can
be imported over the web. This is practical because a database can be downloaded or
updated with a simple command, avoiding all the manual and tedious work of collecting
data manually. It is also easy to share code, as anyone can download the exact same
dataset with a single line of code.
Importation of stock data from Yahoo Finance can be performed using specific packages
in CRAN (Comprehensive R Archive Network) and web scraping techniques. R
Programming also gives a broad variety of statistical (direct and nonlinear
Modeling), techniques, which can be used for Decision Tree analysis for the purpose
ofthis project.
Why D3.JS?
D3.JS is a JavaScript library for producing dynamic, interactive data visualizations in
web browsers. It makes use of the widely implemented SVG, HTML5, and CSS
standards. Techan JS is a visual, stock charting (Candlestick, OHLC, indicators) and
technical analysis library built on D3.
For the purpose of this project, an attempt has been made to enhance the visuals
Technical Trend Charts’ visuals in Excel by integrating Excel with D3 and presenting
the D3 visuals on Excel dashboard with user friendlytooltips and labels.
17
1. OPERATIONAL FEASIBILTY
Operational feasibility is a measure of how well a proposed system solves the problems,
and takes advantage of the opportunities identified during scope definition and how it
satisfies the requirements identified in the requirements analysis phase of system
development. Operational feasibility reviews the willingness of the organization to
support the proposed system. This is probably the most difficult of the feasibilities to
gauge. In order to determine this feasibility, it is important to understand the management
commitment to the proposed project. If the request was initiated bymanagement, it is
likely that there is management support and the system will be accepted and used.
However, it is also important that the employee base will be accepting of the change.
The operational feasibility is the one that will be used effectively after it has been
developed. If users have difficulty with a new system, it will not produce the expected
benefits. It measures the viability of a system in terms of the PIECES framework. The
PIECES framework can help in identifying operational problems to be solved, and their
urgency:
Performance: Does current mode of operation provide adequate throughput and response
time?
As compared to traditional methods of manually retrieving the stock data from the web
and forecasting the stock prices with large number of manual calculations, this system
plays a very important role in designing an application that automates the process of data
retrieval and stock movement/price prediction with the help of a user- friendlydashboard,
thus making the process easier and faster.
Information: Does current mode provide end users and managers with timely, pertinent,
accurate and usefully formatted information?
System provides end users with timely, pertinent, accurate and usefully formatted
information. Since all the stock related information is being pulled from Yahoo Finance
against a unique NSE Stock Symbol, it will provide for meaningful and accurate data
to the investor. The investing decisions are made by the traditional investors manually.
This results in loss ofvalidityofdata due to human error. The information handling and
18
the investing decision in the proposed system will be driven by computerized and
automatically updated prediction and validation of stock data. The human errors will be
minimal. The data will be automatically updated from time to time and will be
validated before the data is processed into the system.
Determines whether the system offers adequate service level and capacity to reduce the
cost of the business or increase the profit of the business. The deployment of the
proposed system, manual work will be reduced and will be replaced by an IT savvy
approach. Moreover, it has also been shown in the economic feasibility reportthat the
recommended solution is definitely going to benefit economically in the long run. The
system is built on Excel, R and JavaScript. Excel and Javascript do not need any
additional installation; they are in-built in every system. R needs installation but it is
free software. So, overall, the application is very economically feasible.
Control: Does current mode of operation offer effective controls to protect against fraud
and to guarantee accuracy and security of data and information?
As all the data is pulled from Yahoo Finance, which is a public stock data provider, it
does not contain any confidential information which can be misused, so on that contrast
there should be no use of anysecurity corner for this system.
Efficiency work is to ensure a proper workflow structure to store patient data; we can
ensure the proper utilization of all the resources. It determines whether the system makes
maximum use of available resources including time, people, flow of forms, minimum
processing delay. In the current system a lot of time is wasted as the investingdecisions
are made by the traditional investors manually. The proposed system will be
A lot efficient as it will be driven bycomputerized and automatically updated prediction
19
and validation of stock data. The data will be automatically updated from time to time
and will be validated before the data is processed into the system.
Services: Does current mode of operation provide reliable service? Is it flexible and
expandable?
The system is desirable and reliable services to those who need it and also whether the
system is flexible and expandable or not. The proposed system is very much flexible for
better efficiency and performance of the organization. The scalability of the proposed
system will be inexhaustible as the storage capacity of the system can be increased as
per requirement. This will provide a strong base for expansion. The new system will
provide a high level of flexibility.
20
2. ECONOMIC FEASIBILITY
The concerned business must be able to see the value of the investment it is pondering
before committing to an entire system study. If short-term costs are not overshadowed
by long-term gains or produce no immediate reduction in operating costs, then the system
is not economically feasible, and the project should not proceed any further. If the
expected benefits equal or exceed costs, the system can be judged to be
economically feasible. Economic analysis is used for evaluating the effectiveness of the
Proposed System. The economical feasibility will review the expected costs to see if
they are in-line with the projected budget or if the project has an acceptablereturn on
investment. At this point, the projected costs will only be a rough estimate. The exact
costs are not required to determine economic feasibility. It is only required todetermine
21
if it is feasible that the project costs will fall within the target budget or return on
investment. A rough estimate of the project schedule is required to determine if it would
be feasible to complete the systems project within a required timeframe. The required
timeframe would need to be set bythe organization.
It is the process of analyzing the financial facts associated with the system development
projects performed when conducting a preliminary investigation. The purpose of a
cost/benefit analysis is to answer questions such as:
22
2.2 TOOLS USEDTO GATHER INFORMATION TOOLS
Py - Charm:
PyCharm is an integrated development environment (IDE) used for programming in Python.
It provides code analysis, a graphical debugger, an integrated unit tester, integration with
version control systems, and supports web development with Django. PyCharm is
developed by the Czech company JetBrains. It is cross- platform, working on Microsoft
Windows, macOS and Linux.
Rapid application development: - Because of its concise code and literal syntax, the
development of applications gets accelerated. The reason for its wide usability is its
simple and easy-to-master syntax. The simplicity of the code helps reduce the time and cost
of development.
let us dive deeper into some of the unique features that make Python the most ubiquitous
language among the developer community. Here are a few of the many features of Python:
23
GIT:
Git is a distributed version control system that tracks changes in any set of computer files,
usually used for coordinating work among programmers collaborativelydeveloping source
code during software development.
Its goals include speed, data integrity, and support for distributed, non-linear workflows
(thousands of parallel branches running on different systems). Git is a version control system
that tracks changes in any set of computer files, usually used for coordinating work among
programmers collaboratively developing source code during software development Its goals
include speed, data integrity, and support for distributed, non-linear workflows (thousands of
parallel branchesrunning on different systems).
Git was originally authored by Linus Torvalds in 2005 for development of the Linux kernel,
with other kernel developers contributing to its initial development. Since 2005, Junio Hamano
has been the core maintainer.
As with most other distributed version control systems, and unlike most client– server
systems, every Git directory on every computer is a full-fledged repository with complete
history and full version-tracking abilities, independent of network access or a central server.
Git is free and open-source software distributed under the GPL-2.0-onlylicense.
Distributed development
Like Darcs, Bit Keeper, Mercurial, Bazaar, and Monotone, Git gives each developer a local
24
from one such repository to another. These changes are imported as added development
branches and can be merged in the same way as a locally developedbranch.
Toolkit-based design
Git was designed as a set of programs written in C and several shell scripts that provide
wrappers around those programs. Although most of those scripts have since been rewritten
in C for speed and portability, the design remains, and it is easy to chain the components
together.
25
TECHNOLOGIES
STREAMLIT: -
Streamlit is an open-source python library that makes it very easy to host data driven apps
and scripts as a web app. But there is something that they don’t tell you. It is not just limited
to data dashboards and ML models.
The trend of Data Science and Analytics is increasing day by day. From the data science
pipeline, one of the most important steps is model deployment. We have a lot of options in
python for deploying ourmodel. Some popular frameworks are Flask and Django. But the
issue with using these frameworks is that we should have some knowledge of HTML, CSS,
and JavaScript. Keeping these prerequisites in mind, Adrien Treuille, Thiago Teixeira, and
Amanda Kelly created “Streamlit”. Now using streamlit you can deploy any machine
learning model and any python project with ease and without Streamlit is very user-friendly
Streamlit library includes our Get started guide, API reference, and more advanced features
of the core library including caching, theming,and Streamlit Components.
Streamlit Community Cloud is an open and free platform for the community to deploy,
discover, and share Streamlit apps and code with each other. Create a new app, share it with
the community, get feedback, iterate quicklywith livecodeupdates, and haveanimpact!
Knowledge base is a self-serve library of tips, step-by-step tutorials, and articles that answer
your questions about creating and deployingStreamlit apps.
26
WORD CLOUD
Word Cloud is a data visualization technique used for representing text data in which the size
of each word indicates its frequency orimportance. Many times, you might have seen a cloud
filled with lots of words in different sizes, which represent the frequency or the importance
of each word.
This is called a Tag Cloud or word cloud. For this tutorial, you will learn how to create a word
cloud in Python and customize it as you see fit. This tool will be handy for exploring text data
and making your report livelier.
It's important to remember that while word clouds are useful for visualizing common words
in a text or data set, they're usually only useful as a high-level overview of themes. They’re
similar to bar blots but are often more visually appealing (albeit at times harder to interpret).
Word clouds can be particularly helpful when you want to:
Communicate the key ideas or concepts in a visually engaging wayHowever; it's important to
keep in mind that word clouds don't provide any context or deeper understanding of the
words and phrases being used. Therefore, they should be used in conjunction with other
methods for analyzing and interpreting text data.
27
MATPLOTLIB:
Matplotlib is a comprehensive library for creating static, animated,and interactive
visualization in python.
Matplotlib is a plotting library for the Python programming language and its numerical
mathematics extension NumPy. It provides an object- oriented API for embedding plots into
applications using general- purpose GUI toolkits like Tkinter, python, Qt, or GTK. There is
alsoa procedural "pylab" interface based on a state machine (like OpenGL), designed to
closely resemble that of MATLAB, though its use is discouraged. SciPymakes use of
Matplotlib.
Matplotlib was originally written by John D. Hunter. Since then it has had an active
development community and is distributed under a BSD- style license. Michael Droettboom
was nominated as matplotlib's lead developer shortly before John Hunter's death in August
2012 and was further joined by Thomas Caswell. Matplotlib is a NumFOCUS fiscally
sponsored project.
Matplotlib 2.0.x supports Python versions 2.7 through 3.10. Python 3 support started with
Matplotlib 1.2. Matplotlib 1.4 is the last version to support Python 2.6. Matplotlib has
pledged not to support Python 2
28
SEABORN:
Seaborn is a library mostly used for statistical plotting in Python. It is built on top of
Matplotlib and provides beautiful default styles and color palettes to make statistical plots
more attractive.
Seaborn helps you explore and understand your data. Its plotting functions operate on data
frames and arrays containing whole datasets and internally perform the necessary semantic
mapping and statistical aggregation to produce informative plots. Its dataset- oriented,
declarative API lets you focus on what the different elements of your plots mean, rather than
on the details of how to draw them.
Sea born makes it easy to switch between different visual representations by using a
consistent dataset-oriented API.
The function relplot () is named that way because it is designed to visualize many different
statistical relationships. While scatter plots are often effective, relationships where one
variable represents a measure of time are better represented by a line. The relplot () function
has a convenient kind parameter that lets you easily switch to this alternate representation:
As a data visualization library, sea born requires that you provide it with data. This chapter
explains the various ways to accomplish that task. Sea born supports several different
dataset formats, and most functions accept data represented with objects from the pandas or
NumPy libraries as well as built-in Python types like lists and dictionaries. Understanding
the usage patterns associated with these
29
URL EXTRACT:
It tries to find any occurrence of TLD in given text. If TLD is found it starts from that
position to expand boundaries to both sides searching for “stop character” (usually
whitespace, comma, single or double quote).
PANDAS :
Pandas is an open-source library that is made mainly for working with relational or labeled
data both easilyand intuitively.
It is free software released under the three-clause BSD license. The name is derived from the
term "panel data", an econometrics term for data sets that include observations over multiple
time periods for the same individuals. Its name is a play on the phrase "Python data analysis"
itself. Wes McKinney started building what would become pandas at AQR Capital while he
was a researcher there from 2007 to 2010.
Libraryfeatures
Many inbuilt methods available for fast data manipulation madepossible with
vectorisation
30
Data alignment and integrated handling of missing data.
Label-based slicing, fancy indexing, and sub setting of large data sets.
DataFrames
A panda is mainly used for data analysis and associated manipulation of tabular data in Data
Frames. Pandas allows importing data from various file formats such as comma-separated
values, JSON, Parquet, SQL database tables or
queries, and Microsoft Excel.
Pandas allow various data manipulation operations suchas merging, reshaping, selecting,
as well as data cleaning, and data wrangling features. The development of pandas introduced
into Python many comparable features of working with Data Frames that were established in
the Rprogramming language. The pandas library is built upon another library NumPy, which
is oriented to efficiently working with arrays instead of the features of workingon Data
Frames.
31
TENSORFLOW:
TensorFlow is an open-source machine learning framework developed by Google. It
allows developers to build and train machine learning models for a variety of tasks,
including image and speech recognition, natural language processing, and more.
TensorFlow provides a high-level API for building neural networks and other machine
learning models, as well as a low-level API for more advanced users who want to
customize their models.
One of the key benefits of TensorFlow is its ability to efficiently utilize hardware
accelerators such as GPUs and TPUs, which can greatly speed up the training and
inference of machine learning models. TensorFlow also has a large and active
community of developers, which means that there are many resources and libraries
available to help developers build and optimize their models.
KERAS:
Keras is a high-level neural networks API written in Python that is designed to be user-
friendly, modular, and extensible. It is built on top of other machine learning
frameworks, including TensorFlow, Theano, and CNTK. Keras allows developers to
build and train deep learning models with minimal code and provides a simplified
interface for implementing complex neural networks.
Keras was developed with the aim of making deep learning accessible to a wider
Audience, including researchers, developers, and data scientists who are not
necessarily experts in machine learning. Keras provides a set of pre-built
layers, activation functions, loss functions, and optimizers that can be easily
Combined to create a neural
32
network. It also supports a wide range of data formats and can be used for a variety of tasks,
including image classification, natural language processing, and more.
One of the key benefits of Keras is its simplicity and ease of use. With Keras, developers can
quickly prototype and iterate on their models without having to worry about the low- level
details of building and training a neural network. Keras also supports a wide range of
customization options for advanced users who want to fine-tune their models or implement
custom layers and loss functions.
Keras has become a popular choice for building deep learning models and has a largeand
active community of developers who contribute to its development and provide support for
users.
PYTORCH:
PyTorch is an open-source machine learning library developed by Facebook that is used for
building and training deep neural networks. It is built on top of Torch, a scientific computing
framework, and provides a Python interface for building and training machine learning
models.
PyTorch is known for its dynamic computational graph, which allows developers to change
the structure of their neural networks on the fly during training. This makes it easy to
implement complex neural networks and experiment with different architectures. PyTorch
also provides a wide range of pre-built modules, includingconvolutionaland recurrent layers,
activation functions, and loss functions.
One of the key benefits of PyTorch is its flexibility and ease of use. PyTorch provides a
simple and intuitive API that makes it easy for developers to build and train deep learning
models. It also supports a wide range of data formats and can be used for a varietyof tasks,
including computer vision, natural language processing, and more.
PyTorch has become a popular choice for building deep learning models and has a large and
active community of developers who contribute to its development and provide support for
users. PyTorch is also widely used in research settings and is
Often the library of choice for academic researchers and students.
33
PROPHET:
Prophet is an open-source time series forecasting library developed by Facebook that is used
for modeling and forecasting time series data. It is designed to be easy to use, fast, and
highly customizable, making it a popular choice for both beginners and advanced users.
Prophet uses an additive model that consists of three main components: trend, seasonality,
and holidays. The trend component models the underlying long-term growth or decline in the
time series, while the seasonality component models the periodic fluctuations in the time
series, such as daily, weekly, or monthly patterns. The holiday’s component allows for the
modeling of specific events or holidays that may affect the time series.
Prophet also provides a range of customization options, including the ability to include
custom seasonality and holiday effects, adjust the sensitivity of the trend and seasonality
components, and specify the number of Fourier terms used to model the seasonality
component.
One of the key benefits of Prophet is its ease of use and speed. Prophet provides a simple and
intuitive API that makes it easy for users to build and evaluate time series models. It also
includes built-in functionality for visualizing time series data and evaluating model
performance.
Prophet has become a popular choice for time series forecasting and has been used in a wide
range of applications, including financial forecasting, demand forecasting, and weather
forecasting.
34
PROJECT DESIGN
Requirement Specification–
● Functionality
● Performance
● Design constraints imposed on
● Implementation External
The software is meant to accept a user valid identification through an id which will provide
unique identity to individual user. It is through this user id that each user data can be
accessed on this platform. The requirements under proposed system are to maintain
information relevantto the following fields:
● User Profile - The full information of each and every stock must be maintained
in the System along with the price to regularly update it from time to time at regular
intervals which will be easily possible through
each user's unique id.
Record of Results - This phase will maintain information about stockstrack record.
All the results of stocks will be kept.
35
3.3 DATA FLOW DIAGRAM
Data Flow Diagrams (DFD) are graphical representations of a system that
illustrate the flow of data within the system. DFDs can be divided into
different levels, which provide varying degrees of detail about the system.
36
LEVEL– 1 DATA FLOW DIAGRAM
This level provides a more detailed view of the system by breaking down the major processes
identified in the level 0 DFD into sub-processes. Each sub-process is depicted as a separate
process on the level 1 DFD. The data flows and data stores associated with each sub-
process are also shown.
(LEVEL 1 DFD)
37
LEVEL 1 DFD — (PREDICTIVE MODEL ALGORITHMS)
38
3.4 USE CASE DIAGRAM
● Userscan make useof chat upload use cases to give input to the system.
● Select time format use case describes that user can input the time
format of the file in the system.
● Users can make use of Show analysis use cases to see the result of
theentireanalysis done bythe system.
39
USE CASE DIAGRAM
40
3.5 ACTIVITY DIAGRAM
In the activity diagram as the initial activity starts user will upload the file as input which is
action and in the next action time format will be selected.
● The decision box check chat format represents the validity of the timeformat of the file.
● If the time format is correct then analysis will be done and process will end.
ACTIVITY DIAGRAM
41
3.6 COLLABORATION DIAGRAM
The collaboration diagram is used to show the relationship between the objects in a system.
Both the sequence and the collaboration diagrams represent the same information but
differently. Instead of showing the flowof messages, it depicts the architecture of the object
residing in the system as it is based on object-oriented programming. An object consists of
several features. Multiple objects present in the system are connected to each other. The
collaboration diagram, which is also known as a communication diagram, is used to
portraythe object's architecture in the system.
● This collaboration diagram shows the relationship between the objects in a system.
42
SYSTEM IMPLEMENTATION
PYTHON:-
It is an interpreted, high-level general-purpose programming language. Created by Guido
Van Rossum and first released in 1991. Its language constructs and objects-oriented
approach aim to help programmer with clear, logical code for small and large-scale tools.
1. Python programs generally are smaller than other programming languages like Java.
Programmers have to type relatively less and the indentation requirement of the language
makes them readable all the time.
43
When exchanging data between a browser and a server, the data can only be text. Json is
text, and we can convert any JavaScript object into json and json to the server. We can also
convert any JSON received from the server into JavaScript objects. This way we work with
the data as JavaScript objects, with no complicated parsing and transactions.
DART:
Though optimizing the compiled JavaScript output to avoid expensive checks operations,
code written in dart can, in some cases, runfaster than equivalent code hand-written using
JavaScript idioms.
Daily Timeline
Similarly, we can create a daily timeline where you must group the data according to date
and count the number of messages. To display this analysis Line chart is perfect.
44
TESTING
Testing is the major quality control that can be used during software development. Its basic
function is to detect the errors in the software. During requirement analysis and design, the
output is a document that is usually textual and non- executable. After the coding phase, a
computer program isavailable that can be executed for testing purposes.
45
Functional Testing
Functional Testing is defined as a type of testing which verifies that each function of the
software application operates in conformance withthe requirement specification. This testing
involves checking of User Interface, APIs, Database, security, client server applications and
functionality of the Application under Test. The testing can be done either manuallyor using
automation.
Testing Plan:
46
Performance Yes Performance is the major dPieffreforernmtancmeodels anodf
criteria for evaluating any typ
of the system. It hold algorith ms is measured in
importance and is tested combi tion using the
likewise. True Rates for
following two approaches:
Individual
Approach
(prediction model
tfrirasitnedonthe
60 percent and
test on the rest 40
Spet arct iesntit)c al
Testing (Model
validations:
Accuracy % of
Decision Tree
Model and Error
rate of Regression
Model)
Stress No - -
Compliance No - -
Security No - -
47
Test Environment:
Software Items:
IISServer
Windows 7
Internet connection
Microsoft Excel
R for Windows 3.4.3
Hardware Items:
Personal Computer/Laptop
Wireless connection or connecting cable
48
Test Cases:
Test ID T1
Input Enter the Stock Symbol to update the data
Expected Output Data fetched from Yahoo! Finance
Status Pass
Test ID T2
Input Predict the stockrate for the very next trading day
Expected Output We get Close Price and Price movement for the next trading day
Status Pass
Test ID T3
Input Check the precision of output by predicting the data on a date whose values
are
Already known
Expected Output Outputs are partially precise
Status Pass
49
SYSTEM INPUTAND OUTPUTSCREENSHOTS
INPUT SCREENSHOTS
APP. PY
request
@app.route('/')
'POST'])
#
@app.errorhandle
r(404)
return
50
if name == " main ":
app.run(debug=True)
TEMPLATES:
404 ERROR.HTML:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport"
content="width=device-width, initial-
scale=1.0">
<title>404</title>
<style>
body{
font-size: 2rem;
margin: 0;
padding: 0;
display: flex;
justify-content: center;
align-items: center;
background-color: black;
color: antiquewhite;
font-family: 'Gill Sans', 'Gill Sans MT', Calibri,
'TrebuchetMS', sans-serif;
}
div{
width: 100%;
height: 100%;
display: flex;
align-items: center;
justify-content: center;
}
</style>
</head>
<body>
<div>
<h1>404 Error</h1>
</div>
</body>
INDEX.HTML:
<!DOCTYPE html>
51
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"
/>
<title>Predictor</title>
<link
href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.m
in.css" rel="stylesheet"
integrity="sha384-
1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3"
crossorigin="anonymous" />
<style>
body,
html {
margin: 0;
padding: 0;
height: 100%;
/* background: #60a3bc !important; */
background: #0000FF !important;
}
.user_card {
height: 400px;
width: 350px;
margin-top: auto;
margin-bottom: auto;
background: #f39c12;
position: relative;
display: flex;
justify-content: center;
flex-direction: column;
padding: 10px;
box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0
rgba(0, 0, 0, 0.19);
-webkit-box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px
20px 0 rgba(0, 0, 0, 0.19);
-moz-box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px
0 rgba(0, 0, 0, 0.19);
border-radius: 5px;
}
.brand_logo_container {
position: absolute;
height: 170px;
width: 170px;
52
top: -75px;
border-radius: 50%;
background: #60a3bc;
padding: 10px;
text-align: center;
}
.brand_logo {
height: 150px;
width: 150px;
border-radius: 100%;
border: 2px solid white;
}
.form_container {
margin-top: 100px;
}
.login_btn {
width: 100%;
background: #c0392b !important;
color: white !important;
}
.login_btn:focus {
box-shadow: none !important;
outline: 0px !important;
}
.login_container {
padding: 0 2rem;
}
.input-group-text {
background: #c0392b !important;
color: white !important;
border: 0 !important;
border-radius: 0.25rem 0 0 0.25rem !important;
}
.input_user,
.input_pass:focus {
box-shadow: none !important;
outline: 0px !important;
}
</style>
</head>
<body>
<!-- <div class="container">
<div class="d-flex align-items-center">
<h1>Stock Price Prediction</h1>
</div>
<form action="/predict/" method="post">
<div class="mb-3 from-group">
53
<label for="Open" class="form-label">Open</label>
<input type="text" class="form-control" id="Open"
name="Open">
</div>
<div class="mb-3 from-group">
<label for="High" class="form-label">High</label>
<input type="text" class="form-control" id="High"
name="High">
</div>
<div class="mb-3 from-group">
<label for="Low" class="form-label">Low</label>
<input type="text" class="form-control" id="Low"
name="Low">
</div>
<div class="mb-3 from-group">
<label for="Volume" class="form-label">Volume</label>
<input type="text" class="form-control" id="Volume"
name="Volume">
</div>
<button type="submit" class="btn btn-primary">Submit</button>
</form>
</div> -->
</html>
PRERDICTION.HTML:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="vi content="width=devic i
scale
<title>Close Price Predict</title>
link
href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/boot
strap.min.css" rel="stylesheet"
integrity=
"sha384-
1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3
"
crossorigin="anonymous" />
<style>
body{
display: flex;
align-items: center;
justify-content: center;
color: aliceblue;
background: #001e29 !important;
}
.row, h1,h2{
display: flex;
align-items: center;
55
color: aliceblue;
}
</style>
</head>
<body>
{% block content %}
<div class="row justify-content-md-center mb-4">
<div class="text-primary">
<h1>Closing Price</h1>
<h2> Prediction is {{ prediction }}</h2>
</div>
</div>
{% endblock %}
</body>
</html>
UTILITY.PY:
import joblib
import numpy as np
56
OUTPUT SCREENSHOTS
57
LIMITATIONS AND SCOPE OF PROJECT
2. Non-stationary data: Stock market data is often non-stationary, meaning that the
statistical properties of the data change over time. This can make it difficult to build
accurate machine learning models that can adapt to changing market conditions.
58
Make it difficult to trust and act on the model's recommendations.
Given these limitations, it is important to use machine learning as one of several tools
for predicting stock market prices, rather than relying on it exclusively. Itis also
important to incorporate fundamental analysis and other forms of market analysis into
the decision-making process.
59
7.2 SCOPE OF PROJECT
The scope of using machine learning for predicting stock market prices is quite broad,
and it has the potential to be a valuable tool for investors and traders. Here are some
of the key areas where machine learning can be applied:
2 .Sentiment analysis: Machine learning algorithms can be used to analyze social media
sentiment and news articles to gauge investor sentiment and identify potential market
trends. This information can be used to inform trading decisions and develop
investment strategies.
60
broad, and it has the potential to significantly improve the accuracy and efficiency of
investment strategies. However, it is important to remember that machine learning
should be used as one tool among many, and that human judgment and expertise remain
critica for successful investing.
61
GANTT CHART
62
IMPACT OF PROPOSED SYSTEMIN ACADEMICS
AND INDUSTRY
1. Conducting a stock price prediction project can help students develop a range of
skills, including data analysis, statistical modeling, programming, and problem-
solving. These skills are highly valuable in a variety of fields, particularly in
finance and related industries.
3. Stock price prediction projects can involve collaboration between students and
faculty members, as well as with industry professionals. This can help students to
develop professional networks and gain exposure to different perspectives and
approaches.
4. Stock price prediction can help investors make informed investment decisions.
By analyzing trends and patterns in stock prices, investors can decide whether to
buy, hold or sell shares of a particular company. Accurate predictions can result in
better investment outcomes and higher returns on investment.
5. Predicting stock prices can help companies manage their risk exposure by
identifying potential risks and opportunities. For example, if a company predicts
a decline in its stock price, it may decide to sell its shares before the price drops
further, there by reducing its exposure to risk.
6. Companies that are able to accurately predict stock. Stock prices may have
a competitive advantage over their rivals. By making better investment
63
CONCLUSION
To summarize, in this project, we attempt to build an automated trading system based on
Machine Learning algorithms. Based on historical price information, the machine
learning models will forecast next day returns of the target stock. A customized trading
strategy will then take the model prediction as input and generate actual buy/sell orders
and send them to a market simulator where the orders are executed. After training on
available data at a particular time interval, our application will back test on out of
sample data at a future time interval.
Following are some of the important Findings that were discovered after building this
project:
We found that only looking at a company’s past stock price itself is not sufficient enough
to predict its future returns. Better ways to do so is to look at the entire sector which the
target company is part of, and use historical price information of all companies within
the sector to predict the target’s next day return.
The Decision Tree model has achieved approximately 66 – 70 percent accuracy for most
of the stocks with statistical significance.
The Regression Model has achieved a high error rate close to 1% for many stocks, and
so steps should be taken in the real time environment to increase the Independent
variables for this analysis For future works, Variables about company fundamentals such
as revenues and earnings and about macroeconomic issues such as interest rates,
exchange rates and unemployment reports should also help predicting stock prices.
Automated trading should not be just about algorithms, programming and mathematics:
an awareness of fundamental market and macroeconomic issues is also needed to help
us decide whether the back test is predictive and the automated trading system will
continue to be predictive.
64
It has the potential to improve the accuracy and efficiency of investment strategies, and
it can be applied in a wide range of areas, including predictive modeling, sentiment
analysis, fraud detection, portfolio management, and trading algorithms.
However, it is important to recognize the limitations of machine learning for predicting
stock market prices. The stock market is complex and influenced by a wide range of
factors, and machine learning models may struggle to account for all of these
factors. It is important to use machine learning as one tool among many, and to
incorporate fundamental analysis and other forms of market analysis into the decision-
making process.
Additionally, it is important to approach machine learning with a critical eye and to
recognize the potential for bias and inaccuracies in the data and models.
Machine learning models should be constantly evaluated and updated to ensure that
they remain accurate and effective.
Overall, machine learning has the potential to significantly improve the accuracy and
efficiency of investment strategies, but it should be used as part of a comprehensive and
well-informed investment approach.
65
REFERENCES
66