Investment Analysis Documentation

CHAPTER 1: INTRODUCTION
1
CHAPTER 1: INTRODUCTION
1.1. PROBLEM STATEMENT
In this paper the objective is to meet out the general challenge, i.e., the goal is to improve the
decision-making power and wakefulness about the investment in the stock market from the naïve user’s
perspective. The naïve investor is having the problem of choosing the valuable stock. The reason for this
concern is the lack of knowledge about the market
1.2. THE PROPOSED SOLUTION
Using Qlik we can analyse previous stock data of certain companies, with help of certain
parameters that affect stock value. This will also help us to determine the values that particular stock
will have in near future.
1.3 INTRODUCTION TO DATA ANALYTICS
OVERVIEW
Data analysis is a process of inspecting, cleansing, transforming, and modelling data with the

goal of discovering useful information, suggesting conclusions, and supporting decision-making. Data
analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names,
in different business, science, and social science domains.
Data Analytics is the science of analysing data to convert information to useful knowledge. This
knowledge could help us understand our world better, and in many contexts enable us to make better
decisions. While this is the broad and grand objective, the last 20 years has seen steeply decreasing costs
to gather, store, and process data, creating an even stronger motivation for the use of empirical
approaches to problem solving. Data analysis seeks to present you with a wide range of data analytic
2
techniques and is structured around the broad contours of the different types of data analytics, namely,
descriptive, inferential, predictive, and prescriptive analytics.
Data mining is a particular data analysis technique that focuses on modelling and knowledge
discovery for predictive rather than purely descriptive purposes, while business intelligence covers data
analysis that relies heavily on aggregation, focusing on business information. In statistical applications,
data analysis can be divided into descriptive statistics, exploratory data analysis (EDA),
and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data and CDA
on confirming or falsifying existing hypotheses. Predictive analytics focuses on application of statistical
models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and
structural techniques to extract and classify information from textual sources, a species of unstructured
data.
THE PROCESS OF DATA ANALYSIS
Analysis refers to breaking a whole into its separate components for individual examination.
Data analysis is a process for obtaining raw data and converting it into information useful for decision-
making by users. Data is collected and analyzed to answer questions, test hypotheses or disprove
theories.
Statistician John Tukey defined data analysis in 1961 as: "Procedures for analyzing data,
techniques for interpreting the results of such procedures, ways of planning the gathering of data to
make its analysis easier, more precise or more accurate, and all the machinery and results of
(mathematical) statistics which apply to analysing data.
There are several phases that can be distinguished, described below. The phases are iterative, in
that feedback from later phases may result in additional work in earlier phases.
DATA REQUIREMENTS:
The data is necessary as inputs to the analysis, which is specified based upon the requirements of
those directing the analysis or customers (who will use the finished product of the analysis). The general
type of entity upon which the data will be collected is referred to as an experimental unit (e.g., a person
or population of people). Specific variables regarding a population (e.g., age and income) may be
specified and obtained. Data may be numerical or categorical (i.e., a text label for numbers).
DATA COLLECTION:
Data is collected from a variety of sources. The requirements may be communicated by analysts
to custodians of the data, such as information technology personnel within an organization. The data
3
may also be collected from sensors in the environment, such as traffic cameras, satellites, recording
devices, etc. It may also be obtained through interviews, downloads from online sources, or reading
documentation.
DATA PROCESSING:
Data initially obtained must be processed or organised for analysis. For instance, these may
involve placing data into rows and columns in a table format (i.e., structured data) for further analysis,
such as within a spreadsheet or statistical software.
DATA CLEANING:
Once processed and organised, the data may be incomplete, contain duplicates, or contain errors.
The need for data cleaning will arise from problems in the way that data is entered and stored. Data
cleaning is the process of preventing and correcting these errors. Common tasks include record
matching, identifying inaccuracy of data, overall quality of existing data, deduplication, and column
segmentation. Such data problems can also be identified through a variety of analytical techniques. For
example, with financial information, the totals for particular variables may be compared against
separately published numbers believed to be reliable. Unusual amounts above or below pre-determined
thresholds may also be reviewed. There are several types of data cleaning that depend on the type of
data such as phone numbers, email addresses, employers etc. Quantitative data methods for outlier
detection can be used to get rid of likely incorrectly entered data. Textual data spell checkers can be
used to lessen the amount of mistyped words, but it is harder to tell if the words themselves are correct.
EXPLORATORY DATA ANALYSIS:
Once the data is cleaned, it can be analyzed. Analysts may apply a variety of techniques referred
to as exploratory data analysis to begin understanding the messages contained in the data. The process
of exploration may result in additional data cleaning or additional requests for data, so these activities
may be iterative in nature. Descriptive statistics, such as the average or median, may be generated to
help understand the data. Data visualization may also be used to examine the data in graphical format, to
obtain additional insight regarding the messages within the data.
4
INITIAL DATA ANALYSIS:
The most important distinction between the initial data analysis phase and the main analysis
phase, is that during initial data analysis one refrains from any analysis that is aimed at answering the
original research question. The initial data analysis phase is guided by the following four questions:
QUALITY OF DATA:
The quality of the data should be checked as early as possible. Data quality can be assessed in
several ways, using different types of analysis: frequency counts, descriptive statistics (mean, standard
deviation, median), normality (skewness, kurtosis, frequency histograms, n: variables are compared with
coding schemes of variables external to the data set, and possibly corrected if coding schemes are not
comparable.
QUALITY OF MEASUREMENTS:
The quality of the measurement instruments should only be checked during the initial data
analysis phase when this is not the focus or research question of the study. One should check whether
structure of measurement instruments corresponds to structure reported in the literature.
There are two ways to assess measurement: [NOTE: only one way seems to be listed]
 Analysis of homogeneity (internal consistency), which gives an indication of the reliability of a

measurement instrument. During this analysis, one inspects the variances of the items and the scales,
the Cronbach's α of the scales, and the change in the Cronbach's alpha when an item would be
deleted from a scale.
Initial transformations:
After assessing the quality of the data and of the measurements, one might decide to impute missing
data, or to perform initial transformations of one or more variables, although this can also be done
during the main analysis phase.Possible transformations of variables are:
 Square root transformation (if the distribution differs moderately from normal)
 Log-transformation (if the distribution differs substantially from normal)
 Inverse transformation (if the distribution differs severely from normal)
 Make categorical (ordinal / dichotomous) (if the distribution differs severely from normal, and
no transformations help)
5
Did the implementation of the study fulfil the intentions of the research design?
One should check the success of the randomization procedure, for instance by checking whether
background and substantive variables are equally distributed within and across groups.
If the study did not need or use a randomization procedure, one should check the success of the non-
random sampling, for instance by checking whether all subgroups of the population of interest are
represented in sample.Other possible data distortions that should be checked are:
 dropout (this should be identified during the initial data analysis phase)

 Item nonresponse (whether this is random or not should be assessed during the initial data
analysis phase)
 Treatment quality (using manipulation checks).
CHARACTERISTICS OF DATA SAMPLE:
In any report or article, the structure of the sample must be accurately described. It is especially
important to exactly determine the structure of the sample (and specifically the size of the subgroups)
when subgroup analyses will be performed during the main analysis phase.
The characteristics of the data sample can be assessed by looking at:
 Basic statistics of important variables

 Scatter plots
 Correlations and associations
 Cross-tabulations
Final stage of the initial data analysis:
During the final stage, the findings of the initial data analysis are documented, and necessary,
preferable, and possible corrective actions are taken.
Also, the original plan for the main data analyses can and should be specified in more detail or
rewritten. In order to do this, several decisions about the main data analyses can and should be made:
 In the case of non-normal: should one transform variables; make variables categorical

(ordinal/dichotomous); adapt the analysis method?
6
 In the case of missing data: should one neglect or impute the missing data; which imputation
technique should be used?
 In the case of outliers: should one use robust analysis techniques?
 In case items do not fit the scale: should one adapt the measurement instrument by omitting
items, or rather ensure comparability with other (uses of the) measurement instrument(s)?
 In the case of (too) small subgroups: should one drop the hypothesis about inter-group
differences, or use small sample techniques, like exact tests or bootstrapping?
 In case the randomization procedure seems to be defective: can and should one
calculate propensity scores and include them as covariates in the main analyses?
ANALYSIS:
Several analyses can be used during the initial data analysis phase:
 Univariate statistics (single variable)

 Bivariate associations (correlations)
 Graphical techniques (scatter plots)
It is important to take the measurement levels of the variables into account for the analyses, as special
statistical techniques are available for each level:
 Nominal and ordinal variables:

 Frequency counts (numbers and percentages)
 Associations
 circumambulations (crosstabulations)
 hierarchical loglinear analysis (restricted to a maximum of 8 variables)
 loglinear analysis (to identify relevant/important variables and possible
confounders)
 Exact tests or bootstrapping (in case subgroups are small)
 Computation of new variables
 Continuous variables:
 Distribution
 Statistics (M, SD, variance, skewness, kurtosis)
 Stem-and-leaf displays
 Box plots
7
1.4 A SURVEY ON PYTHON, TABLEAU AND R:
R PROGRAMMING:
R is the leading tool for statistics, data analysis, and machine learning. It is more than a
statistical package; it’s a programming language, so you can create your own objects, functions, and
packages. R programs explicitly document the steps of your analysis and make it easy to reproduce
and/or update analysis, which means you can quickly try many ideas and/or correct issues. It’s platform-
independent, so you can use it on any operating system. Not only is R free, but it’s also open-source.
That means anyone can examine the source code to see exactly what it’s doing. R allows you to
integrate with other languages (C/C++, Java, Python) and enables you to interact with many data
sources: ODBC-compliant databases (Excel, Access) and other statistical packages
PYTHON PROGRAMMING:
Python and Ruby have become especially popular in recent years for building websites using
their numerous web frameworks, like Rails (Ruby) and Django (Python). Such languages are often
called scripting languages as they can be used to write quick-and-dirty small programs, or scripts. I
don’t like the term “scripting language” as it carries a connotation that they cannot be used for building
mission-critical software. Among interpreted languages Python is distinguished by its large and active
scientific computing community. Adoption of Python for scientific computing in both industry
applications and academic research has increased significantly since the early 2000s.
For data analysis and interactive, exploratory computing and data visualization, Python will
inevitably draw comparisons with the many other domain-specific open source and commercial
programming languages and tools in wide use, such as R, MATLAB, SAS, Stata, and others. In recent
years, Python’s improved library support (primarily pandas) has made it a strong alternative for data
manipulation tasks. Combined with Python’s strength in general purpose programming, it is an excellent
choice as a single language for building data-centric applications.
TABLEAU
Tableau is a Business Intelligence tool for visually analysing the data. Users can create and
distribute an interactive and shareable dashboard, which depict the trends, variations, and density of the
data in the form of graphs and charts. Tableau can connect to files, relational and Big Data sources to
acquire and process data. The software allows data blending and real-time collaboration, which makes it
very unique. It is used by businesses, academic researchers, and many government organizations for
8
visual data analysis. It is also positioned as a leader Business Intelligence and Analytics Platform in
Gartner Magic Quadrant.
Tableau has many desirable and unique features. It powerful data discovery and exploration application
allows you to answer important questions in seconds. You can use Tableau's drag and drop interface to
visualize any data, explore different views, and even combine multiple databases easily. It does not
require any complex scripting. Anyone who understands the business problems can address it with a
visualization of the relevant data. After analysis, sharing with others is as easy as publishing to Server.
1.5. INTRODUCTION TO QLIK AND IT’S FEATURES:
Qlik Sense is a visual analysis platform that unlocks the potential of every user to harness information
and uncover insight. Users of all types and skill levels can generate the insight they need to instantly
answer questions and solve problems in their lines of business, problems that require much more than
just an initial overview of information.
Traditional BI and standalone data visualization tools limit exploration and discovery for business users.
These tools are good at providing answers for predefined questions, but do not offer any way for users
to ask new, follow-up questions. Users either have to settle for what is in the report or visualization or
wait for IT to create new SQL queries or reports for them.
Qlik Sense is different. With Qlik Sense, users of any skill level are empowered to follow their own
paths to insight. Through the Associative Experience, you can ask question after question, from any
object, in any direction, using simple selections and searches. Qlik Sense provides instant feedback on
associated and unrelated data and updated analytics after every step. The result is a vehicle for discovery
that delivers the right insight at all stages of the exploratory process
FEATURES
 Data visualizations
 Dynamic BI ecosystem
 Interact with dynamic apps, dashboards and analytics
 Search across all data
 Natural search
 Default and custom connectors
 App scripts and Workbench
9
 Roles & Permissions
 Secure, real-time collaboration
 Advanced reporting templates
 Custom reports
 Mobile-ready
 Ability to develop their own data visualizations:
 Ability to build dashboards on ad-hoc data sources:
 Ability to create data stories
1.6. PRESENT STATUS AND WHY QLIK:

All the data related to the share market analysis is being gathered from yahoo finance website. The Data
has been processed and visualized in various platforms such as Qlik and R. So, for now all the
visualizations and analysis of data were done on data in Qlik, R and Python.
We choseQlik for this project due to following reasons:
GREAT VISUALIZATIONS:
A couple of years ago Qlik acquired NcomVA that is renowned for their data visualization. This
company was formed by PhDs that specialized in developing graphic visualizationsto communicate what is
happening in your data. The fruits of their labour are in the Qlik Sense product for the first time and as a result we
get award winning visualizations to help us understand our data.
EASY TO USE:
The product is easy to use. It’s designed for the end user so they can truly perform self-serve BI with no
more waiting on IT to build a report. In fact, it’s so easy even our salespeople can build solutions and tell their
story using the tool. A member of our sales team, with no previous experience or training with Qlik Sense,
grabbed the tool and in an afternoon had built an application to communicate to the rest of the team his sales
activity, where we were getting traction, what activities were generating results and what activities weren’t.
Imagine what your team could do with some training!
MOBILITY:
Although you could browse QlikView using tablets, Qlik Sense takes mobility to a new level. The entire
platform is designed for mobility, allowing end users to consume their business intelligence on any
device. But it goes even further. You can develop Qlik Sense applications and visualizations using any
device as well.
10
STORYTELLING:
Qlik Sense has a new feature called storytelling. It allows you to build presentations within the
tool so you can share the insights you gain in your analysis. The benefit of their storytelling feature is
the ability to instantly return to the analysis application to answer questions your audience may have.
ASSOCIATIVE DATA ENGINE:
Qlik Sense still has the Associative Data Engine allowing us to see what is associated with the
data you have selected. This is extremely powerful in providing insight into the relationships between
items in your business. Adding to this is the “Smart Search” which allows you to search for and find
anything regardless of where it is in your data.
GOVERNED DATA AND SECURITY:

Although Qlik Sense is extremely easy to use and allows the business user to perform self-serve
business intelligence it is also Enterprise ready. Qlik Sense provides for Governed Data so the
organization can regulate the data to ensure everyone is using the same data sources and formulas. This
ensures there is a single version of the truth. In addition to the governed data we have enterprise security
to ensure that only those that are allowed to see the information have access to it.
11
CHAPTER 2
SOFTWARE REQUIREMENT SPECIFICATION
12
CHAPTER 2: SOFTWARE REQUIREMENT SPECIFICATION
WHAT IS SRS?
Software Requirement Specification (SRS) is the starting point of the software
developing activity. As system grew more complex it became evident that the goal of the
entire system cannot be easily comprehended. Hence the need for the requirement phase
arose. The software project is initiated by the client needs. The SRS is the means of
translating the ideas of the minds of clients (the input) into a formal document (the output of
the requirement phase.)
The SRS phase consists of two basic activities:
PROBLEM/REQUIREMENT ANALYSIS:
The process is order and more nebulous of the two, deals with understand the
problem, the goal and constraints.
REQUIREMENT SPECIFICATION:
Here, the focus is on specifying what has been found giving analysis such as
representation, specification languages and tools, and checking the specifications are addressed
during this activity. The Requirement phase terminates with the production of the validate SRS
document. Producing the SRS document is the basic goal of this phase.
2.1 FUNCTIONALREQUIREMENTS:
● Finding Major Tables available in the data model.
● Find main fields on the tables with measures and dimension.
● Identify the different dimensions like Dates, Open value, Close value, High value, Low value,
Growth etc.
● Work on the KPIs to find out if it is formula than you need to find different fields
also.
● Work on Major data sources and its types.
13
2.2 NON-FUNCTIONALREQUIREMENTS:
● This provides a good availability of data
● The data can be recovered easily
● Maintainability
● Serviceability
● Usability
2.3 ARCHITECTURALDIAGRAMS:
14
2.4
SDLC:
SDLC METHODOLOGIES:
This document plays a vital role in the development of life cycle (SDLC) as it describes the
complete requirement of the system. It means for use by developers and will be the basic
during testing phase. Any changes made to the requirements in the future will have to go
through formal change approval process.
SPIRAL MODEL was defined by Barry Boehm in his 1988 article, “A spiral Model of
Software Development and Enhancement. This model was not the first model to discuss
iterative development. The spiral model is similar to the incremental model, with more emphasis
placed on risk analysis. The spiral model has four phases: Planning, Risk Analysis,
Engineering and Evaluation. A software project repeatedly passes through these phases in
iterations (called Spirals in this model). The baseline spirals, starting in the planning phase,
requirements are gathered and risk is assessed. Each subsequent spiral build on the baseline
spiral.
As originally envisioned, the iterations were typically 6 months to 2 years long. Each.
Phase starts with a design goal and ends with a client reviewing the progress thus far.
Analysis and engineering efforts are applied at each phase of the project, with an eye
15
toward the end goal of the project.
The steps for Spiral Model can be generalized as follows:
● The new system requirements are defined in as much details as possible. This
usually involves interviewing a number of users representing all the external or
internal users and other aspects of the existing system.
● A preliminary design is created for the new system.
● A first prototype of the new system is constructed from the preliminary design.
This is usually a scaled-down system and represents an approximation of the

characteristics of the final product.
● A second prototype is evolved by a fourfold procedure:
1. Evaluating the first prototype in terms of its strengths, weakness, and risks.
2. Defining the requirements of the second prototype.
3. Planning and designing the second prototype.
4. Constructing and testing the second prototype.

● At the customer option, the entire project can be aborted if the risk is deemed too
great. Risk factors might involve development cost overruns, operating-cost
miscalculation, or any other factor that could, in the customer’s judgment, result
in a less-than-satisfactory final product
● The existing prototype is evaluated in the same manner as was the previous
prototype, and if necessary, another prototype is developed from it according to
the fourfold procedure outlined above.
● The preceding steps are iterated until the customer is satisfied that the refined
prototype represents the final product desired.
● The final system is constructed, based on the refined prototype.
● The final system is thoroughly evaluated and tested. Routine maintenance is

carried on a continuing basis to prevent large scale failures and to minimize down
16
time.
17
The following diagram shows how a spiral model acts like:
● Planning Phase: Requirements are gathered during the planning phase. Requirements
like ‘BRS’ that is ‘Business Requirement Specifications’ and ‘SRS’ that is ‘System
Requirement specifications’.
● Risk Analysis: In the risk analysis phase, a process is undertaken to identify risk and
alternate solutions. A prototype is produced at the end of the risk analysis phase. If
any risk is found during the risk analysis then alternate solutions
● are suggested and implemented.
● Engineering Phase: In this phase software is developed, along with testing at the end
of the phase. Hence in this phase the development and testing are done.
● Evaluation phase: This phase allows the customer to evaluate the output of the
project to date before the project continues to the next spiral.
18
2.5 AGILE METHOD AND ITSPROJECT:
Agile Project Management is one of the revolutionary methods introduced for the
practice of project management. This is one of the latest project management strategies that is
mainly applied to project management practice in software development. Therefore, it is best
to relate agile project management to the software development process when understanding
it.
From the inception of software development as a business, there have been a number
of processes following, such as the waterfall model. With the advancement of software
development, technologies and business requirements, the traditional models are not robust
enough to cater the demands.
Therefore, more flexible software development models were required in order to

address the agility of the requirements. As a result of this, the information technology
community developed agile software development models.
'Agile' is an umbrella term used for identifying various models used for agile
development, such as Scrum. Since agile development model is different from conventional
models, agile project management is a specialized area in project management.
PROCESS:
19
It is required for one to have a good understanding of the agile development process in order
to understand agile project management.
There are many differences in agile development model when compared to traditional
models:
The agile model emphasizes on the fact that entire team should be a tightly integrated
unit. This includes the developers, quality assurance, project management, and the
customer.
Frequent communication is one of the key factors that makes this integration
possible. Therefore, daily meetings are held in order to determine the day's work and
dependencies.
Deliveries are short-term. Usually a delivery cycle ranges from one week to four
weeks. These are commonly known as sprints.
Agile project teams follow open communication techniques and tools which enable
the team members (including the customer) to express their views and feedback
openly and quickly. These comments are then taken into consideration when shaping
the requirements and implementation of the software.
2.6 ADVANTAGES OFAGILE:
The advantages of an agile project management function are given below.

From one project to another, these responsibilities can slightly change and are interpreted
differently.
Responsible for maintaining the agile values and practices in the project team.
The agile project manager removes impediments as the core function of the role. Helps the
project team members to turn the requirements backlog into working software functionality.
Facilitates and encourages effective and open communication within the team. Responsible
for holding agile meetings that discusses the short-term plans and plans to overcome
obstacles.
Enhances the tool and practices used in the development process. Agile project manager is
the chief motivator of the team and plays the mentor role for the team member as well
20
2.7UML
UML, short for Unified Modeling Language, is a standardized modeling language consisting
of an integrated set of diagrams, developed to help system and software developers for
specifying, visualizing, constructing, and documenting the artifacts of software systems, as
well as for business modeling and other non-software systems. The UML represents a
collection of best engineering practices that have proven successful in the modeling of large
and complex systems. The UML is a very important part of developing object-oriented
software and the software development process. The UML uses mostly graphical notations to
express the design of software projects. Using the UML helps project teams communicate,
explore potential designs, and validate the architectural design of the software.
GOALS OF UML
1. Provide users with a ready-to-use, expressive visual modelling language so they can develop
and exchange meaningful models.
2. Provide extensibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development processes.
4. Provide a formal basis for understanding the modelling language.
5. Encourage the growth of the OO tools market.
6. Support higher-level development concepts such as collaborations, frameworks, patterns and
components.
7. Integrate best practices.
21
ALL UML DIAGRAMS
USE CASE DIAGRAM

A use-case model describes a system's functional requirements in terms of use cases. It is a
model of the system's intended functionality (use cases) and its environment (actors). Use
cases enable you to relate what you need from a system to how the system delivers on those
needs.
22
CLASS DIAGRAM
The class diagram is a central modelling technique that runs through nearly all object-oriented
methods. This diagram describes the types of objects in the system and various kinds of static
relationships which exist between them.
Relationships
There are three principal kinds of relationships which are important:
1. Association - represent relationships between instances of types (a person works for a company, a
company has a number of offices.
2. Inheritance - the most obvious addition to ER diagrams for use in OO. It has an immediate
correspondence to inheritance in OO design.
3. Aggregation - Aggregation, a form of object composition in object-oriented design.
SEQUENCE DIAGRAM
UML sequence diagrams are used to represent the flow of messages, events and actions
between the objects or components of a system. Time is represented in the vertical direction
23
showing the sequence of interactions of the header elements, which are displayed horizontally
at the top of the diagram. Sequence Diagrams are used primarily to design, document and
validate the architecture, interfaces and logic of the system by describing the sequence of
actions that need to be performed to complete a task or scenario. UML sequence diagrams are
useful design tools because they provide a dynamic view of the system behavior which can be
difficult to extract from static diagrams or specifications.
24
CHAPTER 3
IMPLEMENTATION
Chapter 3: Implementation
3.1 THE MULTIDIMENSIONAL APPROACH:
25
The strength of OLTP databases is that they can perform large amounts of small transactions, keeping
the database available and the data consistent at all time. The normalization discussed in Chapter 2 helps
keeping the data consistent, but it also introduces a higher degree of complexity to the database, which
causes huge databases to perform poorly when it comes to composite aggregation operations. In the
context of business it is desirable to have historical data covering years of transactions, which results in
a vast amount of database records to be analyzed. It is not very difficult to realize that performance
issues will arise when processing analytical queries that requires complex joining on such databases.
Another issue with doing analysis with OLTP is that it require rather complex queries, specially
composed for each request, in order to get the desired result.
OLAP:
In order to handle the above issues, the concept of Online Analytical Processing (OLAP) has
been proposed and widely discussed through the years and many papers have been written on the
subject. OLTP is often used to handle large amounts of short and repetitive transactions in a constant
flow, such as bank transactions or order entries. The database systems are designed to keep the data
consistent and to maximize transaction throughput. OLAP databases are at the other hand used to store
historical data over a long period of time, often collected from several data sources, and the size of a
typical OLAP database is often orders of magnitude larger than that of an ordinary OLTP database.
OLAP databases are not updated constantly, but they are loaded on a regular basis such as every night,
every week-end or at the end of the month. This leads to few and large transactions, and query response
time is more important than transaction throughput since querying is the main usage of an OLAP
database The. core of the OLAP technology is the data cube, which is a multidimensional database
model. The model consists of dimensions and numeric metrics which are referred to as measures. The
measures are numerical data such as revenue, cost, sales and budget. Those are dependent upon the
dimensions, which are used to group the data similar to the group by operator in relational databases.
Typical dimensions are time, location and product, and they are often organized in hierarchies. A
hierarchy is a structure that defines levels of granularity of a dimension and the relationship between
those levels. A time dimension can for example have hours as the finest granularity, and higher up the
hierarchy can contain days, months and years. When a cube is queried for a certain measure, ranges of
one or several dimensions can be selected to filter the data. The data cube is based on a data warehouse,
which is a central data storage possibly loaded with data from multiple sources. Data warehouses tend to
be very large in size, and the design process is a quite complex and time demanding task. Some
companies could settle with a data mart instead, which is a data warehouse restricted to a departmental
26
subset of the whole data set. The data warehouse is usually implemented as a relational database with
tables grouped into two categories; dimension tables and fact tables. A dimension table is a table
containing data that defines the dimension. A time dimension for example could contain dates, names of
the days of the week, week numbers, months names, month numbers and year. A fact table contains the
measures, that is aggregatabledatathatcanbecounted, summed, multiplied, etc. FacttablesalsoContain
references (foreign keys) to the dimension tables in the cube so the facts can be grouped by the
dimensional data. A data warehouse is generally structured as a star schema or a snowflake schema.
Figure 3.1 illustrates an example of a data warehouse with the star schema structure. As seen in the
figure, a star schema has a fact table in the middle and all dimension tables are referenced from this
table. With a little imagination the setup could be thought of as star shaped. In a star schema, the
dimension tables do not have references to other dimension tables. If they do, the structure is called a
snowflake schema instead. A star schema generally violates the 3NF by having the dimension tables
being several tables joined together, which is often preferred because of the performance loss that 3NF
causes when the data sets are very large. If it for some reason is desirable to keep the data warehouse in
3NF the snowflake schema can be used.
27
3.2 The Qlik Sense Platform and Implementation:
Qlik Sense Platform:
The Qlik Analytics Platform is a developer platform for building custom analytic applications
based on rich front-end and back-end APIs. It gives you full API access to the Qlik engine to
build rich and smart data-driven analytic applications. You can take advantage of the Qlik
Analytics Platform and build web applications for extranet and Internet deployment.
Components of the Qlik Analytics Platform:
The Qlik Analytics Platform consists of the following components:
 Qlik Management Console (QMC) and Dev Hub
 Qlik Sense APIs and SDKs
 Qlik engine and Qlik Sense supporting services
28
Qlik Sense product family:
The Qlik Sense product family includes Qlik Sense Enterprise, Qlik Sense Desktop and Qlik

Sense Cloud all built on top of the Qlik Analytics Platform.
Qlik Analytics Platform:
The Qlik Analytics Platform is a package that gives developers and OEM partners the ability
to embed Qlik´s visual analytics capabilities into any application through web mashups or by
embedding into custom applications. It includes the Qlik engine and the full suite of product
APIs and SDKs.
Qlik Sense Enterprise:
Qlik Sense Enterprise is a self-service environment built on top of the Qlik Analytics

Platform, with a focus on empowering end users to do self-service exploration, discovery and
storytelling
29
Qlik Sense Cloud:
Qlik Sense Cloud is a solution for sharing Qlik Sense apps, so that you can collaborate with
others and make data discoveries together. Additionally, users can access the cloud and the
apps from any device, including mobile devices, with an Internet connection and a modern
web browser.
Qlik Sense Desktop:
Qlik Sense Desktop is a Windows application that gives individuals the possibility to try
out Qlik Sense and create personalized, interactive data visualizations, reports and
dashboards from multiple data sources with drag-and-drop ease. It’s free for personal and
internal business use.
30
Qlik Sense Architecture:
The Qlik Sense architecture consists of a number of services that perform various activities in

the software. The services can be distributed across one or more nodes, that is, server
machines, that together form a site. One node assumes the role of central node, which is used
as the central point of control and the other nodes can be configured to perform specific roles
within the site such as adding capacity for user’s or running reloads.
Technologies used in Qlik Sense:
Qlik Sense relies on the following technologies;
 HTML5
 WebSocket
 CSS3
 JSON
 Canvas
 REST
31
In addition to this, Qlik Sense uses the following libraries:
 AngularJS
 RequireJS
Advantages of using Qlik Sense Platform:
 Easy to use - simple and intuitive use of the platform,

 Transparent reporting and analysis - data visualization in a meaningful and innovative
way,
 Scalability - immediate response time, with no restrictions on the amount of data,
 Data Integration - fast integration of all data from various sources into a single
application,
 Identify trends – recognition of trends and information (which may not be
immediately apparent) to help make innovative decisions,
 Teamwork - common decision-making process and real-time collaboration security,
 Full filtering data - search across all data - directly and indirectly,
 Various forms of data presentation - use of dynamic applications, reports, dashboards
and analyses.
 Mobility - access, analyse and retrieve data from mobile devices,
 Fast implementation – usually takes about a week,
 Low cost - quick return on investment, thanks to the short implementation period.
3.4 Use of R
 R is similar to other programming languages, like C, Java and Perl, in that it helps
people perform a wide variety of computing tasks by giving them access to various
commands. For statisticians, however, R is particularly useful because it contains a
number of built-in mechanisms for organizing data, running calculations on the
information and creating graphical representations of data sets.
 What makes R so useful — and helps explain its quick acceptance — is that
statisticians, engineers and scientists can improve the software’s code or write
32
variations for specific tasks. Packages written for R add advanced algorithms,
coloured and textured graphs and mining techniques to dig deeper into databases.
 “The great beauty of R is that you can modify it to do all sorts of things,” said Hal
Varian, chief economist at Google. “And you have a lot of pre-packaged stuff that’s
already available, so you’re standing on the shoulders of giants.”
3.5 Use of Python
 Python is an open source tool that can be used to manipulate data and represent it in the form
of statistics/charts/insights.
 Python can also be used to identify patterns in data and develop models that can be used to
predict future results based on past values
 One of the major reasons for using python is the availability of libraries
 NumPy is used for fast mathematical operations such as performing

matrix multiplications and many other mathematical functions for
computations on arrays of any dimension.
 SciPy is also a scientific computing library that adds a collection of
algorithms and high-level commands for manipulating and visualizing
data. It also contains modules for optimization, linear algebra,
integration, Fast Fourier Transform, signal and image processing and
much more.
 Pandas provides easy to use data analysis tools and contains functions
designed to make data analysis fast and easy.
 Matplotlib is a Python library that supports 2D and 3D graphics. It is used
to produce publication figures like histograms, power spectra, bar charts,
box plots, pie charts and scatter plots with just few lines of code.
Matplotlib easily integrates with Pandas data frames to make
visualisations quickly and conveniently.
33
3.6. RESULT AND COMPARISON:
DATABASE USED
VISUALIZATION IN QLIK
34
VISULIZATION IN QLIK
35
36
CHAPTER 4
TEST CASES AND TESTING
37
4.1 TESTING
In a software development project, errors can be injected at any stage during

development. The development of software involves a series of productionactivities where
opportunities for injection of human fallibility’s are enormous.
Because of human inability to perform and communicate with perfection, software

development is accomplished by a quality assurance activity.
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification, design and coding. Testing presents an interesting anomaly
for the software engineer. The engineer creates a series of test cases that are intended to
demolish the software engineer process that could be viewed as destructive rather than
constructive.
TESTING OBJECTIVE
 Testing is a process of executing a program with the intent of finding an  error. 
 A good test case is one that has a high probability of finding an as yet
undiscovered error.
 A successful test one that uncovers an as –yet undiscovered error.
4.2 THE TYPE OF TESTING
UNIT TESTING:
Unit testing focuses verification effort on the smallest unit of software design that is
the module. Using procedural design description as a guide, important control paths are
tested to uncover errors within the boundaries of the module. The unit test is normally white
box testing oriented and the step can be conducted in parallel for multiple modules.
38
INTEGRATION TESTING:
Integration testing is a systematic technique for constructing the program structure, while
conducting test to uncover errors associated with the interface. The objective is to take unit
tested methods and build a program structure that has been dictated by design.
 TOP-DOWN INTEGRATION:
Top down integrations is an incremental approach for construction of program structure.
Modules are integrated by moving downward through the control hierarchy, beginning with
the main control program. Modules subordinate to the main program are incorporated in the
structure either in the breath-first or depth-first manner.
 BOTTOM-UP INTEGRATION:
This method as the name suggests, begins construction and testing with atomic modules i.e.,
modules at the lowest level. Because the modules are integrated in the bottom up manner the
processing required for the modules subordinate to a given level is always available and the
need for stubs is eliminated.
SYSTEM TESTING:
System testing is actually a series of different tests whose primary purpose is to fully exercise
the computer-based system. Although each test has a different purpose, all work to verify that
all system elements have been properly integrated to perform allocated functions.
VALIDATION TESTING:
At the end of integration testing software is completely assembled as a package. Validation

testing is the next stage, which can be defined as successful when the software functions in
the manner reasonably expected by the customer. Reasonable expectations are those defined
in the software requirements specifications. Information contained in those sections form a
basis for validation testing approach.
39
SECURITY TESTING:
Attempts to verify the protection mechanisms built into the system.
PERFORMANCE TESTING:
This method is designed to test runtime performance of software within the context of an
integrated system.
4.3 THE TYPES OF TESTING USED
40
4.4 TEST CASES:
 Testers have two main jobs, first one is to create a test case and the second is to use
those for test execution. Testers create cases for various types of testing and provide
details on what tester should do and what are the expected results of the test.
 Test cases are documents, may be a spread sheet or directly maintained in the test
management tools maintained by testers, so that even a novice can read and test the
application or the product. Please note that, a test case created by one tester may be
used by another tester.
 So, it is very essential that the author of the test case writes it in detail so that
everyone can understand the test case.
 Test Cases is a document that has the steps to be executed by tester to test a feature
and what should be expected output from the application or product under test. The
test case is a running document and must be updated and used as per the changing
requirements.
 Test cases should be prepared for constructive as well as destructive purpose. What
we usually call as Positive and negative test cases.
 Ideally, constructive testing is carried out from functionality, system, performance
point of view, whereas destructive testing has emphasis on breaking the system by
usually putting invalid inputs.
41
CHAPTER 5
CONCLUSION AND FUTURE ENHANCEMENT
42
CONCLUSION
As it is revealing, we have been successful in rough estimation of the stock market analysis in
the project. Though, the quality of “preciseness" becomes more significant as the
sensitiveness of the data rises. Thus, regarding the work that has been done, for future, one of
the ideas to apply to gain better quality is to consider weighted ranking of the most similar
past data in search for the likelihood tolerance. Intuitively, it will somehow try to control the
deviation from the actual values that are seen over time. Additionally, further boundary
checks could be applied to the predicted data to prevent undesired deviations in the
predictions Another idea could be proposed as “continuous training", as opposed to the
current situation in which a period of time is considered and for that an amount of data is
located and used to train an HMM. Then the trained HMM is used for prediction purposes.
However, a better idea is to somehow persist the trained HMM and over time try to optimize
and tune the HMM according to the latest data that emerge in time. This way, intuitively, we
would be trying to optimize and improve the HMM over time without losing the trained
HMM from the past. ANN is well researched and established method that has been
successfully used to predict time series behaviour from past datasets. In this paper, we
proposed the use of HMM, a new approach, to predict unknown value in a time series (stock
market). It is clear from that the mean absolute percentage errors (MAPE) values of the two
methods are quite similar. Whilst, the primary weakness with ANNs is the inability to
properly explain the models. According to Repley“ the design and learning for feed-forward
networks are
Opening, price High, price Low, price Closing, price Predicted”. The proposed method using
HMM to forecast stock price is explainable and has solid statistical foundation. The results
show
potential of using HMM for time series prediction. In our future work we plan to develop
hybrid systems using AI paradigms with HMM to further improve accuracy and efficiency of
our forecasts.
43
CHAPTER 6
BIBLIOGRAPHY
44
BIBLIOGRAPHY
REFERENCE BOOKS
1. Bharti V. Pathak, “The Indian Financial System”, Pearson Education [India] Ltd. 2nd Edition, Year
2006.
2. V. K. Bhalla, “Investment Management”, New-Delhi, Sultan Chand& Sons Publication, 10th
Edition, Year 2004.
3. Prasanna Chandra, “Investment analysis & Portfolio Management”, New-Delhi, The McGraw Hill
Company Ltd. 6th edition, year 2006.
WEBSITES
1. http://qcc.qlik.com/course/view.php?id=507
2. https://www.bseindia.com/
3. https://www.edupristine.com/blog/stock-price-analysis-in-
tableau
4. https://in.udacity.com/course/data-analysis-with-r--ud651
5. https://www.tutorialspoint.com/index.htm
6. https://www.tutorialspoint.com/python/index.htm
45
CHAPTER 7
SCREENSHOTS OF PROJECT WORK
46
DATABASE USED
LINE GRAPH
47
BAR GRAPH
PIE CHART
48
QLIK SHEET
49
VISUALIZATION IN PYTHON
Loading dataset and printing
First five rows of dataset
50
Line graph between two frequency and price
51
Bar Graph
52
CLOSE PRICE PREDICTION
HISTOGRAM OPEN-HIGH-CLOSE
IMPORTING LIBRARY
53
LOAD DATASETS
OPEN-ADJCLOSE-PLOT
54
OPEN-PRICE-PREDICTION
PNW-PREDICTION
55
PRIZE-ESTIMATION
SCATTER-PLOT
56
SELECTING ROWS & COLUMNS FROM DATASETS
VOLUME-HISTOGRAM
57
VISUALIZATION IN TABLEAU
58
Bar graph
Line graph
Plotted Circles graph
59
Packed bubbles chart
VISUALIZATION IN R:
60
Source code for Reading and Viewing the Data
Scatter Plot Depicting Open vs. Close Price
Scatter Plot Depicting Low vs. High Price
61
Summarising the dataset
Knowing the Structure and the Type of Data
62
Cleaning the Data
Data before Cleaning
63
Data after Cleaning the NA values
Implementing KNN Algorithm on the data
64
Preparing Training dataset to Perform KNN
Preparing Testing dataset for Predicting the Classification
65
Predicting the Classification
Implementing the K Means Algorithm

66
Identifying the Clusters formed
Clustering on the basis of K Means
67
Plotting Low vs. High price after performing K Means
Plotting Open vs. Close price after performing K Means

68
Plotting Open vs. Close price with ggplot2
Plotting the graph between the Growth Rate and Date
69
70

Investment Analysis Documentation

Uploaded by

Copyright:

Available Formats

Investment Analysis Documentation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Investment Analysis Documentation

Uploaded by

Copyright:

Available Formats

CHAPTER 1: INTRODUCTION

1.1. PROBLEM STATEMENT

1.2. THE PROPOSED SOLUTION

1.3 INTRODUCTION TO DATA ANALYTICS

Data analysis is a process of inspecting, cleansing, transforming, and modelling data with the

THE PROCESS OF DATA ANALYSIS

EXPLORATORY DATA ANALYSIS:

 Analysis of homogeneity (internal consistency), which gives an indication of the reliability of a

 dropout (this should be identified during the initial data analysis phase)

CHARACTERISTICS OF DATA SAMPLE:

 Basic statistics of important variables

Final stage of the initial data analysis:

 In the case of non-normal: should one transform variables; make variables categorical

 Univariate statistics (single variable)

 Nominal and ordinal variables:

1.5. INTRODUCTION TO QLIK AND IT’S FEATURES:

1.6. PRESENT STATUS AND WHY QLIK:

ASSOCIATIVE DATA ENGINE:

GOVERNED DATA AND SECURITY:

SOFTWARE REQUIREMENT SPECIFICATION

The steps for Spiral Model can be generalized as follows:

This is usually a scaled-down system and represents an approximation of the

2. Defining the requirements of the second prototype.

3. Planning and designing the second prototype.

4. Constructing and testing the second prototype.

● The final system is thoroughly evaluated and tested. Routine maintenance is

Therefore, more flexible software development models were required in order to

2.6 ADVANTAGES OFAGILE:

The advantages of an agile project management function are given below.

USE CASE DIAGRAM

Qlik Sense Platform:

Components of the Qlik Analytics Platform:

The Qlik Analytics Platform consists of the following components:

 Qlik Management Console (QMC) and Dev Hub

 Qlik Sense APIs and SDKs

 Qlik engine and Qlik Sense supporting services

The Qlik Sense product family includes Qlik Sense Enterprise, Qlik Sense Desktop and Qlik

Qlik Analytics Platform:

Qlik Sense Enterprise:

Qlik Sense Enterprise is a self-service environment built on top of the Qlik Analytics

Qlik Sense Desktop:

The Qlik Sense architecture consists of a number of services that perform various activities in

Technologies used in Qlik Sense:

Qlik Sense relies on the following technologies;

Advantages of using Qlik Sense Platform:

 Easy to use - simple and intuitive use of the platform,

3.5 Use of Python

 NumPy is used for fast mathematical operations such as performing

In a software development project, errors can be injected at any stage during

Because of human inability to perform and communicate with perfection, software

 Testing is a process of executing a program with the intent of finding an  error. 

 A successful test one that uncovers an as –yet undiscovered error.

4.2 THE TYPE OF TESTING

At the end of integration testing software is completely assembled as a package. Validation

Attempts to verify the protection mechanisms built into the system.

4.3 THE TYPES OF TESTING USED

First five rows of dataset

Plotted Circles graph

Scatter Plot Depicting Open vs. Close Price