DataAnalyticsCh 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

UNIT- 1

Introduction to Data Analytics

Dr. Rashmi M
Department of Computer Science, GFGC T. Dasarahalli
NEP V Sem Data Analytics

UNIT 1
Introduction to Data Analytics

Content: Evolution of Data Analytics, Data Analytics Overview, Types of Data Analytics Descriptive
Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics, Importance and Benefits of
Data Analytics. Different Applications of Analytics in Business, Text Analytics and Web Analytics, Skills for
Business Analytics.

Definition: Data Analytics is a strategy/ a method to investigate, analyse, and demonstrate data to find
useful information and decisions. Data Analysis involves extraction, cleaning, analysis, transformation,
modelling and visualization of data with an objective to extract vital and useful information that can
derive conclusions and make decisions. Hence Data Analytics is known as Data- Driven Decision
Making Strategy that increases the business growth. Data Experts of many corporate companies use
data analytics in their core research. We say Data Mining is a step/ subset of Data Analytics, the reason
is that, it is the process of exploration and analysis of huge data to discover hidden patterns and rules.
Hence Data Mining is known as Knowledge Discovery in Database (KDD).

Data analytics includes numerous types of data analysis. Any type of data can be exposed to data
analytics strategies such that, they are accepted and used to improve data in turn improves the business
growth.

Some examples of Data Analytics in various fields:

1. Game companies can use data analytics to recommend new games to players based on their past
gaming behaviour. This can help to increase player engagement and retention.
2. Data analytics can be used to balance game mechanics and difficulty levels to ensure that the game
is fun and challenging for all kind of players.
3. Game companies can use data analytics to detect and prevent fraud, such as cheating and account
hacking.
4. Retailers use data analytics to track customer behaviour, identify trends to optimize their product
offerings and marketing movements.
5. Financial sectors use data analytics to evaluate risk, detect fraud, and make investment decisions.
6. Healthcare providers use data analytics to improve patient care, develop new treatments, and
reduce costs.
7. Manufacturers use data analytics to optimize production processes, improve quality control, and
reduce waste.

Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |1


NEP V Sem Data Analytics

Types of Data Analytics

There are four main types of data analytics:

1. Descriptive analytics: Descriptive analytics is the simplest type of data analysis and the foundation
the other types are built on. It allows you to pull trends (means classify customers into groups based
on product choosing patterns) from raw data to describe what happened or is currently happening.
It mines historical data to understand the cause of success or failure occurred. Hence we say
Descriptive analytics deals with what happened in past/ currently. Most commonly all kinds of
management reports (sales, marketing, operations performed, finances), data queries, data
dashboards, descriptive statistics use this kind of analysis.

Examples:

 A retail company uses descriptive analytics to track sales data and identify which
products are selling well and which products are not.
 Tracking the cases/ deaths happened in COVID- 19 dataset, descriptive analysis can
identify infected population of a country.
 A social media company uses descriptive analytics to track user engaged data and identify
which types of content surfed by the users.
 A healthcare provider uses descriptive analytics to track patient data and identify trends
in disease prevalence and treatment outcomes.

Tools used in Descriptive Analytics

 Statistical Summary : It provides statistical descriptions for a given business metric, e.g.
Mean, Median, Standard Deviation, Percentile, Interquartile range, etc.
 Z–Score : Z Score tells us how far (in terms of standard deviation) is a particular value of x
from its mean.
 Coefficient of Variance : It is a ratio where we divide standard deviation with mean.
 Interquartile Range : It is an important measure to gauge the variation in the dataset.

Data Dashboard: Is a tool used to track, organise, visualize, analyse data. Overall purpose is to make it
easier for data analysts, decision makers and average users to understand their data, gain deeper
insights and make better data- driven decisions.
Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |2
NEP V Sem Data Analytics

Descriptive Statistics: Includes central tendency, variability, and frequency distribution of the dataset.
The frequency distribution records how often data occurs, central tendency records the data's centre
point of distribution, and variability of a data set records its degree of dispersion.

2. Diagnostic analytics: Diagnostic analytics takes descriptive analytics a step further by trying to
understand why something happened. It uses a variety of statistical techniques to identify patterns and
relationships, dependencies in the data of a particular problem. Hence Diagnostic analytics deal with
why did it happen in the past.
Examples
 A marketing company uses diagnostic analytics to identify which marketing campaigns/
promoting are most effective at driving sales (includes particular promoting month,
particular theme relating to any region).
 A footwear company uses Diagnostic analytics to find why particularly April month is
having highest sales. It identifies that children beach foot wears are having highest
reviews as its vacation month for children.
 A manufacturing company uses diagnostic analytics to identify the root cause of product
defects.
 A financial institution uses diagnostic analytics to identify customers who are at risk of
defaulting on their loans.

Tools used in Diagnostic Analytics

 Correlation Analysis : It is a statistical measure that indicates the strength of the


relationship between two variables.
 5 Why Analysis : It is a very structured approach where we try to dig into a problem and
peel it layer by layer to reach the root cause of the problem.
 Cause and Effect Analysis : Here, we identify all possible reasons for one problem then
we pick up all the reasons as a problem one by one and try to find other causes for that
problem.

2. Predictive analytics: Predictive analytics uses historical data to predict future outcomes. It uses a
variety of machine learning techniques to develop models that can predict things like customer
churn, product demand, and fraud risk. Hence we say Predictive analytics deals with what will
happen in the future.
Examples:
 An e-commerce company uses predictive analytics to recommend products to
customers based on their past purchase history.

Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |3


NEP V Sem Data Analytics
 In elections this analytics is used to predict winning candidate (requires historical
polling data, current polling data).
 A telecommunications company uses predictive analytics to identify customers who
are likely to switch to a competitor.
 An insurance company uses predictive analytics to set premiums for different types of
customers.
Tools used in Predictive Analytics

 Regression Analysis : It establishes the mathematical relationship between input variables


and output variables, which means if we can calculate the future value of output for any
given input, e.g. sales forecast for next month.
 Logistic Regression : It is a classification predictive analytics technique that can predict
the output class for any given set of inputs. E.g. by providing customer demographics
logistic regression can indicate whether the customer will default bank loan in the future
or not.
 Decision Tree : Most of the time, we use a decision tree as a classification technique; it
tells us the output probability of the output variable for various permutations of our
input variables. Although it can be used for continuous output variables also.
 Clustering Techniques : These techniques segregate our customers into a few logical
segments so that we can create tailored offers for a different type of customers as per
their needs and interests.

 Random Forest : It is another very famous business analytics technique that uses a
collaborative approach to solve the problem by generating a large number of predictive
models. Their accuracy is generally better.

3. Prescriptive analytics: Prescriptive analytics takes predictive analytics a step further by


recommending actions that can be taken to improve outcomes. It uses optimization techniques to
identify the best way to achieve a desired outcome. Hence we say Prescriptive analytics deals with
how we can make it happen.

Examples:
 A retailer uses prescriptive analytics to optimize their inventory levels and pricing
strategies.
 A manufacturing company uses prescriptive analytics to optimize their production
processes and supply chain management.

 A healthcare provider uses prescriptive analytics to develop personalized treatment plans


for patients.

Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |4


NEP V Sem Data Analytics

 On social media to increase views, subscriptions, marketing, based on tracking a person’s


interest of contents, similar kind of contents are recommended to that person.

Tools used in Prescriptive Analytics

 Linear Programming: In linear programming, we optimize the objective functions like


revenue, market share and customer feedback ratings by also keeping constraints in the
model like budget, no. of people deployed, etc. as linear functions.
 Analytical Hierarchy Process: We apply these techniques in scenarios where we have to
identify the best solution among various available options, and there is the list of criteria's to
select the solution, e.g. select best cloud service providers among top 5 organizations by
keeping multiple factors into consideration like budget, customer service and flexibility to
upgrade, backup services, maintenance cost, etc.

 Combinational Optimization: It involves identifying optimal solutions from a considerable


number of finite solutions, e.g. the travelling salesman problem, vehicle routing problem, etc.

Data analytics is a powerful tool that can be used to improve decision-making in all industries. By
understanding the different types of data analytics and how they can be used, businesses can gain
valuable insights from their data and make better decisions about how to allocate resources, improve
products and services, and grow their business.

Overview of the Data Analytics Process

Data analytics is the process of collecting, cleaning, and analyzing data to extract meaningful insights. It
is a broad field that encompasses a variety of techniques and tools, and it is used in a wide range of
industries. The data analytics process can be broadly divided into the following steps:

Data collection: The first step is to collect the data that will be analysed. This data can come from a
variety of sources, such as internal databases, customer surveys, and social media.

1. Data cleaning: Once the data has been collected, it needs to be cleaned to remove any errors
or inconsistencies. This may involve correcting typos, filling in missing values, and removing
outliers.
2. Data preparation: Once the data has been cleaned, it needs to be prepared for analysis. This
may involve converting the data to a different format or splitting the data into different
subsets.

Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |5


NEP V Sem Data Analytics

3. Data analysis: This is the step where the data is actually analysed to extract meaningful
insights. This can be done using a variety of statistical and machine learning techniques.
4. Data visualization: Once the data has been analysed, the insights need to be communicated
to the required people in a clear and concise way. This can be done using data visualization
tools to create charts, graphs, and other visuals.

Evolution of Data Analytics

The evolution of data analytics can be broadly divided into four eras:

1. Era 1 (1960s to 1980s): This era was dominated by early data processing technologies, such
as punch cards and mainframe computers. Data analytics was largely limited to descriptive
analytics, which involved using simple statistical techniques to analyze historical data.
2. Era 2 (1990s to early 2000s): The rise of relational databases and business intelligence tools
made it possible to analyse larger and more complex datasets. This led to the development
of more sophisticated data analytics techniques, such as diagnostic and predictive analytics.
3. Era 3 (mid-2000s to early 2010s): The emerging of big data and cloud computing concepts
made it possible to analyse unprecedented volumes of data. This led to the development of
new data analytics techniques, such as machine learning and deep learning.
4. Era 4 (present day): Data analytics is now becoming increasingly pervasive and accessible. AI-
powered data analytics tools are enabling businesses of all sizes to extract insights from their
data and make better decisions.

Some of the key trends/ factors that have driven/ improved the evolution of data analytics:

 The rise of big data: The volume, velocity, and variety of data generated today are
extraordinary. This has created a need for new data analytics tools and techniques that
can handle big data.

 The rapid growth of cloud computing: Cloud computing has made it easier and more
affordable to access and analyse large datasets. This has democratized data analytics and
made it available to businesses of all sizes.

 The advancement of artificial intelligence: Artificial intelligence (AI) is being used to


develop new data analytics tools and techniques that are more powerful and efficient
than traditional methods. AI-powered data analytics tools can automate many tasks, such
as data cleaning, feature engineering, and model building.

The evolution of data analytics is having a major impact on businesses of all sizes. Data analytics is now
being used to improve decision-making in all aspects of business, from marketing and sales to product
development and operations.

Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |6


NEP V Sem Data Analytics

Importance and Benefits of Data Analytics

1. Improved decision-making: Data analytics can help businesses make better decisions by
providing them with insights into their data. For example, a company can use data
analytics to identify which marketing campaigns are most effective or which products are
most popular with customers. This information can then be used to make better
decisions about how to allocate resources and improve business operations.
2. Increased efficiency: Data analytics can help businesses automate tasks and streamline
processes. For example, a company can use data analytics to automate customer service
tasks or to optimize production schedules. This can free up employees to focus on more
strategic initiatives.

3. Reduced costs: Data analytics can help businesses identify and reduce costs. For
example, a company can use data analytics to identify areas where they are wasting
money or to identify opportunities to negotiate better deals with suppliers.

4. Improved customer satisfaction: Data analytics can help businesses improve customer
satisfaction by providing them with deeper insights into their customers' needs and
preferences. For example, a company can use data analytics to identify which products or
services are most popular with customers or to identify areas where they can improve
the customer experience.
5. New product development: Data analytics can help businesses to develop new products
and services that meet the needs of their customers. For example, a technology company
can use data analytics to identify which features are most important to their customers
and to prioritize the development of new features. This information can then be used to
develop new products and services that are more likely to be successful.

Application Areas of Data Analytics

1. In Business: Data analytics can be used in a variety of ways to improve business performance.
Here are a few examples:
 Marketing and sales: Data analytics can be used to understand customer behaviour,
segment customers, target customers with relevant marketing campaigns, and measure
the effectiveness of marketing campaigns.

 Product development: Data analytics can be used to identify customer needs, prioritize
product features, and test new product concepts.

Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |7


NEP V Sem Data Analytics

 Operations: Data analytics can be used to optimize production processes, improve


quality control, and reduce waste.

 Finance: Data analytics can be used to assess risk, detect fraud, and make investment
decisions.

 Human resources: Data analytics can be used to identify top talent, improve employee
engagement, and reduce turnover.

2. In Text Analytics: Text analytics is a type of data analytics that focuses on extracting insights
from unstructured text data. Unstructured text data can come from a variety of sources, such as
social media posts, customer reviews, and product descriptions. Text analytics can be used to:

 Understand customer sentiment: Text analytics can be used to identify the overall
sentiment of customer feedback, as well as the specific topics that customers are most
concerned about.
 Identify emerging trends: Text analytics can be used to identify emerging trends in the
market, such as new products or services that customers are interested in.

 Improve customer service: Text analytics can be used to identify customer support issues
and to develop targeted solutions.

 Improve marketing campaigns: Text analytics can be used to improve the effectiveness of
marketing campaigns by identifying the keywords and phrases that are most likely to
resonate (reverbing the words) with customers.

3. In Web Analytics: Web analytics is a type of data analytics that focuses on extracting insights
from website data. Website data can include things like page views, visitor demographics, and
traffic sources. Web analytics can be used to:

 Understand website traffic: Web analytics can be used to identify which pages are most
visited, which pages are leading to conversions (provides actual customers from targeted
onces), and where visitors are coming from.

 Improve website performance: Web analytics can be used to identify areas where the
website can be improved, such as pages that are loading slowly or pages that have a high
bounce rate.

Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |8


NEP V Sem Data Analytics
 Optimize marketing campaigns: Web analytics can be used to optimize marketing
campaigns by tracking the performance of different campaigns and identifying the
campaigns that are driving the most traffic to the website.

4. In Skills for Business Analytics: Business analysts need a variety of skills to be successful. Here
are a few of the most important:
 Technical skills: Business analysts need to have strong technical skills, including
knowledge of statistical analysis, machine learning, and data visualization tools.

 Problem-solving skills: Business analysts need to be able to identify and solve complex
business problems.

 Communication skills: Business analysts need to be able to communicate their findings to


both technical and non-technical audiences.

 Business knowledge: Business analysts need to have a good understanding of business


principles and practices.

Difference between Data Mining and Data Analytics

Data Mining Data Analytics


Data mining is a process of extracting useful Data analysis is a method that can be used to
information, patterns, and trends from raw data. investigate, analyse, and demonstrate data to
find useful information and decisions.
The data mining output gives the data pattern. The data analysis output is a verified hypothesis
or insights based on the data.
It includes the intersection of databases, It requires expertise in computer science,
machine learning, and statistics. mathematics, statistics, AI.
It is known as Knowledge Discovery in Database It is known as Data- Driven Decision Making
(KDD). Strategy.
In this data set are generally large and Dataset can be large, medium or small, Also
structured. structured, semi structured, unstructured.
It generally does not require visualization. Surely requires Data visualization.
Prime goal is to make data usable. Goal is to make data driven decisions.

Note: Datum means "one piece of information" or "one numerical result." Data is the plural form of
datum and should not be used as a singular noun. Data a collection of facts from which conclusions are
drawn and Datum is an item of factual information derived from measurement or research.

Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |9


NEP V Sem Data Analytics

Structured, Unstructured and Semi structured Data

Structured data: also known as quantitative data, is information that’s highly organized and readable
by machine learning algorithms, making it easier to search, manipulate, and analyse. It is easy to search
and analyse structured data. Structured data exists in a predefined format. You find structured data in
relational database that contains tables, rows, and columns.

Unstructured data: is also known as qualitative data, meaning the information it contains is subjective,
and traditional analytics tools and methods can’t handle it. Unstructured data come in the form of a
photo, audio, video, engineering CAD drawing, social media text stream, HTML document, or any form
of data that is not captured as a fixed record, field-defined data format.
Unstructured data is the data that lacks any predefined model or format. It requires a lot of storage
space, and it is hard to maintain security in it. It cannot be presented in a data model or schema. That's
why managing, analysing, or searching for unstructured data is hard. It is qualitative in nature and
sometimes stored in a non-relational database or NO-SQL.

Examples of human-generated unstructured data are Text files, Email, social media, media, mobile
data, business applications, and others. The machine-generated unstructured data includes satellite
images, scientific data, sensor data, digital surveillance, and many more.

Semi-structured data: occupies the middle ground between structured and unstructured data as data
that has some degree of organization but is not fully organized into a fixed record format found in a
traditional system or database.

Note: Structured data can be extracted from unstructured data using business intelligence (BI) tools that
rely on artificial intelligence (AI) and natural language processing (NLP).

Difference between Structured Data and Unstructured Data

Structured Data Unstructured Data


Is qualitative data. Hence can be processed Is a quantitative data. It cannot be processed
and analysed using conventional/ traditional and analysed using conventional tools.
tools.
It has a predefined format. It has a variety of formats, i.e., it comes in a
variety of shapes and sizes.
It is easy to search. Searching for unstructured data is more
difficult.
It requires less storage space. It requires lot of storage space.
It is based on a relational database. It is based on character and binary data.
Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |10
NEP V Sem Data Analytics
Examples: Tabular data such as customer Examples: Text files, Email, social media
relationship management (CRM), invoicing data, satellite images, scientific data, sensor
systems, product databases, contact lists. data.

Data Science: Data science and data analytics are closely related but there are key differences between
the two fields. While both fields involve working with data to gain insights, Data Analytics tends to focus

more on analysing past data to inform decisions in the present, while Data Science focus on use of data
in building data models that can predict future outcomes.

Data science is a broad field that encompasses data analytics and includes other areas such as data
engineering and machine learning. Data scientists use statistical and computational methods to extract
insights from data, build predictive models, and develop new algorithms. Data analytics involves
analysing data to gain insights and derive business decisions.

Difference between Data Science and Data Analytics

Data Science Data Analytics


Data science deals with explorations and new Data Analysis makes use of existing resources.
innovations.
Data Scientists is a multidisciplinary field Data analytics is a broad field which includes
including data engineering, computer science, data integration, data analysis and data
statistics, machine learning, and predictive presentation.
analytics in addition to presentation of findings.
Data Scientists produces both broad insights by Data analytics is more focused on producing
exploring the data and actionable insights insights to answer specific questions and which
(building data models) that answer specific can be put into action.
questions.
Data Scientists prepare, manage and explore Data analysts prepare, manage and analyse well-
large data sets and then develop custom defined datasets to identify trends and create
analytical models and algorithms to produce the visual presentations to help organizations make
required business insights. better, data-driven decisions.
Python is the most commonly used language for The Knowledge of Python and R Language is
data science along with the use of other essential for Data Analytics.
languages such as C++, Java, Perl, etc.

Relation between Big Data and Data Analytics

Big data plays a crucial role in data analysis solutions by providing organizations with large amounts of data that
can be used to uncover insights and support decision-making. Big data can be integrated with other data sources
such as structured data, semi-structured data, and unstructured data, to form a holistic view of the organization’s
data landscape, which can lead to more accurate predictions, better decision-making, and more effective
outcomes. The purpose of Big Data is to store huge volume of data and to process it whereas, the purpose of
Data Analytics is to analyse the raw data and find out insights for the information.

Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |11


NEP V Sem Data Analytics

Data Analytics and Business Analytics: Data Analytics and Business Analytics both share a common goal but the
skills needed and the strategies used are different. Data Analytics focus on processing data and drawing
conclusions (or deriving decisions) whereas, business analytics focus on implementing changes and
communicating the results. Data analysts are more likely to work independently while business analysts need to
work directly with people in different departments and roles. A data analyst should have knowledge of data
structures (or patterns) whereas, business analyst should have knowledge of business structures.

Business analysts use data to identify problems and solutions, but do not perform a deep technical analysis of
the data. They operate at a conceptual level, defining strategy and communicating with stakeholders, and are
concerned with the business implications of data. Data analysts, on the other hand, spend the majority of their
time gathering raw data from various sources, cleaning and transforming it, and applying a range of specialized
techniques to extract useful information and develop conclusions.

Business analysts typically have extensive domain or industry experience in areas such as e-commerce,
manufacturing, or healthcare. People in this role rely less on the technical aspects of analysis than data
analysts, although they do need a working knowledge of statistical tools, common programming languages,
networks, and databases.

Dr.RASHMI M, Computer Science Department, GFGC T.Dasarahalli |12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy