0% found this document useful (0 votes)
40 views

Overview of The Subject: Honours - TE - Mathematics For Data Science (Theory: HDSC501)

The document discusses exploratory data analysis in data science. It defines EDA and its importance in data science. The objectives and need for EDA are explained. The key steps involved in EDA like data collection, cleaning, variable understanding, correlation analysis, visualization and results analysis are outlined.

Uploaded by

Soban Maruf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Overview of The Subject: Honours - TE - Mathematics For Data Science (Theory: HDSC501)

The document discusses exploratory data analysis in data science. It defines EDA and its importance in data science. The objectives and need for EDA are explained. The key steps involved in EDA like data collection, cleaning, variable understanding, correlation analysis, visualization and results analysis are outlined.

Uploaded by

Soban Maruf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Overview of the Subject

Honours – TE – Mathematics for Data Science


(Theory: HDSC501 )
Prof. Shyamala Mathi
Assistant Professor
Dept. of Electronics and Telecommunication,
SIES Graduate School of Technology

1
Shyamala Mathi
2
Shyamala Mathi
https://wikidocs.net/185538

3
Shyamala Mathi
1.What is Exploratory Data Analysis in Data Science?
2.Objective of Exploratory Data Analysis
3.Role of EDA in Data Science
4.Types of Exploratory Data Analysis
5.Steps Involved in Exploratory Data Analysis (EDA)
6.Exploratory Data Analysis Tools
7.Advantages of Using EDA
8.Example of Exploratory Data Analysis
9.Conclusion

4
Shyamala Mathi
• Data analysis involves different processes of
cleaning,
transforming,
analyzing the data, and
building models to extract specific, relevant insights.

• These are beneficial for making important business decisions in real-time situations.

• Exploratory Data Analysis is important for any business.

Data science is the study of data to extract meaningful


insights for business.

5
Shyamala Mathi
What is Exploratory Data Analysis in Data Science?

• Exploratory Data Analysis (EDA) is one of the techniques used for extracting vital
features and trends used by machine learning and deep learning models in Data
Science.
• Exploratory Data Analysis (EDA) is widely used by Data Scientists to analyze and
investigate Data sets and helps the Data Scientist to discover Data Patterns or
characteristics in visual form.

Importance of EDA in Data Science


• The Data Science field is now very important in the business world as it provides
many opportunities to make vital business decisions by analyzing hugely
gathered data.
• Understanding the data thoroughly needs its exploration from every aspect.
• The impactful features enable making meaningful and beneficial decisions;
therefore, EDA occupies an invaluable place in Data science.
6
Shyamala Mathi
Objective of Exploratory Data Analysis

The overall objective of exploratory data analysis is to obtain vital insights


and hence usually includes the following sub-objectives:

•Identifying and removing data outliers


•Identifying trends in time and space
•Uncover patterns related to the target
•Creating hypotheses and testing them through experiments
•Identifying new sources of data

7
Shyamala Mathi
Need for EDA

The need of the EDA process is to use statistical techniques to


efficiently summarize and visualize a better view of data, and find
values about the importance of the data, its quality, and derive the
new perspective and the suggestion of our analysis. EDA is always
trying to give an answer to the questions on the data.

8
Shyamala Mathi
9
Shyamala Mathi
10
Shyamala Mathi
11
Shyamala Mathi
12
Shyamala Mathi
13
Shyamala Mathi
14
Shyamala Mathi
15
Shyamala Mathi
16
Shyamala Mathi
17
Shyamala Mathi
18
Shyamala Mathi
19
Shyamala Mathi
20
Shyamala Mathi
21
Shyamala Mathi
22
Shyamala Mathi
No. of persons (70-79 years) = 87
No. of persons (Hearing loss) = 17
No. of persons (mobility issues) = 46

No. of persons (80-100 years) = 136


No. of persons (Hearing loss) = 70
No. of persons (mobility issues) = 90

23
Shyamala Mathi
24
Shyamala Mathi
25
Shyamala Mathi
26
Shyamala Mathi
27
Shyamala Mathi
28
Shyamala Mathi
29
Shyamala Mathi
30
Shyamala Mathi
31
Shyamala Mathi
A run chart is a
line graph of
data plotted
over time. By
collecting and
charting data
over time, you
can find trends
or patterns in the
process.

32
Shyamala Mathi
33
Shyamala Mathi
34
Shyamala Mathi
35
Shyamala Mathi
36
Shyamala Mathi
37
Shyamala Mathi
38
Shyamala Mathi
39
Shyamala Mathi
Steps Involved in Exploratory Data Analysis (EDA)

The key components in an EDA are the main steps undertaken to perform
the EDA. These are as follows:

1. Data Collection
2. Finding all Variables and Understanding Them
3. Cleaning the Dataset
4. Identify Correlated Variables
5. Choosing the Right Statistical Methods
6. Visualizing and Analyzing Results

40
Shyamala Mathi
41
Shyamala Mathi
42
Shyamala Mathi
One can follow these steps:
1. Look at the structure of the data: number of data points, number of features, feature
names, data types, etc.
2. When dealing with multiple data sources, check for consistency across datasets.
3. Identify what data signifies (called measures) for each of data points and be mindful
while obtaining metrics.
4. Calculate key metrics for each data point (summary analysis): a. Measures of central
tendency (Mean, Median, Mode); b. Measures of dispersion (Range, Quartile Deviation,
Mean Deviation, Standard Deviation); c. Measures of skewness and kurtosis.
5. Investigate visuals: a. Histogram for each variable; b. Scatterplot to correlate variables.
6. Calculate metrics and visuals per category for categorical variables (nominal, ordinal).
7. Identify outliers and mark them. Based on context, either discard outliers or analyze
them separately.
8. Estimate missing points using data imputation techniques.
43
Shyamala Mathi
44
Shyamala Mathi
45
Shyamala Mathi
Thank You!
(shyamalae@sies.edu.in)

46
Shyamala Mathi

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy