0% found this document useful (0 votes)
19 views

EDA

Uploaded by

ads112822
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

EDA

Uploaded by

ads112822
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is studying and exploring records/cases to


understand their predominant traits, discover patterns, locate outliers, and identify
relationships between variables.

EDA is primarily used to see what data can reveal beyond the formal modeling or
hypothesis testing task and provides a provides a better understanding of data set
variables and the relationships between them. It can also help determine if the
statistical techniques you are considering for data analysis are appropriate.
Originally developed by American mathematician John Tukey in the 1970s, EDA
techniques continue to be a widely used method in the data discovery process
today.

The objectives of EDA

1) Data Cleaning: Examining for errors, lacking values, and inconsistencies.


It includes techniques including records imputation, managing missing
statistics, and figuring out and getting rid of outliers.
2) Descriptive Statistics:
 Measures of Central Tendency / Location.
 Measures of Dispersion
 Measures of Shape

3) Data Visualizations
 Line chart
 Bar chart
 Histogram
 Scatter plot
 Box plot
4) Feature Engineering: EDA allows for the exploration of various variables
and their adjustments to create new functions or derive meaningful insights.
Feature engineering can contain scaling, normalization, binning, and creating
interplay or derived variables.
.
5) Understanding relationships
 Correlation coefficients
 Scatter diagram
 Bubble plot

6) Hypothesis generation
7) Data quality assessment
 Reliability
 Validity

Types of EDA
Depending on the number of columns we are analyzing we can divide EDA into
three types.
I. Univaraite analysis
This is simplest form of data analysis, where the data being analyzed consists of
just one variable. Since it’s a single variable, it doesn’t deal with causes or
relationships. The main purpose of univariate analysis is to describe the data and
find patterns that exist within it.

Sometimes, Non-graphical methods don’t provide a full picture of the data.


Graphical methods are therefore required. Common types of univariate graphics
include:
a. Stem-and-leaf plots, which show all data values and the shape of
the distribution.
b. Histograms, a bar plot in which each bar represents the frequency
(count) or proportion (count/total count) of cases for a range of
values.
c. Box plots, which graphically depict the five-number summary of
minimum, first quartile, median, third quartile, and maximum.

II. Bivariate analysis………..


It shows the relationship between two variables through cross-tabulation, scatter
diagram, Grouped bar chart ,correlation coefficients.

III. Multivariate analysis…………..


More than two variables are studied together ..
:
 Bubble chart, which is a data visualization that displays multiple circles
(bubbles) in a two-dimensional plot.
 Heat map, which is a graphical representation of data where values are
depicted by color.
 Multiple regression etc.

Exploratory Data Analysis Tools:


 MS Excel
 SPSS
 Python
 R etc

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy