EDA
EDA
EDA is primarily used to see what data can reveal beyond the formal modeling or
hypothesis testing task and provides a provides a better understanding of data set
variables and the relationships between them. It can also help determine if the
statistical techniques you are considering for data analysis are appropriate.
Originally developed by American mathematician John Tukey in the 1970s, EDA
techniques continue to be a widely used method in the data discovery process
today.
3) Data Visualizations
Line chart
Bar chart
Histogram
Scatter plot
Box plot
4) Feature Engineering: EDA allows for the exploration of various variables
and their adjustments to create new functions or derive meaningful insights.
Feature engineering can contain scaling, normalization, binning, and creating
interplay or derived variables.
.
5) Understanding relationships
Correlation coefficients
Scatter diagram
Bubble plot
6) Hypothesis generation
7) Data quality assessment
Reliability
Validity
Types of EDA
Depending on the number of columns we are analyzing we can divide EDA into
three types.
I. Univaraite analysis
This is simplest form of data analysis, where the data being analyzed consists of
just one variable. Since it’s a single variable, it doesn’t deal with causes or
relationships. The main purpose of univariate analysis is to describe the data and
find patterns that exist within it.