Data Cleaning Workshop:: Club Data Science and Cloud Computing
Data Cleaning Workshop:: Club Data Science and Cloud Computing
Data Cleaning Workshop:: Club Data Science and Cloud Computing
Important Concepts :
Missing values :
● Missing values occur when there is no data or value stored for the
variable in an observation.
● Missing data are a common occurrence and can have a significant
effect on the conclusions drawn from the data.
● There can be several causes for the occurence of missing values
in a data set.
● Most statistical procedures require a value for each variable.
Mode :
Median :
Mean :
● We should use the “Mean” Function when the data does not contain
any Outliers.
● Mean Function is very sensitive towards Outliers.
Outliers :
●Outliers are extreme values that fall a long way outside of the other
observations.
●Outliers are those Values which are very very different from most of the
Values.
●Example: There is a Column called “Age” for B.tech College.
○Where students generally have age around 17 to 24.
Types Of Outliers :
1.Univariate Outliers.
2.Bivariate Outliers.
●Univariate Outliers are the points which are beyond the normal values
in a single variable.
●Bivariate Outliers are the points which lie far from the expected values
when two variables are plotted against each other.