CURE Project Deliverable 1 Sep 17
CURE Project Deliverable 1 Sep 17
You must submit this deliverable on time to be able to submit the upcoming deliverables.
This is not a group project. Each student needs to submit his/her own report.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Research skill:
(i) Looking at data critically (Who collected the data? why? what was the context? Are there missing
data? What is the usefulness of the dataset?)
(ii) Data preparation to develop machine learning algorithm with python, graphical/numerical
diagnostics/analysis of data
Relevant course content: Descriptive Statistics (Ch 1)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
iloc[] method:
(b) Using the .iloc[] method, we can access any part of the dataframe. Run the following
commands and show the outputs:
emissions.head()
emissions.iloc[0,0]
emissions.iloc[1,1]
emissions.iloc[0:2,0:2]
emissions.iloc[2:4,:]
(b) Use the pandas library in python to generate a comparative boxplot of the emissions dataset.
Interpret the boxplot (max 50 words)
(c) Use Excel to compute the statistics as discussed in part (a) and draw the comparative boxplot
mentioned in part (b). The screenshot with your work may look like the following for the
summary statistics part.
(a) Define a variable ‘dataset1’ of type dataframe using the bivariate dataset I given above. Find
the summary statistics using dataset1.describe().
[Hints: You can create the dataframe by any one of the techniques mentioned in Task A. Or, find
the dataset ‘anscombe’ from seaborn as shown below. Then define dataset1 using .iloc method.
]
(b) Define a dataframe ‘dataset2’ of type dataframe using the bivariate dataset II given above.
Find the summary statistics using dataset2.describe().
(c) Do you see any difference between the statistics that summarize the y variables in the two
datasets?
(d) Draw a diagram showing two scatterplots using the same axes to display dataset1 and
dataset2 above. Do you see any difference between the two datasets. Comment using less than 50
words.
The list of some small standard datasets available in sklearn are given below with the functions
to load them on python.
Note: You can directly take screenshot and paste it as your answers.