Training Report On Data Analysis With Python
Training Report On Data Analysis With Python
PYTHON
A Report submitted in partial fulfilment of the requirement for the award
of degree of
Bachelor of Technology
In
Electronics and Communication Engineering
Submitted by
ARYAN MISHRA
Candidate’s Declaration ii
Bibliography 35
ACKNOWLEDGEMENT
i
CANDIDATE’S DECLARATION
Aryan Mishra
(02015002820)
ii
Chapter 1: Introduction to Data Analysis
Data analysis is the science of analysing raw data to make conclusions about that information.
Many of the techniques and processes of data analysis have been automated into mechanical
processes and algorithms that work over raw data for human consumption.
Data analysis is the science of analyzing raw data to make conclusions about that
information.
Data analysis helps a business optimize its performance, perform more efficiently,
maximize profit, or make more strategically-guided decisions.
The techniques and processes of data analysis have been automated into mechanical
processes and algorithms that work over raw data for human consumption.
Project Description-
In this project, I have visualized and made calculations from medical examination data using
matplotlib, seaborn, and pandas. The dataset values were collected during medical examinations.
Data description
Objective
Age age int (days)
Feature
Objective
Height height int (cm)
Feature
Objective
Weight weight float (kg)
Feature
Objective
Gender gender categorical code
Feature
Examination
Systolic blood pressure ap_hi int
Feature
Examination
Diastolic blood pressure ap_lo int
Feature
1: normal, 2: above
Examination
Cholesterol cholesterol normal, 3: well above
Feature
normal
Feature Variable Type Variable Value Type
1: normal, 2: above
Examination
Glucose gluc normal, 3: well above
Feature
normal
Subjective
Smoking smoke binary
Feature
Subjective
Alcohol intake alco binary
Feature
Subjective
Physical activity active binary
Feature
Presence or absence of
Target Variable cardio binary
cardiovascular disease
Tasks
Create a chart similar to examples/Figure_1.png, where we show the counts of good and bad
outcomes for the cholesterol, gluc, alco, active, and smoke variables for patients with
cardio=1 and cardio=0 in different panels.Use the data to complete the following tasks
in medical_data_visualizer.py:
(31)
o height is less than the 2.5th percentile (Keep the correct data with (df['height'] >=
df['height'].quantile(0.025)))
o height is more than the 97.5th percentile
o weight is less than the 2.5th percentile
o weight is more than the 97.5th percentile
Create a correlation matrix using the dataset. Plot the correlation matrix using
seaborn's heatmap(). Mask the upper triangle. The chart should look
like examples/Figure_2.png.
CATPLOT-
HEATPLOT-
(32)
PROJECT 2-Page View Time Series Visualizer
For this project I visualize time series data using a line chart, bar chart, and box plots. I have
used Pandas, Matplotlib, and Seaborn to visualize a dataset containing the number of page views
each day on the freeCodeCamp.org forum from 2016-05-09 to 2019-12-03. The data
visualizations helps understand the patterns in visits and identify yearly and monthly growth.
DATA DESCRIPTION:
LINE PLOT-
(33)
BAR PLOT-
BOX PLOT-
(34)
PROJECT 3- Sea Level Predictor
The dataset of the global average sea level change since 1880 has been analyzed, and the data
was used to predict the sea level change through year 2050.
TASK-
Use matplotlib to create a scatter plot using the Year column as the x-axis and the CSIRO
Adjusted Sea Level column as the y-axis.
Use the linregress function from scipy.stats to get the slope and y-intercept of the line of
best fit. Plot the line of best fit over the top of the scatter plot. Make the line go through
the year 2050 to predict the sea level rise in 2050.
Plot a new line of best fit just using the data from year 2000 through the most recent year
in the dataset. Make the line also go through the year 2050 to predict the sea level rise in
2050 if the rate of rise continues as it has since the year 2000.
RESULT-
(35)
BIBLIOGRAPHY
https://www.investopedia.com/terms/d/data-analytics.asp
https://makemeanalyst.com/data-science-with-python/python-libraries-for-
data-analysis/
https://numpy.org/doc/stable/user/quickstart.html
https://www.w3schools.com/python/pandas/
pandas_intro.asp#:~:text=Pandas%20is%20a%20Python%20library,by
%20Wes%20McKinney%20in%202008.
https://www.w3schools.com/python/matplotlib_pyplot.asp
https://www.freecodecamp.org/learn/data-analysis-with-python/