0% found this document useful (0 votes)
3 views12 pages

Training Report On Data Analysis With Python

The document is a training report on data analysis using Python, submitted by Aryan Mishra as part of his Bachelor of Technology degree requirements. It covers the significance, types, and processes of data analysis, along with practical projects utilizing Python libraries such as NumPy, Pandas, and Matplotlib. The projects include a Medical Data Visualizer, Page View Time Series Visualizer, and Sea Level Predictor, demonstrating the application of data analysis techniques.

Uploaded by

rynme2k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views12 pages

Training Report On Data Analysis With Python

The document is a training report on data analysis using Python, submitted by Aryan Mishra as part of his Bachelor of Technology degree requirements. It covers the significance, types, and processes of data analysis, along with practical projects utilizing Python libraries such as NumPy, Pandas, and Matplotlib. The projects include a Medical Data Visualizer, Page View Time Series Visualizer, and Sea Level Predictor, demonstrating the application of data analysis techniques.

Uploaded by

rynme2k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

TRAINING REPORT ON DATA ANALYSIS WITH

PYTHON
A Report submitted in partial fulfilment of the requirement for the award
of degree of
Bachelor of Technology
In
Electronics and Communication Engineering

Submitted by

ARYAN MISHRA

(ENROLL NO.- 02015002820)

MAHARAJA SURAJMAL INSTITUTE OF TECHNOLOGY

C-4, Janakpuri, New Delhi-58

Affiliated to Guru Gobind Singh Indraprastha University, Delhi


CONTENTS
Acknowledgement i

Candidate’s Declaration ii

Chapter 1: Introduction to Data Analysis 1-3

1.1 Significance of Data Analysis 1

1.2 Types of Data Analysis 2

1.3 Data Analysis Process 3

Chapter 2:Data Analysis Using Python 4-28

2.1 Why use Python? 4

2.2 Python Libraries Used in the Course 5

2.2.1 NumPy 6-16

2.2.2 Pandas 17-22

2.2.3 Matplotlib 23-28

Chapter 3: PROJECTS 29-34

3.1 Medical Data Visualizer 29-31

3.2 Page View Time Series Visualizer 32-33

3.3 Sea Level Predictor 34

Bibliography 35
ACKNOWLEDGEMENT

It is my pleasure to be indebted to various people, who directly or indirectly contributed to the


development of this work and who influenced my thinking, behaviour, and acts during study. and
I express my sincere gratitude to Ms. Archana Balyan(HOD, Department of Electronics And
Communication Engineering , Maharaja Surajmal Institute of Technology, New Delhi) for
providing me an opportunity to undergo training at FreeCodeCamp
.I am thankful for support, cooperation, and motivation provided to me during the training for
constant inspiration, presence, and blessings.
At last I must express my sincere heartfelt gratitude to all the staff members of ECE Department
who helped me directly or indirectly during this course of work.

i
CANDIDATE’S DECLARATION

I, Aryan Mishra, Roll No.02015002820, B.Tech (Semester- 6 th ) of the Maharaja Surajmal


Institute of Technology, New Delhi hereby declare that the Training Report entitled “ DATA
ANALYSIS WITH PYTHON” is an original work and data provided in the study is authentic to
the best of my knowledge. This report has not been submitted to any other Institute for the award
of any other degree.

Aryan Mishra
(02015002820)

ii
Chapter 1: Introduction to Data Analysis

Data analysis is the science of analysing raw data to make conclusions about that information.
Many of the techniques and processes of data analysis have been automated into mechanical
processes and algorithms that work over raw data for human consumption.

 Data analysis is the science of analyzing raw data to make conclusions about that
information.

 Data analysis helps a business optimize its performance, perform more efficiently,
maximize profit, or make more strategically-guided decisions.

 The techniques and processes of data analysis have been automated into mechanical
processes and algorithms that work over raw data for human consumption.

 Various approaches to data analysis include looking at what happened (descriptive


analysis), why something happened), what is going to happen), or what should be done
next
 Data analysis relies on a variety of software tools ranging from spreadsheets, data
visualization, and reporting tools, data mining programs, or open-source languages for
the greatest data manipulation.
(29)

PROJECT 1- MEDICAL DATA VISUALIZER

Project Description-
In this project, I have visualized and made calculations from medical examination data using
matplotlib, seaborn, and pandas. The dataset values were collected during medical examinations.

Data description

File name: medical_examination.csv

Feature Variable Type Variable Value Type

Objective
Age age int (days)
Feature

Objective
Height height int (cm)
Feature

Objective
Weight weight float (kg)
Feature

Objective
Gender gender categorical code
Feature

Examination
Systolic blood pressure ap_hi int
Feature

Examination
Diastolic blood pressure ap_lo int
Feature

1: normal, 2: above
Examination
Cholesterol cholesterol normal, 3: well above
Feature
normal
Feature Variable Type Variable Value Type

1: normal, 2: above
Examination
Glucose gluc normal, 3: well above
Feature
normal

Subjective
Smoking smoke binary
Feature

Subjective
Alcohol intake alco binary
Feature

Subjective
Physical activity active binary
Feature

Presence or absence of
Target Variable cardio binary
cardiovascular disease

Tasks

Create a chart similar to examples/Figure_1.png, where we show the counts of good and bad
outcomes for the cholesterol, gluc, alco, active, and smoke variables for patients with
cardio=1 and cardio=0 in different panels.Use the data to complete the following tasks
in medical_data_visualizer.py:

 Add an overweight column to the data. To determine if a person is overweight, first


calculate their BMI by dividing their weight in kilograms by the square of their height in
meters. If that value is > 25 then the person is overweight. Use the value 0 for NOT
overweight and the value 1 for overweight.
 Normalize the data by making 0 always good and 1 always bad. If the value
of cholesterol or gluc is 1, make the value 0. If the value is more than 1, make the value 1
 Convert the data into long format and create a chart that shows the value counts of the
categorical features using seaborn's catplot(). The dataset should be split by 'Cardio' so
there is one chart for each cardio value. The chart should look
like examples/Figure_1.png.
 Clean the data. Filter out the following patient segments that represent incorrect data:
o diastolic pressure is higher than systolic (Keep the correct data with (df['ap_lo']
<= df['ap_hi']))

(31)
o height is less than the 2.5th percentile (Keep the correct data with (df['height'] >=
df['height'].quantile(0.025)))
o height is more than the 97.5th percentile
o weight is less than the 2.5th percentile
o weight is more than the 97.5th percentile
 Create a correlation matrix using the dataset. Plot the correlation matrix using
seaborn's heatmap(). Mask the upper triangle. The chart should look
like examples/Figure_2.png.

CATPLOT-

HEATPLOT-

(32)
PROJECT 2-Page View Time Series Visualizer
For this project I visualize time series data using a line chart, bar chart, and box plots. I have
used Pandas, Matplotlib, and Seaborn to visualize a dataset containing the number of page views
each day on the freeCodeCamp.org forum from 2016-05-09 to 2019-12-03. The data
visualizations helps understand the patterns in visits and identify yearly and monthly growth.

DATA DESCRIPTION:

LINE PLOT-

(33)

BAR PLOT-
BOX PLOT-

(34)
PROJECT 3- Sea Level Predictor
The dataset of the global average sea level change since 1880 has been analyzed, and the data
was used to predict the sea level change through year 2050.

TASK-

 Use matplotlib to create a scatter plot using the Year column as the x-axis and the CSIRO
Adjusted Sea Level column as the y-axis.
 Use the linregress function from scipy.stats to get the slope and y-intercept of the line of
best fit. Plot the line of best fit over the top of the scatter plot. Make the line go through
the year 2050 to predict the sea level rise in 2050.
 Plot a new line of best fit just using the data from year 2000 through the most recent year
in the dataset. Make the line also go through the year 2050 to predict the sea level rise in
2050 if the rate of rise continues as it has since the year 2000.

RESULT-

(35)
BIBLIOGRAPHY

 https://www.investopedia.com/terms/d/data-analytics.asp

 https://makemeanalyst.com/data-science-with-python/python-libraries-for-
data-analysis/

 https://numpy.org/doc/stable/user/quickstart.html

 https://www.w3schools.com/python/pandas/
pandas_intro.asp#:~:text=Pandas%20is%20a%20Python%20library,by
%20Wes%20McKinney%20in%202008.

 https://www.w3schools.com/python/matplotlib_pyplot.asp

 https://www.freecodecamp.org/learn/data-analysis-with-python/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy