0% found this document useful (0 votes)
29 views12 pages

Analysis the Biomedical Datasets CSV File

The report analyzes a biomedical dataset in CSV format, focusing on insurance data related to smokers. It details the process of reading and analyzing the data using Python's pandas library, including statistical summaries and visualizations to explore relationships between variables such as age, sex, and medical charges. The conclusion emphasizes the importance of managing CSV files and conducting various analyses to uncover insights from biomedical data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views12 pages

Analysis the Biomedical Datasets CSV File

The report analyzes a biomedical dataset in CSV format, focusing on insurance data related to smokers. It details the process of reading and analyzing the data using Python's pandas library, including statistical summaries and visualizations to explore relationships between variables such as age, sex, and medical charges. The conclusion emphasizes the importance of managing CSV files and conducting various analyses to uncover insights from biomedical data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

(CS-103L- Introduction to Programming for Data Science)

Report # Open Ended Lab

Analysis the Biomedical Datasets CSV File


Report # Open Ended Lab

(CS-103L- Introduction to Programming for Data Science)

(Spring-2024)

Submitted By
Hassan Mukhtiar
(2023-BME-5)

Submitted To

Mr. Farhan Yousaf


Mr. Ali Noman

Department of Biomedical Engineering,


University of Engineering and Technology, Lahore,
New Campus
(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

Analysis the Biomedical Datasets CSV File


Report # Open Ended Lab
Objective:
❖ To learn how to store and retrieval of diverse biomedical data.
❖ To identify how to use streamlined data analysis for pattern discovery.
❖ To learn how to examine and analysis, sharing, and interpretation of biomedical data.
❖ To understand how biomedical CSV file is to store structured data related to biomedical
research, healthcare, or clinical studies.

Data Base:
The database is an organized collection of structured data to make it easily accessible, manageable and
update. In simple words, we can say, a database in a place where the data is stored. The best analogy is
the library. The library contains a huge collection of books of different genres, here the library is
database, and books are the data. [1]
Example:
There are some databases examples include such as grocery store, bank E-commerce platforms,
healthcare systems, social media platforms.

Biomedical Database:
Databases that store and maintain biomedical data such as gene and protein sequences. Biomedical
data: NER is used extensively in biomedical data for gene identification, DNA identification, and the
identification of drug names and disease names. These experiments use CRFs with features engineered
for their domain data. [2]
Example: [3]
❖ Generic gene expression databases
❖ Nucleosome positioning region database
❖ Protein structure database
Biomedical Datasets:
Healthcare data sets include a vast amount of medical data, various measurements, financial data,
statistical data, demographics of specific populations, and insurance data, to name just a few, gathered
from various healthcare data sources. To investigate how data sets are used in the healthcare industry.
Example: [4]

❖ The Uniform Hospital Discharge Data Set (UHDDS)


❖ The Human Mortality Database (HMD)
❖ HealthData.gov
❖ SEER cancer incidence
❖ BROAD Institute Cancer Program Datasets
❖ Chronic Disease Data
(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

Analyze Biomedical Datasets CSV File

Introduction of CSV file:

This is my CSV file which has insurance data of smoker and download from Kaggle website and save it
into laptop files and perform different task on this file. This file screenshot shown in below and different
task which also perform on it given below step by step.

Figure 1-Download CSV(Excel) file.

Read CSV file:

The statement f=pd.read_csv('OEL.csv') reads the data from a CSV file named 'OEL.csv' into a panda
DataFrame and give name it to the variable 'f'. This allows us to analyze the data using the panda’s
library in Python.
(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

Figure 2.Read CSV file

Info of CSV file:

The expression info = f.info( ) in Python show a summary of the DataFrame 'f', include information
about its structure, such as the number of entries and data types, and assigns this summary to the
variable 'info'.

Figure 3-Info of CSV file.

Columns of CSV file:

The line columns = f.columns in Python returned the column names or labels from the DataFrame 'f' and
assigns them to the variable 'columns'. This allows for easily access to the names of the columns within
the DataFrame, facilitating further data manipulation or analysis based on column names.
(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

Figure 4-Print Columns name of CSV file.

Head of CSV file:


f.head( ) shows the first few rows or by default 5 of the DataFrame 'f', providing overview of its
structure and content. This method is often used to inspect the data and understand its format before
performing further analysis or operations.

Figure 5-Head of CSV file

Tail of CSV file:

The f.tail() method in pandas DataFrame, when applied like f.tail(), return and displays the last few rows
or by default 5 of the DataFrame 'f'.

Figure 6-Tail of CSV file

Describe CSV file:


The describe() method in pandas DataFrame, when apply like f.describe(), generates a statistical data of
the numerical columns in the DataFrame 'f'. This data consists of measures such as count, mean,
(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

standard deviation, minimum, maximum and, providing a comprehensive summary of the distribution
the numerical data within the DataFrame.

Figure 7-Describe all factors of CSV file.

f['region'].value_counts():

This key f['region'].value_counts() in pandas counts the present of each unique value in the 'region'
column of the DataFrame 'f'. It provides a series where the index represents each unique value in the
'region' column, and the corresponding values indicate how many times each value appears in the
column.

Figure 8- Count CSV file unique name of region.

Graph analysis CSV file:

As age increasing, there is a showing increasingly trend in charges, express that older individuals tend to
have higher medical expenses. The markers on the line represent individual data points, showing the
specific charges associated with each age.
(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

Figure 9- Plot the graph b/w age ,charges and smoking.

Males in CSV file:

The following program shown that from ‘sex’ column when index i==male then add 1 intger variables
and at last of column print the total number of males in the csv file.

Figure 10-Number of male CSV file


(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

Females in CSV file:

The following program shown that from ‘sex’ column when index i==female then add 1 intger variables
and at last of cloumn print the total number of females in the csv file.

Figure 11-Number of females in CSV file.

Females smoker in CSV file:

In the given program, we check that number of female in the files but who are smoking when both
conditin fulfil then these are store in varible and print this varible which shows number of female who
are smoking given below.

Figure 12-Number of females who are smoking.

Males smoker in CSV file:

In the given program, we check that number of male in the files but who are smoking when both
conditin fulfil then these are store in varible and print this varible which shows number of male who
are smoking given below.

Figure 13-Number male who smoking

Total number Smoker and Nonsmoker:


(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

The following given program after read csv files iterate all rows in smoker column and when both
conditions fulfil then print the total number of smokers which have yes and nonsmoker which have non
condition when fulfil then print the total number of smoker and non-smoker.

Figure 14-Total number of smoker and non-smoker in CSV file.

Total number Smoker in different region:

The following program shown that the following shows the smoker which are present in different ratio
in different regions such smoker shows below southeast, northeast, and northwest and southwest.

Figure 15-Number of smokers in different region.

Sex and charges & smoker and charges relationship:

• There is following a program which shows relationship between the average charges and sex.
• In this CSV file the average charge of female is 12569.578844 and male average charges is
13956.751178.
• Another relationship between smokers and charges and nonsmokers and charges shows.
(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

• People who are smoker their average charges is much less but charges of smoker who are
smoking their average charges is very huge.

Figure 16-Relationship Charges with Sex and smoker.

Average_charges_sex.describe():
• The describe() method in pandas DataFrame, when apply like Average_charges_sex.describe(),
generates a statistical data of the numerical columns in the DataFrame 'f'.
• This data consists of measures such as count, mean, standard deviation, minimum, maximum
and, providing a comprehensive summary of the distribution the numerical data within the
DataFrame.

Figure 17-Describe detail of Average charges with sex.

Average_charges_smoker.describe():

• The describe() method in pandas DataFrame, when apply like


Average_charges_smoker.describe(), generates a statistical data of the numerical columns in
the DataFrame 'f'.
• This data consists of measures such as count, mean, standard deviation, minimum, maximum
and, providing a comprehensive summary of the distribution the numerical data within the
DataFrame.
(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

Figure 18-Describe detail of Average charges with smoker.

Line Graph between charges and age:

• This line graph explains the relationship between age and medical charges in the
OEL/insurance dataset. As age increases, there is an increasing trend in charges,
expressing that older individuals tend to have higher medical expenses.
• The markers on the line represent individual data points, showing the specific charges
associated with each age.
• The line’s increase slope suggests a positive correlation between age and medical charges,
indicating that age is a significant factor influencing healthcare costs.
• This shown highlights the importance of age as an increase as medical expenses.

Figure 19-Plot line graph between Age and charges


(CS-103L- Introduction to Programming for Data Science)
Report # Open Ended Lab

Conclusion:
We learnt how to manage or handle CSV files and perform different operations on them. Handling the
Biomedical Datasets CSV file in Python involved reading the data using pandas, exploring its structure
and content using methods like head(), tail(), and describe(), and conducting various analyses such as
statistical summaries, visualization, or machine learning modeling. To establish relationships between
different columns such as sex and charges and smoking and charges in the file, we could use
correlation analysis to identify any linear relationships between pairs of columns.

References:

1- https://www.edureka.co/blog/what-is-a-database/#Database
2- https://www.sciencedirect.com/topics/computer-science/biomedical-data
3- https://en.wikipedia.org/wiki/List_of_biological_databases
4- https://www.kaggle.com/datasets?search=biomedical+datasets

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy