0% found this document useful (0 votes)

40 views7 pages

Case Study Data Science

The document outlines a case study on diabetes prevention, detailing a six-step process to predict diabetes occurrences using patient medical history data. It emphasizes data collection, cleaning, analysis, and the implementation of a decision tree model for prediction. Additionally, it highlights the skills required to become a successful data scientist and the growing demand for data science professionals.

Uploaded by

2bpcskygcx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views7 pages

Case Study Data Science

Uploaded by

2bpcskygcx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Now, I will take a case study to explain to you the various phases described above.

Case Study: Diabetes Prevention

What if we could predict the occurrence of diabetes and take appropriate measures
beforehand to prevent it?

In this use case, we will predict the occurrence of diabetes using the entire lifecycle we
discussed earlier. Let’s go through the various steps.

Step 1:

● First, we will collect the data based on the medical history of the patient as discussed
in Phase 1. You can refer to the sample data below.

● As you can see, we have the various attributes as mentioned below.

Attributes:

1. npreg – Number of times pregnant

2. glucose – Plasma glucose concentration
3. bp – Blood pressure
4. skin – Triceps skinfold thickness
5. bmi – Body mass index
6. ped – Diabetes pedigree function
7. age – Age
8. income – Income

Step 2:

● Now, once we have the data, we need to clean and prepare the data for data
analysis.
● This data has a lot of inconsistencies like missing values, blank columns, abrupt
values and incorrect data format which need to be cleaned.
● Here, we have organized the data into a single table under different attributes –
making it look more structured.
● Let’s have a look at the sample data below.
This data has a lot of inconsistencies.

1. In the column npreg, “one” is written in words, whereas it should be in the numeric
form like 1.
2. In column bp one of the values is 6600 which is impossible (at least for humans) as
bp cannot go up to such huge value.
3. As you can see the Income column is blank and also makes no sense in predicting
diabetes. Therefore, it is redundant to have it here and should be removed from the
table.
● So, we will clean and preprocess this data by removing the outliers, filling up the null
values and normalizing the data type. If you remember, this is our second phase
which is data preprocessing.
● Finally, we get the clean data as shown below which can be used for analysis.
Step 3:

Now let’s do some analysis as discussed earlier in Phase 3.

● First, we will load the data into the analytical sandbox and apply various statistical
functions on it. For example, R has functions like describe which gives us the number
of missing values and unique values. We can also use the summary function which
will give us statistical information like mean, median, range, min and max values.
● Then, we use visualization techniques like histograms, line graphs, box plots to get a
fair idea of the distribution of data.
Step 4:

Now, based on insights derived from the previous step, the best fit for this kind of problem is
the decision tree. Let’s see how?

● Since, we already have the major attributes for analysis like npreg, bmi, etc., so we
will use supervised learning technique to build a model here.
● Further, we have particularly used decision tree because it takes all attributes into
consideration in one go, like the ones which have a linear relationship as well as
those which have a non-linear relationship. In our case, we have a linear relationship
between npreg and age, whereas the nonlinear relationship between npreg and ped.
● Decision tree models are also very robust as we can use the different combination of
attributes to make various trees and then finally implement the one with the
maximum efficiency.

Let’s have a look at our decision tree.

Here, the most important parameter is the level of glucose, so it is our root node. Now, the
current node and its value determine the next important parameter to be taken. It goes on
until we get the result in terms of pos or neg. Pos means the tendency of having diabetes is
positive and neg means the tendency of having diabetes is negative.

If you want to learn more about the implementation of the decision tree, refer this blog How
To Create A Perfect Decision Tree

Step 5:

In this phase, we will run a small pilot project to check if our results are appropriate. We will
also look for performance constraints if any. If the results are inaccurate, we need to replan
and rebuild the model.

Step 6:
Data Science with Python Certification Course
Weekday / Weekend Batches
See Batch Details
Once we have executed the project successfully, we will share the output for full deployment.

Being a Data Scientist is easier said than done. So, let’s see what all you need to be a Data
Scientist. A Data Scientist requires skills basically from three major areas as shown below.

As you can see in the above image, you need to acquire various hard skills and soft skills.
You need to be good at statistics and mathematics to analyze and visualize data. Needless
to say, Machine Learning forms the heart of Data Science and requires you to be good at it.
Also, you need to have a solid understanding of the domain you are working in to
understand the business problems clearly. Your task does not end here. You should be
capable of implementing various algorithms which require good coding skills. Finally, once
you have made certain key decisions, it is important for you to deliver them to the
stakeholders. So, good communication will definitely add brownie points to your skills.

I urge you to see this Data Science video tutorial that explains what is Data Science and all
that we have discussed in the blog. Go ahead, enjoy the video and tell me what you think.
What Is Data Science? Data Science Course – Data Science Tutorial For Beginners |
subject

This subject Data Science course video will take you through the need of data science, what
is data science, data science use cases for business, BI vs data science, data analytics
tools, data science lifecycle along with a demo.

In the end, it won’t be wrong to say that the future belongs to Data Scientists. It is predicted
that by the end of the year 2018, there will be a need of around one million Data Scientists.
More and more data will provide opportunities to drive key business decisions. It will soon
change how we look at the world deluged with data around us. Therefore, a Data Scientist
should be highly skilled and motivated to solve the most complex problems. You can predict
the growth of their business by incorporating data science methods in operations in the
coming years, anticipate the potential for problems, and develop strategies based on data to
achieve success. This is the best opportunity to kick off your career in the field of data
science by taking the Data Science Masters Program.

Lecture 03 DS Methodology
No ratings yet
Lecture 03 DS Methodology
77 pages
Data Science
No ratings yet
Data Science
25 pages
Unit I and unit ii dev (1)
No ratings yet
Unit I and unit ii dev (1)
36 pages
Bd4151 Foundations of Data Science
No ratings yet
Bd4151 Foundations of Data Science
70 pages
UNIT _ Introduction_DataScience_new (1)
No ratings yet
UNIT _ Introduction_DataScience_new (1)
55 pages
Python
100% (1)
Python
635 pages
Statistics For Data Science - 1
100% (2)
Statistics For Data Science - 1
38 pages
Data Science Course Curriculum 27 Feb 2023
No ratings yet
Data Science Course Curriculum 27 Feb 2023
21 pages
Data Science Day 1
No ratings yet
Data Science Day 1
22 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
DT 444
No ratings yet
DT 444
19 pages
Unit 1 - Exploratory Data Analysis Fundamentals
No ratings yet
Unit 1 - Exploratory Data Analysis Fundamentals
47 pages
Data Science
100% (2)
Data Science
33 pages
ADS-IMP-QNA-2025-15-04-06-06-35_copy
No ratings yet
ADS-IMP-QNA-2025-15-04-06-06-35_copy
33 pages
IDS - UNIT-2 - Notes part1_Introduction to Data Science and Prob concept[1]
No ratings yet
IDS - UNIT-2 - Notes part1_Introduction to Data Science and Prob concept[1]
66 pages
ds sem
No ratings yet
ds sem
71 pages
Statistics for Data Science
No ratings yet
Statistics for Data Science
39 pages
Article Review 11 Eng
No ratings yet
Article Review 11 Eng
18 pages
Dissertation
No ratings yet
Dissertation
41 pages
2025-05-13
No ratings yet
2025-05-13
158 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
Fundamental of Data Science
No ratings yet
Fundamental of Data Science
20 pages
Part1 Ds ML Introduction
No ratings yet
Part1 Ds ML Introduction
61 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
DataScienceProcess 14may2019
No ratings yet
DataScienceProcess 14may2019
35 pages
It-3006 (Da) - CS End April 2024
No ratings yet
It-3006 (Da) - CS End April 2024
23 pages
Dsdm-Unit1 241031 194317
No ratings yet
Dsdm-Unit1 241031 194317
38 pages
Unit I
No ratings yet
Unit I
52 pages
data Science
No ratings yet
data Science
3 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Data Science S (2 Files Merged)
No ratings yet
Data Science S (2 Files Merged)
30 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Internship
No ratings yet
Internship
28 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
Harsh Synopsis
No ratings yet
Harsh Synopsis
21 pages
Data Science
No ratings yet
Data Science
30 pages
Ivy - Data Science and Data Visualization Certification Course
100% (1)
Ivy - Data Science and Data Visualization Certification Course
10 pages
data scince report
No ratings yet
data scince report
11 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
ARM Processor-Full
100% (1)
ARM Processor-Full
148 pages
Final Industrial Report
No ratings yet
Final Industrial Report
34 pages
Unit6 Part3 General Procedure
No ratings yet
Unit6 Part3 General Procedure
19 pages
Data Science 1
No ratings yet
Data Science 1
2 pages
File
No ratings yet
File
27 pages
Multi-Traffic Scene Perception Based On Supervised Learning
78% (9)
Multi-Traffic Scene Perception Based On Supervised Learning
26 pages
Norgren GenelKatolog Ingilizce
No ratings yet
Norgren GenelKatolog Ingilizce
612 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
01_Introduction
No ratings yet
01_Introduction
7 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Data Science Syllabus for Bba
No ratings yet
Data Science Syllabus for Bba
2 pages
Python For Data Science Department of Indian Institute of Technology, Madras Lecture - 01 Why Python For Data Science?
No ratings yet
Python For Data Science Department of Indian Institute of Technology, Madras Lecture - 01 Why Python For Data Science?
9 pages
Challenges and Scope of Data Science Project
No ratings yet
Challenges and Scope of Data Science Project
21 pages
HTML-365-385 Op y Mant
100% (1)
HTML-365-385 Op y Mant
375 pages
Data Science Course Agenda
No ratings yet
Data Science Course Agenda
29 pages
Activity 3. Mind Map. Data Science Methodology
No ratings yet
Activity 3. Mind Map. Data Science Methodology
4 pages
4dwwBofcGp81AcKjV5k6tS
No ratings yet
4dwwBofcGp81AcKjV5k6tS
52 pages
Unit 3
No ratings yet
Unit 3
9 pages
Data Science PDF
No ratings yet
Data Science PDF
11 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Data Science Process Stages Lecture 2
No ratings yet
Data Science Process Stages Lecture 2
4 pages
Statictics Computerscience Information Science
No ratings yet
Statictics Computerscience Information Science
3 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Module 1_ Introduction to Data Science
No ratings yet
Module 1_ Introduction to Data Science
3 pages
Final Research Paper
No ratings yet
Final Research Paper
3 pages
54 2 Pells
No ratings yet
54 2 Pells
27 pages
Duxbury Clipper 03 - 18 - 2009
No ratings yet
Duxbury Clipper 03 - 18 - 2009
44 pages
Chapter - 1
No ratings yet
Chapter - 1
90 pages
HK Supervisor_ KRA_Roles &Responsibilities
No ratings yet
HK Supervisor_ KRA_Roles &Responsibilities
5 pages
(Investor) Sbi MF - Children's Benefit Fund - Compliance Approved PPT - 21.08.2020
No ratings yet
(Investor) Sbi MF - Children's Benefit Fund - Compliance Approved PPT - 21.08.2020
31 pages
SKUAST Accounts Assistant 2022 Paper
100% (1)
SKUAST Accounts Assistant 2022 Paper
9 pages
Block I Apollo Guidance Computer (AGC) : How To Build One in Your Basement
No ratings yet
Block I Apollo Guidance Computer (AGC) : How To Build One in Your Basement
51 pages
0612 Fashion and Fabrics_2023
No ratings yet
0612 Fashion and Fabrics_2023
14 pages
A Project Report On A Study On Amul Taste of India: Vikash Degree College Sambalpur University, Odisha
No ratings yet
A Project Report On A Study On Amul Taste of India: Vikash Degree College Sambalpur University, Odisha
32 pages
CS302 Quiz Solved For MID TERM PDF
No ratings yet
CS302 Quiz Solved For MID TERM PDF
25 pages
Appian Training
No ratings yet
Appian Training
8 pages
Cross Border M & Aquisitions
100% (1)
Cross Border M & Aquisitions
39 pages
Student Siwes Report Writing in The Depa
No ratings yet
Student Siwes Report Writing in The Depa
12 pages
Fake Product Detection Using Blockchain Technology - Final Project Report (IT - 7th Sem)
No ratings yet
Fake Product Detection Using Blockchain Technology - Final Project Report (IT - 7th Sem)
18 pages
CpE Laws Reviewer
No ratings yet
CpE Laws Reviewer
8 pages
B2 Practice Test 2: Audioscript
No ratings yet
B2 Practice Test 2: Audioscript
5 pages
RITZ
No ratings yet
RITZ
8 pages
Capital Pool Company Program: What Are The Benefits?
No ratings yet
Capital Pool Company Program: What Are The Benefits?
6 pages
Conceptual Framework
No ratings yet
Conceptual Framework
3 pages
Aruba Network
No ratings yet
Aruba Network
2 pages
Independent Review and Monotoring Agency (Irma) Under (Amrut) Mohua New-Delhi
No ratings yet
Independent Review and Monotoring Agency (Irma) Under (Amrut) Mohua New-Delhi
5 pages
Jessica Marie Stahlman
No ratings yet
Jessica Marie Stahlman
1 page
AHP PPT
No ratings yet
AHP PPT
20 pages
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
What, So What, Now What
From Everand
What, So What, Now What
Chris Garson
3/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Case Study Data Science

Uploaded by

Case Study Data Science

Uploaded by

Now, I will take a case study to explain to you the various phases described above.

Case Study: Diabetes Prevention

● As you can see, we have the various attributes as mentioned below.

1. npreg – Number of times pregnant

Now let’s do some analysis as discussed earlier in Phase 3.

Let’s have a look at our decision tree.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Case Study Data Science

Uploaded by

Case Study Data Science

Uploaded by

Now, I will take a case study to explain to you the various phases described above.

Case Study: Diabetes Prevention

●​ As you can see, we have the various attributes as mentioned below.

1.​ npreg – Number of times pregnant

Now let’s do some analysis as discussed earlier in Phase 3.

Let’s have a look at our decision tree.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

● As you can see, we have the various attributes as mentioned below.

1. npreg – Number of times pregnant