100% found this document useful (1 vote)

431 views7 pages

Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component

This document outlines steps to perform principal component analysis (PCA) and clustering on pharmaceutical patient data to determine optimal patient groupings. It describes loading the data, applying PCA to reduce dimensions, plotting variance and scatter plots, performing hierarchical and K-means clustering on original and PCA-transformed data, and using a scree plot to identify optimal cluster numbers and check if the pre- and post-PCA clusterings are the same. The objective is to segment patients by factors like age to help a drug company study a new heart disease treatment.

Uploaded by

Sravani Adapa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

431 views7 pages

Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component

Uploaded by

Sravani Adapa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Name:Silpa Batch Id: WDEO 171220 Topic: Principal Component

Analysis

Principal Component Analysis with python

PROBLEM STATEMENT:

Perform Principal component analysis and perform clustering using first 3 principal component
scores ( Hierarchical). Use Scree plot or elbow curve and obtain optimum number of clusters
and check whether we have obtained same number of clusters with the original data

Objective:

Use Scree plot or elbow curve and obtain optimum number of clusters and check whether we
have obtained same number of clusters with the original data

STEPS TO BE FOLLOWED TO OBTAIN SOLUTION:

STEP1:

Loading of wine data on which PCA method are applied. wine data set is of 8 rows and 15
columns. The details of the data is

STEP2:

AFTER describing data ,to obtain PCA s PCA module is imported from sklearn.decomposition . PCA
technique is applied only on normalized data. For normalization Scale module is imported from
sklearn.preprocessing .

STEP3:
For n_components = 6 ;PCA is obtained and its fitted to normalized wine data. Variance plot for PCA
components obtained
STEP3

Later pca data is concated to main data index column and Scatter diagram is plotted

STEP 5:

check whether we have obtained same number of clusters with the original data Clustring
techniques is applied on original wine data.

Dendogram is drawn for original wine data

STEP 6:

performing hclustering for wine data after PCA and Dendogram is drawn

STEP7:

After plotting Dendogram for both the data cluster groups were obtained from
AgglomerativeClustering which is imported from sklearn.cluster by complete method

cluster grouping for main data

cluster grouping for data after PCA framed

STEP8:

Same way kmeans clustering is applied on both the data sets of given and after PCA formed.

to do kmeans ,the regarding module kmeans is imported from sklearn.cluster.

STEP 9:

scree plot or elbow curve is plotted between no.of clusters picked and total data points with in
sum of squares.

scree plot is plotted for PCA applied data

Summary:

Using Scree plot or elbow curve we can determine the obtain optimum number
of clusters obtained same number of clusters with the original data

PROBLEM STATEMENT:
A Pharmaceutical drug manufacturing company is studying on a new medicine to treat
Heart diseases, it has gathered data from its secondary sources, and it would like you to
provide high level analytical insights on the data, its aim is to segregate patients
depending on their age group and other factors as given in the data, perform PCA and
Clustering Machine learning Algorithm on the dataset given, and check if the clusters
formed before and after PCA are same and provide a brief report on your model. You
can also explore more on ways to improve your model.

Objective:

Aim is to segregate patients depending on their age group and other factors as given in
the data, perform PCA and Clustering Machine learning Algorithm on the dataset given,
and check if the clusters formed before and after PCA are same and provide a brief
report on your model. You can also explore more on ways to improve your model.

STEPS TO BE FOLLOWED TO OBTAIN SOLUTION:

STEP1:

Loading of Pharma data on which PCA method are applied. wine data set is of 8 rows and
14columns. The details of the data is

STEP 2

from sklearn.decomposition import PCA ,with that PCA are obtained

Variance plot for PCA components is plotted

STEP:

PCA are concated to the given data index column and then scattered plot is plotted between first two
components of final data

CRISP ML (Q) Business Understanding
No ratings yet
CRISP ML (Q) Business Understanding
17 pages
Umland PHD Dissertation6Britshspelling
100% (1)
Umland PHD Dissertation6Britshspelling
243 pages
Multinomial Problem Statement
No ratings yet
Multinomial Problem Statement
28 pages
Association Rules Ans
No ratings yet
Association Rules Ans
28 pages
Day10 Mathematical Foundations
No ratings yet
Day10 Mathematical Foundations
4 pages
13.exploratory Data Analysis
0% (1)
13.exploratory Data Analysis
10 pages
String Manipulation Problem Statement
No ratings yet
String Manipulation Problem Statement
6 pages
Missing Values
No ratings yet
Missing Values
6 pages
Business Uderstanding Solved1 - Module 1
No ratings yet
Business Uderstanding Solved1 - Module 1
11 pages
2a EDA
50% (2)
2a EDA
11 pages
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
No ratings yet
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
77 pages
Transformations Problem Statement
0% (1)
Transformations Problem Statement
7 pages
CRISP DM Business Understanding Completed
No ratings yet
CRISP DM Business Understanding Completed
18 pages
Inferential Statistics (AutoRecovered)
No ratings yet
Inferential Statistics (AutoRecovered)
12 pages
Association Rules Problem Statement
50% (2)
Association Rules Problem Statement
5 pages
Duplication - Typecasting-Problem Statement
No ratings yet
Duplication - Typecasting-Problem Statement
6 pages
DataPreparation Outlier Treatment
No ratings yet
DataPreparation Outlier Treatment
5 pages
Standardization Problem Statement
No ratings yet
Standardization Problem Statement
5 pages
Topic: Dimension Reduction With PCA: Instructions
No ratings yet
Topic: Dimension Reduction With PCA: Instructions
8 pages
Discretization Problem Statement
No ratings yet
Discretization Problem Statement
4 pages
Basic Statistics (Module - 3)
100% (2)
Basic Statistics (Module - 3)
12 pages
Clustering Documentation R Code
100% (1)
Clustering Documentation R Code
9 pages
Zero Variance-Problem Statement
50% (2)
Zero Variance-Problem Statement
4 pages
Basic Statistics (Module - 3)
No ratings yet
Basic Statistics (Module - 3)
9 pages
13.exploratory Data Analysis
50% (2)
13.exploratory Data Analysis
8 pages
Problem Statement - Mathematical Foundations
No ratings yet
Problem Statement - Mathematical Foundations
6 pages
Discretization Problem Statement
No ratings yet
Discretization Problem Statement
3 pages
Zero Variance-Problem Statement
0% (1)
Zero Variance-Problem Statement
3 pages
PCA Problem Statement With Answer
No ratings yet
PCA Problem Statement With Answer
22 pages
Recommendation Engine Problem Statement
33% (3)
Recommendation Engine Problem Statement
3 pages
Duplication - Typecasting-Problem Statement
100% (1)
Duplication - Typecasting-Problem Statement
3 pages
Hierarchical Clustering: Instructions
67% (3)
Hierarchical Clustering: Instructions
4 pages
R - Assignment
No ratings yet
R - Assignment
2 pages
Module 04 - Part1 Assignment
75% (4)
Module 04 - Part1 Assignment
10 pages
06.discretization Problem Statement
50% (2)
06.discretization Problem Statement
2 pages
15 KNN - Problem Statement
0% (2)
15 KNN - Problem Statement
3 pages
11 Network Analytics - Problem Statement
25% (4)
11 Network Analytics - Problem Statement
4 pages
Assignment 06
50% (2)
Assignment 06
2 pages
Agentic AI in Predictive AIOps Enhancing IT Autonomy and Performance
No ratings yet
Agentic AI in Predictive AIOps Enhancing IT Autonomy and Performance
9 pages
DataPreparation Outlier Treatment
100% (1)
DataPreparation Outlier Treatment
3 pages
ML Journal
No ratings yet
ML Journal
29 pages
PCA Problem Statement
No ratings yet
PCA Problem Statement
25 pages
M.E.S. College of Nursing: Evaluation Criteria For Nursing Care Plan
No ratings yet
M.E.S. College of Nursing: Evaluation Criteria For Nursing Care Plan
11 pages
Name: Suresh Basic Statistics (Module - 4 ( - 2) )
No ratings yet
Name: Suresh Basic Statistics (Module - 4 ( - 2) )
8 pages
Multivariate Analysis in The Pharmaceutical Industry Instant Reading Access
100% (8)
Multivariate Analysis in The Pharmaceutical Industry Instant Reading Access
17 pages
Data Assigment 1
100% (2)
Data Assigment 1
32 pages
EDTA METHOD (Sulphate)
100% (3)
EDTA METHOD (Sulphate)
2 pages
Support Vector Machines Problem Statement
No ratings yet
Support Vector Machines Problem Statement
27 pages
BTEC Business Level 3 Marketing Knowledge Organiser
No ratings yet
BTEC Business Level 3 Marketing Knowledge Organiser
8 pages
Testing The Reliability and Validity of The Self-Effica.. PDF
No ratings yet
Testing The Reliability and Validity of The Self-Effica.. PDF
13 pages
Module-Preliminaries For Data Analysis - Data Science
100% (1)
Module-Preliminaries For Data Analysis - Data Science
5 pages
Assignment 03
No ratings yet
Assignment 03
4 pages
Statistics and Probability
No ratings yet
Statistics and Probability
8 pages
Review of Literature On Employee Attrition and Retention
No ratings yet
Review of Literature On Employee Attrition and Retention
6 pages
Power BI in Oil & Gas
No ratings yet
Power BI in Oil & Gas
5 pages
Radhika PCA - Problem Statement
No ratings yet
Radhika PCA - Problem Statement
3 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Ud Thesis Guidelines
100% (5)
Ud Thesis Guidelines
8 pages
Performance Management System
No ratings yet
Performance Management System
12 pages
PCA Example - Toothpaste
No ratings yet
PCA Example - Toothpaste
7 pages
PCA Example - Toothpaste
No ratings yet
PCA Example - Toothpaste
6 pages
KNN - Problem Statement ANSWER
100% (1)
KNN - Problem Statement ANSWER
8 pages
DataPreparation - Outlier - Treatment ASSIGNMENT 1
100% (1)
DataPreparation - Outlier - Treatment ASSIGNMENT 1
7 pages
Module 03 Assignment
100% (1)
Module 03 Assignment
13 pages
CRISP - ML (Q) - Business Understanding
No ratings yet
CRISP - ML (Q) - Business Understanding
13 pages
Simple Linear Regression - Assign4
No ratings yet
Simple Linear Regression - Assign4
8 pages
Business Moments Graphic Assignmebt
No ratings yet
Business Moments Graphic Assignmebt
11 pages
Problem Statement
100% (3)
Problem Statement
8 pages
Research in The Foreign Languages
No ratings yet
Research in The Foreign Languages
33 pages
PELS, Peter - The Anthropology of Colonialism
No ratings yet
PELS, Peter - The Anthropology of Colonialism
23 pages
Text Mining Problem Statement
100% (1)
Text Mining Problem Statement
3 pages
PBL Nya FK UGM
No ratings yet
PBL Nya FK UGM
18 pages
04 Jes 269 11 01 2020
No ratings yet
04 Jes 269 11 01 2020
9 pages
Global Assessment of Functioning
No ratings yet
Global Assessment of Functioning
12 pages
Day13 K Means Clustering
No ratings yet
Day13 K Means Clustering
4 pages
10th Grade Mental Health Lesson Plan
No ratings yet
10th Grade Mental Health Lesson Plan
3 pages
Elevance Rees: The Futures Group International
No ratings yet
Elevance Rees: The Futures Group International
11 pages
Polytechnic University of The Philippines
No ratings yet
Polytechnic University of The Philippines
3 pages
Simple Linear Regression - Assign2
No ratings yet
Simple Linear Regression - Assign2
9 pages
Basic Statistics (Module - 3)
No ratings yet
Basic Statistics (Module - 3)
7 pages
8.dummy Variables
No ratings yet
8.dummy Variables
4 pages
Transportation Problem 2
100% (1)
Transportation Problem 2
11 pages
Simple Linear Regression - Assignn5
No ratings yet
Simple Linear Regression - Assignn5
8 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Why Collaborative Inquiry
No ratings yet
Why Collaborative Inquiry
10 pages
Day02-Data Understanding Answer Asit 25082022
No ratings yet
Day02-Data Understanding Answer Asit 25082022
4 pages
The Portion Size Effect and Overconsumption - Towards Downsizing Solutions For Children and Adolescents - An Update
No ratings yet
The Portion Size Effect and Overconsumption - Towards Downsizing Solutions For Children and Adolescents - An Update
8 pages
Course Outline
No ratings yet
Course Outline
6 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
INP Forms
No ratings yet
INP Forms
7 pages
Industry Highlights: Bioscience in Prince Edward Island
No ratings yet
Industry Highlights: Bioscience in Prince Edward Island
6 pages
Grade 6-King-1st Myp Assessment Cover Sheet-Technology Design Year 1 9 8 2015
No ratings yet
Grade 6-King-1st Myp Assessment Cover Sheet-Technology Design Year 1 9 8 2015
6 pages
Exploring The Complex Relations Between Achievement Emotions and Self-Regulated Learning Behaviors in Online Learning
No ratings yet
Exploring The Complex Relations Between Achievement Emotions and Self-Regulated Learning Behaviors in Online Learning
6 pages
Accuracy and Precision - Definition, Examples
No ratings yet
Accuracy and Precision - Definition, Examples
3 pages
Think 1
No ratings yet
Think 1
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component

Uploaded by

Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component

Uploaded by

Name:Silpa Batch Id: WDEO 171220 Topic: Principal Component

Principal Component Analysis with python

STEPS TO BE FOLLOWED TO OBTAIN SOLUTION:

Dendogram is drawn for original wine data

cluster grouping for main data

to do kmeans ,the regarding module kmeans is imported from sklearn.cluster.

scree plot is plotted for PCA applied data

STEPS TO BE FOLLOWED TO OBTAIN SOLUTION:

from sklearn.decomposition import PCA ,with that PCA are obtained

Variance plot for PCA components is plotted

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.