100% found this document useful (1 vote)
431 views7 pages

Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component

This document outlines steps to perform principal component analysis (PCA) and clustering on pharmaceutical patient data to determine optimal patient groupings. It describes loading the data, applying PCA to reduce dimensions, plotting variance and scatter plots, performing hierarchical and K-means clustering on original and PCA-transformed data, and using a scree plot to identify optimal cluster numbers and check if the pre- and post-PCA clusterings are the same. The objective is to segment patients by factors like age to help a drug company study a new heart disease treatment.

Uploaded by

Sravani Adapa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
431 views7 pages

Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component

This document outlines steps to perform principal component analysis (PCA) and clustering on pharmaceutical patient data to determine optimal patient groupings. It describes loading the data, applying PCA to reduce dimensions, plotting variance and scatter plots, performing hierarchical and K-means clustering on original and PCA-transformed data, and using a scree plot to identify optimal cluster numbers and check if the pre- and post-PCA clusterings are the same. The objective is to segment patients by factors like age to help a drug company study a new heart disease treatment.

Uploaded by

Sravani Adapa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Name:Silpa Batch Id: WDEO 171220 Topic: Principal Component

Analysis

Principal Component Analysis with python

PROBLEM STATEMENT:

Perform Principal component analysis and perform clustering using first 3 principal component
scores ( Hierarchical). Use Scree plot or elbow curve and obtain optimum number of clusters
and check whether we have obtained same number of clusters with the original data

Objective:

Use Scree plot or elbow curve and obtain optimum number of clusters and check whether we
have obtained same number of clusters with the original data

STEPS TO BE FOLLOWED TO OBTAIN SOLUTION:

STEP1:

Loading of wine data on which PCA method are applied. wine data set is of 8 rows and 15
columns. The details of the data is

STEP2:

AFTER describing data ,to obtain PCA s PCA module is imported from sklearn.decomposition . PCA
technique is applied only on normalized data. For normalization Scale module is imported from
sklearn.preprocessing .

STEP3:
For n_components = 6 ;PCA is obtained and its fitted to normalized wine data. Variance plot for PCA
components obtained
STEP3

Later pca data is concated to main data index column and Scatter diagram is plotted

STEP 5:

check whether we have obtained same number of clusters with the original data Clustring
techniques is applied on original wine data.

Dendogram is drawn for original wine data


STEP 6:

performing hclustering for wine data after PCA and Dendogram is drawn

STEP7:

After plotting Dendogram for both the data cluster groups were obtained from
AgglomerativeClustering which is imported from sklearn.cluster by complete method

cluster grouping for main data


cluster grouping for data after PCA framed

STEP8:

Same way kmeans clustering is applied on both the data sets of given and after PCA formed.

to do kmeans ,the regarding module kmeans is imported from sklearn.cluster.

STEP 9:

scree plot or elbow curve is plotted between no.of clusters picked and total data points with in
sum of squares.

scree plot is plotted for PCA applied data


Summary:

Using Scree plot or elbow curve we can determine the obtain optimum number
of clusters obtained same number of clusters with the original data

PROBLEM STATEMENT:
A Pharmaceutical drug manufacturing company is studying on a new medicine to treat
Heart diseases, it has gathered data from its secondary sources, and it would like you to
provide high level analytical insights on the data, its aim is to segregate patients
depending on their age group and other factors as given in the data, perform PCA and
Clustering Machine learning Algorithm on the dataset given, and check if the clusters
formed before and after PCA are same and provide a brief report on your model. You
can also explore more on ways to improve your model.

Objective:

Aim is to segregate patients depending on their age group and other factors as given in
the data, perform PCA and Clustering Machine learning Algorithm on the dataset given,
and check if the clusters formed before and after PCA are same and provide a brief
report on your model. You can also explore more on ways to improve your model.

STEPS TO BE FOLLOWED TO OBTAIN SOLUTION:

STEP1:

Loading of Pharma data on which PCA method are applied. wine data set is of 8 rows and
14columns. The details of the data is

STEP 2

from sklearn.decomposition import PCA ,with that PCA are obtained

Variance plot for PCA components is plotted


STEP:

PCA are concated to the given data index column and then scattered plot is plotted between first two
components of final data

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy