100% found this document useful (5 votes)
3K views

Advanced Statistics - Project Report

The document describes a factor analysis conducted on a dataset with 13 variables related to customer satisfaction. 4 factors were identified through principal component analysis and varimax rotation: 1) Customer Service, 2) Marketing, 3) Technical Support, and 4) Product Value. A multiple linear regression model was then developed using the factor scores, with the factors of Customer Service, Marketing, and Product Value found to significantly influence customer satisfaction.

Uploaded by

Rohan Kanungo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (5 votes)
3K views

Advanced Statistics - Project Report

The document describes a factor analysis conducted on a dataset with 13 variables related to customer satisfaction. 4 factors were identified through principal component analysis and varimax rotation: 1) Customer Service, 2) Marketing, 3) Technical Support, and 4) Product Value. A multiple linear regression model was then developed using the factor scores, with the factors of Customer Service, Marketing, and Product Value found to significantly influence customer satisfaction.

Uploaded by

Rohan Kanungo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Advanced Statistics Module Mini-Project Rohan Kanungo

MINI PROJECT
ADVANCED
STATISTICS MODULE
Submitted by Rohan Kanungo
5th June 2019

pg. 1
Advanced Statistics Module Mini-Project Rohan Kanungo

TABLE OF CONTENTS
Project Objective ........................................................................ 3
Problem Analysis ....................................................................... 4
Evidence of Multicollinearity.................................................... 5
Factor Analysis ........................................................................... 7
Naming of Factors .................................................................... 10
Multiple Regression Analysis ................................................. 11
R-Code....................................................................................... 13

pg. 2
Advanced Statistics Module Mini-Project Rohan Kanungo

Project Objective
The project is focussed on market segmentation in the context of product service
management. The data file Facor-Hair is to be used for performing the analysis.

pg. 3
Advanced Statistics Module Mini-Project Rohan Kanungo

Problem Analysis
The data set consists of 13 variables and 100 observations. Satisfaction is the
dependent variable and the others are the factors that determine the satisfaction
(independent variables)
For the purposes of market segmentation, Principal Component/Factor analysis can
be used identify the structure of a set of variables as well as provide a process for
data reduction.
We therefore examine and analyze the data set -
 Understand whether these variables can be “grouped.” By grouping the
variables, we will be able to see the big picture in terms of understanding the
customer
 Reduce the 13 variables to a smaller number of composite variables

str(Hairdata_original)
'data.frame': 100 obs. of 13 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ ProdQual : num 8.5 8.2 9.2 6.4 9 6.5 6.9 6.2 5.8 6.4 ...
$ Ecom : num 3.9 2.7 3.4 3.3 3.4 2.8 3.7 3.3 3.6 4.5 ...
$ TechSup : num 2.5 5.1 5.6 7 5.2 3.1 5 3.9 5.1 5.1 ...
$ CompRes : num 5.9 7.2 5.6 3.7 4.6 4.1 2.6 4.8 6.7 6.1 ...
$ Advertising : num 4.8 3.4 5.4 4.7 2.2 4 2.1 4.6 3.7 4.7 ...
$ ProdLine : num 4.9 7.9 7.4 4.7 6 4.3 2.3 3.6 5.9 5.7 ...
$ SalesFImage : num 6 3.1 5.8 4.5 4.5 3.7 5.4 5.1 5.8 5.7 ...
$ ComPricing : num 6.8 5.3 4.5 8.8 6.8 8.5 8.9 6.9 9.3 8.4 ...
$ WartyClaim : num 4.7 5.5 6.2 7 6.1 5.1 4.8 5.4 5.9 5.4 ...
$ OrdBilling : num 5 3.9 5.4 4.3 4.5 3.6 2.1 4.3 4.4 4.1 ...
$ DelSpeed : num 3.7 4.9 4.5 3 3.5 3.3 2 3.7 4.6 4.4 ...
$ Satisfaction: num 8.2 5.7 8.9 4.8 7.1 4.7 5.7 6.3 7 5.5 ...

pg. 4
Advanced Statistics Module Mini-Project Rohan Kanungo

Evidence of Multicollinearity
The sample size is 100 which provides an adequate basis to calculate the corelation
between variables.
To determine the existence of collinearity, we run a collinearity test.
## Find the correlation
cor(Hairdata)
cor.plot(Hairdata,numbers=TRUE,xlas = 2,upper=FALSE)

The plot above shows that there is evidence of multicollinearity. The cells marked in
blue show a high degree of possibility of multi-collinearity.

pg. 5
Advanced Statistics Module Mini-Project Rohan Kanungo

To determine the significance of collinearity, we run Bartlett’s test.


## Significance of correlation
## Bartlett's Test
cortest.bartlett(Hairdata,n=100)

$chisq
[1] 619.2726
$p.value
[1] 1.79337e-96
$df
[1] 55

Conclusion:
Since the p-value is very less, the test indicates that statistically, multicollinearity
exists in the data set.

pg. 6
Advanced Statistics Module Mini-Project Rohan Kanungo

Factor Analysis
1. Eigen Value Computation

eigen() decomposition

$values
3.426971 2.550897 1.690976 1.086556 0.609424 0.551884 0.401518 0.246952
0.203553 0.132842 0.098427

2. Scree Plot

## Scree Plot
HairScree<-data.frame(Hairfactor,HairEigenValue)
plot(HairScree,col="RED",pch=18,main="Scree Plot")
lines(HairScree,col="Blue")
abline(h=1,col="PURPLE")

pg. 7
Advanced Statistics Module Mini-Project Rohan Kanungo

Using the Kaiser rule, we determine that there are four factors, which are the
principal factors.

3. Rotation of Loadings

## Loadings
## Unrotate Principal Loadings
Hair_unrotate <- principal(Hairdata,nfactors = 4,rotate = "none")
print(Hair_unrotate,digits=5)
UnRotatedprofile <-plot(Hair_unrotate,row.names(Hair_unrotate$loadings))
UnRotatedprofile

PC1 PC2 PC3 PC4


SS loadings 3.42697 2.55090 1.69098 1.08656
Proportion Var 0.31154 0.23190 0.15373 0.09878
Cumulative Var 0.31154 0.54344 0.69717 0.79595
Proportion Explained 0.39141 0.29135 0.19314 0.12410
Cumulative Proportion 0.39141 0.68276 0.87590 1.00000

To make the boundaries sharper, we perform an orthogonal rotation to clearly identify the factors.

## Rotate Principal Loadings


Hair_rotate <- principal(Hairdata,nfactors=4,rotate="varimax")
print(Hair_rotate,digits=5)
Rotatedprofile <-plot(Hair_rotate,row.names(Hair_rotate$loadings),cex=1.0)
Rotatedprofile

pg. 8
Advanced Statistics Module Mini-Project Rohan Kanungo

RC1 RC2 RC3 RC4


SS loadings 2.89268 2.23362 1.85551 1.77359
Proportion Var 0.26297 0.20306 0.16868 0.16124
Cumulative Var 0.26297 0.46603 0.63471 0.79595
Proportion Explained 0.33039 0.25511 0.21193 0.20257
Cumulative Proportion 0.33039 0.58550 0.79743 1.00000

4. Plot and Determine the Factors


par(mfrow=c(1,2))
fa.diagram(Hair_unrotate,main="Unrotated factors")
fa.diagram(Hair_rotate,main="Rotated factors")

pg. 9
Advanced Statistics Module Mini-Project Rohan Kanungo

Naming of Factors
RC1 RC2 RC3 RC4
ProdQual 0.00152 -0.01274 -0.03282 0.87566
Ecom 0.0568 0.87056 0.04735 -0.11746
TechSup 0.01833 -0.02446 0.93919 0.10051
CompRes 0.92582 0.11593 0.0486 0.09123
Advertising 0.13876 0.74152 -0.0816 0.01467
ProdLine 0.59122 -0.06397 0.14598 0.642
SalesFImage 0.13252 0.90045 0.07559 -0.15924
ComPricing -0.08515 0.22563 -0.24551 -0.72258
WartyClaim 0.10982 0.05483 0.93099 0.10218
OrdBilling 0.86376 0.10683 0.0839 0.03931
DelSpeed 0.9382 0.17734 -0.00463 0.05227

Factor 1: Customer Service


i. CompRes
ii. DelSpeed
iii. OrdBilling

Factor 2 : Marketing
i. SalesFImage
ii. Ecom
iii. Advertising

Factor 3 : Technical Support


i. TechSup
ii. WartyClaim

Factor 4 : Product Value


i. ProdQual
ii. ComPricing

pg. 10
Advanced Statistics Module Mini-Project Rohan Kanungo

Multiple Regression Analysis


Using the factor scores of the four factors identified, we build a data set and
formulate a multiple linear regression model.
## Multiple Regression Analysis
## Create the data set using the factor scores from PCA/FA process

mydata=data.frame(Hair_rotate$score)
mydataforregression=cbind(mydata,Hairdata_original$Satisfaction)
names(mydataforregression) <-
c("customerservice","marketing","techsupport","productvalue","customersatisf
action")
str(mydataforregression)
attach(mydataforregression)

## Build Simple Linear Model


SLM=lm(customersatisfaction~customerservice+marketing+techsupport+productval
ue,data=mydataforregression)
summary(SLM)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.91800 0.07089 97.589 < 2e-16 ***
customerservice 0.61805 0.07125 8.675 1.12e-13 ***
marketing 0.50973 0.07125 7.155 1.74e-10 ***
techsupport 0.06714 0.07125 0.942 0.348
productvalue 0.54032 0.07125 7.584 2.24e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7089 on 95 degrees of freedom


Multiple R-squared: 0.6605, Adjusted R-squared: 0.6462
F-statistic: 46.21 on 4 and 95 DF, p-value: < 2.2e-16

pg. 11
Advanced Statistics Module Mini-Project Rohan Kanungo

R-squared Interpretation
 Multiple R-squared: 0.6605 means that 66.05% of the dependent variable is
explained by the independent variables; i.e. 66.05% of customer satisfaction is
dependent on the four factors identified.
 Probability (F-statistic > 46.21) = p-value: < 2.2e-16 is much smaller than 5%
 Hence, REJECT NULL HYPOTHESIS that all betas are zeroes
 Conclude at least one beta exists, ACCEPT ALT. HYPOTHESIS
 Individual Coefficients are also highly significant as evidenced by the t-stat that are
extremely low
 Pvalue - Each one of them is much less than 5%. Hence, individual betas also exist.

Overall, regression model exists in the poulation, meaning that the linear model of
customer satisfaction depending on customer service, marketing, technical support
and product value is statistically valid.

pg. 12
Advanced Statistics Module Mini-Project Rohan Kanungo

R-Code
## =======================================================================
## MINI-PROJECT 2
## MODULE - ADVANCED STATISTICS
## =======================================================================
## Environment Set up
## Read Input File "Factor-Hair-Revised"
## Install libraries nFactors and Psych for Factor Analysis

library(nFactors)
library(psych)
getwd()
Hairdata_original <- read.csv("Factor-Hair-Revised.csv",header=TRUE)
View(Hairdata_original)
attach(Hairdata_original)
str(Hairdata_original)

## Since the first col is an ID number, we need to ignore it before analysis


## The last column Satisfaction is the dependent variable, we need to remove it before
analysis
Hairdata <-Hairdata_original[,2:12]
str(Hairdata)

## Find the correlation


cor(Hairdata)
cor.plot(Hairdata,numbers=TRUE,xlas = 2,upper=FALSE)

## Significance of correlation
## Bartlett's Test
cortest.bartlett(Hairdata,n=100)

## How many factors are applicable


## Eigen Values is the basis for selecting the number of factors
## Eigen Value Computation
Hairev<-eigen(cor(Hairdata))
print(Hairev,digits=5)
HairEigenValue=Hairev$values
HairEigenValue

## There are 11 variables, hence 11 factors


Hairfactor <-seq(1,11,by=1)
Hairfactor

## Scree Plot
HairScree<-data.frame(Hairfactor,HairEigenValue)
plot(HairScree,col="RED",pch=18,main="Scree Plot")

pg. 13
Advanced Statistics Module Mini-Project Rohan Kanungo

lines(HairScree,col="Blue")
abline(h=1,col="PURPLE")

## Loadings
## Unrotate Principal Loadings
Hair_unrotate <- principal(Hairdata,nfactors = 4,rotate = "none")
print(Hair_unrotate,digits=5)
UnRotatedprofile <-plot(Hair_unrotate,row.names(Hair_unrotate$loadings))
UnRotatedprofile

## Rotate Principal Loadings


Hair_rotate <- principal(Hairdata,nfactors=4,rotate="varimax")
print(Hair_rotate,digits=5)
Rotatedprofile <-plot(Hair_rotate,row.names(Hair_rotate$loadings),cex=1.0)
Rotatedprofile
Hair_rotate$scores
Hair_rotate$loadings

par(mfrow=c(1,2))
fa.diagram(Hair_unrotate,main="Unrotated factors")
fa.diagram(Hair_rotate,main="Rotated factors")

## Multiple Regression Analysis


## Create the data set using the factor scores from PCA/FA process
mydata=data.frame(Hair_rotate$score)
mydataforregression=cbind(mydata,Hairdata_original$Satisfaction)
names(mydataforregression) <-
c("customerservice","marketing","techsupport","productvalue","customersatisfaction")
str(mydataforregression)
attach(mydataforregression)

## Build Simple Linear Model


SLM=lm(customersatisfaction~customerservice+marketing+techsupport+productvalue,data=my
dataforregression)
summary(SLM)

## =======================================================================
## END MINI-PROJECT 2
## =======================================================================

pg. 14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy