RM Lab Main File BBA Project

MAHARAJA SURAJMAL INSTITUTE
C-4 JANAKPURI, NEW DELHI, DELHI-110058

DEPARTMENT OF BUSINESS ADMINISTRATION
RESEARCH METHODOLOGY LAB

SUBJECT CODE – BBA(B&I)212
SUBMITTED BY- SUBMITTED TO-
Name-Ashutosh Dhawan DR.Shavita Deshwal
ENROLL N0.- 35214901817 ASSOCIATE PROFESSOR

Module 1
Introduction to SPSS
Objectives
 Learn about SPSS
 Open SPSS
 Layout of SPSS
 Become familiar with Menus
 Exit SPSS
 Recoding Variables
 Frequency Distribution
 Cross Tabulation
 Graphs
What is SPSS
SPSS is the abbreviation of ‘Statistical Package for the Social Sciences’.This software package
was created in 1968 by SPSS Inc and it was acquired by IBMin 2009.In 2014 the software was
officially renamedIBM SPSS Statistics but still commonly referred as SPSS.
The software was originally meant for the social sciences,but has become popular in other fields
such as health sciences and especially in marketing, marketing research and data mining.SPSS is
a Windows based program that can be used to perform data entry and analysis and to create
tables and graphs.
Features of SPSS
 It is easy to learn.
 SPSS includes lot of data management system and editing tools.
 It offers in-depth statistical capabilities.
 It offers excellent plotting, reporting and presentation features.
Opening SPSS
Click the left mouse button on the button on your screen, then put your cursor on Programsor All
Programsand left click the mouse. Select SPSS 17.0 for Windowsby clicking the left mouse
button.
Start→Programs→SPSS17
Layout of SPSS
The Data Editor window has two views that can be selected from the lower left hand side of the screen.
Data View is where you see the data you are using. Variable View is where you can specify the format of
your data when you are creating a file or where you can check the format of a pre-existing file. The data
in the Data Editor is saved in a file with the extension .sav
A dialogue box will appear in front of SPSS grid listing several options to choose from.The following
options will appear in the dialogue box.
 Run the tutorial

 Type in data
 Run an existing query
 Create a new query using data based wizard
 Open an existing data source
 Open another type of file
SPSS Menus
 File:includes all of the options you we use in other programs, such as open, save, exit.
 Edit: includes the cut, copy, and paste commands, and allows us to specify various options for
displaying data and output.
 View: allows us to select which toolbars you want to show, select font size, add or remove the
gridlines that separate each piece of data, and to select whether or not to display your raw data or
the data labels.
 Data allows you to select several options ranging from displaying data that is sorted by a specific
variable to selecting certain cases for subsequent analyses.
 Transform includes several options to change current variables. For example, you can change
continuous variables to categorical variables, change scores into rank scores, add a constant to
variables, etc.
 Analyze: includes all of the commands to carry out statistical analyses and to calculate
descriptive statistics.
 Graphs: includes the commands to create various types of graphs including box plots,
histograms, line graphs, and bar charts.
 Utilities:allows us to list file information which is a list of all variables, there labels, values,
locations in the data file, and type.
 Add-ons:are programs that can be added to the base SPSS package. Wedo not have access to any
of those.
 Window can be used to select which window you want to view (i.e., Data Editor, Output Viewer,
or Syntax).
 Help has many useful options including a link to the SPSS homepage, a statistics coach, and a
syntax guide. Using topics, you can use the index option to type in any key word and get a list of
options, or you can view the categories and subcategories available under contents. This is an
excellent tool and can be used to troubleshoot most problems.
Exiting SPSS
To close SPSS, we can either left click on the close button located on the upper right hand corner of the
screen or select Exit from the File menu. A dialog box will appear for every open window asking you if
you want to save it before exiting. We must always save the data files.
SPSS Data Editor
The SPSS data editor window has two views
1. Data View
Data view contains the actual data.
2. Variable View
Variable view is used to define variables that will store the data.
The first step is to open the variable view window of the data editor and define variables. Let us take an
example of faculty data of a educational institution need to be analyzed. The objective is to create a small
data file for employees that consist of six variables.
S.No Variable Name Variable Type

1 EmpID Numeric
2 Empl_Name String
3 Gender Numeric(Female=1,Male =2)
4 Age Numeric
5 Designation Numeric(Professor=2,Associate Professor=3,Assistant Professor=1)
6 Type_appointment Numeric(Regular=1,Contractual=2,Guest=3)
7 Income Numeric
8 Marital_Status Numeric(Married=1,Unmarried=2)
I. Recoding Variables
To convert a variable into another variable SPSS has transform command.A variable can be converted
using any of the following tools:
 Recode into same variable

 recode into another variable
Example
To create a new variable “Inc_Group” use following steps
1. Click on Transform→Recode into different variable
2. In the dialogue box that appears, select input variable and output variable.
3. In the new dialogue box define how to transform using old and new value rules
This will create a new variable whose data will appear in data editor window
II. Frequency Distribution
To convert the data into a frequency distribution SPSS has Analyze command
Example
To create frequency distribution based on variable ‘Gender’following steps must be followed:
 Click on Analyze→Descriptive Statistics →Frequencies
 Select the variable s from the left column of the sialogue box and move them to the right hand
“Variable” window. Click ok
 The frequency distribution will be shown as output in output window as given below:
III. Cross Tabulation

To describe the relationship between two categorical variables, we use a special type of table called cross
tabulation. To create crosstab for relationship between variables Gender and MaritalStatus use
following steps use following steps:
Example
 Click on Analyze→Descriptive Statistics →Crosstab
 In the crosstab dialogue box from the left column select the variables for rows and columns in the
right. Then click ok
 The crosstab will appear in the output window
IV. Graphs
To make a Graph for any variable,SPSS has Graph command.
Example
To make a graph for Income of employees, use following steps:
 Click Graphs→Legacy Dialogs →Bars
 A dialogue appears. Click on the type of bar graph required.
A new dialogue box appears. Click on the appropriate radio button to tell SPSS what type of the
data is in variable list:
 Summaries for group of cases

 Summaries of separate variables or
 Value of individual cases
After making selections, click the “Define”button

 The transfer the variable which you want bars to represent from the left to right column and the
variable you want category label to show.
 The output will appear in the output window showing a graph representing employee id on the
horizontal axis and Income of employee on vertical axis which the bar represents.
Module 2
Frequencies and Descriptive Statistics
Objectives
 Value Labels
 Sorting
 Frequencies
 Frequencies for Scale Variables
 Graphs(Bar Graph, Histogram, Box Plot)
 Descriptive
 Explore
Let us take an example of Students data for two coaching institutes need to be analyzed. The
objective is to create a small data file for students that consist of six variables.
S.No Variable Name Variable Label Variable Type

1 C1 Student ID Numeric
2 C2 Gender Numeric(Female=1,Male =2)
3 C3 Age Numeric
4 C4 Coaching Institute Numeric(TIME=1,IMS=2)
5 C5 Coaching Sessions Numeric
6 C6 Level of Satisfaction Numeric
Value Labels
We can assign descriptive value labels for each value of a variable. This process is particularly
useful if your data file uses numeric codes to represent non numeric categories ( eg.Code1for
female & code 2 for male).
To view that we have assigned correct value label, use following steps (taking example of values
assigned to Gender)
1. Click View → Value Labels

2. The result will be that that the numbers 1 and 2 get changed into female and male
respectively.
Sorting
Sorting data allows us to reorganize the data in ascending or descending order with respect to a
specific variable. Some procedures in SPSS require that your data be sorted in a certain way
before the procedure will execute. There are two options for sorting data:
1. Sort Cases(i.e. Row Sort)

2. Sort Variables(i.e. Column Sort)
Example
To Sort “Gender” in SPSS

Use following steps to sort
1. Click Data → Sort Cases
2. A sort cases dialogue box will appear variables on the left side and sort by and sort
order on right side. Transfer the variable (Gender) required from left to sort by
column. Then select the order (ascending) from sort order column. Then click OK
3. The result is that all females and all males appear together from females to males in
ascending order.
Frequencies
As frequency distribution is an overview of all distinct values in some variable and the number
of times they occur. Frequency distributions are mostly used for summarizing categorical
variables.
The vast majority of descriptive statistics available in the Frequencies: Statistics window are
never appropriate for nominal variables, and are rarely appropriate for ordinal variables in most
situations. There are two exceptions to this:
1. The mode (which is the most frequent response) has a clear interpretation when applied
to most nominal and ordinal categorical variables.
2. The values are group mid points option can be applied to certain ordinal variables that
have been coded in such a way that their value takes on the midpoint of a range.
Example
To draw frequency table and apply various statistical tools on the variable, use following steps
1. Click Analyze → Descriptive Statistics → Frequencies

2. A “Frequencies” dialogue box will open .Now transfer the variable (Gender) from left to
right by clicking the arrow in middle.
3. Then to use the Statistical tools like mean, median etc. Click statistics in right
4. A new dialogue box will open showing various statistical tools .Select among
 Central Tendency: Mean, Median, Mode
 Dispersion: Maximum and Minimum
5. Then click continue.
Interpretation
The frequency table shows the frequency of each gender. This shows among 10 students. 4 are
female and 6 are male It also shows that males were more than females (males are 60% and
females are 40%).It also shows cumulative percentage.
The Statistics table shows Mean which shows that the average is 1.60 which does not signify
anything as it is nominal variable and Median which shows middle value which is 2 .The table
also represents mode which is 2 ,which means that the most frequent student students are males.
The mode and median of the variables is same i.e. 2
Frequencies for Scale Variables

Use following steps
1. Click Analyze → Descriptive Statistics → Frequencies
2. A “Frequencies” dialogue box will open .Now transfer the variable (Age, Session and
Level of Satisfaction) from left to right by clicking the arrow in middle.
3. Then use the statistical tool like mean, median etc. Click Statistics in right
4. A new dialogue box will open showing various statistical tools . Select among
 Central Tendency: Mean, Median, Mode
 Dispersion: Maximum and Minimum
5. Then click Continue.
The result will appear in the output window showing three different Frequency tables of all three
variables selected with their percentages and statistics table with Mean, Median, Mode,
Maximum and Minimum of all three variables in one.
Interpretation
The frequency table of the three variables represents the following:
 Age in years: It shows number of students of different ages i.e. frequencies, percentage
and cumulative percentage of all the students.
 Coaching Sessions: It shows number of students taking different number of coaching
sessions.
 Level of satisfaction: It shows number of students satisfied at different levels.
The statistic table represents the following
 Age in years: The mean age is 21.80 which indicate that most of the students who take
coaching sessions are around the age of 22 years. The age in the middle(Median) is 21.50
years .Footnote given at the end of the table represents that there is one mode.So, the
table represents the smallest number as mode. The table shows that students of age 21
years are most likely to take coaching sessions.
 Coaching Sessions: The mean numbers of sessions taken by a student are 5.50. i.e.
approx. 6 .This means on an average 6 coaching sessions are attended by almost every
student. The middle(Median) number of coaching session is also 5.50 i.e. 6 .Sessions also
have multiple mode. The table represents that most likely one coaching session are taken
by most students.
 Level of satisfaction: The mean level of satisfaction is 5.50 i.e. approx 6. The middle
value (median) is also 5.50, same as mean. We can conclude that most students are
neutral towards level of satisfaction. The mode of the data is also 5.50 which mean that
the students are neutral towards the coaching sessions taken by them.
Graphs
Bar Graph
It displays the number of cases in particular categories, or the score on the score of a continuous
variable for different categories. When comparing group means, the best type of graph to use is a
bar chart
Use the following steps to present a bar graph for session variable
1. Click Graph→ Chart Builder
2. The first thing you will see is the pop up box asking you to define your level of
measurement for each variable (i.e. explain to SPSS whether it is Interval, Ordinal,
Nominal).To stop this pop up from displaying again ,tick the ‘ don’t show this dialogue
box again)box.
3. Click on OK to Continue
4. This will open up the Chart Builder dialogue box. The chart builder is an interactive drag
and drop dialog box where we can define exactly what type of graph we would like.
5. Now select from the gallery the image of bar graph we want and drag and drop it to the
chart preview window.
6. Then from the variables select coaching institute and drag and drop it in the place of x
axis. Similarly drag the other variable coaching session from the variables and drop in
place of y axis.
7. In the axis label in element property window that appears next to the chart builder
coaching institute will appear. In the categories column in the same window under order
TIME and IMS will appear.
8. Click OK.
The output window will show a bar graph.
Interpretation
Bar graph shows the mean number of coaching sessions conducted by each coaching institute.
From the bar graphs we can see that TIME and IMS both have kept an average of providing 6
sessions (approx. six sessions) to the students.
Histogram
Histogram displays the distribution of one continuous variable .They provides information about
the shape of the distribution of scores.
Example
To represent the age of the students in form of histogram, we use following steps:
1. Click Graphs→ Chart Builder

2. This will open up Chart builder Dialogue Box The chart Builder is an interactive
drag-and-drop dialogue box where you can define exactly what type of graph you
would like.
3. Now, from the gallery select histogram and the image of histogram we want and
drag- and-drop it to the chart preview window.
4. Then from the variables select age and drag-and-drop it in the place of x axis.
5. Click ok.
The output window will show the resulting histogram as follows:
Interpretation
The histogram shows that there is multiple mode for the data i.e. most of the students are of age
20 and 25.
Box Plot
A box plot is another useful visualization for viewing how the data are distributed. A box plot is
also called a box and a whisker plot, is a way to show the spread and centers of a data set. Box
Plot is a summary plot of our data set, graphically depicting the median, quartiles and extreme
values. The box represents the interquartile(IQ) range which contains the middle 50% of
records .The whiskers are lines that extend from the upper and lower edge of the box to the
highest and lowest values which are no greater than 1.5 times the IQ range.
The following steps are performed to create a box plot for age in years of different genders.
1. Click Graph → Chart Builder Box Plot

2. This will open up the chart builder dialogue box .Now select from the gallery box plot
and the image of the box plot you want and drag and drop it to the chart preview
window.
3. Then from the variables select age and drag-and-drop it in the place of y axis. Then
select gender and drag it in the place of x axis.
4. Click OK.
Interpretation
The thick line in the middle is the median. The top and bottom lines shows the first and third
quartiles. The whiskers shows the maximum and minimum values, with the exception of
outliers(circles)and extremes(asterisks). Outliers are at least 1.5 box length from the median and
extremes are at least 3 box length from the median
The lower line of box plot shows the minimum age, the upper line shows the maximum age.
Starting of box from down shows quartile 1,the darkened black line shows quartile 20 i.e. median
and the end of the box shows quartile 3 of the age(for both female and male).It also represents
that the median of females are much more than males. The median is located below the center of
the box and the upper tail is longer than the lower tail, this data is skewed right.
Descriptive
In SPSS, the Descriptive procedure computes a select set of basic descriptive statistics for one or
more continuous numeric variables. Additionally, the descriptive procedures can optionally
computes Z scores a new variable in our data set. The descriptive procedure can’t compute
medians or quartiles. The descriptive procedure is best used when you want to compare the
descriptive statistics of several numeric variables side by side. Standardizing variable means
rescaling them so that they have a mean of zero and standard deviation of one. This is done by
subtracting a variable’s mean from each separate value and dividing the remainder by the
variable’s standard deviation. The resulting values are called z scores.
Descriptive table represent the value of mean ,maximum ,minimum, standard deviation,
kurtosis(the sharpness of peak of a frequency distribution curve) and skewness (degree to which
a statistical distribution is not a balance around the mean i.e. is asymmetrical or lopsided, a
perfectly symmetrical distribution having a value of zero.)
Use following steps for descriptive:
1. Click Analyze→ Descriptive Statistics → Descriptives
2. A dialogue box descriptive will appear with variables on left side. Select the variables
(Coaching Session and Level of Satisfaction) and transfer them from left to right
column. Then click “Options”
3. A new Descriptive: Options dialogue box will open ,Select mean, minimum,
maximum, kurtosis, skewness and variable list. Then click continue.
The result will appear in output window showing the descriptive statistics table.
Interpretation
 Valid N( List wise): This is the number of non missing values. The N table shows that
there is no missing value.
 N: This is the number of valid observations for the variable. The total number of
observations is the sum of N and the number of missing values. The number of valid
observation is ten for both sessions and valid observations.
 Minimum: This is the minimum or smallest value of variable. The minimum numbers of
coaching sessions taken are one and the minimum level of satisfaction is also one, which
means least satisfied.
 Maximum: This is the maximum or the highest value of the variable. The maximum
number of sessions is ten and the maximum number of satisfaction level is ten which
means highly satisfied.
 Mean: This is the arithmetic mean across the observations. Mean is sensitive to small
values and extremely large values. The mean number of coaching sessions taken are
6.67(7approx.) and same is the value for level of satisfaction.
 Standard Deviation: Standard deviation is the square root of the variance .It measures
the spread of a set of observation. It is a measure of the average distance between the
values of the data in the set and the mean. A low standard variation indicates that most of
the numbers are very close to the average. A high standard deviation indicates that the
numbers are spread out. Standard deviation of the coaching sessions and level of
satisfaction is 1.472.
 Skewness: Skewness is asymmetry in a statistical distribution, in which the curve
appears distorted either to the left or to the right. In normal distribution mean= mode
Normal distribution has zero skewness. When the distribution is skewed to the left ( also
called negative skewness)then mean is less than mode. When the distribution is skewed
to the right ( also called positive skewness) then mean greater than mode. The table
represents that the skewness of coaching sessions and level of satisfaction is zero which
means that the distribution is normally distributed and there is complete absence of
skewness.
 Kurtosis: Kurtosis is a measure of whether the data are heavy tailed or light tailed
relative to a normal distribution. Data set with high kurtosis tend to have heavy tails or
outliers. Data set with low kurtosis tend to have light tails or lack of outliers. The kurtosis
for both coaching sessions and level of satisfaction is negative which indicates that it has
lighter tails and a flatter peak.
Explore
The explore command produces a extensive set of statistics for a variable that review the shape,
center, spread and the presence of any outliers in a dispersion. It also produces a box and whisker
plot that provides these summary statistics in visual form. Explore command grants you to
equate two or more groups in terms of various statistics for a variable.
Use following steps:
1. Click Analyze→ Descriptive Statistics→ Explore
2.A dialogue box will appear showing various columns. Select the variable “Coaching Session”
and transfer to dependent list and select level of satisfaction to transfer it to label cases .Select
both in display column.
3.Then click Statistics in right.

2. Explore Statistics dialogue box will appear. In this box select descriptive, Outliers and
Percentile. Then click OK.
3. Then again return to the main dialogue box Explore.Then click Plots.
4. A new dialogue box Explore : Plots will appear. Select Factors Levels together among
box plots and Histograms from Descriptives.Then Click Continue.
5. Then again from the first dialogue box select Options. A new dialogue box will appear.
From this new Options box select exclude cases pair wise. Then click Continue.
6. Now click ok from the first dialogue box(Explore)
The result will appear in the output window representing case processing summary,
Descriptive tables, Percentile, Extreme values, Histogram and Boxplot representing
Coaching Sessions.
Module 3
Correlation
Objectives
 Scatter/Dot Plot
 Correlation
 Split File
 Scatter Plot for Split File
 Correlation for Split Data
Scatter/Dot Plot
A Scatter Plots(or Scatter Graphs) is a graphic tool used to display the relationship between two
quantitative variables. A scatter plot consists of an X axis(the horizontal axis) and Y axis(the
vertical axis)and a series of dots. Usually the first variable is independent and the second is
dependent on the first variable. The scatter diagram is used to find the correlation between two
variables.
Let us take an example of Students data for counseling sessions need to be analyzed. The objective is to
create a small data file for students that consist of six variables.

1 student_id Student ID Nominal
2 Gender Gender Nominal(1=Female,2=Male)
3 Age Age Scale
4 Counsellor Name of Counsellor Nominal (1=Dr. Vipul,2=Dr. Rashmi)
5 Sessions Counselling Sessions Scale
6 level_of_Satisfaction level of Satisfaction Scale
Use the following steps to create Scatter Plots using SPSS
1. Click Graphs →Chart Builder

2. This will open up the Chart Builder dialogue box. The chart builder is an interactive drag
and drop dialog box where we can define exactly what type of graph we would like.
3. Now select from the gallery Scatter/Dots and the image of Scatter we want and drag and
drop it to the chart preview window.
4. Then from the variables select counseling sessions and drag and drop it in the place of x
axis. Similarly drag the other variable level of satisfaction and drop in place of y axis.
5. Click OK.
Interpretation
The scatter diagram represents there is high degree of positive relationship as the dots are
moving from left to right in upward direction .This means that with change in counseling
sessions the satisfaction level of students will change.
Correlation
Correlation is a bivariate analysis that measures the strength of association between two
variables and the direction of relationship. In terms of the strength of relationship the value of
correlation coefficient varies between +1 and -1.A value of + 1indicates a perfect degree of
association between the two variables. The bivariate Pearson Correlation produces a sample
correlation coefficient ,r, which measures the strength and direction of linear relationships
between pairs of continuous variable .By extension ,the Pearson Correlation evaluates whether
there is statistical evidence for a linear relationship among the same pairs of variables in the
population, represented by a population correlation coefficient ,ρ(“rho”). The pearson
Correlation is a parametric measure.
This measure is also known as

 Pearson Correlation
 Pearson product moment correlation(PPMC)
-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 - 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
←---------------------→ ←---------------→ ←-----------→ ←-------------------→
STRONG WEAK WEAK STRONG
1 represents perfect positive correlation

-1 represents negative positive correlation
0 means no correlation
Statistical Significance
Statistical significance is the likelihood that a relationship between two or more variables is
caused by something other than chance. Statistical hypothesis testing is used to determine
whether the result of a data set is statistically significant. This test provides a p-value,
representing the probability that random chance could explain the result; in general, a p-value of
5% or lower is considered to be statistically significant. Statistical significance is used to accept
or reject the null hypothesis. When the test result exceeds the p-value, the null hypothesis is
accepted. When the test result is less than the p-value, the null hypothesis is rejected.
 If the Sig(2 tailed )value is greater than .05
We can conclude that there is no statistically significant correlation between two

variables.
 If the Sig(2 tailed )value is less than or equal to .05
We can conclude that there is statistically significant correlation between two variables.
Use following steps to finding Correlation in SPSS

1. Click Analyze → Correlate→ Bivariate
Interpretation
The correlation between counseling session and level of satisfaction is -.769 which is means that
there is high degree of correlation between the two variables.
The table also represents that significant (2- tailed) level is 0.000 which is less than 0.074,so we
can conclude that there is statistically significant correlation between the counseling session and
level of satisfaction.
Split File
Split File splits the data file into separate groups for analysis based on the values of one or more
grouping variables. If you select multiple grouping variables, cases are grouped by each variable
within categories of the preceding variable on the Groups Based on list.
Compare Groups: Split file groups are presented together for comparison purpose. For pivot
tables a single pivot table is created and each split file variable can be moved between table
dimensions. For charts a separate chart is created for each split file group and the charts are
displayed together in the viewer.
Use the following steps for splitting the file:
1. Click Data → Split File
2. A dialogue box of split file will appear. From the options appearing select Compare
groups. Then from variables on left transfer Gender and Name of Counselor to the
Group Based on. Also select sort the file by grouping variables.
3. Click OK
Interpretation
The result shows all females which prefer Dr.Dhawan as counselor are together and all those
who prefer Dr. Dabswagger are together, similarly data for males was shown.
Scatter Plot for Split File
1. Click Graph → Chart Builder

2. This will open up the Chart Builder dialogue box. The chart builder is an interactive drag-
and-drop-box dialogue box where you can define exactly what type of graph you would
like.
3. Now select from gallery Scatter/dots and the image of scatter you want and drag and drop
it to the chart preview window.
4. Then from the variables select Counseling Session and drag and drop it in the place of x-
axis. The select level of satisfaction and drag and drop it in place of y-axis.
5. After this ,select gender and Name of Counselor (one by one) and drag and drop in the
place of Filter
6. Click OK.
Interpretation
The above graph shows the scatter plot for relation between Counseling Sessions taken by
females from Dr.Dhawan, Counselor and level of satisfaction. Only one female joined Dr.
Dhawanfor counseling and correlation cannot be calculated.
Interpretation
The graph shows relation between Counseling Sessions taken by females from Dr. Dabswagger,
Counselor and level of satisfaction. This represents that there is high degree of positive
correlation.
Interpretation
The graph shows relation between Counseling Sessions taken by males from Dr. Dhawan and the
level of satisfaction. This represents that there is high degree of positive correlation.
Interpretation
The graph shows relation between Counseling Sessions taken by males from Dr. Dabswagger
and level of satisfaction. This represents that there is high degree of positive correlation.
Correlation for Split Data
1. Click Analyze → Correlate→Bivariate

2.A bivariate correlation dialogue box will open.From the variables on the left select counseling
session and level of satisfaction and transfer them to variable column. Then from correlation
coefficients select Pearson and from Test of Significance select Two-tailed.
3.Click OK
The result will appear in output window representing a correlation table which shows correlation
between counseling sessions and level of satisfaction corresponding to both female and male
individually with both counselors .
Interpretation
Only one female has taken counseling from Dr. Dhawan so it is not possible to find the
correlation in that case. The correlation table represents that the correlation between session
(with Dr. Dabswagger) and level of satisfaction of females is 0.967 i.e. high degree of positive
correlation.
Module 4
Cross Tabulation and Chi Square Statistics
Objectives
 Cross Tabs
 Chi Square
CROSS TABS
To describe a single categorical variable, we use frequency tables .To describe the relationship between
two categorical variables we use a special type of variable called a cross tabulation. This type of variable
is also known as Crosstab or Two way Table.
In a cross tab , the categories of one variable determine the rows of the table ,and the categories of other
variable determine the columns. The cells of the table contain the number of times that a particular
combination of categories occurred.
Crosstab is SPSS procedure that cross tabulates two variables, thus displaying their relationship in tabular
form. In contrast to frequencies, which summarize information about one variable, Cross tab generates
information about bivariate relationships.
Cross tab creates a table hat contains a cell for every combination of categories in the two variable.
 Inside each cell is the number of cases that fit that particular combination of responses.
 SPSS can also report the row, column and total percentages for each cell of the table.
Because Crosstabs create a row for each value in one variable and the column for each value in the other,
the procedure is not suitable for continuous variables that assume many values. Cross tabs is designed
for discrete variables- usually those measured on nominal or ordinal scales.
DATA REQUIREMENTS
Data must meet the following requirements:
 Two categorical variables
 Two or more categories for each variables
DATA SET UP
The categorical variable can be numeric or string variables in the SPSS dataset and their measurement
level can be defined as nominal, ordinal or scale. However crosstabs should only be used when there are
limited numbers of categories.
In most cases the row and column variables in a crosstab can be used interchangeably. The choice of
row/column variable is usually dictated by space requirements or interpretation of the results.
Let us take an example of students’ data and their future aspirations for MBA. The objective is to create a
small data file that consist of four variables.

1 Student Id Numeric
2 Gender Numeric
3 Age Numeric
4 MBA String(1=Indian Univ,2=Foreign Univ)
Use the following steps for creating Crosstabs

1. Click Analyze →Descriptive Statistics→ Crosstabs
2.A new dialogue box Crosstabs will open .Transfer the variable Gender to the row(s) and Coaching
Institute to the Column(s) on right.
2. Then click Cells. A new dialogue box Crosstabs: Cell Display will appear. From Count select
Observed and from Percentages select Row. From Nonintegers select Round Cell Counts. All
these selections are already done by default. Then click Continue From the previous box select
OK.Interpretation
 Crosstabs shows the distribution of female and male towards the Indian University and Foreign
University. The table shows 3 female prefer Indian University and 2 females prefer Foreign
University. Similarly 3 male prefer Indian University and 2 male prefer Foreign University. The
distribution also have percentages with them.
 It shows that the proportion of females and males were not distributed evenly across Universities.
 Maximum Females (60%) prefer Indian University. Maximum Males (60%) prefer Foreign
University.
CHI-SQUARE TEST
A chi-square test, also written as χ2 test. It is a test of significance. It was developed by Karl Pearson in
1900.The Chi- Square test determines whether there is an association between categorical variables (i.e.
whether the variables are independent or related). It is a non parametric test.
The test uses a contingency table to analyze the data. A contingency table (also known as cross tabulation,
crosstab or two way table) is an arrangement in which data is classified according to two categorical
variables. The categories for one variable appear in the rows, and the categories for the other variable
appear in column. Each variable must have two or more categories. Each cell reflects the total count of
cases for a specific pair of categories.
Chi-Square Formula
Uses
The Chi-Square Test of independence is commonly used to test the following:
 Statistical independence or association between two or more categorical variables.

 The chi-square test of independence can only compare categorical variables. It cannot make
comparisons between continuous variables or between categorical and continuous variables.
 The Chi Square Test of Independence only assesses associations between categorical variables
and cannot provide any inferences about causation.
Data Requirements
The data must meet the following requirements
 Two categorical variables

 Two or more categories (groups) for each variable.
 Independence of observations. There is no relationship between the subjects in each
group. The categorical variables are not paired in any way .(e.g. pre- test /post-test
observations)
 Relatively large sample size.
 Expected frequencies for each cell are at least 1.
 Expected frequencies should be at least 5 for the majority (80%) of the cells.
Hypotheses
The research hypotheses state that the two variables are dependent or related. This will be true
 If the observed counts for the categories of the variables in the samples are different
from the expected counts.
Null Hypothesis
The null hypothesis is that the two variables are independent. This will be true
 If the observed counts in the sample are similar to the expected counts.
The null hypothesis (H0) and alternative hypothesis (H1) of the Chi-Square Test of independence
can be expressed in two different but equivalent ways:
H0 : “[Variable 1] is independent of [Variable 2]”or “[Variable 1] is not associated with[Variable

2]”
H1: “[Variable 1] is not independent of [Variable 2]” or “[Variable 1] is associated with

[Variable 2]”
1.Use following steps to perform Chi-Square Test:Click Analyze →Descriptive Statistics→ Crosstabs
2. A new dialogue box Crosstabs will open .Transfer the variable Gender to the row(s) and
Coaching Institute to the Column(s) on right.
3. Click the Cells which will display a new dialogue box.
4. Select Observed and Expected from the –Counts- area and Row from the Percentages
area. Also select Round cell counts from non integers weights.
5. Click Continue.
6. Now from Cross tab dialog box Click Statistics
7. A new dialogue box will appear ‘Crosstab: Statistics’
8. Select Chi Square and then click continue.
9. From the previous box select OK.

Interpretation
The footnote (a) is an important assumption for Chi Square which is being satisfied. Chi-
Square value is 7.38 with significant probability of 0.738. This means that according to the test
the probability of the distribution of values occurring by chance is less than 0.01 or 1 in
100.So, probability (p)=0.007 is <0.05. It means that null hypothesis that the variables are
independent is rejected. This implies that the preference for university is based on the Gender.
The females are more likely to prefer Indian University and males prefer Foreign University.
There is relationship between Gender and type of University.
Module 5
Comparing Means Using One Way Anova
Objectives
 One Way ANOVA
ANOVA(Analysis of Variance)
The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically
significant differences between the means of three or more independent (unrelated) groups. ANOVA
must have a dependent variable which should be metric (measured using an interval or ratio
scale). ANOVA must also have one or more independent variables, which should be
categorical in nature. In ANOVA, categorical independent variables are called factors. A
particular combination of factor levels, or categories, is called a treatment. The one-way ANOVA
test will be able to inform us if there is a significant difference between the three groups. However, it
cannot directly state which group(s) are different from each other. So, if a one-way ANOVA test
indiciates a significant result, further post-hoc testing is required to investigate specifically which groups
are significantly different.
Assumptions of a One-Way ANOVA Test

Before running a One-Way ANOVA test in SPSS, it is best to ensure the data meets the
following assumptions.
1. The dependent variables should be measured on a continuous scale (either interval or

ratio).
2. There should be three or more independent (non-related) groups.
3. There are no outliers present in the dependent variable.
4. The dependent variables should be normally distributed.
5. The dependent variables should have homogeneity of variances. This means that the
population variances in each group are equal. If we use SPSS Statistics, Levene's Test
for Homogeneity of Variances is included in the output when we run a one-way
ANOVA in SPSS Statistics.
Why ANOVA and not T-test?

1. Comparing three groups using t-tests would require that 3 t-tests be conducted. Group 1
vs. Group 2, Group 1 vs. Group 3, and Group 2 vs. Group 3. This increases the chances
of making a type I error. Only a single ANOVA is required to determine if there are
differences between multiple groups.
2. The t-test does not make use of all of the available information from which the samples
were drawn. For example, in a comparison of Group 1 vs. Group 2, the information from
Group 3 is neglected. An ANOVA makes use of the entire data set.
3. It is much easier to perform a single ANOVA then it is to perform multiple t-tests. This
is especially true when a computer and statistical software program are used.
Let us take an example of Employees data regarding Types of Courses attended and Time Taken to
complete the course and need to be analyzed.

1 Course Type of Course Nominal(1=Beginner,2=Intermediate,3=Advanced)
2 Time Time taken in minutes Scale
Hypotheses
H0: The means of all the groups are equal (μ1 = μ2 = μ3 )

Ha: Not all the means are equal
If the Sig value is greater than 05…

We can conclude that there is no statistically significant difference between time taken in
different courses. We can conclude that the differences between condition Means are likely due
to chance and not likely due to the Independent Variable (IV) manipulation.

If the Sig value is less than or equal to .05…

We can conclude that there is a statistically significant difference between time taken in different
courses. We can conclude that the differences between condition Means are not likely due to
change and are probably due to the Independent variable manipulation.
Use following steps to perform ANOVA Test:
1. Click Analyze →Compare Means→ One Way ANOVA
2. A new dialogue box One way ANOVA will open. Transfer the variable Time to the dependent
list and Type of Course to Factor on right.
3. Then click Post Hoc. A new dialogue box One Way Anova :Post Hoc Multiple
Comparison will appear.
4. Select Tukey and click Continue.
5. Select Options from One Way Anova. A new tab One way ANOVA: Option will open.
6. Select Descriptive, Homogeneity of Variance Test and Means Plot and click Continue
7. From the previous box select OK.
Interpretation
Descriptive
In Descriptive N in the first column refers to the number of cases used for calculating the
descriptive statistics. These numbers being equal to our sample sizes tells us that there are no
missing values. The means are the core of our output. Our question is whether these differ for
different type of courses. On average, 25.67 minutes were used by Beginners. Intermediate
Course result in an average of 26.33 minutes whereas Advanced course does best with a mean
time taken of 26.50 minutes.
Test of Homogeneity of Variance
F value for Levene’s test is 0.096 with a Sig. p value is .909. Because the Sig. value is greater
than our alpha of .05 (p > .05), we retain the null hypothesis (no difference) for the assumption
of homogeneity of and conclude that there is not a significant difference between the three
group’s variances. That is, the assumption of homogeneity of variance is met.
The p value (denoted by “Sig.”) is .000. The null hypothesis is accepted if p > .05 and the null
hypothesis is rejected if p < .05 .Here p < .05 so we conclude that the mean time of the three
groups of courses is unequal.
Post Hoc Tests (Multiple Comparison Table)
From the results so far, we know that there are statistically significant differences between the
groups as a whole. The table shown above, Multiple Comparisons, shows which groups
differed from each other. The Tukey post hoc test is generally the preferred test for conducting
post hoc tests on a one-way ANOVA The post hoc table indicates, the mean of the advanced
course(0.03 and 0.01) is significantly (p <.05) different from the means of the other courses.
Homogeneous Subsets
The Means plot is a visual representation of what we saw in the Compare Means output. The
points on the chart are the average of each group. It's much easier to see from this graph that the
employees who joined advanced course have the lowest mean time taken to complete the task,
while employees with intermediate course took highest mean time.
Module 6
ONE SAMPLE T TEST
Objectives
 One Sample t test
The One Sample t Test determines whether the sample mean is statistically different from a
known or hypothesized population mean. The One Sample t Test is a parametric test. This test is
also known as Single Sample t Test.
The variable used in this test is known as:

 Test variable
In a One Sample t Test, the test variable is compared against a "test value", which is a known or
hypothesized value of the mean in the population.
DATA SETUP
Our data should include one continuous, numeric variable (represented in a column) that will be
used in the analysis. The variable's measurement level should be defined as Scale in the Variable
View window.
DATA REQUIREMENTS
Before performing the test, it is important to check that your data satisfies the assumptions of the
one-sample t-test. These are:
1. The variable of interest contains continuous data (interval or ratio).

2. Each data point should be independent of each other.
3. There are no significant outliers present in the data.
4. The variable of interest should be approximately normally distributed.
5. Homogeneity of variances (i.e., variances approximately equal in both the sample and
population)
Example
College-aged adults need at least 7 hours of sleep each night to stay healthy. Sleep deprivation
can lead to decreased immune system function, lack of concentration, and poor memory. In the
data set a simple random sample of college students reports the number of hours of sleep they
had. A researcher wants to test whether the average sleep hours of college students of a group of
20 students differ from average 7 hours sleep. He wants to know whether his sample is
representative of the normal population.

1 Sleep_hours Scale
HYPOTHESES
The null hypothesis (H0) and (two-tailed) alternative hypothesis (H1) of the one sample T test can
be expressed as:
H0: µ = x ("the sample mean is equal to the [proposed] population mean")
Or
Ho = The average sleep of students is equal to 7 hours.
H1: µ ≠ x ("the ample mean is not equal to the [proposed] population mean")
Or
H1 = The average sleep of students is not equal to 7 hours.
Where
µ= is a constant proposed for the population mean(7 hrs)
and x= is the sample mean.
If the Sig value is greater than 05… (if p > .05)

We can conclude that there is no statistically significant difference between the actual sleep
hours(sample mean) and standard sleep hours(population mean).( i.e. if p > .05, we accept the
null hypothesis)

If the Sig value is less than or equal to .05… ( if p < .05)

We can conclude that there is a statistically significant difference between the actual sleep hours
(sample mean) and standard sleep hours (population mean). (i.e If p < .05, we reject the null
hypothesis and accept the alternative hypothesis)
Use following steps to perform One sample T Test :
1. Click Analyze → Compare Means → One-Sample T Test.

2. A new dialogue box ‘One Sample T Test’ will open. Transfer Sleep Hours in Test
Variable and 7 in Test Value
Test Variable(s): The variable whose mean will be compared to the hypothesized population
mean (i.e., Sleep Hours).
Test Value: The hypothesized population mean against which your test variable(s) will be
compared.(i.e 7 )
3. Click Options, a new dialogue box will open One Sample T Test Options. Select 95%
confidence Interval percentage and under missing value heading select Exclude cases
analysis by analysis.(These are select by default)
4. Click Continue.
Interpretation
SPSS Statistics generates two main tables of output for the one-sample t-test that contains all the
information we need to interpret the results of a one-sample t-test.
Descriptive Statistics
We can make an initial interpretation of the data using the One-Sample Statistics table, which
presents relevant descriptive statistics:
Mean value of sleep hours of students 6.60 was lower than the population ‘normal’ sleep hours
of 7.0.
One Sample T Test
The One-Sample Test table reports the result of the one-sample t-test. The top row provides the
value of the known or hypothesized population mean you are comparing your sample data to.
Moving from left-to-right, you are presented with the observed t-value ("t" column), the degrees
of freedom ("df"), and the statistical significance (p-value) ("Sig. (2-tailed)") of the one-sample
t-test. Since, p > .05 ( p = .119). We accept the null hypothesis that the sample mean is equal to
the hypothesized population mean and conclude that the mean sleep hours of the sample is not
significantly different than the average sleep hours of the overall sleep hours of population.
Module 7
Independent Sample T Test
Objectives
 Independent Sample T Test

 Levene’s Tests For Equality Of Variance
The Independent Samples t Test compares the means of two independent groups in order to determine
whether there is statistical evidence that the associated population means are significantly different. The
Independent Samples t Test is a parametric test.
This test is also known as:
 Independent t Test
 Independent Measures t Test
 Independent Two-sample t Test
 Student t Test
 Two-Sample t Test
 Uncorrelated Scores t Test
 Unpaired t Test
 Unrelated t Test
The variables used in this test are known as:
 Dependent variable, or test variable

 Independent variable, or grouping variable
DATA SET UP
Data should include two variables (represented in columns) that will be used in the analysis.
 independent variable should be categorical and include exactly two groups. (Note that
SPSS restricts categorical indicators to numeric or short string values only.)
 The dependent variable should be continuous (i.e., interval or ratio).
 SPSS can only make use of cases that have no missing values for the independent and the
dependent variables, so if a case has a missing value for either variable, it cannot be
included in the test.
DATA REQUIREMENTS
Conclusions from an independent samples t test can be trusted if the following assumptions are
met:
1. The dependent variables should be measured on a continuous scale (either interval or
ratio).
2. There should be two dependent variables present which are measured from
independent (non-related) groups.
3. There are no outliers present in the variables.
4. The dependent variables should be normally distributed
5. The dependent variables should have homogeneity of variances. In other words, their
standard deviations need to be approximately the same. This can be investigated with
the Levene’s Test for Equality of Variances.
Example
We want to understand whether the graduates salary differs based on gender (i.e. dependent variable
would be " graduate salary" and independent variable would be "gender", which has two groups: "male"
and "female").

1 Gender Nominal (1= Male,2= Female)
2. Salary Scale
HYPOTHESES
For this experiment, the null hypothesis (H0) will be:
H0: µ1 = µ2 ("the two population means are equal")

Or
“There is no difference in salary between the males and females”.
And the alternative (Ha) hypothesis will be:
Ha: µ1 ≠ µ2 ("the two population means are not equal")

Or
“There is a difference in salary between males and females”.
where µ1 and µ2 are the population means for group 1 and group 2, respectively.
If the Sig value is greater than 05… (if p > .05)

We can conclude that there is no significant difference between the Salary of males and females
.( i.e. if p > .05, we accept the null hypothesis)

If the Sig value is less than or equal to .05… ( if p < .05)

We can conclude that there is a statistically significant difference between the salary of males
and females.. (i.e If p < .05, we reject the null hypothesis and accept the alternative
hypothesis)
LEVENES TESTS FOR EQUALITY OF VARIANCE
Independent Samples t Test requires the assumption of homogeneity of variance -- i.e., both
groups have the same variance. SPSS conveniently includes a test for the homogeneity of
variance, called Levene's Test, whenever we run an independent samples T test.
The hypotheses for Levene’s test are:

H0: σ12 - σ22 = 0 ("the population variances of group 1 and 2 are equal")
H1: σ12 - σ22 ≠ 0 ("the population variances of group 1 and 2 are not equal")
This implies that if we reject the null hypothesis of Levene's Test, it suggests that the variances of the two
groups are not equal; i.e., that the homogeneity of variances assumption is violated.
Use following steps to perform independent T Test :
1. Click Analyze→ Compare Means→ Independent Samples T Test
2. A new dialogue box Independent Sample T test will appear. Add Salary of Graduates in Test
Variable(s) and Gender in Grouping Variable
3. Now Click on Define Groups. This will open up a small dialogue box. Here under the Use
specified values, add 1 to Group 1and 2 to Group 2 This is to tell SPSS that you want to
compare males (Group 1) against females (Group 2). This will look like this:
4. Click on Continue button then OK.
Two sections (boxes) appear in the output: Group Statistics and Independent Samples Test.

The first section, Group Statistics, provides basic information about the group comparisons, including
the sample size (n), mean, standard deviation, and standard error for salary of graduates by group. In this
example, there are graduates 5 males and 5 females. The mean salary for males is 154000 and the mean
salary for females is 94400.
The second section, Independent Samples Test, displays the results most relevant to the Independent
Samples t Test. There are two parts that provide different pieces of information: (A) Levene’s Test for
Equality of Variances and (B) t-test for Equality of Means.
The output in the Independent Samples Test table includes two rows: Equal variances
assumed and Equal variances not assumed. If Levene’s test indicates that the variances are equal across
the two groups (i.e., p-value large), we will rely on the first row of output, Equal variances assumed,
when we look at the results for the actual Independent Samples t Test (under t-test for Equality of Means).
If Levene’s test indicates that the variances are not equal across the two groups (i.e., p-value small), we
will need to rely on the second row of output, Equal variances not assumed.
Levene's Test for Equality of of Variances:
This section has the test results for Levene's Test. From left to right:
 F is the test statistic of Levene's test

 Sig. is the p-value corresponding to this test statistic.
The p-value of Levene's test is ".230"(p< 0.05 )so accept the null hypothesis of Levene's test and
conclude that the variance in salary of male is not significantly different than that of females. This tells
us that we should not look at the "Equal variances not assumed" row for the test (If this test result
had not been significant -- that is, if we had observed p < 0.05-- then we would have used the "Equal
variances not assumed" output.)
t-test for Equality of Means
It provides the results for the actual Independent Samples t Test. From left to right:
 t is the computed test statistic

 df is the degrees of freedom
 Sig (2-tailed) is the p-value corresponding to the given test statistic and degrees of
freedom
 Mean Difference is the difference between the sample means , the mean difference is
calculated by subtracting the mean of the second group from the mean of the first group
(23450-23150)The sign of the mean difference corresponds to the sign of the t value. .
The positive t value in this example indicates that the mean salary of males is
significantly greater than the mean salary for the females.
 Std. Error Difference is the standard error; it also corresponds to the denominator of the
test statistic
DECISION AND CONCLUSIONS
Since p < 0.05, we cannot accept the null hypothesis, and conclude that the that there is
significant difference between the mean salary for males and females.
Module 8
Dependent/Paired T Test
Objectives
 Dependent/Paired T Test
The Paired Samples t Test compares two means that are from the same object, or related units.
The two means typically represent two different times (e.g., pre-test and post-test with an
intervention between the two time points) or two different but related conditions or units (e.g.,
left and right ears, twins). The purpose of the test is to determine whether there is statistical
evidence that the mean difference between paired observations on a particular outcome is
significantly different from zero. The Paired Samples t Test is a parametric test. This test is also
known as:
 Dependent t Test
 Paired t Test
 Repeated Measures t Test
The variable used in this test is known as:
 Dependent variable, or test variable (continuous), measured at two different times or for
two related conditions or units
COMMON USES
The Paired Samples t Test is commonly used to test the following:
 Statistical difference between two time points

 Statistical difference between two conditions
 Statistical difference between two measurements
 Statistical difference between a matched pair
The Paired Samples t Test can only compare the means for two related (paired) units on a
continuous outcome that is normally distributed. The Paired Samples t Test is not appropriate for
analyses involving the following:
1) Unpaired data
2) Comparisons between more than two units/groups
3) A continuous outcome that is not normally distributed
4) An ordinal/ranked outcome.
Example
A new fitness program is devised for ten obese people. Each participant's weight was measured
before and after the program to see if the fitness program is effective in reducing their weights.In
this example, our null hypothesis is that the program is not effective, i.e., there is no difference
between the weight measured before and after the program. The alternative hypothesis is that the
program is effective and the weight measured after is less than the weight measured before the
program. In the data view, the first column is the weight measured before the program and the
second column is the weight after the fitness program.
S.No Variable Name Measure

1 Before Scale
2. After Scale
HYPOTHESES
The paired sample t-test has two competing hypotheses, the null hypothesis and the alternative
hypothesis. The null hypothesis assumes that the true mean difference between the paired
samples is zero. Conversely, the alternative hypothesis assumes that the true mean difference
between the paired samples is not equal to zero. The paired sample t-test hypotheses are formally
defined below:
The null hypothesis would be:
H0: µ1 = µ2 ("the paired population means are equal") or “There is no significant difference
between the weight before and after the fitness program.”
The alternative hypothesis would be:
H1: µ1 ≠ µ2 ("the paired population means are not equal") or “There is significant difference
between the weight before and after the fitness program.”
DATA REQUIREMENTS
Data must meet the following requirements:
1. Dependent variable that is continuous (i.e., interval or ratio level)

2. Related samples/groups (i.e., dependent observations)
 The subjects in each sample, or group, are the same. This means that the
subjects in the first group are also in the second group.
3. Random sample of data from the population
4. Normal distribution (approximately) of the difference between the paired values
5. No outliers in the difference between the two related groups
Use following steps to perform Dependent/Paired Samples T Test:
1. Click Analyze→ Compare Means→ Paired Samples T Test

6. The Paired-Samples T Test window opens where you will specify the variables to be used
in the analysis. All of the variables in your dataset appear in the list on the left side. Move
variables to the right by selecting them in the list and clicking the blue arrow buttons. We
will specify the paired variables in the Paired Variables area.
Pair: The “Pair” column represents the number of Paired Samples t Tests to run.
Variable1: The first variable, representing the first group of matched values. Move the variable
that represents the first group to the right where it will be listed beneath the “Variable1” column.
Variable2: The second variable, representing the second group of matched values. Move the
variable that represents the second group to the right where it will be listed beneath
the “Variable2” column.
7. Click Options a window will open where we can specify the Confidence Interval

Percentage (95%, taken by default)and how the analysis will address Missing
Values (i.e., Exclude cases analysis by analysis (taken by default)
8. Click Continue when you are finished making specifications.
9. Then from previous box click OKto run the Paired Samples t Test.
DECISION AND CONCLUSIONS
Calculated value 17.418 is greater than table value 2.262, hence difference is significant.
p value .000 is less than p value 0.05
95% confidence interval does not include zero as the upper and lower value both are negative thus
difference is significant.
Hence null hypothesis is rejected,
Module 9
Mann Whithey U Test
Objectives
 Mann Whitney U Test
The Mann-Whitney U test is used to compare differences between two independent groups when the
dependent variable is either ordinal or continuous, but not normally distributed. It is the non-parametric
equivalent of the independent samples t-test. This means that the test does not assume any
properties regarding the distribution of the dependent variable in the analysis. This makes the Mann-
Whitney U-test the appropriate analysis to use when analyzing dependent variables on an ordinal
scale. The U-test is a non-parametric test, in contrast to the t-test; it does not compare mean scores
but median scores of two samples. Thus, it is much more robust against outliers and heavy tail
distributions. Because the Mann-Whitney U-test is a non-parametric test, it does not require a special
distribution of the dependent variable in the analysis. Therefore, it is an appropriate test to compare
groups when the dependent variable is not normally distributed and at least of ordinal scale.
Assumptions of the Mann-Whitney U Test

There are three assumptions that the data has to pass before performing a Mann-Whitney U test
in SPSS. These are:
1. The dependent variable should be measured on an ordinal or continuous scale.

2. The dependent variable should be measured in two independent (non-related) groups.
3. The distributions of the two variables must be similar in shape.
Example
We will compare the marks given by Professor A and B. Number of Professors used in this
sample is 40. The data is as follows.
Data View:
Variable View
HYPOTHESES
H0 “There is no significant difference between the marks given by Professor A and Professor
B.”
H1:“There is significant difference between the marks given by Professor A and Professor B .”
Use following steps to perform the Test:

1. Analyze→ Non Parametric Tests→ Legacy Dialogues→2 Independent Samples
2. From “2 Independent Samples Test” dialogue box enter Marks in Test Variable list
then enter Professor Name in Grouping Variable, in Test Type select Mann Whitney
U.
3. Next Click Define Groups ,then type 1 for Group 1 and type 2 for Group 2. Then click
Continue.
4. Finally Click OK.

The Rank table shows that the mean marks given by Professor A is 20.43 and Professor B is
20.53.So it can be concluded that there is difference in the average marks given by both
professors.
Test Statistics shows that sig. value (two tailed) is 0.968. This value is more than
0.05(0.968>0.05).So we can conclude that there is no significant difference between the Marks
given by Professor A and Professor B. Hence Our Null Hypothesis is accepted.
Module 10
WILCOXON MATCHED PAIR RANK SUM TEST
Objectives
 Wilcoxon Matched Pair Rank Sum Test
The Wilcoxon signed-rank test is a non-parametric test that looks for differences between two dependent
samples. That is, it tests whether the populations from which two related samples are drawn have the
same location. It is the non-parametric equivalents of the dependent (or matched-pairs) t-test. As the
Wilcoxon signed-rank test does not assume normality in the data, it can be used when this assumption has
been violated and the use of the dependent t-test is inappropriate.
WHEN TO USE
 Outcome data are not normally distributed

 Outcome data are ranks instead of interval/ratio
ASSUMPTIONS
 Data are paired; members of pairs from same population.

 Pairs selected randomly, independently.
 The differences are measured on at least an ordinal scale
HYPOTHESES
 H0 “There is no significant difference between the Pain Score before and after the
acupuncture.”
 H1:“There is significant difference between the Pain Score before and after the
acupuncture.”
Example
A researcher is interested in finding methods to reduce lower back pain in individuals without
having to use drugs. The researcher thinks that having acupuncture in the lower back might
reduce back pain. To investigate this, the researcher recruits 25 participants to their study. At the
beginning of the study, the researcher asks the participants to rate their back pain on a scale of 1
to 10, with 10 indicating the greatest level of pain. After 4 weeks of twice weekly acupuncture,
the participants are asked again to indicate their level of back pain on a scale of 1 to 10, with 10
indicating the greatest level of pain. The researcher wishes to understand whether the
participants' pain levels changed after they had undergone the acupuncture, so a Wilcoxon
signed-rank test is run.
Variable View:
Data View:
Use following steps to perform the Test:

1. Analyze→ Non Parametric Tests→ Legacy Dialogues→2 Related Samples
Two-Related-Samples Tests dialogue box, as shown below
2. A new “Two-Related-Samples Tests “dialogue box, will appear as shown

below
3. Transfer the variables in Test Pairs in Variable 1(Pain Score Before) and Variable
2(Pain Score 2). Make sure that the Wilcoxon checkbox is ticked in the Test
Type Area.
4. Click on Options and then select Descriptive and Quartiles for the
variables, Under missing values select Exclude cases test by test(selected by
default)
5. Click on Continue Button and we will be returned to the Two-Related-Samples

Tests dialogue box.
6. Now Click on OK.
Now we will get our Results

SPSS STATISTICS OUTPUT OF THE WILCOXON SIGNED-RANK TEST
SPSS Statistics generates a number of tables in the Output Viewer under the title NPar Tests
Descriptive Table
The Descriptive Statistics table is where SPSS Statistics has generated descriptive and quartile
statistics for your variables if we select these options. If we did not select these options, this table
will not appear in our results. You can use the results from this table to describe the Pain Score
scores before and after the acupuncture treatment.
Ranks Table
The Ranks table provides data on the comparison of participants' Before (Pre) and After (Post)
Pain Score. We can see from the table that 24 participants had a higher pre-acupuncture
treatment Pain Score than after their treatment. However, 1 participant had a higher Pain Score
after treatment and there is no participant who saw no change in their Pain Score.
Test Statistics Table
By examining the final Test Statistics table, we can discover whether these changes, due to
acupuncture treatment, led overall to a statistically significant difference in Pain Scores. We are
looking for the "Asymp. Sig. (2-tailed)" value, which in this case is 0.000. This is the p-value
for the test.
Reporting the Output from the Wilcoxon Sign-Rank Test
Based on the results above, we could report the results of the study as follows:
At the level 0.05 of significance,(Because p<0.05) there is enough evidence to conclude that
there is change in pain level after the acupuncture. Hence we accept the Alternative Hypothesis.
and reject the Null Hypothesis.

RM Lab Main File BBA Project

Uploaded by

Copyright:

Available Formats

RM Lab Main File BBA Project

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RM Lab Main File BBA Project

Uploaded by

Copyright:

Available Formats

MAHARAJA SURAJMAL INSTITUTE

C-4 JANAKPURI, NEW DELHI, DELHI-110058

RESEARCH METHODOLOGY LAB

SUBMITTED BY- SUBMITTED TO-

Name-Ashutosh Dhawan DR.Shavita Deshwal

ENROLL N0.- 35214901817 ASSOCIATE PROFESSOR

 Run the tutorial

SPSS Data Editor

The SPSS data editor window has two views

Data view contains the actual data.

S.No Variable Name Variable Type

3 Gender Numeric(Female=1,Male =2)

5 Designation Numeric(Professor=2,Associate Professor=3,Assistant Professor=1)

 Recode into same variable

To create a new variable “Inc_Group” use following steps

1. Click on Transform→Recode into different variable

To create frequency distribution based on variable ‘Gender’following steps must be followed:

 Click on Analyze→Descriptive Statistics →Frequencies

III. Cross Tabulation

 Click on Analyze→Descriptive Statistics →Crosstab

To make a graph for Income of employees, use following steps:

 Click Graphs→Legacy Dialogs →Bars

 A dialogue appears. Click on the type of bar graph required.

 Summaries for group of cases

After making selections, click the “Define”button

S.No Variable Name Variable Label Variable Type

2 C2 Gender Numeric(Female=1,Male =2)

4 C4 Coaching Institute Numeric(TIME=1,IMS=2)

5 C5 Coaching Sessions Numeric

6 C6 Level of Satisfaction Numeric

1. Click View → Value Labels

1. Sort Cases(i.e. Row Sort)

To Sort “Gender” in SPSS

1. Click Data → Sort Cases

1. Click Analyze → Descriptive Statistics → Frequencies

Frequencies for Scale Variables

1. Click Analyze → Descriptive Statistics → Frequencies

The frequency table of the three variables represents the following:

The statistic table represents the following

1. Click Graph→ Chart Builder

1. Click Graphs→ Chart Builder

1. Click Graph → Chart Builder Box Plot

Use following steps for descriptive:

1. Click Analyze→ Descriptive Statistics → Descriptives

Use following steps:

1. Click Analyze→ Descriptive Statistics→ Explore

3.Then click Statistics in right.

S.No Variable Name Variable Label Variable Type

2 Gender Gender Nominal(1=Female,2=Male)

3 Age Age Scale

4 Counsellor Name of Counsellor Nominal (1=Dr. Vipul,2=Dr. Rashmi)

5 Sessions Counselling Sessions Scale

6 level_of_Satisfaction level of Satisfaction Scale

Use the following steps to create Scatter Plots using SPSS

1. Click Graphs →Chart Builder

This measure is also known as

1 represents perfect positive correlation

 If the Sig(2 tailed )value is greater than .05