RM Lab Main File BBA Project
RM Lab Main File BBA Project
RM Lab Main File BBA Project
Objectives
Learn about SPSS
Open SPSS
Layout of SPSS
Become familiar with Menus
Exit SPSS
Recoding Variables
Frequency Distribution
Cross Tabulation
Graphs
What is SPSS
SPSS is the abbreviation of ‘Statistical Package for the Social Sciences’.This software package
was created in 1968 by SPSS Inc and it was acquired by IBMin 2009.In 2014 the software was
officially renamedIBM SPSS Statistics but still commonly referred as SPSS.
The software was originally meant for the social sciences,but has become popular in other fields
such as health sciences and especially in marketing, marketing research and data mining.SPSS is
a Windows based program that can be used to perform data entry and analysis and to create
tables and graphs.
Features of SPSS
It is easy to learn.
SPSS includes lot of data management system and editing tools.
It offers in-depth statistical capabilities.
It offers excellent plotting, reporting and presentation features.
Opening SPSS
Click the left mouse button on the button on your screen, then put your cursor on Programsor All
Programsand left click the mouse. Select SPSS 17.0 for Windowsby clicking the left mouse
button.
Start→Programs→SPSS17
Layout of SPSS
The Data Editor window has two views that can be selected from the lower left hand side of the screen.
Data View is where you see the data you are using. Variable View is where you can specify the format of
your data when you are creating a file or where you can check the format of a pre-existing file. The data
in the Data Editor is saved in a file with the extension .sav
A dialogue box will appear in front of SPSS grid listing several options to choose from.The following
options will appear in the dialogue box.
SPSS Menus
File:includes all of the options you we use in other programs, such as open, save, exit.
Edit: includes the cut, copy, and paste commands, and allows us to specify various options for
displaying data and output.
View: allows us to select which toolbars you want to show, select font size, add or remove the
gridlines that separate each piece of data, and to select whether or not to display your raw data or
the data labels.
Data allows you to select several options ranging from displaying data that is sorted by a specific
variable to selecting certain cases for subsequent analyses.
Transform includes several options to change current variables. For example, you can change
continuous variables to categorical variables, change scores into rank scores, add a constant to
variables, etc.
Analyze: includes all of the commands to carry out statistical analyses and to calculate
descriptive statistics.
Graphs: includes the commands to create various types of graphs including box plots,
histograms, line graphs, and bar charts.
Utilities:allows us to list file information which is a list of all variables, there labels, values,
locations in the data file, and type.
Add-ons:are programs that can be added to the base SPSS package. Wedo not have access to any
of those.
Window can be used to select which window you want to view (i.e., Data Editor, Output Viewer,
or Syntax).
Help has many useful options including a link to the SPSS homepage, a statistics coach, and a
syntax guide. Using topics, you can use the index option to type in any key word and get a list of
options, or you can view the categories and subcategories available under contents. This is an
excellent tool and can be used to troubleshoot most problems.
Exiting SPSS
To close SPSS, we can either left click on the close button located on the upper right hand corner of the
screen or select Exit from the File menu. A dialog box will appear for every open window asking you if
you want to save it before exiting. We must always save the data files.
1. Data View
2. Variable View
Variable view is used to define variables that will store the data.
The first step is to open the variable view window of the data editor and define variables. Let us take an
example of faculty data of a educational institution need to be analyzed. The objective is to create a small
data file for employees that consist of six variables.
2 Empl_Name String
4 Age Numeric
6 Type_appointment Numeric(Regular=1,Contractual=2,Guest=3)
7 Income Numeric
8 Marital_Status Numeric(Married=1,Unmarried=2)
I. Recoding Variables
To convert a variable into another variable SPSS has transform command.A variable can be converted
using any of the following tools:
Example
2. In the dialogue box that appears, select input variable and output variable.
3. In the new dialogue box define how to transform using old and new value rules
This will create a new variable whose data will appear in data editor window
II. Frequency Distribution
To convert the data into a frequency distribution SPSS has Analyze command
Example
Select the variable s from the left column of the sialogue box and move them to the right hand
“Variable” window. Click ok
The frequency distribution will be shown as output in output window as given below:
Example
In the crosstab dialogue box from the left column select the variables for rows and columns in the
right. Then click ok
The crosstab will appear in the output window
IV. Graphs
To make a Graph for any variable,SPSS has Graph command.
Example
A new dialogue box appears. Click on the appropriate radio button to tell SPSS what type of the
data is in variable list:
Let us take an example of Students data for two coaching institutes need to be analyzed. The
objective is to create a small data file for students that consist of six variables.
3 C3 Age Numeric
Value Labels
We can assign descriptive value labels for each value of a variable. This process is particularly
useful if your data file uses numeric codes to represent non numeric categories ( eg.Code1for
female & code 2 for male).
To view that we have assigned correct value label, use following steps (taking example of values
assigned to Gender)
Example
2. A sort cases dialogue box will appear variables on the left side and sort by and sort
order on right side. Transfer the variable (Gender) required from left to sort by
column. Then select the order (ascending) from sort order column. Then click OK
3. The result is that all females and all males appear together from females to males in
ascending order.
Frequencies
As frequency distribution is an overview of all distinct values in some variable and the number
of times they occur. Frequency distributions are mostly used for summarizing categorical
variables.
The vast majority of descriptive statistics available in the Frequencies: Statistics window are
never appropriate for nominal variables, and are rarely appropriate for ordinal variables in most
situations. There are two exceptions to this:
1. The mode (which is the most frequent response) has a clear interpretation when applied
to most nominal and ordinal categorical variables.
2. The values are group mid points option can be applied to certain ordinal variables that
have been coded in such a way that their value takes on the midpoint of a range.
Example
To draw frequency table and apply various statistical tools on the variable, use following steps
4. A new dialogue box will open showing various statistical tools .Select among
Central Tendency: Mean, Median, Mode
Dispersion: Maximum and Minimum
5. Then click continue.
Interpretation
The frequency table shows the frequency of each gender. This shows among 10 students. 4 are
female and 6 are male It also shows that males were more than females (males are 60% and
females are 40%).It also shows cumulative percentage.
The Statistics table shows Mean which shows that the average is 1.60 which does not signify
anything as it is nominal variable and Median which shows middle value which is 2 .The table
also represents mode which is 2 ,which means that the most frequent student students are males.
The mode and median of the variables is same i.e. 2
2. A “Frequencies” dialogue box will open .Now transfer the variable (Age, Session and
Level of Satisfaction) from left to right by clicking the arrow in middle.
3. Then use the statistical tool like mean, median etc. Click Statistics in right
4. A new dialogue box will open showing various statistical tools . Select among
Central Tendency: Mean, Median, Mode
Dispersion: Maximum and Minimum
5. Then click Continue.
The result will appear in the output window showing three different Frequency tables of all three
variables selected with their percentages and statistics table with Mean, Median, Mode,
Maximum and Minimum of all three variables in one.
Interpretation
Age in years: It shows number of students of different ages i.e. frequencies, percentage
and cumulative percentage of all the students.
Coaching Sessions: It shows number of students taking different number of coaching
sessions.
Level of satisfaction: It shows number of students satisfied at different levels.
Age in years: The mean age is 21.80 which indicate that most of the students who take
coaching sessions are around the age of 22 years. The age in the middle(Median) is 21.50
years .Footnote given at the end of the table represents that there is one mode.So, the
table represents the smallest number as mode. The table shows that students of age 21
years are most likely to take coaching sessions.
Coaching Sessions: The mean numbers of sessions taken by a student are 5.50. i.e.
approx. 6 .This means on an average 6 coaching sessions are attended by almost every
student. The middle(Median) number of coaching session is also 5.50 i.e. 6 .Sessions also
have multiple mode. The table represents that most likely one coaching session are taken
by most students.
Level of satisfaction: The mean level of satisfaction is 5.50 i.e. approx 6. The middle
value (median) is also 5.50, same as mean. We can conclude that most students are
neutral towards level of satisfaction. The mode of the data is also 5.50 which mean that
the students are neutral towards the coaching sessions taken by them.
Graphs
Bar Graph
It displays the number of cases in particular categories, or the score on the score of a continuous
variable for different categories. When comparing group means, the best type of graph to use is a
bar chart
Use the following steps to present a bar graph for session variable
2. The first thing you will see is the pop up box asking you to define your level of
measurement for each variable (i.e. explain to SPSS whether it is Interval, Ordinal,
Nominal).To stop this pop up from displaying again ,tick the ‘ don’t show this dialogue
box again)box.
3. Click on OK to Continue
4. This will open up the Chart Builder dialogue box. The chart builder is an interactive drag
and drop dialog box where we can define exactly what type of graph we would like.
5. Now select from the gallery the image of bar graph we want and drag and drop it to the
chart preview window.
6. Then from the variables select coaching institute and drag and drop it in the place of x
axis. Similarly drag the other variable coaching session from the variables and drop in
place of y axis.
7. In the axis label in element property window that appears next to the chart builder
coaching institute will appear. In the categories column in the same window under order
TIME and IMS will appear.
8. Click OK.
The output window will show a bar graph.
Interpretation
Bar graph shows the mean number of coaching sessions conducted by each coaching institute.
From the bar graphs we can see that TIME and IMS both have kept an average of providing 6
sessions (approx. six sessions) to the students.
Histogram
Histogram displays the distribution of one continuous variable .They provides information about
the shape of the distribution of scores.
Example
To represent the age of the students in form of histogram, we use following steps:
The histogram shows that there is multiple mode for the data i.e. most of the students are of age
20 and 25.
Box Plot
A box plot is another useful visualization for viewing how the data are distributed. A box plot is
also called a box and a whisker plot, is a way to show the spread and centers of a data set. Box
Plot is a summary plot of our data set, graphically depicting the median, quartiles and extreme
values. The box represents the interquartile(IQ) range which contains the middle 50% of
records .The whiskers are lines that extend from the upper and lower edge of the box to the
highest and lowest values which are no greater than 1.5 times the IQ range.
The following steps are performed to create a box plot for age in years of different genders.
The thick line in the middle is the median. The top and bottom lines shows the first and third
quartiles. The whiskers shows the maximum and minimum values, with the exception of
outliers(circles)and extremes(asterisks). Outliers are at least 1.5 box length from the median and
extremes are at least 3 box length from the median
The lower line of box plot shows the minimum age, the upper line shows the maximum age.
Starting of box from down shows quartile 1,the darkened black line shows quartile 20 i.e. median
and the end of the box shows quartile 3 of the age(for both female and male).It also represents
that the median of females are much more than males. The median is located below the center of
the box and the upper tail is longer than the lower tail, this data is skewed right.
Descriptive
In SPSS, the Descriptive procedure computes a select set of basic descriptive statistics for one or
more continuous numeric variables. Additionally, the descriptive procedures can optionally
computes Z scores a new variable in our data set. The descriptive procedure can’t compute
medians or quartiles. The descriptive procedure is best used when you want to compare the
descriptive statistics of several numeric variables side by side. Standardizing variable means
rescaling them so that they have a mean of zero and standard deviation of one. This is done by
subtracting a variable’s mean from each separate value and dividing the remainder by the
variable’s standard deviation. The resulting values are called z scores.
Descriptive table represent the value of mean ,maximum ,minimum, standard deviation,
kurtosis(the sharpness of peak of a frequency distribution curve) and skewness (degree to which
a statistical distribution is not a balance around the mean i.e. is asymmetrical or lopsided, a
perfectly symmetrical distribution having a value of zero.)
2. A dialogue box descriptive will appear with variables on left side. Select the variables
(Coaching Session and Level of Satisfaction) and transfer them from left to right
column. Then click “Options”
3. A new Descriptive: Options dialogue box will open ,Select mean, minimum,
maximum, kurtosis, skewness and variable list. Then click continue.
The result will appear in output window showing the descriptive statistics table.
Interpretation
Valid N( List wise): This is the number of non missing values. The N table shows that
there is no missing value.
N: This is the number of valid observations for the variable. The total number of
observations is the sum of N and the number of missing values. The number of valid
observation is ten for both sessions and valid observations.
Minimum: This is the minimum or smallest value of variable. The minimum numbers of
coaching sessions taken are one and the minimum level of satisfaction is also one, which
means least satisfied.
Maximum: This is the maximum or the highest value of the variable. The maximum
number of sessions is ten and the maximum number of satisfaction level is ten which
means highly satisfied.
Mean: This is the arithmetic mean across the observations. Mean is sensitive to small
values and extremely large values. The mean number of coaching sessions taken are
6.67(7approx.) and same is the value for level of satisfaction.
Standard Deviation: Standard deviation is the square root of the variance .It measures
the spread of a set of observation. It is a measure of the average distance between the
values of the data in the set and the mean. A low standard variation indicates that most of
the numbers are very close to the average. A high standard deviation indicates that the
numbers are spread out. Standard deviation of the coaching sessions and level of
satisfaction is 1.472.
Skewness: Skewness is asymmetry in a statistical distribution, in which the curve
appears distorted either to the left or to the right. In normal distribution mean= mode
Normal distribution has zero skewness. When the distribution is skewed to the left ( also
called negative skewness)then mean is less than mode. When the distribution is skewed
to the right ( also called positive skewness) then mean greater than mode. The table
represents that the skewness of coaching sessions and level of satisfaction is zero which
means that the distribution is normally distributed and there is complete absence of
skewness.
Kurtosis: Kurtosis is a measure of whether the data are heavy tailed or light tailed
relative to a normal distribution. Data set with high kurtosis tend to have heavy tails or
outliers. Data set with low kurtosis tend to have light tails or lack of outliers. The kurtosis
for both coaching sessions and level of satisfaction is negative which indicates that it has
lighter tails and a flatter peak.
Explore
The explore command produces a extensive set of statistics for a variable that review the shape,
center, spread and the presence of any outliers in a dispersion. It also produces a box and whisker
plot that provides these summary statistics in visual form. Explore command grants you to
equate two or more groups in terms of various statistics for a variable.
2.A dialogue box will appear showing various columns. Select the variable “Coaching Session”
and transfer to dependent list and select level of satisfaction to transfer it to label cases .Select
both in display column.
3. Then again return to the main dialogue box Explore.Then click Plots.
4. A new dialogue box Explore : Plots will appear. Select Factors Levels together among
box plots and Histograms from Descriptives.Then Click Continue.
5. Then again from the first dialogue box select Options. A new dialogue box will appear.
From this new Options box select exclude cases pair wise. Then click Continue.
6. Now click ok from the first dialogue box(Explore)
The result will appear in the output window representing case processing summary,
Descriptive tables, Percentile, Extreme values, Histogram and Boxplot representing
Coaching Sessions.
Module 3
Correlation
Objectives
Scatter/Dot Plot
Correlation
Split File
Scatter Plot for Split File
Correlation for Split Data
Scatter/Dot Plot
A Scatter Plots(or Scatter Graphs) is a graphic tool used to display the relationship between two
quantitative variables. A scatter plot consists of an X axis(the horizontal axis) and Y axis(the
vertical axis)and a series of dots. Usually the first variable is independent and the second is
dependent on the first variable. The scatter diagram is used to find the correlation between two
variables.
Let us take an example of Students data for counseling sessions need to be analyzed. The objective is to
create a small data file for students that consist of six variables.
Interpretation
The scatter diagram represents there is high degree of positive relationship as the dots are
moving from left to right in upward direction .This means that with change in counseling
sessions the satisfaction level of students will change.
Correlation
Correlation is a bivariate analysis that measures the strength of association between two
variables and the direction of relationship. In terms of the strength of relationship the value of
correlation coefficient varies between +1 and -1.A value of + 1indicates a perfect degree of
association between the two variables. The bivariate Pearson Correlation produces a sample
correlation coefficient ,r, which measures the strength and direction of linear relationships
between pairs of continuous variable .By extension ,the Pearson Correlation evaluates whether
there is statistical evidence for a linear relationship among the same pairs of variables in the
population, represented by a population correlation coefficient ,ρ(“rho”). The pearson
Correlation is a parametric measure.
-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 - 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
←---------------------→ ←---------------→ ←-----------→ ←-------------------→
STRONG WEAK WEAK STRONG
Statistical Significance
Statistical significance is the likelihood that a relationship between two or more variables is
caused by something other than chance. Statistical hypothesis testing is used to determine
whether the result of a data set is statistically significant. This test provides a p-value,
representing the probability that random chance could explain the result; in general, a p-value of
5% or lower is considered to be statistically significant. Statistical significance is used to accept
or reject the null hypothesis. When the test result exceeds the p-value, the null hypothesis is
accepted. When the test result is less than the p-value, the null hypothesis is rejected.
We can conclude that there is statistically significant correlation between two variables.
The correlation between counseling session and level of satisfaction is -.769 which is means that
there is high degree of correlation between the two variables.
The table also represents that significant (2- tailed) level is 0.000 which is less than 0.074,so we
can conclude that there is statistically significant correlation between the counseling session and
level of satisfaction.
Split File
Split File splits the data file into separate groups for analysis based on the values of one or more
grouping variables. If you select multiple grouping variables, cases are grouped by each variable
within categories of the preceding variable on the Groups Based on list.
Compare Groups: Split file groups are presented together for comparison purpose. For pivot
tables a single pivot table is created and each split file variable can be moved between table
dimensions. For charts a separate chart is created for each split file group and the charts are
displayed together in the viewer.
2. A dialogue box of split file will appear. From the options appearing select Compare
groups. Then from variables on left transfer Gender and Name of Counselor to the
Group Based on. Also select sort the file by grouping variables.
3. Click OK
Interpretation
The result shows all females which prefer Dr.Dhawan as counselor are together and all those
who prefer Dr. Dabswagger are together, similarly data for males was shown.
5. After this ,select gender and Name of Counselor (one by one) and drag and drop in the
place of Filter
6. Click OK.
Interpretation
The above graph shows the scatter plot for relation between Counseling Sessions taken by
females from Dr.Dhawan, Counselor and level of satisfaction. Only one female joined Dr.
Dhawanfor counseling and correlation cannot be calculated.
Interpretation
The graph shows relation between Counseling Sessions taken by females from Dr. Dabswagger,
Counselor and level of satisfaction. This represents that there is high degree of positive
correlation.
Interpretation
The graph shows relation between Counseling Sessions taken by males from Dr. Dhawan and the
level of satisfaction. This represents that there is high degree of positive correlation.
Interpretation
The graph shows relation between Counseling Sessions taken by males from Dr. Dabswagger
and level of satisfaction. This represents that there is high degree of positive correlation.
3.Click OK
The result will appear in output window representing a correlation table which shows correlation
between counseling sessions and level of satisfaction corresponding to both female and male
individually with both counselors .
Interpretation
Only one female has taken counseling from Dr. Dhawan so it is not possible to find the
correlation in that case. The correlation table represents that the correlation between session
(with Dr. Dabswagger) and level of satisfaction of females is 0.967 i.e. high degree of positive
correlation.
Module 4
Cross Tabulation and Chi Square Statistics
Objectives
Cross Tabs
Chi Square
CROSS TABS
To describe a single categorical variable, we use frequency tables .To describe the relationship between
two categorical variables we use a special type of variable called a cross tabulation. This type of variable
is also known as Crosstab or Two way Table.
In a cross tab , the categories of one variable determine the rows of the table ,and the categories of other
variable determine the columns. The cells of the table contain the number of times that a particular
combination of categories occurred.
Crosstab is SPSS procedure that cross tabulates two variables, thus displaying their relationship in tabular
form. In contrast to frequencies, which summarize information about one variable, Cross tab generates
information about bivariate relationships.
Cross tab creates a table hat contains a cell for every combination of categories in the two variable.
Inside each cell is the number of cases that fit that particular combination of responses.
SPSS can also report the row, column and total percentages for each cell of the table.
Because Crosstabs create a row for each value in one variable and the column for each value in the other,
the procedure is not suitable for continuous variables that assume many values. Cross tabs is designed
for discrete variables- usually those measured on nominal or ordinal scales.
DATA REQUIREMENTS
Data must meet the following requirements:
Two categorical variables
Two or more categories for each variables
DATA SET UP
The categorical variable can be numeric or string variables in the SPSS dataset and their measurement
level can be defined as nominal, ordinal or scale. However crosstabs should only be used when there are
limited numbers of categories.
In most cases the row and column variables in a crosstab can be used interchangeably. The choice of
row/column variable is usually dictated by space requirements or interpretation of the results.
Let us take an example of students’ data and their future aspirations for MBA. The objective is to create a
small data file that consist of four variables.
2 Gender Numeric
3 Age Numeric
2.A new dialogue box Crosstabs will open .Transfer the variable Gender to the row(s) and Coaching
Institute to the Column(s) on right.
2. Then click Cells. A new dialogue box Crosstabs: Cell Display will appear. From Count select
Observed and from Percentages select Row. From Nonintegers select Round Cell Counts. All
these selections are already done by default. Then click Continue From the previous box select
OK.Interpretation
Crosstabs shows the distribution of female and male towards the Indian University and Foreign
University. The table shows 3 female prefer Indian University and 2 females prefer Foreign
University. Similarly 3 male prefer Indian University and 2 male prefer Foreign University. The
distribution also have percentages with them.
It shows that the proportion of females and males were not distributed evenly across Universities.
Maximum Females (60%) prefer Indian University. Maximum Males (60%) prefer Foreign
University.
CHI-SQUARE TEST
A chi-square test, also written as χ2 test. It is a test of significance. It was developed by Karl Pearson in
1900.The Chi- Square test determines whether there is an association between categorical variables (i.e.
whether the variables are independent or related). It is a non parametric test.
The test uses a contingency table to analyze the data. A contingency table (also known as cross tabulation,
crosstab or two way table) is an arrangement in which data is classified according to two categorical
variables. The categories for one variable appear in the rows, and the categories for the other variable
appear in column. Each variable must have two or more categories. Each cell reflects the total count of
cases for a specific pair of categories.
Chi-Square Formula
Uses
Data Requirements
Hypotheses
The research hypotheses state that the two variables are dependent or related. This will be true
If the observed counts for the categories of the variables in the samples are different
from the expected counts.
Null Hypothesis
The null hypothesis is that the two variables are independent. This will be true
If the observed counts in the sample are similar to the expected counts.
The null hypothesis (H0) and alternative hypothesis (H1) of the Chi-Square Test of independence
can be expressed in two different but equivalent ways:
1.Use following steps to perform Chi-Square Test:Click Analyze →Descriptive Statistics→ Crosstabs
2. A new dialogue box Crosstabs will open .Transfer the variable Gender to the row(s) and
Coaching Institute to the Column(s) on right.
3. Click the Cells which will display a new dialogue box.
4. Select Observed and Expected from the –Counts- area and Row from the Percentages
area. Also select Round cell counts from non integers weights.
5. Click Continue.
6. Now from Cross tab dialog box Click Statistics
7. A new dialogue box will appear ‘Crosstab: Statistics’
8. Select Chi Square and then click continue.
The footnote (a) is an important assumption for Chi Square which is being satisfied. Chi-
Square value is 7.38 with significant probability of 0.738. This means that according to the test
the probability of the distribution of values occurring by chance is less than 0.01 or 1 in
100.So, probability (p)=0.007 is <0.05. It means that null hypothesis that the variables are
independent is rejected. This implies that the preference for university is based on the Gender.
The females are more likely to prefer Indian University and males prefer Foreign University.
There is relationship between Gender and type of University.
Module 5
Comparing Means Using One Way Anova
Objectives
One Way ANOVA
ANOVA(Analysis of Variance)
The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically
significant differences between the means of three or more independent (unrelated) groups. ANOVA
must have a dependent variable which should be metric (measured using an interval or ratio
scale). ANOVA must also have one or more independent variables, which should be
categorical in nature. In ANOVA, categorical independent variables are called factors. A
particular combination of factor levels, or categories, is called a treatment. The one-way ANOVA
test will be able to inform us if there is a significant difference between the three groups. However, it
cannot directly state which group(s) are different from each other. So, if a one-way ANOVA test
indiciates a significant result, further post-hoc testing is required to investigate specifically which groups
are significantly different.
Let us take an example of Employees data regarding Types of Courses attended and Time Taken to
complete the course and need to be analyzed.
Hypotheses
2. A new dialogue box One way ANOVA will open. Transfer the variable Time to the dependent
list and Type of Course to Factor on right.
3. Then click Post Hoc. A new dialogue box One Way Anova :Post Hoc Multiple
Comparison will appear.
4. Select Tukey and click Continue.
5. Select Options from One Way Anova. A new tab One way ANOVA: Option will open.
6. Select Descriptive, Homogeneity of Variance Test and Means Plot and click Continue
7. From the previous box select OK.
Interpretation
Descriptive
In Descriptive N in the first column refers to the number of cases used for calculating the
descriptive statistics. These numbers being equal to our sample sizes tells us that there are no
missing values. The means are the core of our output. Our question is whether these differ for
different type of courses. On average, 25.67 minutes were used by Beginners. Intermediate
Course result in an average of 26.33 minutes whereas Advanced course does best with a mean
time taken of 26.50 minutes.
F value for Levene’s test is 0.096 with a Sig. p value is .909. Because the Sig. value is greater
than our alpha of .05 (p > .05), we retain the null hypothesis (no difference) for the assumption
of homogeneity of and conclude that there is not a significant difference between the three
group’s variances. That is, the assumption of homogeneity of variance is met.
The p value (denoted by “Sig.”) is .000. The null hypothesis is accepted if p > .05 and the null
hypothesis is rejected if p < .05 .Here p < .05 so we conclude that the mean time of the three
groups of courses is unequal.
From the results so far, we know that there are statistically significant differences between the
groups as a whole. The table shown above, Multiple Comparisons, shows which groups
differed from each other. The Tukey post hoc test is generally the preferred test for conducting
post hoc tests on a one-way ANOVA The post hoc table indicates, the mean of the advanced
course(0.03 and 0.01) is significantly (p <.05) different from the means of the other courses.
Homogeneous Subsets
The Means plot is a visual representation of what we saw in the Compare Means output. The
points on the chart are the average of each group. It's much easier to see from this graph that the
employees who joined advanced course have the lowest mean time taken to complete the task,
while employees with intermediate course took highest mean time.
Module 6
ONE SAMPLE T TEST
Objectives
The One Sample t Test determines whether the sample mean is statistically different from a
known or hypothesized population mean. The One Sample t Test is a parametric test. This test is
also known as Single Sample t Test.
In a One Sample t Test, the test variable is compared against a "test value", which is a known or
hypothesized value of the mean in the population.
DATA SETUP
Our data should include one continuous, numeric variable (represented in a column) that will be
used in the analysis. The variable's measurement level should be defined as Scale in the Variable
View window.
DATA REQUIREMENTS
Before performing the test, it is important to check that your data satisfies the assumptions of the
one-sample t-test. These are:
Example
College-aged adults need at least 7 hours of sleep each night to stay healthy. Sleep deprivation
can lead to decreased immune system function, lack of concentration, and poor memory. In the
data set a simple random sample of college students reports the number of hours of sleep they
had. A researcher wants to test whether the average sleep hours of college students of a group of
20 students differ from average 7 hours sleep. He wants to know whether his sample is
representative of the normal population.
HYPOTHESES
The null hypothesis (H0) and (two-tailed) alternative hypothesis (H1) of the one sample T test can
be expressed as:
H0: µ = x ("the sample mean is equal to the [proposed] population mean")
Or
Ho = The average sleep of students is equal to 7 hours.
H1: µ ≠ x ("the ample mean is not equal to the [proposed] population mean")
Or
H1 = The average sleep of students is not equal to 7 hours.
Where
µ= is a constant proposed for the population mean(7 hrs)
and x= is the sample mean.
Test Variable(s): The variable whose mean will be compared to the hypothesized population
mean (i.e., Sleep Hours).
Test Value: The hypothesized population mean against which your test variable(s) will be
compared.(i.e 7 )
3. Click Options, a new dialogue box will open One Sample T Test Options. Select 95%
confidence Interval percentage and under missing value heading select Exclude cases
analysis by analysis.(These are select by default)
4. Click Continue.
Interpretation
SPSS Statistics generates two main tables of output for the one-sample t-test that contains all the
information we need to interpret the results of a one-sample t-test.
Descriptive Statistics
We can make an initial interpretation of the data using the One-Sample Statistics table, which
presents relevant descriptive statistics:
Mean value of sleep hours of students 6.60 was lower than the population ‘normal’ sleep hours
of 7.0.
One Sample T Test
The One-Sample Test table reports the result of the one-sample t-test. The top row provides the
value of the known or hypothesized population mean you are comparing your sample data to.
Moving from left-to-right, you are presented with the observed t-value ("t" column), the degrees
of freedom ("df"), and the statistical significance (p-value) ("Sig. (2-tailed)") of the one-sample
t-test. Since, p > .05 ( p = .119). We accept the null hypothesis that the sample mean is equal to
the hypothesized population mean and conclude that the mean sleep hours of the sample is not
significantly different than the average sleep hours of the overall sleep hours of population.
Module 7
Independent Sample T Test
Objectives
Independent t Test
Independent Measures t Test
Independent Two-sample t Test
Student t Test
Two-Sample t Test
Uncorrelated Scores t Test
Unpaired t Test
Unrelated t Test
DATA SET UP
Data should include two variables (represented in columns) that will be used in the analysis.
independent variable should be categorical and include exactly two groups. (Note that
SPSS restricts categorical indicators to numeric or short string values only.)
SPSS can only make use of cases that have no missing values for the independent and the
dependent variables, so if a case has a missing value for either variable, it cannot be
included in the test.
DATA REQUIREMENTS
Conclusions from an independent samples t test can be trusted if the following assumptions are
met:
1. The dependent variables should be measured on a continuous scale (either interval or
ratio).
2. There should be two dependent variables present which are measured from
independent (non-related) groups.
3. There are no outliers present in the variables.
4. The dependent variables should be normally distributed
5. The dependent variables should have homogeneity of variances. In other words, their
standard deviations need to be approximately the same. This can be investigated with
the Levene’s Test for Equality of Variances.
Example
We want to understand whether the graduates salary differs based on gender (i.e. dependent variable
would be " graduate salary" and independent variable would be "gender", which has two groups: "male"
and "female").
2. Salary Scale
HYPOTHESES
where µ1 and µ2 are the population means for group 1 and group 2, respectively.
2. A new dialogue box Independent Sample T test will appear. Add Salary of Graduates in Test
Variable(s) and Gender in Grouping Variable
3. Now Click on Define Groups. This will open up a small dialogue box. Here under the Use
specified values, add 1 to Group 1and 2 to Group 2 This is to tell SPSS that you want to
compare males (Group 1) against females (Group 2). This will look like this:
The output in the Independent Samples Test table includes two rows: Equal variances
assumed and Equal variances not assumed. If Levene’s test indicates that the variances are equal across
the two groups (i.e., p-value large), we will rely on the first row of output, Equal variances assumed,
when we look at the results for the actual Independent Samples t Test (under t-test for Equality of Means).
If Levene’s test indicates that the variances are not equal across the two groups (i.e., p-value small), we
will need to rely on the second row of output, Equal variances not assumed.
This section has the test results for Levene's Test. From left to right:
The p-value of Levene's test is ".230"(p< 0.05 )so accept the null hypothesis of Levene's test and
conclude that the variance in salary of male is not significantly different than that of females. This tells
us that we should not look at the "Equal variances not assumed" row for the test (If this test result
had not been significant -- that is, if we had observed p < 0.05-- then we would have used the "Equal
variances not assumed" output.)
It provides the results for the actual Independent Samples t Test. From left to right:
Since p < 0.05, we cannot accept the null hypothesis, and conclude that the that there is
significant difference between the mean salary for males and females.
Module 8
Dependent/Paired T Test
Objectives
Dependent/Paired T Test
The Paired Samples t Test compares two means that are from the same object, or related units.
The two means typically represent two different times (e.g., pre-test and post-test with an
intervention between the two time points) or two different but related conditions or units (e.g.,
left and right ears, twins). The purpose of the test is to determine whether there is statistical
evidence that the mean difference between paired observations on a particular outcome is
significantly different from zero. The Paired Samples t Test is a parametric test. This test is also
known as:
Dependent t Test
Paired t Test
Repeated Measures t Test
Dependent variable, or test variable (continuous), measured at two different times or for
two related conditions or units
COMMON USES
The Paired Samples t Test is commonly used to test the following:
The Paired Samples t Test can only compare the means for two related (paired) units on a
continuous outcome that is normally distributed. The Paired Samples t Test is not appropriate for
analyses involving the following:
1) Unpaired data
4) An ordinal/ranked outcome.
Example
A new fitness program is devised for ten obese people. Each participant's weight was measured
before and after the program to see if the fitness program is effective in reducing their weights.In
this example, our null hypothesis is that the program is not effective, i.e., there is no difference
between the weight measured before and after the program. The alternative hypothesis is that the
program is effective and the weight measured after is less than the weight measured before the
program. In the data view, the first column is the weight measured before the program and the
second column is the weight after the fitness program.
2. After Scale
HYPOTHESES
The paired sample t-test has two competing hypotheses, the null hypothesis and the alternative
hypothesis. The null hypothesis assumes that the true mean difference between the paired
samples is zero. Conversely, the alternative hypothesis assumes that the true mean difference
between the paired samples is not equal to zero. The paired sample t-test hypotheses are formally
defined below:
The null hypothesis would be:
H0: µ1 = µ2 ("the paired population means are equal") or “There is no significant difference
between the weight before and after the fitness program.”
H1: µ1 ≠ µ2 ("the paired population means are not equal") or “There is significant difference
between the weight before and after the fitness program.”
DATA REQUIREMENTS
Variable1: The first variable, representing the first group of matched values. Move the variable
that represents the first group to the right where it will be listed beneath the “Variable1” column.
Variable2: The second variable, representing the second group of matched values. Move the
variable that represents the second group to the right where it will be listed beneath
the “Variable2” column.
Calculated value 17.418 is greater than table value 2.262, hence difference is significant.
95% confidence interval does not include zero as the upper and lower value both are negative thus
difference is significant.
Module 9
Mann Whithey U Test
Objectives
The Mann-Whitney U test is used to compare differences between two independent groups when the
dependent variable is either ordinal or continuous, but not normally distributed. It is the non-parametric
equivalent of the independent samples t-test. This means that the test does not assume any
properties regarding the distribution of the dependent variable in the analysis. This makes the Mann-
Whitney U-test the appropriate analysis to use when analyzing dependent variables on an ordinal
scale. The U-test is a non-parametric test, in contrast to the t-test; it does not compare mean scores
but median scores of two samples. Thus, it is much more robust against outliers and heavy tail
distributions. Because the Mann-Whitney U-test is a non-parametric test, it does not require a special
distribution of the dependent variable in the analysis. Therefore, it is an appropriate test to compare
groups when the dependent variable is not normally distributed and at least of ordinal scale.
Example
We will compare the marks given by Professor A and B. Number of Professors used in this
sample is 40. The data is as follows.
Data View:
Variable View
HYPOTHESES
H0 “There is no significant difference between the marks given by Professor A and Professor
B.”
H1:“There is significant difference between the marks given by Professor A and Professor B .”
2. From “2 Independent Samples Test” dialogue box enter Marks in Test Variable list
then enter Professor Name in Grouping Variable, in Test Type select Mann Whitney
U.
3. Next Click Define Groups ,then type 1 for Group 1 and type 2 for Group 2. Then click
Continue.
Test Statistics shows that sig. value (two tailed) is 0.968. This value is more than
0.05(0.968>0.05).So we can conclude that there is no significant difference between the Marks
given by Professor A and Professor B. Hence Our Null Hypothesis is accepted.
Module 10
WILCOXON MATCHED PAIR RANK SUM TEST
Objectives
Wilcoxon Matched Pair Rank Sum Test
The Wilcoxon signed-rank test is a non-parametric test that looks for differences between two dependent
samples. That is, it tests whether the populations from which two related samples are drawn have the
same location. It is the non-parametric equivalents of the dependent (or matched-pairs) t-test. As the
Wilcoxon signed-rank test does not assume normality in the data, it can be used when this assumption has
been violated and the use of the dependent t-test is inappropriate.
WHEN TO USE
ASSUMPTIONS
HYPOTHESES
H0 “There is no significant difference between the Pain Score before and after the
acupuncture.”
H1:“There is significant difference between the Pain Score before and after the
acupuncture.”
Example
A researcher is interested in finding methods to reduce lower back pain in individuals without
having to use drugs. The researcher thinks that having acupuncture in the lower back might
reduce back pain. To investigate this, the researcher recruits 25 participants to their study. At the
beginning of the study, the researcher asks the participants to rate their back pain on a scale of 1
to 10, with 10 indicating the greatest level of pain. After 4 weeks of twice weekly acupuncture,
the participants are asked again to indicate their level of back pain on a scale of 1 to 10, with 10
indicating the greatest level of pain. The researcher wishes to understand whether the
participants' pain levels changed after they had undergone the acupuncture, so a Wilcoxon
signed-rank test is run.
Variable View:
Data View:
4. Click on Options and then select Descriptive and Quartiles for the
variables, Under missing values select Exclude cases test by test(selected by
default)
SPSS Statistics generates a number of tables in the Output Viewer under the title NPar Tests
Descriptive Table
The Descriptive Statistics table is where SPSS Statistics has generated descriptive and quartile
statistics for your variables if we select these options. If we did not select these options, this table
will not appear in our results. You can use the results from this table to describe the Pain Score
scores before and after the acupuncture treatment.
Ranks Table
The Ranks table provides data on the comparison of participants' Before (Pre) and After (Post)
Pain Score. We can see from the table that 24 participants had a higher pre-acupuncture
treatment Pain Score than after their treatment. However, 1 participant had a higher Pain Score
after treatment and there is no participant who saw no change in their Pain Score.
By examining the final Test Statistics table, we can discover whether these changes, due to
acupuncture treatment, led overall to a statistically significant difference in Pain Scores. We are
looking for the "Asymp. Sig. (2-tailed)" value, which in this case is 0.000. This is the p-value
for the test.
Based on the results above, we could report the results of the study as follows:
At the level 0.05 of significance,(Because p<0.05) there is enough evidence to conclude that
there is change in pain level after the acupuncture. Hence we accept the Alternative Hypothesis.
and reject the Null Hypothesis.