Practical
Practical
SPSS WINDOWS
Data Editor Window: It displays the contents of the data file. This is the window that
opens automatically when you start an SPSS session. In this window, you can create new data files
or modify existing ones. When you open more than one data file, each data file has a separate Data Editor
Window.
The Data Editor Window provides two view of the data, or two worksheets-
(you can switch between the two worksheets by clicking on these at the bottom of the SPSS window):
The Data View, which contains the data.
The Variable View, which contains information about each variable.
Variable View
Some columns are necessary to fill, and some aren’t. The one’s that are a must to fill are:
Name , Type, Label, Values , Measure
Defining Variables in SPSS
1. Name:
A short description of the variable (with no spaces or punctuation!), or What you want to call your
The only restriction is special characters which are not allowed in this type.
2. Label:
A fuller description of the variable. This is what will appear in tables and graphs to describe the
variable.
For example, if your variable NAME is “Type” then you can use the label “Type of Pet.” This can help you
a year down the road when you have forgotten what the label “Type” means!
3. Type: What type of data you want to enter. Click the square to the right of the box to open a dialog box for
options. Most of these variables are self-explanatory (like dollar, date, or scientific notation), but there are a
couple that aren’t.
String variable: Use when you want to type letters. For example, peoples’ names, breeds of dog,
occupations.
Comma: Numeric variables that are separated every three places by a comma. For example,
100,000.00
Dot: Similar to comma, but the dot is used to separate the three places and a comma is used to
indicate a decimal. For example. 100.000,00 and 999.988.565,21. Not used in the UK or USA, but
common in some other countries.
You are most likely to use either string variables, or numeric variables.
5. Decimal: While entering the percentage value, this type helps us to decide how much one needs to define
You have tell SPSS what each number represents (for example, does a 1 represent females, or males?)
7. Missing: This helps the user to skip unnecessary data which is not required during analysis.
Measure: You must tell SPSS whether each variable is nominal, ordinal, or scale (i.e. quantitative) by
The data has to enter in the sheet named “variable view”. It allows us to customize the data type as
To analyze the data, one needs to fill the different column headings like Name, Label, Type, Width,
These headings are the different attributes which, help to characterize the data accordingly.
When you plan to perform a complicated operation, like a 'Z test', you must define variables in SPSS before you
perform the test. In other words, you need to tell SPSS what your variables represent.
For example, you may have a list of boys and girls in a classroom study with a “1” representing a boy in your
data list and 2 representing a girl. SPSS won’t know what a “1” or a “2” means until you tell it, through defining
variables.
Sample Problem:
Define the variables for gender in the following data sheet, where “1” represents a boy and “2” represents a girl:
Step 2) Click a variable in the left window that you want to define. In this sample problem, we want to define
the “gender” variable, so click “Gender” and then click the center arrow.
Step 3) Click “Continue.” The main Define Variable Properties window will open.
Step 4) Type your variable names in the “Label” section of the Define Properties window. For this sample
problem, type “boy” to the right of the value 1.00 and “girl” to the right of the value 2.00.
Data View
The data view is structured as rows and columns. By importing a file or adding data manually, we can work
with SPSS.
There are three types of SPSS files that we will use during this class: data files, which end in .sav; syntax
files, which end in .sps; and output files, which end in .spv.
When you open the SPSS program, you will see a blank spreadsheet in Data View.
If you already have another dataset open but want to create a new one, click File > New > Data to open a blank
spreadsheet.
You will notice that each of the columns is labeled “var.” The column names will represent the variables that
you enter in your dataset. You will also notice that each row is labeled with a number (“1,” “2,” and so on).
The rows will represent cases that will be a part of your dataset. When you enter values for your data in the
spreadsheet cells, each value will correspond to a specific variable (column) and a specific case (row).
1. Click the Variable View tab. Type the name for your first variable under the Name column. You can also
enter other information about the variable, such as the type (the default is “numeric”), width, decimals,
label, etc.
Example: I will type “School_Class” since I plan to include a variable for the class level of each student
(i.e., 1 = first year, 2 = second year, 3 = third year, and 4 = fourth year). I will also specify 0 decimals since
my variable values will only include whole numbers. (The default is two decimals.)
2. Click the Data View tab. Any variable names that you entered in Variable View will now be included in the
columns (one variable name per column).
Example: You can see that School_Class appears in the first column in this example.
4. Repeat these steps for each variable that you will include in your dataset. Don't forget to periodically save
your progress as you enter data.
INSERTING A CASE
1. In Data View, click a row number or individual cell below where you want your new row to be inserted.
DELETING A CASE
1. In the Data View tab, click the case number (row) that you wish to delete. This will highlight the row
for the case you selected.
2. Press Delete on your keyboard, or right-click on the case number and select “Clear”. This will
remove the entire row from the dataset.
Inserting or Deleting Single Variables
Sometimes you may need to add new variables or delete existing variables from your dataset. For example,
perhaps you are in the process of creating a new dataset and you must add many new variables to your growing
dataset. Alternatively, perhaps you decide that some variables are not very useful to your study and you decide
to delete them from the dataset.
1. In the Data View window, click the name of the column to the right of of where you want your new variable
to be inserted.
2. You can now insert a variable in several ways:
A new, blank column will appear to the left of the column or cell you selected.
3. New variables will be given a generic name (e.g. VAR00001). You can enter a new name for the variable
on the Variable View tab.
DELETING A VARIABLE
1. In the Data View tab, click the column name (variable) that you wish to delete. This will highlight the
variable column.
2. Press Delete on your keyboard, or right-click on the selected variable and click “Clear.” The variable
and associated values will be removed.
Alternatively, you can delete a variable through the Variable View window:
1. Click on the row number corresponding to the variable you wish to delete. This will highlight the row.
2. Press Delete on your keyboard, or right-click on the row number corresponding to the variable you
wish to delete and click "Clear".
If you already have data that are in an SPSS file format (file extension “.sav”), you can simply open that file to
begin working with your data in SPSS. However, if you have data stored in other types of files, such as an Excel
spreadsheet or a text file, you will need to instruct SPSS how to read the file and then save it in the SPSS file
format (“.sav”).
1. Excel to SPSS
Excel to SPSS
Both Excel and SPSS have a similar feel, with pull-down menus, a host of built-in statistical functions and a
spreadsheet format for easy data entry. However, SPSS is specifically built for statistics and surpasses Excel in
many ways, including:
1. Faster and easier basic function access like descriptive statistics (i.e. mean, standard deviation or median).
While Excel does have built-in functions, SPSS has these basic statistics elements in pull down menus.
2. Wider variety of graphs and charts in SPSS. Excel does have a wide range of basic charts, but if you want
to create complex graphs like contingency tables, this is much easier in SPSS with the pull-down menus.
3. Easier to find statistical tests. While Excel does have a wide range of statistical tests built-in, the pull-
down menus in SPSS make for faster access.
Importing Data from an Excel File
To import data from an Excel spreadsheet into SPSS, first make sure your Excel spreadsheet is formatted
according to these criteria:
The spreadsheet should have a single row of variable names across the top of the spreadsheet in the
first row.
Variable names should include ordinary letters, numbers, and underscores
(e.g., Gender, Grad_Date, Test_1) and not include special characters (e.g., "Graduation Date" would not
be a valid variable name because it contains a space).
The data should begin in the first column, second row (beneath the variable names row) of the
spreadsheet.
Anything that is not part of the data itself (e.g., extra text, labels, graphs, Pivot Tables) should be
removed.
Missing values for string or numeric variables have blank (empty) cells
Once the data in your Excel file is formatted properly it can be imported into SPSS by following these
steps:
1) Click File > Open > Data. The Open Data window will appear.
2) In the Files of type list select Excel (*.xls, *.xlsx, *.xlsm) to specify that your data are in an Excel file. If
you do not specify the type of file that you wish to open, your file will not appear in the list of available
files. Locate and click on your file. The file name will appear in the File name field. Click Open.
3) If you are using SPSS version 24 or earlier, you will instead see the Opening Excel Data Source window:
If your variable names are in the first row of data, select the Read variable names from the first row of
data check box.
The Maximum width for string columns option determines how wide a string variable should be; it is suggested
to keep the default value unless you have a reason for altering it.
Click on OK.
Summary:
=> Open
After selecting the excel file that will be imported for performing the data analysis, we need to ensure that in the
dialog box that we selected is “read variable names from the first row of data”.
And at the end, click OK. Your file has now imported in SPSS.
Though Excel offers a good way of data organization, SPSS is more suitable for in-depth data analysis. This
ID,Age,Gender
A001,41,F
A009,36,M
C321,27,F
Fixed-width data: Rather than using delimiters between observations, the values of the variables are aligned
vertically, so that a given variable always begins in a certain column position. In the below example, ID always
begins in column 1; Age always begins in column 10; and Gender always begins in column 16.
ID Age Gender
A001R 41 F
Z009 36 M
C321BC 27 F
SPSS has two primary options for calculating statistics: Descriptives (for basic statistics
like mean, median, range and standard deviation) and Frequencies (additional options
Descriptive statistics could also include charts and graphs such as a frequency distribution or histogram,
among others.
questions like:
The descriptive statistics feature of SPSS can give summary statistics such as the mean, median,mode and
standard deviation.
The mean is a measure of average (sum of the values divided by the number of values).
Standard deviation measures the spread of the data and can be used to describe normal distributions.
Skewness is a measure of how symmetrical the distribution is. Values of skewness close to 0 represent
symmetry, positive values mean that there are some high valued outliers and a negative value means some low
valued outliers.
Kurtosis values refer to how peaked the distribution is. A normal distribution would have a value of 0. Negative
values mean that the distribution is flat, i,e, many cases in the extremes) and positive values meaning the
distribution is clustered in the centre.
Descriptive Statistics SPSS: Descriptives Option
Mean
Steps:
Step 1: Click Analyze, mouse over Descriptive Statistics, and then click Descriptives to open the Descriptives
box.
Step 2: Select the variables you want descriptive statistics in SPSS for.
Click the blue arrow in the center to move the selected variables from the left to the right-hand Variables box.
Step 3: Click Options in the right-hand column of the Descriptives box. This action opens the Options window.
Place a check mark next to any additional descriptives you would like, or uncheck those that you don’t want.
Step 3: Click Continue, then click OK. The descriptive statistics SPSS calculates will be displayed in the
output window.
Percentile Values
eg) Quartiles
Central Tendency- Mean, Median, Mode
Dispersion- Standard Deviation, Variance, Range
Distribution- Skewness, Kurtosis
Step 1: Click Analyze, then mouse over Descriptive Statistics, and then click Frequencies to open the
Descriptives box
Step 2: Move the Variables you want to analyze. Click the variables one at a time (or all together), then click
Step 3: Click Statistics in the right-hand column (the top blue button) to open the Frequencies: Statistics box.
Step 4: Click the box to check the statistics you would like. More options are available here, like Quartiles,
which were not available in the Descriptives box in Part 1 above.
Step 5: Click Continue, then click OK. The Descriptive Statistics SPSS output window will display the
requested results.
Explore
Used to statistically analyze Univariate data.
The results are in the form of either graphs or numerical results.
These can also be compared against standard charts and data.
Analyze > Descriptive Statistics > Explore
Crosstab
A crosstab is table showing the relationship between two or more variables. Crosstabs are useful for finding
patterns and correlations in data.
Analyze > Descriptive Statistics > Crosstabs
Charts
SPSS can be used to create bar graphs, histograms, line graphs, and scatterplots.
Steps:
Graphs
Legacy Dialogues
Select Bar, line , Area, Pie etc.
You can select from three types of bar graphs: simple, clustered, and stacked.
The simple bar graph has space between each bar on the graph.
The clustered bar graph does not put space between bars that are related, but does put space between bars that
are not related. This type of bar graph is useful when plotting the results of studies with more than one IV.
The stacked bar graph graphs related bars on top of each other. This type of bar graph is often difficult to read
correctly and probably should not be used in most situations.
Select the type of bar chart that you want by clicking on it
T-test: A t-test is a type of inferential statistic used to determine if there is a significant difference between the
means of two groups, which may be related in certain features.
One Sample T test( used to see whether the mean of the sample is the same as the mean of the population,
i.e whether the sample is representing the population)
Analyze > Compare Means > One-Sample T Test.
The One-Sample T Test window opens where you will specify the variables to be used in the analysis. All of the
variables in your dataset appear in the list on the left side. Move variables to the Test Variable(s) area by
selecting them in the list and clicking the arrow button.
Options: Clicking Options will open a window where you can specify the Confidence Interval
Percentage and how the analysis will address Missing Values .
Click Continue when you are finished making specifications.
Click OK to run the One Sample t Test.
Independent Sample t test (used to compare 2 samples from the same population)
Analyze > Compare Means > Independent-Samples T Test.
The Independent-Samples T Test window opens where you will specify the variables to be used in the analysis.
All of the variables in your dataset appear in the list on the left side. Move variables to the right by selecting
them in the list and clicking the blue arrow buttons. You can move a variable(s) to either of two
areas: Grouping Variable or Test Variable(s).
Test Variable(s): The dependent variable(s)
Grouping Variable: The independent variable.
Define Groups: Click Define Groups
You must define the categories of your grouping variable before you can run the Independent Samples t Test
procedure.
Options: The Options section is where you can set your desired confidence level for the confidence interval for
the mean difference and how the analysis will address Missing Values .
Click OK.
Paired Sample T test:
The Paired Samples t Test compares the means of two measurements taken from the same individual, object
Eg) A measurement taken at two different times (e.g., pre-test and post-test score with an intervention
administered between the two time points)
One-way ANOVA:
One-Way ANOVA ("analysis of variance") compares the means of two or more independent groups in order to
determine whether there is statistical evidence that the associated population means are significantly different.
One-Way ANOVA is a parametric test.
Steps:
Analyze > Compare Means > One-Way ANOVA.
The One-Way ANOVA window opens, where you will specify the variables to be used in the analysis. All of the
variables in your dataset appear in the list on the left side. Move variables to the right by selecting them in the
list and clicking the blue arrow buttons. You can move a variable(s) to either of two areas: Dependent
List or Factor.
Dependent List: The dependent variable(s).
Factor: The independent variable.
Test: By default, a 2-sided hypothesis test is selected. Alternatively, a directional, one-sided hypothesis test can
be specified
Significance level: The desired cutoff for statistical significance. By default, significance is set to 0.05.
Significance level: The desired cutoff for statistical significance. By default, significance is set to 0.05.
Click OK to run the One-Way ANOVA.
Non parametric tests
Non-Parametric Test
Data is not normally distributed.
Uses median value for the central tendency
Doesn’t require previous knowledge about the
population.
Doesn’t make assumptions about the
population.
Normality tests:
Meaning:
In a normal distribution, data is symmetrically distributed with no skew.
When plotted on a graph, the data follows a bell shape, with most values clustering around a central
region and tapering off as they go further away from the center.
All kinds of variables in natural and social sciences are normally or approximately normally distributed.
Height, birth weight, reading ability, job satisfaction, or SAT scores are just a few examples of such
variables.
Because normally distributed variables are so common, many statistical tests are designed for normally
distributed populations.
1) Symmetric :
A normal distribution is perfectly symmetrical around its center. That is, the right side of the center is
a mirror image of the left side.
2) Unimodal:
There is also only one mode, or peak, in a normal distribution
3) Asymptotic:
Normal distributions are continuous and have tails that are asymptotic, which means that they approach
but never touch the x-axis.
One will never be able to cover all cases in a go. Hence, it will never touch 0.
4) The mean, median, mode, are all the same.
The center of a normal distribution is located at its peak, and 50% of the data lies above the mean,
while 50% lies below. It follows that the mean, median, and mode are all equal in a normal distribution.
Correlation and regression
Correlation
Correlation is a statistical technique that is used to measure and describe a relationship between two
variables. Usually the two variables are simply observed, not manipulated.
Correlation is used to find the relationship between 2 quantitative variables, without being able to infer
causal (cause and effect) relationships.
Correlation means that there is a relationship between two or more variables , but this relationships doesn’t
necessarily imply cause and effect.
When two variables are corelated, it simply means that as one variable changes, so does the other.
Characteristics/ Types
Correlations have four important characteristics/types
1. They can tell us about the direction of the relationship
2. The form (shape) of the relationship
3. The degree (strength) of the relationship between two variables.
4. Number of Variables
1. Positive: In a positive relationship both variables tend to move in the same direction: If one variable
increases, the other tends to also increase. If one decreases, the other tends to also.
2. Negative: In a negative relationship the variables tend to move in the opposite directions: If one variable
increases, the other tends to decrease, and vice-versa.
The direction of the relationship between two variables is identified by the sign of the correlation coefficient for
the variables. Postive relationships have a "plus" sign, whereas negative relationships have a "minus" sign.
2.The Form (Shape) of a Relationship: The form or shape of a relationship refers to whether the relationship
is straight or curved.
1. Linear Correlation: A straight relationship is called linear, because it approximates a straight line.
2. Curved Linear correlation: A curved relationship is called curvilinear, because it approximates a
curved line.
Finally, a correlation coefficient measures the degree (strength) of the relationship between two variables. The
mesures we discuss only measure the strength of the linear relationship between two variables. Two specific
strengths are:
1. Perfect Relationship: When two variables are exactly (linearly) related the correlation coefficient is
either +1.00 or -1.00. They are said to be perfectly linearly related, either positively or negatively.
2. No relationship: When two variables have no relationship at all, their correlation is 0.00.
There are strengths in between -1.00, 0.00 and +1.00. Note, though. that +1.00 is the largest postive correlation
and -1.00 is the largest negative correlation that is possible.
4. Number of Variables
Simple Correlation
2 Variables/ Bivariate
Multiple Correlation
Different chunks of simple correlation/ extension of simple correlation
Partial Correlation
More than 2 variables, but some are partial
Exploratory factor analysis is if you don’t have any idea about what structure your data is or how
many dimensions are in a set of variables.
Confirmatory Factor Analysis is used for verification as long as you have a specific idea about
what structure your data is or how many dimensions are in a set of variables.
If you want to explore patterns, use EFA.
If you want to perform hypothesis testing, use CFA.
Steps:
Analyze > Dimension Reduction > Factor
Cluster Analysis
Cluster analysis is a multivariate data mining technique whose goal is to groups objects (eg., products,
respondents, or other entities) based on a set of user selected characteristics or attributes. It is the basic and most
important step of data mining and a common technique for statistical data analysis.
Clusters should exhibit high internal homogeneity and high external heterogeneity.
Meaning, when plotted geometrically, objects within clusters should be very close together and clusters will be
far apart.
Before you can analyze your data it is important to check your data file for errors, possible errors. First, it is
important to see if you have made typos (see above). In addition, it is essential to investigate whether there are
other errors with your data. For this you follow the following steps:
Step 1: Checking for errors. First it is necessary to check all scores of all variables. You then investigate
whether there are certain scores that fall outside the normal range.
Step 2: Finding and checking error in the data file. It is then necessary to find out where the error is in
the data file. This error must then be corrected or removed.
Data Cleaning:
Data Cleaning is the process of preparation of a data for analysis by removing or modifying data that is
incorrect ,missing, irrelevant , duplicated or improperly formatted.
When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled.
If this data is not cleaned, then it might distort the results.
So therefore, before performing any statistical analysis, we have to first clean the data.
Importance of cleaning data
This data is usually not necessary or helpful when it comes to analyzing data because it may hinder the
process or provide inaccurate results.
Analysis → Descriptive Statistics → Frequencies> Select all variables ,and transfer from right box to left
box using the blue button>Click on OK
In the output window, the table will show the number of missing values in that particular variable.
Repeat the process for all the variables which has missing values.
To cross check, in the data view worksheet, there will be a new column added after we perform the
replacing missing value procedure.
Analysis → Descriptive Statistics → Frequencies
Now, the missing values in the frequency table should be 0.
For example: when variable 'gender' is coded with 0 or 1 (where 0 = male and 1 = female), it is not possible to
find scores other than 0 or 1. Scores that have a number other than 0 or 1 ( for example 2 or 3) should therefore
be removed or adjusted.
Another example can be that the person filling the form wanted to enter 1, but by mistake entered 11.
Steps:
Analysis → Descriptive Statistics → Frequencies> Select all variables ,and transfer from right box to left box
using the blue button>Click on Statistics> Select Minimum and Maximum from Dispersion” Box> Click
Continue> Click on OK
In the output window, the table will show
For example: If the Minimum value is 1, and Maximum value is 22 (But the maximum value we assigned was 1
and 2) Then we revisit out cases from that particular error, and manually correct the data entry of that case
from “22” to “2”.
4)Typing errors
It is always very important to run through your data for example on typing errors. You can of course check all
entered data again with the original data, but this takes a lot of time. An easier way is to request Frequencies.
You do this by following the following steps: Analysis → Descriptive Statistics → Frequencies.