0% found this document useful (0 votes)
139 views

MBA Compre Stats Reviewer: Terminology Used in Statistics

The document discusses key terminology used in statistics. It defines population as the entire group being studied, while a sample is a subset of the population. A parameter describes a characteristic of the population, while a statistic describes a characteristic of the sample. Descriptive statistics summarize sample data, while statistical inference uses sample data to make conclusions about the population. Variables can be numerical or categorical characteristics used to describe a population or sample.

Uploaded by

BCC BSIT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
139 views

MBA Compre Stats Reviewer: Terminology Used in Statistics

The document discusses key terminology used in statistics. It defines population as the entire group being studied, while a sample is a subset of the population. A parameter describes a characteristic of the population, while a statistic describes a characteristic of the sample. Descriptive statistics summarize sample data, while statistical inference uses sample data to make conclusions about the population. Variables can be numerical or categorical characteristics used to describe a population or sample.

Uploaded by

BCC BSIT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

MBA Compre Stats Reviewer

Researcher: Dr. Luisito Hagos

Terminology Used in Statistics

Like every subject, statistics has its own language. The language is what
helps you know what a problem is asking for, what results are needed,
and how to describe and evaluate the results in a statistically correct
manner. Here’s an overview of the types of statistical terminology:

Four big terms in statistics are population, sample, parameter, and


statistic:

A population is the entire group of individuals you want to study, and a


sample is a subset of that group.

A parameter is a quantitative characteristic of the population that you’re


interested in estimating or testing (such as a population mean or
proportion).

A statistic is a quantitative characteristic of a sample that often helps


estimate or test the population parameter (such as a sample mean or
proportion).

Descriptive statistics are single results you get when you analyze a set
of data — for example, the sample mean, median, standard deviation,
correlation, regression line, margin of error, and test statistic.

Statistical inference refers to using your data (and its descriptive


statistics) to make conclusions about the population. Major types of
inference include regression, confidence intervals, and hypothesis tests.

In statistics, we generally want to study a population. You can think of a


population as a collection of persons, things, or objects under study. To
study the population, we select a sample. The idea of sampling is to
select a portion (or subset) of the larger population and study that
portion (the sample) to gain information about the population. Data are
the result of sampling from a population.

From the sample data, we can calculate a statistic. A statistic is a


number that represents a property of the sample. For example, if we
consider one math class to be a sample of the population of all math
classes, then the average number of points earned by students in that
one math class at the end of the term is an example of a statistic. The
statistic is an estimate of a population parameter. A parameter is a
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

number that is a property of the population. Since we considered all


math classes to be the population, then the average number of points
earned per student over all the math classes is an example of a
parameter.

Population: all math classes

Sample: One of the math classes

Parameter: Average number of points earned per student over all math
classes

One of the main concerns in the field of statistics is how accurately a


statistic estimates a parameter. The accuracy really depends on how well
the sample represents the population. The sample must contain the
characteristics of the population in order to be a representative sample.
We are interested in both the sample statistic and the population
parameter in inferential statistics. In a later chapter, we will use the
sample statistic to test the validity of the established population
parameter.

A variable, notated by capital letters such as X and Y, is a characteristic


of interest for each person or thing in a population. Variables may
be numerical or categorical. Numerical variables take on values with
equal units such as weight in pounds and time in hours. Categorical
variables place the person or thing into a category.
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

1. Z-test

Comparing two sample means

X1 – X2
Z=

S12 + S22
n1 n2

Where: X1 - mean of the first sample

X2 – mean of the second sample

n1/n2 – number of items in the second/ first sample

S1/ S2 – standard deviation of the first/ second sample

Table for Critical Value of


Z at Varying Significant Levels.

Significant Level
.10 .05 .025 .01
Test Type

One-tailed test +1.28 +1.654 +1.96 +2.33

Two-tailed test +1.645 +1.96 +2.33 +2.58

Problem: A researcher wishes to find out whether or not there is


significant difference between the IQ of morning and afternoon student in
his school. By random sampling, he took a sample of 239 students in the
morning session. These students were found out to have a mean IQ of
80. The researcher also took sample of 209 students in the afternoon
session. They were found out to have a mean IQ of 82. The second
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

deviation of the students’ IQ in the morning and afternoon session is


13.11 and 11.17, respectively. Test a hypothesis.

Solution:

1. H(O): There is no significant difference in the I.Q. between the


students in the morning and afternoon sessions

2. £ = 0.05

3. Use z –test (why? The sample size is greater than 30)

4. T.V. -= 1.96 (two-tailed)

Afternoon Morning

X1= 82 X2= 80
n1=209 n2=239
s1=11.17 s2=13.19

5. Compute z

X1 – X2
Z=

S12 + S22
n1 n2

= 82 – 80

11.172 + 13.192
209 239

= 1.72

6. Compare CV and TV

CV is less than TV
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

1.72 is less than 1.96, Accept hypothesis

Therefore, there is no significant difference in the I.Q. between the


students in the morning and afternoon sessions

2. SPEARMAN’S RANK CORRELATION COEFFICIENT

(British Psychologist Charles Spearman, 1863-1945)

Formula:

1. r = 1 -

2. r = ][ - (n+1)]

Problem:

Ten applicants for the editorship of a school were ranked in their

essay writing ability. They were also ranked in a test that measured their

reading comprehension. The data are tabulated below where the highest

rank is 1 and the lowest rank is 10.

Student Grade in Grade in Rank Rank xy x-y (x-y)²


essay reading (X) (Y)
writing comprehension
1 85 90 5 3 15 2 4
2 80 92 8.5 2 17 6.5 42.25
3 90 89 3 4 12 -1 1
4 83 85 6 6.5 39 -.5 .25
5 75 86 10 5 50 5 25
6 81 96 7 1 7 6 36
7 80 75 8.5 10 85 -1.5 2.25
8 92 85 2 6.5 13 -4.5 20.25
9 93 81 1 8 8 -7 49
10 88 78 4 9 36 -5 25
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

TOTAL 282 205

1. Formulate the hypothesis


H(O): There is no significant relationship between the applicants’
grade in writing ability and reading comprehension.

H(a): There is a significant relationship between the applicants’


grade in writing ability and reading comprehension.

2. Use level of significance 0.05


3. Use t- statistic
4. Computation:
a. r = ][ - (10+1)] b. r = 1 –

r = -0.2424 r = -0.2424

5. Interpretation:
There is a negative low correlation between the
applicants’ writing ability and reading comprehension.

6. Conclusion:
As the applicants writing ability increases their reading
comprehension decreases.
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

3. t-test for Correlated Samples

Why do we use t-test correlated samples?

 The t-test for correlated samples is used to find out if a difference


exists between the before and after means. If there is a difference
in favor of the posttest then the treatment or intervention is
effective. However, if there is no significant difference then the
treatment is not effective.

 This is an appropriate test for the evaluation of government


programs. This is used in an experimental design to test the
effectiveness of a certain technique or method or program that had
been developed.

How do we use t-test for correlated samples?

The formula is

Where:

D = the mean difference between the pretest and the


posttest.
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

= the sum of the squares of the difference between the


pre-test and the posttest.

= the summation of the difference between the pretest


and the posttest.

n = the sample size.

Example 1: An experimental study was conducted on the effect of


programmed materials in English on the performance of 20 selected
college students. Before the program was implemented the pretest was
administered and after 5 months the same instrument was used to get
the posttest result. The following is the result of the experiment.

Pretest Posttest
D
20 25 -5 25
30 35 -5 25
10 25 -15 225
15 25 -10 100
20 20 0 0
10 20 -10 100
18 22 -4 16
14 20 -6 36
15 20 -5 25
20 15 5 25
18 30 -12 144
15 10 5 25
15 16 -1 1
20 25 -5 25
18 10 8 64
40 45 -5 25
10 15 -5 25
10 10 0 0
12 18 -6 36
20 25 -5 25

= -81 = 947
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

1. H(O): There is no significant difference on the effect of programmed


materials in English on the performance of 20 selected students
based on the pretest and the posttest.

H(a): There is no significant difference on the effect of programmed


materials in English on the performance of 20 selected students
based on the pretest and the posttest.

2. Use t – test for correlated samples

3. α = 0.05

4. df = n-1

= 20-1 df = 19

5. Compute t-test
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

6. > = 2.10; Reject the null hypothesis.

Conclusion:
There is a significant difference between the pretest and the
posttest in the English performance of the 20 selected college
students.

PEARSON PRODUCT MOMENT CORRELATION

Correlation is a statistical test used to measure the association of


two or more quantitative variables.
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

To estimate roughly if a relationship exists between two variables,


a scatter point is made. Draw a straight line intersecting as many as
possible in the graph.
The most widely used computational formula for correlation is the
Pearson Product Moment Correlation Coefficient.
The formula for Pearson r:

Where:
X- The observed data for the independent variable
Y- The observed data for the dependent variable
N-Sample size
r- Degree of relationship between X and Y.

The qualitative interpretation of the degree of linear relationship


existing is shown in the following range of values:
Perfect Positive/Negative Correlation
Very High Positive/Negative Correlation
High Positive/ Negative Correlation
Moderate Positive/Negative Correlation
Low Positive/ Negative Correlation
Negligible Positive/ Negative Correlation
0.00 No Correlation

Problem no. 1
The data below were obtained from the study conducted by a
researcher on the relationship between grades in Algebra (X) and
Computer (Y) of a sample of 10 students.
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

Students X Y XY
1 75 82 6150 5625 6724
2 80 78 6240 6600 6084
3 93 86 7998 8649 7396
4 65 72 4680 4225 5184
5 87 91 7917 7569 8281
6 71 80 5680 5041 6400
7 98 95 9310 9604 9025
8 68 72 4896 4624 5184
9 84 89 7476 7056 7921
10 77 74 5698 5929 5476
∑X=798 ∑Y=819 ∑XY=66045 ∑ =64722 ∑ =67675

 There is no significant relationship between the grades of the


students in Algebra and Computer.
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

Chi-square

Example:

Suppose we wish to assess whether there is a relationship between


exercise on campus and students' living arrangements. As part of the
same survey, graduates were asked where they lived their senior year.
The response options were dormitory, on-campus apartment, off-campus
apartment, and at home (i.e., commuted to and from the university). The
data are shown below.

No Regular Sporadic Regular Exercise Total


Exercise Exercise
Dormitory 32 30 28 90
On-Campus 74 64 42 180
Apartment
Off-Campus 110 25 15 150
Apartment
At Home 39 6 5 50
Total 255 125 90 470
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

Based on the data, is there a relationship between exercise and student's


living arrangement? Do you think where a person lives affect their
exercise status? Here we have four independent comparison groups
(living arrangement) and a discrete (ordinal) outcome variable with three
response options. We specifically want to test whether living arrangement
and exercise are independent. We will run the test using the five-step
approach.

 Step 1. Set up hypotheses and determine level of significance.

H0: Living arrangement and exercise are independent

H1: H0 is false. α=0.05

The null and research hypotheses are written in words rather than in
symbols. The research hypothesis is that the grouping variable (living
arrangement) and the outcome variable (exercise) are dependent or
related.

 Step 2. Select the appropriate test statistic.

The formula for the test statistic is:

The condition for appropriate use of the above test statistic is that each
expected frequency is at least 5. In Step 4 we will compute the expected
frequencies and we will ensure that the condition is met.

 Step 3. Set up decision rule.

The decision rule depends on the level of significance and the degrees of
freedom, defined as df = (r-1)(c-1), where r and c are the numbers of rows
and columns in the two-way data table. The row variable is the living
arrangement and there are 4 arrangements considered, thus r=4. The
column variable is exercise and 3 responses are considered, thus c=3.
For this test, df=(4-1)(3-1)=3(2)=6. Again, with χ2 tests there are no
upper, lower or two-tailed tests. If the null hypothesis is true, the
observed and expected frequencies will be close in value and the
χ2 statistic will be close to zero. If the null hypothesis is false, then the
χ2 statistic will be large. The rejection region for the χ2 test of
independence is always in the upper (right-hand) tail of the distribution.
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

For df=6 and a 5% level of significance, the appropriate critical value is


12.59 and the decision rule is as follows: Reject H0 if c 2 > 12.59.

 Step 4. Compute the test statistic.

We now compute the expected frequencies using the formula,

Expected Frequency = (Row Total * Column Total)/N.

The computations can be organized in a two-way table. The top number


in each cell of the table is the observed frequency and the bottom
number is the expected frequency. The expected frequencies are shown
in parentheses.

No Regular Sporadic Regular Total


Exercise Exercise Exercise
Dormitory 32 30 28 90

(48.8) (23.9) (17.2)


On-Campus 74 64 42 180
Apartment
(97.7) (47.9) (34.5)
Off-Campus 110 25 15 150
Apartment
(81.4) (39.9) (28.7)
At Home 39 6 5 50

(27.1) (13.3) (9.6)


Total 255 125 90 470

Notice that the expected frequencies are taken to one decimal place and
that the sums of the observed frequencies are equal to the sums of the
expected frequencies in each row and column of the table.

Recall in Step 2 a condition for the appropriate use of the test statistic
was that each expected frequency is at least 5. This is true for this
sample (the smallest expected frequency is 9.6) and therefore it is
appropriate to use the test statistic.

The test statistic is computed as follows:


MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

 Step 5. Conclusion.

We reject H0 because 60.5 > 12.59. We have statistically significant


evidence at a =0.05 to show that H 0 is false or that living arrangement
and exercise are not independent (i.e., they are dependent or related), p <
0.005.

Again, the χ2 test of independence is used to test whether the


distribution of the outcome variable is similar across the comparison
groups. Here we rejected H0 and concluded that the distribution of
exercise is not independent of living arrangement, or that there is a
relationship between living arrangement and exercise. The test provides
an overall assessment of statistical significance. When the null
hypothesis is rejected, it is important to review the sample data to
understand the nature of the relationship. Consider again the sample
data.
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

Least Square Regression Line


Linear regression is a basic and commonly used type of predictive
analysis. The overall idea of regression is to examine two things: (1) does
a set of predictor variables do a good job in predicting an outcome
(dependent) variable? (2) Which variables in particular are significant
predictors of the outcome variable, and in what way do they–indicated by
the magnitude and sign of the beta estimates–impact the outcome
variable? These regression estimates are used to explain the relationship
between one dependent variable and one or more independent variables.
The simplest form of the regression equation with one dependent and one
independent variable is defined by the formula y = c + b*x, where y =
estimated dependent variable score, c = constant, b = regression
coefficient, and x = score on the independent variable.
Problem 5:
MBA Compre Stats Reviewer
Researcher: Dr. Luisito Hagos

Year Time Sales X2 XY


Period (X) (Units) (Y)

2001 1 100 1 100

2002 2 110 4 220

2003 3 122 9 366

2004 4 130 16 520

2005 5 139 25 695

2006 6 152 36 912

2007 7 164 49 1148

S X = 28 S Y =917 S X2=140 S XY = 3961

x
 x  28  4
n 7

y
 y  917  131
n 7

b
 xy  nxy  3961  (7)(4)(131)  293  10.46
 x  nx
2 2
140  (7)(4 ) 2
28
a  y  bx  131  (10.46  4)  8916
.
Therefore, the least squares trend equation is:
y  a  bx  8916
.  10.46 x
To project demand in 2008, we denote the year 2008 as x = 8, and:
Sales in 2008 = 89.16 + 10.46 * 8 = 172.84

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy