0% found this document useful (0 votes)

82 views60 pages

Lesson 6 - Statistics For Data Science - II

The document discusses hypothesis testing and different types of parametric and non-parametric tests. It explains concepts like z-test, t-test, ANOVA test and provides examples of their usage in R.

Uploaded by

rimbrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views60 pages

Lesson 6 - Statistics For Data Science - II

The document discusses hypothesis testing and different types of parametric and non-parametric tests. It explains concepts like z-test, t-test, ANOVA test and provides examples of their usage in R.

Uploaded by

rimbrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Data Science with R

Lesson 6— Statistics for Data Science – II

© Simplilearn. All rights reserved.

Learning Objectives

Discuss Hypothesis Test

Explain Parametric test and its types

Explain Non-Parametric test and its types

Perform Hypothesis Tests on Population Means

Perform Hypothesis Tests on Population Variance

Perform Hypothesis Tests on Population Proportions

Statistics for Data Science – II
Topic 1—Hypothesis Test
What Is Hypothesis Test?

A hypothesis test is a formal procedure in statistics used to test whether a hypothesis can be
accepted or not.

It is used to infer the results of a hypothesis performed on sample data to a large population.

The testing methodology depends on the data used and the reason for the analysis.
Types of Hypothesis Test

Simple Complex Null Hypothesis

Hypothesis Test Hypothesis Test Test

Alternative Statistical
Hypothesis Test Hypothesis Test

Non-Parametric
Parametric Test
Test

You have already learned about simple, complex, null, alternative, and statistical
hypotheses in the previous lesson. This lesson will focus on discussing parametric and
non-parametric tests.
Statistics for Data Science – II
Topic 2—Parametric Test
What Is a Parametric Test?

A parametric statistical test is one that makes assumptions about the

parameters (defining properties) of the population distribution(s) from which
one's data is drawn.

In these tests, inferences are based on the assumptions made about the
nature of the population distribution. The tests are used for normal data.
Types of Parametric Tests

Analysis of
Z-Test and T-Test Variance
(ANOVA) Test

Two population means or Equality of several

proportions are compared population means is
and tested. tested.

There are many tests that are parametric. We will limit our attention to the tests
mentioned above.
Types of Parametric Tests
Z-TEST

Z-Test is performed in cases where the test statistic is t, σ is known, the

population is normal, and the sample size is at least 30.
Z-Test
The formula to calculate z (standard statistic) is:

T-Test
𝑋ത − 𝜇
𝑧=
ANOVA 𝜎
𝑛

Where,
n: Sample number
𝑋ത: Sample mean from a sample X1, X2, …, Xn
μ: Population mean
σ: Standard Deviation
Types of Parametric Tests
EXAMPLE IN R

The test scores of an entrance exam fit a normal distribution with the
Z-Test mean test score of 72 and a standard deviation of 15.2. Compute the
Problem percentage of students scoring 84 or more.
statement
T-Test

ANOVA
Calculation
on R

Solution
Types of Parametric Tests
EXAMPLE IN R

Let’s use the pnorm (probability normal distribution) function to find the
Z-Test required percentage of students and the upper tail of the normal
Problem distribution (since the given score criteria is 84 or more).
statement
T-Test pnorm(84, mean = 72, sd = 15.2, lower.tail = FALSE)

ANOVA [1] 0.21492

Calculation
on R

lower.tail = TRUE is used to find the probability of values no

larger than z, whereas lower.tail = FALSE is used to find the
probability of values z or larger.
Solution
Types of Parametric Tests
EXAMPLE IN R

Z-Test The required percentage is 21.5%.

Problem
statement
T-Test

ANOVA
Calculation
on R

Solution
Types of Parametric Tests

T-Test is performed in cases where the test statistic is t, σ is unknown, sample standard
deviation is known, and the population is normal.
Z-Test
The formula to calculate t is:
T-Test
𝑋ത − 𝜇
𝑡=
ANOVA 𝑠
𝑛

Where,
n: Sample number
𝑋ത : Sample mean from a sample X1, X2, …, Xn
μ: Population mean
σ: Standard Deviation
Types of Parametric Tests
EXAMPLE IN R

Find out the 2.5th and 97.5th percentiles of the Student’s t-distribution,
Z-Test assuming 5 degrees of freedom.
Problem
statement
T-Test

ANOVA
Calculation
on R

Solution
Types of Parametric Tests
EXAMPLE IN R

Let’s use the quantile function (applied to compute percentiles) “qt”

Z-Test against the decimal values 0.025 and 0.975.
Problem
statement qt(c(.025, .975), df = 5) # 5 degrees of freedom
T-Test
[1] -2.5706 2.5706
ANOVA
Calculation
on R

Degree of freedom refers to the number of values in the final

calculation of a test statistic that varies freely. It is calculated using
the formula df = N-1 (where N is the number of values in a dataset).
Solution
Types of Parametric Tests
EXAMPLE IN R

The required 2.5th and 97.5th percentiles are -2.5706 and 2.5706,
Z-Test respectively.
Problem
statement
T-Test

ANOVA
Calculation
on R

Solution
Types of Parametric Tests

The ANOVA test is used for hypothesis tests that compare the averages of two or more groups.
Z-Test
For example, consider the following statements:
T-Test
• An environmentalist wants to know if the average amount of pollution varies in several
bodies of water.
ANOVA
• A sociologist wants to find out if a person’s income varies according to his/her upbringing.
Types of Parametric Tests
TYPES

Z-Test ANOVA

T-Test

ANOVA One-way Two-way

ANOVA ANOVA
Types of Parametric Tests

One-way Anova:
Z-Test
• Uses variances to determine if a statistically significant difference exists among
several group means or not
T-Test
• Tests H0: μ1 = μ2 = μ3 = ... = μk (where, µ = group mean and k = number of groups)
ANOVA

One-way
ANOVA
Two-way
ANOVA

For one-way ANOVA, the ratio of the between-group variability to the within-
group variability follows an F-distribution when the null hypothesis is true.
Types of Parametric Tests
ASSUMPTIONS

The populations have

Each equal standard deviations
Z-Test population is
normal 4
T-Test
2
ANOVA

One-way
ANOVA 1 3 5
Two-way
ANOVA All samples are The factor is a The result is a
random and categorical numerical
independent variable variable
Types of Parametric Tests
EXAMPLE 1

Find out if there is a difference in the mean grades among the sororities,
Z-Test assuming μ1, μ2, μ3, and μ4 are the population means of the sororities.
Problem
statement
T-Test

ANOVA
Calculation
One-way on R
ANOVA
Two-way
ANOVA
Solution
Types of Parametric Tests
EXAMPLE 1

Test:
• H0: μ1 = μ2 = μ3 = μ4
Z-Test • H1: Not all of the means μ1, μ2, μ3, and μ4 are equal
Problem • Distribution for the test: F3,16
statement
o df(num)= k – 1 = 4 – 1 = 3
T-Test
o df(denom) = n – k = 20 – 4 = 16
• Calculate the test statistic: F = 2.23
ANOVA • Define probability statement: p-value = P(F > 2.23) = 0.1241
• Compare α and the p-value: α = 0.01
Calculation
o p-value = 0.1241
One-way on R
o α < p-value
ANOVA
• Decide: Since α < p-value, you cannot reject H0.
Two-way
ANOVA
Solution
Types of Parametric Tests
EXAMPLE 1

Without sufficient evidence, you cannot conclude that there is a

difference among the mean grades for the sororities.
Z-Test
Problem
statement
T-Test

ANOVA
Calculation
One-way on R
ANOVA
Two-way
ANOVA
Solution
Types of Parametric Tests
EXAMPLE 2

A fast food chain wants to test and market three of its new menu items. To
analyze if they are equally popular, consider:
Z-Test
Problem • 18 random restaurants for the study
statement • 6 of the restaurants to test market the first menu item, another 6 for the
T-Test second one, and the remaining 6 for the last one

The table below shows the sales figures of the menu items in the 18
ANOVA restaurants. At .05 level of significance, test whether the mean sales volumes
Calculation for these menu items are equal.
One-way on R
ANOVA Item 1 Item 2 Item 3
22 52 16
Two-way
ANOVA 42 33 24
44 8 19
Solution
52 47 18
45 43 34
37 32 39
Types of Parametric Tests
EXAMPLE 2

1. Copy and paste the sales figures in a table file "fastfood-1.txt" using a text
editor.
Z-Test
Problem 2. Load the file into a data frame df1 using the read.table function.
statement df1 = read.table("fastfood-1.txt", header = TRUE); df1
T-Test
Item1 Item2 Item3
1 22 52 16
ANOVA 2 42 33 24
3 44 8 19
Calculation 4 52 47 18
One-way on R 5 45 43 34
ANOVA 6 37 32 39
Two-way
ANOVA
Solution
Types of Parametric Tests
EXAMPLE 2

3. Concatenate the data rows of df1 into a single vector r.

r = c(t(as.matrix(df1))) # response data
Z-Test r
Problem [1] 22 52 16 42 33 ...
statement
T-Test
4. Assign new variables for the treatment levels and number of observations.
f = c("Item1", "Item2", "Item3") # treatment levels
ANOVA k=3 # number of treatment levels
n=6 # observations per treatment
Calculation
One-way on R
ANOVA
Two-way
ANOVA
Solution
Types of Parametric Tests
EXAMPLE 2

5. Create a vector of treatment factors, corresponding to each element of R in step 3,

using the gl function.
Z-Test tm = gl(k, 1, n*k, factor(f)) # matching treatments
Problem Tm
statement [1] Item1 Item2 Item3 Item1 Item2 ...
T-Test tm = gl(k, 1, n*k, factor(f)) # matching treatments
tm
[1] Item1 Item2 Item3 Item1 Item2 ...
ANOVA Apply the function aov to a formula that describes the response r by the treatment
factor tm.
Calculation
av = aov(r ~ tm)
One-way on R Print out the ANOVA table with the summary function.
ANOVA summary(av)
Two-way
ANOVA Df Sum Sq Mean Sq F value Pr(>F)
tm 2 745 373 2.54 0.11
Solution Residuals 15 2200 147
Types of Parametric Tests
EXAMPLE 2

p-value of 0.11 > .05 significance level. Do not reject H0. This means that
the mean sales volumes of the new menu items are all equal.
Z-Test
Problem
statement
T-Test

ANOVA
Calculation
One-way on R
ANOVA
Two-way
ANOVA
Solution
F- Distribution

F distribution or the Fisher–Snedecor distribution is a continuous probability distribution that arises

frequently as the null distribution of a test statistic, most notably in the analysis of variance (ANOVA).

F-Ratio refers to the value derived from two estimates of the variance, as described below:

o Variance between samples (SSbetween): It is an estimate of σ2: variance of the sample means * n,
when the sample sizes are the same. When sizes are different, the variance is weighted to account
for different sample sizes.

o Variance within samples (SSwithin): It is an estimate of σ2: average of sample variances. When
sizes are different, the variance within samples is weighted.
Types of Parametric Tests

Two-way ANOVA refers to a hypothesis test where the classification of data is

Z-Test based on two independent variables

For example:
T-Test
A company bases its sales classification by identifying the sales by a salesman
ANOVA and sales by region.

One-way
ANOVA
Two-way
ANOVA
Types of Parametric Tests
ASSUMPTIONS

Independence of
Measurement of observations
Z-Test dependent variable
at continuous level
4
T-Test
2
ANOVA

One-way
ANOVA 1 3 5
Two-way
ANOVA
Normal Categorical independent Homogeneity of the
distribution of the groups that have the variance of the
population sample same size population

https://keydifferences.com/difference-between-one-way-and-two-way-anova.html
Statistics for Data Science – II
Topic 3—Non-Parametric Test
What Is a Non-Parametric Test?

A non-parametric test (sometimes called a distribution free test) does not

assume anything about the underlying distribution. It is used when the data is
not distributed normally.

It refers to a null category, since virtually all statistical tests assume one
thing or another about the properties of the source population(s).

http://www.statisticshowto.com/parametric-and-non-parametric-data/
Types of Non-Parametric Tests

• Kruskal Willis test (alternative to the One way ANOVA)

• Mann Whitney test (alternative to the two sample t test)

• Chi-square test

Chi-square test is the most commonly used non-parametric test. We will limit our
scope to learning chi-square test in this course.
What Is Chi-square Test?

Chi-square test is a nonparametric test used to compare two or more variables

for randomly selected data.
Chi-Square Test
FEATURES

Uses contingency tables (in

Evaluates if frequencies observed in market researches, these
different categories vary significantly tables are called cross-tabs)
from the frequencies expected under a
specified set of assumptions 4

1 3 5

Considers the Determines how well an Supports nominal-level

square of a assumed distribution fits measurements
standard normal the data
variate
Types of Chi-square Test

1. Chi-square test for goodness of fit

2. Chi-square test for independence of two variables
Types of Chi-square Test

It is used to observe the closeness of a sample that matches a population. The

Chi-square test statistic (𝜒 2 ) is
Chi-square test
for goodness of 𝑂𝑖 − 𝐸𝑖
2
2
fit 𝜒 =෎
𝐸𝑖
Chi-square test
for independence
of two variables with k-1 degrees of freedom.

Where Oi is the observed count, k is categories, and Ei is the expected counts

Goodness of fit of a statistical model refers to the understanding of how well

sample data fits a set of observations.

https://www.chegg.com/homework-help/definitions/chi-square-test-14
Types of Chi-square Test
USE CASES

Goodness of fit test is used to identify the relation between two attributes, as in the cases
below:
Chi-square test
for goodness of • Credit worthiness of borrowers based on their age groups and personal loans
fit
Chi-square test • Relation between the performance of salesmen and training received
for independence
of two variables • Return on a single stock and on stocks of a sector like pharmaceutical or banking

• Category of viewers and impact of a TV campaign

Types of Chi-square Test

It is used to check whether the variables are independent of each other or not. The
Chi-square test statistic (𝜒 2 ) is

Chi-square test 2
𝑂𝑖 − 𝐸𝑖
for goodness of 𝜒2 =෎
fit 𝐸𝑖
Chi-square test
for independence
of two variables With (r-1) (c-1) degrees of freedom

Where Oi is the observed count, r is number of rows, c is the number of columns, and Ei is
the expected counts

Two random variables are called independent if the probability distribution of one
variable is not affected by the other.

https://www.chegg.com/homework-help/definitions/chi-square-test-14
Types of Chi-square Test
USE CASES

Test of independence is suitable for the following situations:

Chi-square test • There is one categorical variable.

for goodness of
fit • There are two categorical variables, and you will need to determine the relation between
Chi-square test them.
for independence
of two variables • There are cross-tabulations, and relation between two categorical variables needs to be
found.

• There are non-quantifiable variables (For example, answers to questions like, do

employees in different age groups choose different types of health plans?)
Types of Chi-square Test
EXAMPLE

The manager of a restaurant wants to find the relation between customer

satisfaction and the salaries of the people waiting tables.

Chi-square test Problem • She takes a random sample of 100 customers asking if the service was
for goodness of statement excellent, good, or poor.
fit • She then categorizes the salaries of the people waiting as low, medium, and
Chi-square test high.
for independence
of two variables Her findings are shown in the table below:
Calculation
on R Salary

Service Low Medium High Total

Excellent 9 10 7 26

Good 11 9 31 51
Solution
Poor 12 8 3 23
Total 32 27 41 100
Types of Chi-square Test
EXAMPLE

Assume the level of significance is 0.05. Here, H0 and H1 denote the

independence and dependence of the service quality on the salaries of
people waiting tables.
Chi-square test Problem
for goodness of Test: DF = (3-1) (3-1) = 4
statement • Under H0, expected frequencies are:
fit
o E11 = (26X32)/100 = 8.32, E12 = 7.02, E13 = 10.66
Chi-square test o E21 = 16.32, E22 = 13.77, E23 = 20.91
for independence o E31 = 7.36, E32 = 6.21, E33 = 9.41
of two variables
Calculation Therefore, ‫א‬2(calculated) = (9-8.32)2/8.32+(10-7.02)2/7.02+(7-10.66)2/10.66 +(11-
on R 16.32)2/16.32+(9-13.77)2/13.77+(31-20.91)2/20.91+(12-7.36)2/7.36+(8-
6.21)2/6.21+(3-9.43)2/9.43 = 18.658

• ‫א‬2 0.05,4 = 9.48773

‫א‬2 (Calculated) > ‫א‬2(Tabulated)
• Reject H0, accept H1.
Solution
Types of Chi-square Test
EXAMPLE

Service quality is dependent on the salaries of the people waiting.

Chi-square test Problem

for goodness of statement
fit
Chi-square test
for independence
of two variables
Calculation
on R

Solution
Types of Chi-square Test
EXAMPLE IN R

To perform this test in R, let’s consider a table that is a result of a survey

conducted among students about their smoking habits.
Chi-square test Problem
for goodness of statement This tables has:
fit
Chi-square test “Smoke” variables, which record the smoking habits of students (Allowed
for independence values: "Heavy," "Regul," "Occas," and "Never")
of two variables “Exer” variables, which record the exercise levels of smoking (Allowed
Calculation values: "Freq," "Some, " and "None")
on R
Assuming .05 as the significance level, test the hypothesis whether the
smoking habits of students are independent of their exercise levels or not.

Solution
Types of Chi-square Test
EXAMPLE IN R

Let’s build the contingency table in R:

library(MASS) # load the MASS package
Chi-square test Problem
head(survey)
for goodness of statement
tbl = table(survey$Smoke, survey$Exer)
tbl
fit Freq None Some
Chi-square test Heavy 7 1 3
Never 87 18 84
for independence Occas 12 3 4
of two variables Regul 9 1 7
Calculation
on R Let’s use the chisq.test function for the contingency table and find the
value of p (calculated probability).
chisq.test(tbl)

Output: data: table(survey$Smoke, survey$Exer)

Solution X-squared = 5.4885, df = 6, p-value = 0.4828
Types of Chi-square Test
EXAMPLE IN R

As p > significance level, H0 is not rejected. This means that the smoking
habits of students are independent of their exercise levels.
Chi-square test Problem
for goodness of statement
fit
Chi-square test
for independence
of two variables
Calculation
on R

Solution
Hypothesis Test around Mean, Variance, and Proportion

Both parametric and non-parametric hypothesis tests are used to check

whether the mean, variance, and proportion of the population have
pre-determined values or if the values need to be defined.

Let’s discuss them in detail.

Statistics for Data Science – II
Topic 4—Hypothesis Tests about Population Means
Hypothesis Tests about Population Means

Hypothesis tests about population means involve testing the hypothesis that
compares the population mean of interest with a specified value.
Hypothesis Tests about Population Means
ASSUMPTION

X1, X2,……., Xn is a sample of size n from a normal population with mean μ and variance ơ2. The
mean X is distributed normally with the mean μ and variance ơ2/n (X ~ N (μ, ơ2/n)).
If n is large, X will be calculated similarly, even if the sample is from a non-normal population.
Therefore, for large samples, the standard normal variable corresponding to X bar is Z (as
calculated in the Z-test).
Hypothesis Tests about Population Means
WHEN POPULATION VARIANCE IS KNOWN

Consider a random large sample of size n, with a sample mean 𝑋ത

Test the hypothesis that the sample mean X has been drawn from a population with the mean μ and
a specified value μ0, that is:

• H0 : μ = μ0
• H1 : μ ≠ μ0
• H1 : μ > μ0
• H1 : μ < μ0

Under null hypothesis, Z = (X̅ – μ0)/S.E.(X) follows Standard Normal Distribution approximately.

When population variance is unknown, Z test is used.

Hypothesis Tests about Population Means
WHEN POPULATION VARIANCE IS UNKNOWN

Consider the following hypothesis formation:

• H0 : μ = μ0
• H1 : μ ≠ μ0

If μ0 falls in the confidence interval, the test result is “failing to reject the null hypothesis”; if
not, the result is “reject the null hypothesis.”

When population variance is unknown, T test is used.

Statistics for Data Science – II
Topic 5—Hypothesis Tests about Population Variance
Hypothesis Tests about Population Variance

Hypothesis test about population variance involves finding the squared

deviation of a random variable from its mean. It measures how far a set of
(random) numbers are spread out from their average value.
Hypothesis Tests about Population Variance
FORMULA

Consider the case where data consists of a simple random sample drawn from a normally
distributed population. The test statistic for testing hypotheses about a single population
variance is calculated as:

Chi-square test is used in hypothesis tests of population variance.

Statistics for Data Science – II
Topic 6—Hypothesis Tests about Population Proportions
Hypothesis Tests about Population Proportions

Hypothesis Tests about population proportions are defined as the ratio of

the values in a subset S to the values in a set R.
Hypothesis Tests about Population Proportions
FORMULA

Consider a random sample of the size n and the proportion of members with a certain attribute p.

You need to test the hypothesis that the proportion P in the population has a specified value P0,
that is:

• H0 : P = P0
• H1 : P ≠ P0
• H1 : P > P0
• H1 : P < P0

For a large sample, Z = (p - P0)/S.E.(p) ~ N (0,1) (under H0)

Where,
p = X/n = Number of successes in sample/Sample size
P0 = Hypothesized proportion of successes in the population
Key Takeaways

Hypothesis test is a formal procedure in statistics used to test whether a

hypothesis can be accepted or not.

The Z-test is performed in cases where the test statistic is t and σ is known.

The T-test is performed in cases where the test statistic is t and σ is unknown.

The degree of freedom is the number of independent variates that make up the
statistic.

The Chi-Square Test considers the square of a standard normal variate.

The ANOVA test is used for such hypothesis tests that compare
the averages of two or more groups.

Both parametric and non-parametric tests of the population have a pre-

determined value, or the values need to be defined.

AP Classroom Unit 2 FRQ Scoring Guide
No ratings yet
AP Classroom Unit 2 FRQ Scoring Guide
13 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Parametric and Non-Parametric Statistical Testing
No ratings yet
Parametric and Non-Parametric Statistical Testing
19 pages
rohit seminar
No ratings yet
rohit seminar
22 pages
RM Sec 4
No ratings yet
RM Sec 4
33 pages
Parametric & Non Parametric Tests
No ratings yet
Parametric & Non Parametric Tests
18 pages
Ed Aaaaaaa
No ratings yet
Ed Aaaaaaa
7 pages
RM Module 4
No ratings yet
RM Module 4
22 pages
Parametric and Non Parametric Assignment[1]
No ratings yet
Parametric and Non Parametric Assignment[1]
17 pages
Research Methodology - Module: 3: Prepare By: Prof. Vijay Bhatu
No ratings yet
Research Methodology - Module: 3: Prepare By: Prof. Vijay Bhatu
75 pages
Different Types of Statistical Tests
No ratings yet
Different Types of Statistical Tests
19 pages
Parametric and Non Parametric Test
100% (4)
Parametric and Non Parametric Test
36 pages
Parametric Test
No ratings yet
Parametric Test
49 pages
Comparison of Means: Hypothesis Testing
No ratings yet
Comparison of Means: Hypothesis Testing
52 pages
Chapter 5 Descriptive Inferential Statistics
No ratings yet
Chapter 5 Descriptive Inferential Statistics
33 pages
Parametric and non-parametric
No ratings yet
Parametric and non-parametric
35 pages
UNIT 10
No ratings yet
UNIT 10
30 pages
MNS3173 - Chapter 8 - Types of Data Analysis Methods
No ratings yet
MNS3173 - Chapter 8 - Types of Data Analysis Methods
19 pages
Session 10
No ratings yet
Session 10
10 pages
BRM Unit V
No ratings yet
BRM Unit V
99 pages
7-9
No ratings yet
7-9
99 pages
Parametric Test and Non
No ratings yet
Parametric Test and Non
9 pages
Parametric Test
No ratings yet
Parametric Test
6 pages
Parametric Tests
No ratings yet
Parametric Tests
57 pages
ml unit 3
No ratings yet
ml unit 3
46 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Unit5 - Inference For Numerical Data
No ratings yet
Unit5 - Inference For Numerical Data
54 pages
Z and T Test
No ratings yet
Z and T Test
7 pages
Reviewer Psychstats Midterms
No ratings yet
Reviewer Psychstats Midterms
12 pages
Be A 65 Ads Exp 4
No ratings yet
Be A 65 Ads Exp 4
17 pages
Intermediate Analytics-nonparametric statistics&sampling-Week 5
No ratings yet
Intermediate Analytics-nonparametric statistics&sampling-Week 5
56 pages
Analysing and Presenting Data: Practical Hints: Daniele CEI, Giorgio MATTEI
No ratings yet
Analysing and Presenting Data: Practical Hints: Daniele CEI, Giorgio MATTEI
53 pages
Tests of Significance
No ratings yet
Tests of Significance
35 pages
5 Single Sample T JASP
No ratings yet
5 Single Sample T JASP
10 pages
Parametric vs. Non-Parametric Tests
No ratings yet
Parametric vs. Non-Parametric Tests
17 pages
Unit 5 Mba 1ST
No ratings yet
Unit 5 Mba 1ST
197 pages
SMuR Complete
No ratings yet
SMuR Complete
114 pages
Parametric Test R
No ratings yet
Parametric Test R
47 pages
BRM 5th Unit
No ratings yet
BRM 5th Unit
16 pages
Hypothesis Testing Parametric and Non Parametric Tests
No ratings yet
Hypothesis Testing Parametric and Non Parametric Tests
14 pages
Dr. Dame Presentation Last
No ratings yet
Dr. Dame Presentation Last
19 pages
T-test an P-valve
No ratings yet
T-test an P-valve
34 pages
Statistical Instruments and References Writing in Research
No ratings yet
Statistical Instruments and References Writing in Research
36 pages
StatisticalTests (2slidesPerPage)
No ratings yet
StatisticalTests (2slidesPerPage)
50 pages
Application of Student T Test and Paired T Test
No ratings yet
Application of Student T Test and Paired T Test
13 pages
Research Methodology Presentation by Zaheen & Aman
No ratings yet
Research Methodology Presentation by Zaheen & Aman
16 pages
Research and Statistics Units 5 & 6
No ratings yet
Research and Statistics Units 5 & 6
37 pages
Analysis of Measured Data
No ratings yet
Analysis of Measured Data
77 pages
Chapter 18
0% (1)
Chapter 18
56 pages
Research Methods Unit 4
No ratings yet
Research Methods Unit 4
6 pages
Independent Sample T-Test: (Only Comparing Two Samples)
No ratings yet
Independent Sample T-Test: (Only Comparing Two Samples)
4 pages
STAT359 Study Guide
No ratings yet
STAT359 Study Guide
7 pages
Parametric and Non-parametric Hypothesis Tests [Slides]
No ratings yet
Parametric and Non-parametric Hypothesis Tests [Slides]
19 pages
Parametric Tests
No ratings yet
Parametric Tests
16 pages
Study_Guide_Hypothesis_Testing
No ratings yet
Study_Guide_Hypothesis_Testing
3 pages
Data Preparation & Analysis
No ratings yet
Data Preparation & Analysis
27 pages
Mor 7 FR Icd 05-00-1t Test 2z Test 3analysis of Variance Anova 4regression
No ratings yet
Mor 7 FR Icd 05-00-1t Test 2z Test 3analysis of Variance Anova 4regression
24 pages
CAT-2 answ
No ratings yet
CAT-2 answ
9 pages
Psych Stats
No ratings yet
Psych Stats
8 pages
Experiment-3..2 (1)
No ratings yet
Experiment-3..2 (1)
9 pages
Officedepotliste
No ratings yet
Officedepotliste
914 pages
Agenda: Day 2: 8th March Day 3: 9th March Day 4: 10th March Day 1: 7th March
No ratings yet
Agenda: Day 2: 8th March Day 3: 9th March Day 4: 10th March Day 1: 7th March
1 page
Request For Proposals Procurement Number: Open Date: Questions Deadline: Closing Deadline: Geographical Area Restrictions: 937 Point of Contact: Background
No ratings yet
Request For Proposals Procurement Number: Open Date: Questions Deadline: Closing Deadline: Geographical Area Restrictions: 937 Point of Contact: Background
17 pages
Lesson 5 - Exercise 5.1
No ratings yet
Lesson 5 - Exercise 5.1
2 pages
Lesson 3 - Exercise 3.2
No ratings yet
Lesson 3 - Exercise 3.2
3 pages
Lesson 5 - Statistics For Data Science - I
No ratings yet
Lesson 5 - Statistics For Data Science - I
27 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Big Data Analytics syllabus
No ratings yet
Big Data Analytics syllabus
1 page
CA2-Question Bank MCQ (PEC-CSBS601D)
No ratings yet
CA2-Question Bank MCQ (PEC-CSBS601D)
9 pages
Proposal of Security Information and Event Management System For University
No ratings yet
Proposal of Security Information and Event Management System For University
6 pages
Chapter 3 Research Design and Proposal
No ratings yet
Chapter 3 Research Design and Proposal
12 pages
لتوافق الزواجي وعلاقته بالاستقرار الاسري لدى عينة من المتزوجين بمدينة مكة المكرمة
No ratings yet
لتوافق الزواجي وعلاقته بالاستقرار الاسري لدى عينة من المتزوجين بمدينة مكة المكرمة
161 pages
K-Nearest Neighbors
100% (1)
K-Nearest Neighbors
32 pages
Updated - Major Minor All Syllabus
No ratings yet
Updated - Major Minor All Syllabus
162 pages
Orrelation Nalysis: 14.1 Correlation Analysis Exercises
No ratings yet
Orrelation Nalysis: 14.1 Correlation Analysis Exercises
2 pages
Course Outline Business Analytics BNU - BBA-425
No ratings yet
Course Outline Business Analytics BNU - BBA-425
7 pages
SPE 90013 Use of Production Data Inversion To Evaluate Performance of Naturally Fractured Reservoirs
No ratings yet
SPE 90013 Use of Production Data Inversion To Evaluate Performance of Naturally Fractured Reservoirs
10 pages
46_An_Adversarial_Learning_App
No ratings yet
46_An_Adversarial_Learning_App
11 pages
CP CPK - PP PPK
No ratings yet
CP CPK - PP PPK
8 pages
Chapter 1 Data Presentation
No ratings yet
Chapter 1 Data Presentation
15 pages
TESTING THE SIGNIFICANCE OF R Example
No ratings yet
TESTING THE SIGNIFICANCE OF R Example
4 pages
Project Management
No ratings yet
Project Management
15 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
35 pages
03 - Policy Analysis Framework
No ratings yet
03 - Policy Analysis Framework
21 pages
Psy 230 Independent Samples T-Test: Figure 10-3 (P. 314)
No ratings yet
Psy 230 Independent Samples T-Test: Figure 10-3 (P. 314)
5 pages
CS1B April 2019 ExamPaper
No ratings yet
CS1B April 2019 ExamPaper
5 pages
Sample Eval Plan
No ratings yet
Sample Eval Plan
3 pages
Biostat Resume.pdf
No ratings yet
Biostat Resume.pdf
1 page
Problems on Correlation Analysis (2)
No ratings yet
Problems on Correlation Analysis (2)
5 pages
Methods of Research
No ratings yet
Methods of Research
11 pages
BIDV
No ratings yet
BIDV
18 pages
Formulae and Tables Booklet
No ratings yet
Formulae and Tables Booklet
18 pages
Nordis Final
No ratings yet
Nordis Final
6 pages
GuideSelectingStatisticalTechniques OCR PDF
No ratings yet
GuideSelectingStatisticalTechniques OCR PDF
71 pages
JURNAL Ranita Proses MSDM
No ratings yet
JURNAL Ranita Proses MSDM
6 pages
qt theory
No ratings yet
qt theory
19 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lesson 6 - Statistics For Data Science - II

Uploaded by

Lesson 6 - Statistics For Data Science - II

Uploaded by

Data Science with R

Lesson 6— Statistics for Data Science – II

© Simplilearn. All rights reserved.

Discuss Hypothesis Test

Explain Parametric test and its types

Explain Non-Parametric test and its types

Perform Hypothesis Tests on Population Means

Perform Hypothesis Tests on Population Variance

Perform Hypothesis Tests on Population Proportions

Simple Complex Null Hypothesis

A parametric statistical test is one that makes assumptions about the

Two population means or Equality of several

Z-Test is performed in cases where the test statistic is t, σ is known, the

ANOVA [1] 0.21492

lower.tail = TRUE is used to find the probability of values no

Z-Test The required percentage is 21.5%.

Let’s use the quantile function (applied to compute percentiles) “qt”

Degree of freedom refers to the number of values in the final

ANOVA One-way Two-way

The populations have

Without sufficient evidence, you cannot conclude that there is a

3. Concatenate the data rows of df1 into a single vector r.

5. Create a vector of treatment factors, corresponding to each element of R in step 3,

F distribution or the Fisher–Snedecor distribution is a continuous probability distribution that arises

Two-way ANOVA refers to a hypothesis test where the classification of data is

A non-parametric test (sometimes called a distribution free test) does not

• Kruskal Willis test (alternative to the One way ANOVA)

• Mann Whitney test (alternative to the two sample t test)

Chi-square test is a nonparametric test used to compare two or more variables

Uses contingency tables (in

Considers the Determines how well an Supports nominal-level

1. Chi-square test for goodness of fit

It is used to observe the closeness of a sample that matches a population. The

Where Oi is the observed count, k is categories, and Ei is the expected counts

Goodness of fit of a statistical model refers to the understanding of how well

• Category of viewers and impact of a TV campaign

Test of independence is suitable for the following situations:

Chi-square test • There is one categorical variable.

• There are non-quantifiable variables (For example, answers to questions like, do

The manager of a restaurant wants to find the relation between customer

Service Low Medium High Total

Assume the level of significance is 0.05. Here, H0 and H1 denote the

• ‫א‬2 0.05,4 = 9.48773

Service quality is dependent on the salaries of the people waiting.

Chi-square test Problem

To perform this test in R, let’s consider a table that is a result of a survey

Let’s build the contingency table in R:

Output: data: table(survey$Smoke, survey$Exer)

Both parametric and non-parametric hypothesis tests are used to check

Let’s discuss them in detail.

Consider a random large sample of size n, with a sample mean 𝑋ത

When population variance is unknown, Z test is used.

Consider the following hypothesis formation:

When population variance is unknown, T test is used.

Hypothesis test about population variance involves finding the squared

Chi-square test is used in hypothesis tests of population variance.

Hypothesis Tests about population proportions are defined as the ratio of

For a large sample, Z = (p - P0)/S.E.(p) ~ N (0,1) (under H0)

Hypothesis test is a formal procedure in statistics used to test whether a

The Chi-Square Test considers the square of a standard normal variate.

Both parametric and non-parametric tests of the population have a pre-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.