F, T Test
F, T Test
F, T Test
Hypothesis Testing Q. Statistical abstracts (117th edition) reports that the average annual expenditure for health care by individuals 25 to 34 years old is $1,596. A random sample of 34 people in Detroit between the ages of 25 and 34 had a sample mean expenditure of $1,425 with sample standard deviation of $425. Test to see if the mean expenditure for health care for people in Detroit between the ages of 25 and 34 is different from the national average. Use a 1% significance level. Ans.
= 1596
s=425
n=34
We use t distribution as is unknown. P Value P-value = 2*P(z <-2.35) = 0.0251 Since P-value of 0.0251 > 0.01 we do not reject H0. It is statistically not significant Conclusion
At the 1% level of significance, the data does not provide enough evidence to reject the null hypothesis. Thus we conclude that the mean expenditure for health care for people in Detroit between the ages of 25 and 34 is not different from the national average
Q. The MBA department is concerned that dual degree students may be receiving lower grades than the regular MBA students. Two independent random samples have been selected. 100 observations from population 1 (dual degree students) and 100 from population 2 (MBA students). The sample means obtained are X1(bar)=84 and X2(bar)=87. It is known from previous studies that the population variances are 4.0 and 5.0 respectively. Using a level of significance of .10, is there evidence that the dual degree students are receiving lower grades? Fully explain your answer. Ans. To Test Ho : 1= 2 Vs H1 : 1< 2 (one tailed test)
follows Standard Normal distribution N(0,1) = -10 Putting =84 =87 n1=100 n2=100 =4 =5 we get
P-value = P (z < -10) = 0 Since the P-value 0 < 0.1, we reject H0. It is statistically significant. Conclusion
At the 10% level of significance, the data provides enough evidence to reject the null hypothesis. Thus we conclude at 0.1 level of significance that the dual degree students are receiving lower grades .
Q. The board of real estate developers claims that 56% of all voters will vote for a bond issue to construct a massive new water project. A random sample of 80 voters was taken and 51 said that they would vote for the new water project. (note: carry out the calculations for this problem to 3 dec. places). Test to see if this data indicates that more than 56% of all voters favor the project. Use a 5% significance level. Ans. To Test
follows N(0,1)
Q = 1-P = 0.44
Since the P-value is greater than =0.05, we fail to reject H0. It is not statistically significant. Conclusion At the 5% level of significance, the data does not provides enough evidence to reject the null hypothesis. Thus we conclude that data does not indicates that more than 56% of all voters favor the project.
Q. In a particular market there are three commercial television stations, each with its own evening news program from 6:00 to 6:30 P.M. According to a report in this morning's local newspaper, a random sample of 150 viewers last night revealed 53 watched the news on WNAE (channel 5), 64 watched on WRRN (channel 11), and 33 on WSPD (channel 13). At the .05 significance level, is there a difference in the proportion of viewers watching the three channels? Ans. We use Chi Square goodness of fit test to test that there a difference in the proportion of viewers watching the three channels Null Hypothesis : There a no difference in the proportion of viewers watching the three channels
11 Channel 13
33
50
289
5.78
follows Chi Square with n-1=2 d.f = 9.88 P-value P( 9.88) = 0.007155
Which is significant at 0.05 level of significance. Conclusion We reject the null Hypothesis at 0.02 level of sig as 0.007155 <0.05.Thus there a difference in the proportion of viewers watching the three channels.
Confidence Interval
Q. Not all seeds germinate. In a random sample of 115 sunflower seeds 87 of them germinated. Let p be the proportion of seeds that germinate. Find a 99% confidence interval for p Ans.99% confidence interval for p
Sample Size
Q. A preliminary study showed that out of 60 associates, 12 have used the gift shop catalog. What sample size does the director need in order to say with 95% confidence that the sample estimate is within 2% of the population proportion? Ans.
n= Where
=60025
Probability
Q. A very large logging opperation has serious problems keeping their skidders oprating properly. The equitment fails at the rate of 3 breakdowns every 48 hours. Assume taht x is the time between breadowns and is exponentially distributed. What is the probability of two or less breakdowns in the next 48hour period? Ans. No. of breakdowns in next 48 hours= y y follows Poisson dist with mean =lembda = 3
Putting
=3 we get
f(y)
0 1 2 P(y<=2)
Thus the probability of two or less breakdowns in the next 48hour period is P(y<=2) = 0.423
Q. Sharon makes telephone appeals for donations of used clothing and household goods for charitable organizations. She knows from experience that about 30% of the calls result in donations. How many calls must she make to be 90% sure of getting at least two donations. Ans. X= no. of donations X follows Binomial(n,0.3) We have to find n such that P(X 2) = 0.90 1 P(x<2) = 0.90 P(x<2) = 0.10 Using binomial tables or excel functions we find that For n=11 P(x<2) = 0.11 For n=12 P(x<2) = 0.085 Thus she must make 12 calls to be 90% sure of getting at least two donations.
Q. Susan is taking an exam.There are three multiple choice questions where she has no idea which answer is correct. There are four choices for each multiple choice question and only one is correct. What are the chances that Susan would guess all the three questions right? Ans. X = no of correct answers X follows Binomial (n,p) Where n=3 p=0.25
P(x=3) =
= 0.015625
Q. There are 5 candidates in an election. How many different ways can a committee of 3 be selected from among them? Ans. 3 candidates can be selected out of 5 in = 5*4/2= 10 ways ways.
Student's t Distribution
The t distribution (aka, Students t-distribution) is a probability distribution that is used to estimate population parameters when the sample size is small and/or when the population variance is unknown.
Degrees of Freedom
There are actually many different t distributions. The particular form of the t distribution is determined by its degrees of freedom. The degrees of freedom refers to the number of independent observations in a set of data. When estimating a mean score or a proportion from a single sample, the number of independent observations is equal to the sample size minus one. Hence, the distribution of the t statistic from samples of size 8 would be described by a t distribution having 8 - 1 or 7 degrees of freedom. Similarly, a t distribution having 15 degrees of freedom would be used with a sample of size 16. For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.
The mean of the distribution is equal to 0 . The variance is equal to v / ( v - 2 ), where v is the degrees of freedom (see last section) and v > 2. The variance is always greater than 1, although it is close to 1 when there are many degrees of freedom. With infinite degrees of freedom, the t distribution is the same as the standard normal distribution.
The population distribution is normal. The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less. The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40. The sample size is greater than 40, without outliers.
The t distribution should not be used with small samples from populations that are not approximately normal.
where x is the sample mean, is the population mean, s is the standard deviation of the sample, n is the sample size, and degrees of freedom are equal to n - 1. The t score produced by this transformation can be associated with a unique cumulative probability. This cumulative probability represents the likelihood of finding a sample mean less than or equal to x, given a random sample of size n. The easiest way to find the probability associated with a particular t score is to use the T Distribution Calculator, a free tool provided by Stat Trek.
T Distribution Calculator
The T Distribution Calculator solves common statistics problems, based on the t distribution. The calculator computes cumulative probabilities, based on simple inputs. Clear instructions guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and sample problems provide straightforward explanations. The calculator is free. It can be found under the Stat Tables tab, which appears in the header of every Stat Trek web page. T Distribution Calculator
Note: There are two ways to solve this problem, using the T Distribution Calculator. Both approaches are presented below. Solution A is the traditional approach. It requires you to compute the t score, based on data presented in the problem description. Then, you use the T Distribution Calculator to find the probability. Solution B is easier. You simply enter the problem data into the T Distribution Calculator. The calculator computes a t score "behind the scenes", and displays the probability. Both approaches come up with exactly the same answer. Solution A The first thing we need to do is compute the t score, based on the following equation: t = [ x - ] / [ s / sqrt( n ) ] t = ( 290 - 300 ) / [ 50 / sqrt( 15) ] = -10 / 12.909945 = - 0.7745966 where x is the sample mean, is the population mean, s is the standard deviation of the sample, and n is the sample size. Now, we are ready to use the T Distribution Calculator. Since we know the t score, we select "T score" from the Random Variable dropdown box. Then, we enter the following data:
The degrees of freedom are equal to 15 - 1 = 14. The t score is equal to - 0.7745966.
The calculator displays the cumulative probability: 0.226. Hence, if the true bulb life were 300 days, there is a 22.6% chance that the average bulb life for 15 randomly selected bulbs would be less than or equal to 290 days. Solution B: This time, we will work directly with the raw data from the problem. We will not compute the t score; the T Distribution Calculator will do that work for us. Since we will work with the raw data, we select "Sample mean" from the Random Variable dropdown box. Then, we enter the following data:
The degrees of freedom are equal to 15 - 1 = 14. Assuming the CEO's claim is true, the population mean equals 300. The sample mean equals 290. The standard deviation of the sample is 50.
The calculator displays the cumulative probability: 0.226. Hence, there is a 22.6% chance that the average sampled light bulb will burn out within 290 days.
Problem 2 Suppose scores on an IQ test are normally distributed, with a population mean of 100. Suppose 20 people are randomly selected and tested. The standard deviation in the sample
group is 15. What is the probability that the average test score in the sample group will be at most 110? Solution: To solve this problem, we will work directly with the raw data from the problem. We will not compute the t score; the T Distribution Calculator will do that work for us. Since we will work with the raw data, we select "Sample mean" from the Random Variable dropdown box. Then, we enter the following data:
The degrees of freedom are equal to 20 - 1 = 19. The population mean equals 100. The sample mean equals 110. The standard deviation of the sample is 15.
We enter these values into the T Distribution Calculator. The calculator displays the cumulative probability: 0.996. Hence, there is a 99.6% chance that the sample average will be no greater than 110.
F Distribution
The F distribution is the probability distribution associated with the f statistic. In this lesson, we show how to compute an f statistic and how to find probabilities associated with specific f statistic values.
The f Statistic
The f statistic, also known as an f value, is a random variable that has an F distribution. (We discuss the F distribution in the next section.) Here are the steps required to compute an f statistic:
Select a random sample of size n1 from a normal population, having a standard deviation equal to 1. Select an independent random sample of size n2 from a normal population, having a standard deviation equal to 2. The f statistic is the ratio of s12/12 and s22/22.
f = [ s12/12 ] / [ s22/22 ] f = [ s12 * 22 ] / [ s22 * 12 ] f = [ 21 / v1 ] / [ 22 / v2 ] f = [ 21 * v2 ] / [ 22 * v1 ] where 1 is the standard deviation of population 1, s1 is the standard deviation of the sample drawn from population 1, 2 is the standard deviation of population 2, s2 is the standard deviation of the sample drawn from population 2, 21 is the chi-square statistic for the sample drawn from population 1, v1 is the degrees of freedom for 21, 22 is the chi-square statistic for the sample drawn from population 2, and v2 is the degrees of freedom for 22 . Note that degrees of freedom v1 = n1 - 1, and degrees of freedom v2 = n2 - 1 .
The F Distribution
The distribution of all possible values of the f statistic is called an F distribution, with v1 = n1 - 1 and v2 = n2 - 1 degrees of freedom. The curve of the F distribution depends on the degrees of freedom, v1 and v2. When describing an F distribution, the number of degrees of freedom associated with the standard deviation in the numerator of the f statistic is always stated first. Thus, f(5, 9) would refer to an F distribution with v1 = 5 and v2 = 9 degrees of freedom; whereas f(9, 5) would refer to an F distribution with v1 = 9 and v2 = 5 degrees of freedom. Note that the curve represented by f(5, 9) would differ from the curve represented by f(9, 5). The F distribution has the following properties:
The mean of the distribution is equal to v2 / ( v2 - 2 ) for v2 > 2. The variance is equal to [ 2 * v22 * ( v1 + v1 - 2 ) ] / [ v1 * ( v2 - 2 )2 * ( v2 - 4 ) ] for v2 > 4.
F Distribution Calculator
The F Distribution Calculator solves common statistics problems, based on the F distribution. The calculator computes cumulative probabilities, based on simple inputs. Clear instructions guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and sample problems provide straightforward explanations. The calculator is free. It can be found under the Stat Tables tab, which appears in the header of every Stat Trek web page. F Distribution Calculator
Sample Problems
Example 1 Suppose you randomly select 7 women from a population of women, and 12 men from a population of men. The table below shows the standard deviation in each sample and in each population. Population Population standard deviation Sample standard deviation Women 30 35 Men 50 45 Compute the f statistic. Solution A: The f statistic can be computed from the population and sample standard deviations, using the following equation: f = [ s12/12 ] / [ s22/22 ] where 1 is the standard deviation of population 1, s1 is the standard deviation of the sample drawn from population 1, 2 is the standard deviation of population 2, and s1 is the standard deviation of the sample drawn from population 2. As you can see from the equation, there are actually two ways to compute an f statistic from these data. If the women's data appears in the numerator, we can calculate an f statistic as follows: f = ( 352 / 302 ) / ( 452 / 502 ) = (1225 / 900) / (2025 / 2500) = 1.361 / 0.81 = 1.68 For this calculation, the numerator degrees of freedom v1 are 7 - 1 or 6; and the denominator degrees of freedom v2 are 12 - 1 or 11. On the other hand, if the men's data appears in the numerator, we can calculate an f statistic as follows: f = ( 452 / 502 ) / ( 352 / 302 ) = (2025 / 2500) / (1225 / 900) = 0.81 / 1.361 = 0.595
For this calculation, the numerator degrees of freedom v1 are 12 - 1 or 11; and the denominator degrees of freedom v2 are 7 - 1 or 6. When you are trying to find the cumulative probability associated with an f statistic, you need to know v1 and v2. This point is illustrated in the next example.
Example 2 Find the cumulative probability associated with each of the f statistics from Example 1, above. Solution: To solve this problem, we need to find the degrees of freedom for each sample. Then, we will use the F Distribution Calculator to find the probabilities.
The degrees of freedom for the sample of women is equal to n - 1 = 7 - 1 = 6. The degrees of freedom for the sample of men is equal to n - 1 = 12 - 1 = 11.
Therefore, when the women's data appear in the numerator, the numerator degrees of freedom v1 is equal to 6; and the denominator degrees of freedom v2 is equal to 11. And, based on the computations shown in the previous example, the f statistic is equal to 1.68. We plug these values into the F Distribution Calculator and find that the cumulative probability is 0.78. On the other hand, when the men's data appear in the numerator, the numerator degrees of freedom v1 is equal to 11; and the denominator degrees of freedom v2 is equal to 6. And, based on the computations shown in the previous example, the f statistic is equal to 0.595. We plug these values into the F Distribution Calculator and find that the cumulative probability is 0.22.
Chi-Square Distribution
The distribution of the chi-square statistic is called the chi-square distribution. In this lesson, we learn to compute the chi-square statistic and find the probability associated with the statistic.
2 = [ ( n - 1 ) * s2 ] / 2 If we repeated this experiment an infinite number of times, we could obtain a sampling distribution for the chi-square statistic. The chi-square distribution is defined by the following probability density function: Y = Y0 * ( 2 ) ( v/2 - 1 ) * e-2 / 2 where Y0 is a constant that depends on the number of degrees of freedom, 2 is the chisquare statistic, v = n - 1 is the number of degrees of freedom, and e is a constant equal to the base of the natural logarithm system (approximately 2.71828). Y0 is defined, so that the area under the chi-square curve is equal to one. In the figure below, the red curve shows the distribution of chi-square values computed from all possible samples of size 3, where degrees of freedom is n - 1 = 3 - 1 = 2. Similarly, the the green curve shows the distribution for samples of size 5 (degrees of freedom equal to 4); and the blue curve, for samples of size 11 (degrees of freedom equal to 10).
The mean of the distribution is equal to the number of degrees of freedom: = v. The variance is equal to two times the number of degrees of freedom: 2 = 2 * v When the degrees of freedom are greater than or equal to 2, the maximum value for Y occurs when 2 = v - 2. As the degrees of freedom increase, the chi-square curve approaches a normal distribution.
Fortunately, we don't have to compute the area under the curve to find the probability. The easiest way to find the cumulative probability associated with a particular chi-square statistic is to use the Chi-Square Distribution Calculator, a free tool provided by Stat Trek.
The standard deviation of the population is 4 minutes. The standard deviation of the sample is 6 minutes. The number of sample observations is 7.
To compute the chi-square statistic, we plug these data in the chi-square equation, as shown below.
2 = [ ( n - 1 ) * s2 ] / 2 = [ ( 7 - 1 ) * 62 ] / 42 = 13.5
2
where 2 is the chi-square statistic, n is the sample size, s is the standard deviation of the sample, and is the standard deviation of the population.
Problem 2 Let's revisit the problem presented above. The manufacturing department ran a quality control test, using 7 randomly selected batteries. In their test, the standard deviation was 6 minutes, which equated to a chi-square statistic of 13.5. Suppose they repeated the test with a new random sample of 7 batteries. What is the probability that the standard deviation in the new test would be greater than 6 minutes? Solution We know the following:
The sample size n is equal to 7. The degrees of freedom are equal to n - 1 = 7 - 1 = 6. The chi-square statistic is equal to 13.5 (see Example 1 above).
Given the degrees of freedom, we can determine the cumulative probability that the chisquare statistic will fall between 0 and any positive value. To find the cumulative probability that a chi-square statistic falls between 0 and 13.5, we enter the degrees of freedom (6) and the chi-square statistic (13.5) into the Chi-Square Distribution Calculator. The calculator displays the cumulative probability: 0.96. This tells us that the probability that a standard deviation would be less than or equal to 6 minutes is 0.96. This means (by the subtraction rule) that the probability that the standard deviation would be greater than 6 minutes is 1 - 0.96 or .04.
independence to determine whether gender is related to voting preference. The sample problem at the end of the lesson considers this example.
The sampling method is simple random sampling. Each population is at least 10 times as large as its respective sample. The variables under study are each categorical. If sample data are displayed in a contingency table, the expected frequency count for each cell of the table is at least 5.
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.
Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used. Test method. Use the chi-square test for independence to determine whether there is a significant relationship between two categorical variables.
Degrees of freedom. The degrees of freedom (DF) is equal to: DF = (r - 1) * (c - 1) where r is the number of levels for one catagorical variable, and c is the number of levels for the other categorical variable.
Expected frequencies. The expected frequency counts are computed separately for each level of one categorical variable at each level of the other categorical variable. Compute r * c expected frequencies, according to the following formula. Er,c = (nr * nc) / n where Er,c is the expected frequency count for level r of Variable A and level c of Variable B, nr is the total number of sample observations at level r of Variable A, nc is the total number of sample observations at level c of Variable B, and n is the total sample size.
Test statistic. The test statistic is a chi-square random variable (2) defined by the following equation. 2 = [ (Or,c - Er,c)2 / Er,c ] where Or,c is the observed frequency count at level r of Variable A and level c of Variable B, and Er,c is the expected frequency count at level r of Variable A and level c of Variable B.
P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.
Interpret Results
If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.
50 50 100
Is there a gender gap? Do the men's voting preferences differ significantly from the women's preferences? Use a 0.05 level of significance. Solution The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:
State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis. H0: Gender and voting preferences are independent. Ha: Gender and voting preferences are not independent.
Formulate an analysis plan. For this analysis, the significance level is 0.05. Using sample data, we will conduct a chi-square test for independence. Analyze sample data. Applying the chi-square test for independence to sample data, we compute the degrees of freedom, the expected frequency counts, and the chisquare test statistic. Based on the chi-square statistic and the degrees of freedom, we determine the P-value. DF = (r - 1) * (c - 1) = (2 - 1) * (3 - 1) = 2 Er,c = (nr * nc) / n E1,1 = (400 * 450) / 1000 = 180000/1000 = 180 E1,2 = (400 * 450) / 1000 = 180000/1000 = 180 E1,3 = (400 * 100) / 1000 = 40000/1000 = 40 E2,1 = (600 * 450) / 1000 = 270000/1000 = 270 E2,2 = (600 * 450) / 1000 = 270000/1000 = 270 E2,3 = (600 * 100) / 1000 = 60000/1000 = 60 2 = [ (Or,c - Er,c)2 / Er,c ] 2 = (200 - 180)2/180 + (150 - 180)2/180 + (50 - 40)2/40 + (250 - 270)2/270 + (300 - 270)2/270 + (50 - 60)2/40 2 = 400/180 + 900/180 + 100/40 + 400/270 + 900/270 + 100/60 2 = 2.22 + 5.00 + 2.50 + 1.48 + 3.33 + 1.67 = 16.2 where DF is the degrees of freedom, r is the number of levels of gender, c is the number of levels of the voting preference, nr is the number of observations from level r of gender, nc is the number of observations from level c of voting preference, n is the number of observations in the sample, Er,c is the expected frequency count when gender is level r and voting preference is level c, and Or,c is the observed frequency count when gender is level r voting preference is level c.
The P-value is the probability that a chi-square statistic having 2 degrees of freedom is more extreme than 16.2. We use the Chi-Square Distribution Calculator to find P(2 > 16.2) = 0.0003.
Interpret results. Since the P-value (0.0003) is less than the significance level (0.05), we cannot accept the null hypothesis. Thus, we conclude that there is a relationship between gender and voting preference.
Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, each population was more than 10 times larger than its respective sample, the variables under study were categorical, and the expected frequency count was at least 5 in each cell of the contingency table.