0% found this document useful (0 votes)
4 views18 pages

T Exercises

The document contains a series of exercises related to probability and statistics, covering topics such as probability density functions, expected values, cumulative distribution functions, and various distributions including exponential and uniform. It includes questions on specific scenarios, such as the life expectancy of lamps, waiting times, and random samples from distributions. Additionally, it addresses concepts like unbiased estimators, mean squared error, and confidence intervals.

Uploaded by

yoncakoymen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views18 pages

T Exercises

The document contains a series of exercises related to probability and statistics, covering topics such as probability density functions, expected values, cumulative distribution functions, and various distributions including exponential and uniform. It includes questions on specific scenarios, such as the life expectancy of lamps, waiting times, and random samples from distributions. Additionally, it addresses concepts like unbiased estimators, mean squared error, and confidence intervals.

Uploaded by

yoncakoymen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

T-EXERCISES

This file might be adapted during the course.

1. Is there an interval [a, b] such that f (x) = sin(x) for a ≤ x ≤ b is a probability density
function?

2. The edge X of a cube is a random variable with probability density function



2(3 − x) if 2 < x < 3,
fX (x) =
0 elsewhere.

a) What is the probability that the edge is longer than 2.5?


b) What is the expected value of the edge of the cube?
c) What is the expected value of the volume of the cube?
d) Let W be the volume of the cube. What is the cumulative distribution function of
W?

3. A random variable X denotes the time between two crashes at the stock exchange and
has the following density: fX (x) = (b − 1)/xb for 1 ≤ x < ∞, and fX (x) = 0 for x < 1.
(We have b > 1).

a) Check that fX is a probability density function.


b) Compute the expected value of X. Assume b > 2.
c) Consider b = 4 and suppose we have ten independent random variables X1 . . . , X10
with probability density function fX . Use the Central Limit Theorem to approxi-
mate the probability that the average (X1 + X2 + . . . + X10 )/10 is larger than 2.
You may use the fact that V (Xi ) = 3/4.
d) Is it possible to approximate this probability if b = 3? Why?

4. Correct or wrong? For the life time of a car the exponential distribution is a good
model.

5. Correct or wrong? The uniform distribution on the interval 0 ≤ x ≤ 1 has the lack of
memory property: P (X < t + t0 |t > t0 ) = P (X < t).

6. Correct or wrong? If X is exponentially distributed, then 2X is also exponentially


distributed.

7. Philips sells lamps and Philips claims that the life time is at least one year with a
probability of more than 0.95. Assume that the lamps do not wear out (but fail because
of incidents).

a) Show that the expected life time of such a lamp is at least −1/ ln(0.95) ' 19.49
year.
b) Now assume that the expected life time is 20 year. Now assume that a lamps burns
continuously and that it is immediately replaced when it fails. Is it sufficient to
have four lamps to guarantee that one has light for more than 3 years with a
probability of at least 0.95?

1
8. Consider a supermarket with pay desks. The probability that one has to wait longer
than 5 minutes before paying the bill at a pay desk equals 0.2. A man and a woman go
shopping and choose different pay desks at the super market. They arrive at the same
time at their pay desk.

a) What is the probability that the man and the woman both have to wait less than
5 minutes? Assume independency.

Assume that the waiting time is exponentially distributed with parameter λ.

b) Show that λ = 0.32 (in two digits).


c) The man is waiting two minutes. What is the (conditional) probability that the
(total) waiting time (including these two minutes) is more than 5 minutes?

9. You call a helpdesk. You hear: there are three people waiting before you. Give a
probability distribution that can be used as a model for the waiting time before it is
your turn.

10. Correct or wrong? If X has an Erlang distribution with parameters (r, λ) and Y is
independent from X and has also an Erlang distribution with parameters (r0 , λ0 ), then
the sum X + Y is also Erlang distributed.

11. The time between two arrivals of patients at a clinic in a hospital is an important
variable. So is the treatment time of the patient. Very often it is assumed that both
times are exponentially distributed. Assume that the mean time between two arrivals
is 4 minutes.

a) What is the probability that the time between two arrivals is less than one minute?
b) What is the distribution of the number of arrivals in 3 hour? Give also the para-
meter(s) of the distribution.
c) Give (using the normal distribution) an approximation for the probability that
less than 60 patients arrive in 3 hours.

Assume that the mean treatment time is 5 minutes. Melvin arrives and two people are
waiting.

d) What is the expected value of the time Melvin has to wait before it is his turn?

2
12. Two employees (A and B) have flexible work-hours. They have to work at least 6 hours
on a day, but they can choose to work 7, 8 or 9 hours. So, they can choose to work
0, 1, 2 or 3 hours extra. The number of hours that A works extra is denoted by X and
the number of hours that B works extra with Y . The joint probability mass function
of the discrete random variables X and Y is given by

X/Y 0 1 2 3
0 1/8 1/8 0 0
1 1/8 1/16 1/16 0
2 0 1/16 1/16 1/8
3 0 0 1/8 1/8

So, for example P (X = 0, Y = 1) = 1/8.

a) Show that the marginal probability distribution of X is uniform.


b) It is given that on a specific day B works more than 7 hours (so 8 or 9). What is
the probability that A works more than 7 hours (so 8 or 9)?
c) Compute the covariance between X and Y .
d) Compute the correlation between X and Y .

13. Correct or not? The variance of the sum of two random variables is equal to the sum
of the variances.

14. Correct or not? If X, Y are normally distributed and independent with µ = 1, σ 2 = 1,


then X − Y is normally distributed with µ = 0, σ 2 = 2.

15. A mechanical assembly used in an automobile engine contains four major components.
The weights of the components are independent and normally distributed with the
following means and standard deviations (in grams).

Component Mean Standaard Deviation


Left case 4 0.4
Right case 5.5 0.5
Bearing assembly 10 0.2
Bolt assembly 8 0.5

a) What is the probability that the weight of an assembly exceeds 28.5 ounces?
b) What is the probability that the total weight of 8 independent assemblies exceeds
8 · 28.5 = 228 ounces?

Now assume that the weights of the ’Left case’ and the ’Right case’ are dependent. The
correlation is equal to −0.5. The other independencies still hold. The weight of an
assembly is normally distributed.

c) What is the probability that the weight of an assembly exceeds 28.5 ounces?

3
16. Consider n uniformly distributed random variables on the interval [0, 1]. The random
variables are independent.

a) What is the cumulative distribution function of the maximum of the n random


variables? What is the expected value?
b) What is the cumulative distribution function of the minimum of the n random
variables? What is the expected value?

Application of item a): a production process where a product has to pass n stations on
a production line. So, at the same moment n products are on the line (in each station
one product). The process can continue if on all the stations the operation has been
completed. Let Xi be the operation time in station i. Let T be the time it takes before
the production process can continue. Then T = max{X1 , · · · , Xn }.

17. -

18. A company sells bags of 25 kg special sand used for making cement. The company
purchases these bags for 1.50 euro per sack. The company sells the bags for 2.50, but
gives some discount if a customer buys more bags. See the table below.

Number of bags Price Profit Profit per


the customer buys per bag per bag customer
i w(i) y
1 2.50 1.00 1.00
2 2.45 0.95 1.90
3 2.40 0.90 2.70
4 2.36 0.86 3.44
5 2.32 0.82 4.10
6 2.28 0.78 4.68
7 2.25 0.75 5.25
8 2.22 0.72 5.76
9 2.19 0.69 6.21
10 or more 2.16 0.66 i · 0.66

The company wants to know what the expected profit per customer is. A random
sample of size 76 is taken and one makes a list of the number of bags each customer
buys.

4
Number of Number
bags of customers
i a(i)
1 13
2 18
3 9
4 6
5 9
6 7
7 6
8 2
9 3
10 2
12 1

Use the empirical distribution function to answer the questions below.

a) Estimate the expected value of the number of bags that a customer buys.
b) The answer for a) is 304/76 = 4. Somebody argues: ’now we also know an estimate
for the expected profit per customer’; it equals 4 · 0.86 = 3.44’, where 0.86 is the
profit per bag if a customer buys 4 bags. Why is this not correct?
c) Give a proper estimate for the expected profit per customer.

19. Consider the density fX (x) = (b − 1)/xb for 1 ≤ x < ∞, and fX (x) = 0 for x < 1.
Assume b > 1.

a) Verify that fX is a density.


b) Compute the expected value of X. Assume b > 2.
c) Now assume b = 4 and and consider 10 independent random variables X1 . . . , X10
with density fX . Use the Central Limit Theorem to approximate the probability
that the average (X1 + X2 + . . . + X10 )/10 is larger than 2. Use the fact that
V (Xi ) = 3/4.

20. The number of telephone calls at call center A follow a Poisson-process with expected
value µ per hour. The number of calls at call center B follow a Poisson-process with
expected value 2µ. One has n observations at call center A (X1 , X2 , · · · , Xn ) and n
observations at call center B (Y1 , Y2 , · · · , Yn ). All observations are independent.
Consider the following estimators for µ

W1 = (X + Y )/2,
W2 = (2X + Y )/4.

a) Are W1 and W2 unbiased estimators for µ?


b) Which of two estimators is better with respect to the MSE?

21. Let X1 , X2 , · · · , Xn a random sample from a uniform distribution on the interval [0, θ]
with θ unknown. The density of the distribution is
1
f (x) = (0 ≤ x ≤ θ) .
θ

5
An estimator θb for the unknown θ is
Pn
Xi
θb = 2X = 2 i=1 .
n

a) Show that the estimator θb is unbiased for θ.


b) Determine the Mean Squared Error (MSE).

22. The waiting time for a ticket office is exponentially distributed on the interval [θ, ∞).
The density is

f (x) = e−(x−θ) , (x ≥ θ) .

This means that the waiting time is the sum of a threshold θ and an exponentially
distributed waiting time on the interval [0, ∞). The parameter θ is unknown. One has
a random sample of size 15 (X1 , X2 , · · · , X15 ) to estimate θ. It is known that a good
estimator is Y =min(X1 , X2 , · · · , X15 ) . For the density f (y) we have

f (y) = 15e−15(y−θ) , (y ≥ θ) .

This means: if we define V = Y −θ, then V is exponentially distributed with parameter


λ = 15.
1
a) Show that this estimator is biased and that the bias is 15 .

In the following question it can be used that


2 2θ
E(Y 2 ) = 2 + + θ2 .
15 15
b) Determine the MSE of the estimator

23. With a computer 15 random samples of size 4 are taken from a normal distribution
with µ = 11 en σ = 1. These are used to construct 15 90%-confidence intervals (CI)
for µ, where it is assumed that one does not know µ, but σ is known.

a) Is the width (upper bound minus lower bound) of these CI’s the same?
b) Do these 15 intervals all contain the value 11? Why.
c) What is the probability that the 15 random samples will be generated in such a
way that they all will contain the value 11?
d) Consider the first random sample. What is the probability that it will be generated
in such a way that it will contain the value 12?
e) The upper bound of the third CI is the realisation of a random variable; we notate
it with R3 . Compute E(R3 ).

We construct 15 CI’s as described above, but now we have σ = 2.

f) What do you expect? Will there be more or less CI’s that will contain the value
11?

6
24. In a lake one investigates the mercury (Dutch: kwik) contamination using a random
sample of fish with size 53. A normal probability plot gives

a) Can the normal distribution be used to find a confidence interval on µ (the popu-
lation mean)? Give an argument.
A statistical package gives the following output.

Summary Statistics for mercury (kwik)

Count = 53
Average = 0.524981
Variance = 0.12154
Standard deviation = 0.348625
Minimum = 0.04
Maximum = 1.33
Range = 1.29

b) Give a 95% confidence interval for µ.


c) Is the following statement correct? If one has many observations, then 95% of the
observations will be in interval from b). Give an argument.

7
25. The waiting time X in minutes at a service desk is exponentially distributed with
unknown parameter λ > 0. So the expected value µ equals 1/λ. A random sample
of size 20 was selected and the mean sample waiting time was 10 minutes. A good
estimator for the expected value µ is X, the average of the sample. For the variance of
X we have
1 µ2
V (X) = = .
nλ2 n
a) Construct a 95% two-sided confidence interval for the expected value µ of the
waiting time using the normal approximation.

It is also possible to construct an exact confidence interval using the χ2 -distribution.


It is known that the quantity

2nX
(0.1)
µ

has a χ2 -distribution with 2n degrees of freedom.

b) Construct the 95% confidence interval by use of (1).


Hint: first find values l and u such that
 
2nX
P l< < u = 0.95.
µ

26. A manufacturer wants to know the mean of the lifetime of the batteries he produces.
He wants to have a 95% confidence interval on the expected lifetime. It is known that
the lifetime has a normal distribution with standard deviation 10 hour.

a) What should the sample size be if the width of the interval should not exceed 2?

A random sample of size 20 gives as results


20
X 20
X
x2i = 47399.2 xi = 956 .
i=1 i=1

b) Give a 95-% lower-confidence bound (of the form (l, ∞) for the expected lifetime.
Assume that the standard deviation is unknown.

Assume that the interval in b) equals (47, ∞). The manufacturer claims ’If you buy
a battery at our company, the probability that the lifetime is more than 47.0 hour is
0.95’.

c) Do you support this claim?

8
27. In a complicated production process the proportion of defective products is high (on
average 10%). One wants to investigate if the proportion has become even higher using
a random sample of size 20. The null hypothesis is rejected if the number of defect
products is equal to 4 or more.
(Remark: Answer the following questions without using the normal approximation)
a) What is the significance level (or α-error or type I error probability) of this test?
b) The sample yields 3 defective products. What is the p-value?
c) What is the type II error probability if the true proportion is 20%?
28. Consider 500g packs of butter. The weight has a normal distribution with population
mean µ = 500. One wants to test the null hypothesis H0 : µ = 500. The standard
deviation is known (but not given here yet) and a random sample is taken. The 95%
two-sided confidence interval, based on this random sample, is (501.1, 510.9). Consider
the test µ = 500 against the alternative µ 6= 500 with confidence level α = 0.05.
a) Is the null hypothesis rejected?
b) What is the acceptance region for this test?
Now assume that the standard deviation equals σ = 10 g and that the random sample
has size n = 25. In that case the acceptance region for the mentioned test (with
confidence level 0.05) is (496.08, 503.92).
c) What is the power for the test if the true value of µ equals 495?
d) What sample size do we need to have a power of 0.90 if µ = 495?
29. The number of orders a company has each week can be modeled with a Poisson process
with expected value µ. Until now it is assumed that µ equals 2. One wants to test
if the number of orders is larger than it used to be and one wants to have a strong
conclusion. One decides to use the number of orders during 5 weeks to test the null
hypothesis H0 : µ = 2. The null hypothesis is rejected if the total number is 16 or more.
Remark: the sum of n independent Poisson distributed random variables with expected
value µ is also Poisson distributed (with expected value n · µ).
a) What is the significance level (or α-error or type I error probability) of this test?
b) What is the type II error probability if the true proportion of µ is 3?
c) The number of orders that the company got in the 5 weeks was 14. What is the
p-value?
30. The weight (x) of 110 randomly chosen men is measured. They are classified as follows:

Weight (x) in kg Number of men in this class


x ≤ 60 10
60 < x ≤ 70 20
70 < x ≤ 80 46
80 < x ≤ 90 27
x > 90 7

One wants to test if the weight of men is normally distributed with σ = 10. The
expected value of the normal distribution is estimated with the sample. The sample
mean is 76.

9
a) What is the probability distribution of the test statistic (give also the parameter(s)
of the distribution)?
b) What is the expected number of men with a weight less or equal than 60 kg?
c) The value of the test statistic is 3.52. Is the null hypothesis rejected (α = 0.05)?

31. A company sells vegetables in cans. Some of the cans do not meet the specifications.
There are several reasons for this: a stain(1), a dint(2), the eye to open it is on the
wrong place(3), there is no eye to open it (4), other (5). We call these deviations. One
takes a random sample and classifies the wrong scans.

(1) (2) (3) (4) (5) Sum


89 145 58 54 29 375

One wants to investigate if the number of deviations for the 5 classes follow the pattern
2:3:2:2:1.

a) Formulate the null hypothesis in terms of probabilities and compute the number of
expected cans for each class when the null hypothesis is true.
b) The value of the test statistic is 23.7. Is the null hypothesis rejected (α = 0.05)?
Why?
c) Give a 95% two-sided confidence interval for the proportion of tins with a dint.

32. Consider crates with 20 bottles with beer. Define X as the number of bottles that
contain not enough beer. The number of bottles with not enough beer is counted for
50 crates and the results are given below.

Value of X 0 1 2 3
Number of crates 15 22 11 2

a) One wants to investigate if the binomial model is a good model for the number of
bottles with not enough beer. Estimate the expected number of crates in class 0
(so all bottles contain enough beer) if the binomial model is a good model.
b) The value of the test statistic is 2.09. Give the critical value for a confidence level
of α = 0.05? Is the null hypothesis rejected? Why (not)?

33. A company operates four machines in three shifts. From production records the follo-
wing data on the number of breakdowns are collected.

Machines
Shift A B C D
1 41 20 12 16
2 31 11 9 14
3 15 17 16 10

One wants to test the null hypothesis that the breakdowns are independent of the shift.

a) Compute the expected number of breakdowns for machine B for shift 2 if the
hypothesis is true.

10
b) The value of the test statistic is 11.65. Is the null hypothesis rejected? Why (not)?
Use α = 0.05.

34. One adds a certain ingredient to an laundry detergent to try to improve the effect of
the detergent. At random 10 dirty overalls are chosen and cut in two parts. One part
is washed with the detergent with the ingredient and the other part is washed with the
detergent without the ingredient. One measures in some unit how clean the (parts of)
the overalls are. The observations are

Overall 1 2 3 4 5 6 7 8 9 10
Detergent without (’OUD’) 5 5 12 29 10 33 33 17 2 8
Detergent with (’NIEUW’) 7 9 17 36 8 40 29 27 5 19

With a statistical package two analyses are done (one for paired observations and one
for independent observations). The results are given below

a1) Which of the two methods (paired or independent) should be applied here for the
analysis? Explain your choice.
a2) Explain the difference in p-values for the two methods.
b) Test ONE-SIDED the null hypothesis that the new detergent (with the ingredient)
is better. Use α = 0.05.

11
35. Consider simple linear regression and the data.

x y
−2 −5
−1 0
0 −1
1 3
2 3

a) Give the model and assumptions.


b) Compute the estimates for the intercept (β0 ) and slope (β1 ) of the model.

It is given that the estimate for the variance of the error term is equal to 2.6333.

c) Give a 95% prediction interval for Y if x = 0.5.

36. One wants to investigate the relation between blood pressure Y and noise x. The
following 20 observations are available.

y 1 0 1 2 5 1 4 6 2 3
x 60 63 65 70 70 70 80 90 80 80
y 5 4 6 8 4 5 7 9 7 6
x 85 89 90 90 90 90 94 100 100 100

Here y is the increase of the blood pressure (in mm Hg) and x is the level of the noise
in decibel. A statistical package gives the following results

Regression Analysis - Linear model: Y = a + b*X


---------------------------------------------------------------------
Dependent variable: pressure
Independent variable: noise
---------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
---------------------------------------------------------------------
Intercept ???????? 1.9949 -5.07872 0.0001
Slope 0.174294 0.0238286 7.31447 0.0000
---------------------------------------------------------------------

a) Give the estimate for the intercept of the model.


b) Give a 95%-confidence interval for the coefficient of x in this model.
c) Someone wants to know the increase of the blood pressure if x has the value 120.
What is the answer according to the linear model. Would you give this answer if
someone poses you this question? Why (not) ?

12
37. Corrosion of iron in reinforced concrete is a problem for its sustainability. For a number
of constructions the strength y (in MPa) is measured and also the so called ’depth of
carbonification’ x (in mm) which is an important variable for corrosion. A statistical
package gives as result

Regression Analysis - Linear model: Y = a + b*X


-----------------------------------------------------------------------------
Dependent variable: y
Independent variable: x
-----------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
-----------------------------------------------------------------------------
Intercept 27.1829 1.65135 16.4611 0.0000
Slope -0.297561 0.0411642 ??????? ??????
-----------------------------------------------------------------------------

Analysis of Variance
-----------------------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
-----------------------------------------------------------------------------
Model 428.615 1 ??????? ????? ?????
Residual 131.242 16 ???????
-----------------------------------------------------------------------------
Total (Corr.) 559.858 17

Correlation Coefficient = -0.874974


R-squared = 76.5579 percent
R-squared (adjusted for d.f.) = 75.0928 percent
Standard Error of Est. = 2.86403

a) What is the number of observations n?


(xi − x)2 .
P
b) Compute, using the above results, the sum of squares Sxx =
c) Give the 95%-confidence interval for the intercept.
d) Test the null hypothesis β1 = 0. Use α = 0.05.

13
38. The height of a tree y (in cm) is measured during a period of 10 year. The years x are
coded with 1, 2, 3, · · · , 10. The table is (partially) given here.

yi (hoogte) 150 161 167 ?? ?? ?? ?? ?? ?? 208


xi (jaar) 1 2 3 4 5 6 7 8 9 10

Regression analysis (based on this 10 observations) gives the following results (’hoogte’=height
and ’jaar’=year).

a) Give a two sided 95%-confidence interval for σ 2 .


b) Give an estimate of the height of the tree in the next year (year 11).
c) Give a 95% prediction interval for the height of the tree in the next year (year 11).

The next picture shows the residuals against the number of the year.

d) Use the picture to comment on the model assumptions.


What are the consequences for the prediction of the height in the next year (11)?

14
39. One wants to optimize a production process. Two factors are important: temperature
and the type of the machine. In an experiment the temperature has two values (60o C
en 70o C). There are three types of machines. An observation is done for all possible
combinations. The (coded) observations are

Temp Machine Waarneming


60 A −2
60 B 0
60 C 0
60 A −2
60 B 1
60 C 1
70 A 1
70 B 1
70 C 0
70 A 1
70 B 0
70 C 1

Formulate this problem as a regression model (including cross products between tem-
perature and machine).

40. One wants to relate the quality of wine to a number of variables: ’Clarity’ (x1 or A),
’Aroma’ (x2 or B), ’Body’ (x3 or C), ’Flavor’ (x4 or D) and ’Oakiness’ (x5 or E);
38 samples of wine have been tested. The dependent variable y is a score which is a
measure for the quality. Consider the model

Y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 + ε ,

with the usual assumptions for ε.


A statistical package gives the following output.

Multiple Regression Analysis Dependent variable: Quality


----------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
----------------------------------------------------------------
CONSTANT 3.99686 2.23177 1.79089 0.0828
Clarity 2.33945 1.73483 1.34852 0.1870
Aroma 0.482551 0.272447 1.77117 0.0861
Body 0.273161 0.332561 0.821388 0.4175
Flavor 1.16832 0.304481 3.8371 0.0006
Oakiness -0.68401 0.271193 -2.52223 0.0168
----------------------------------------------------------------

15
Analysis of Variance
----------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
----------------------------------------------------------------
Model 111.54 5 22.3081 16.51 ??????
Residual 43.248 32 1.3515
----------------------------------------------------------------
Total (Corr. 154.788 37

And for a model with only three variables

Multiple Regression Analysis


---------------------------------------------------------------
Dependent variable: Quality
---------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
---------------------------------------------------------------
CONSTANT 6.46719 1.33279 4.85238 0.0000
Aroma 0.58012 0.262185 2.21264 0.0337
Flavor 1.19969 0.274881 4.36441 0.0001
Oakiness -0.602325 0.264401 -2.27807 0.0291
---------------------------------------------------------------

Analysis of Variance
---------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
---------------------------------------------------------------
Model 108.935 3 36.3117 26.92 0.0000
Residual 45.8534 34 1.34863
---------------------------------------------------------------
Total (Corr.) 154.788 37

a) Test the hypothesis β1 = β2 = β3 = β4 = β5 = 0.


b) Test the hypothesis β1 = β3 = 0.

See next page

16
All possible 32 models (using the 5 regressors) are considered. This gives the following
output

Regression Model Selection

Dependent variable: Quality


Independent variables:
A=Clarity B=Aroma
C=Body D=Flavor
E=Oakiness

Number of models fit: 32


Model Results
-------------------------------------------
Adjusted Included
MSE R-Squared R-Squared Variables
-------------------------------------------
4.18347 0.0 0.0
4.18347 2.7027 0.0 A
2.14852 50.0308 48.6427 B
3.00516 30.1074 28.1659 C
1.61593 62.4174 61.3735 D
4.18347 2.7027 0.0 E
2.20886 50.0544 47.2004 AB
2.9001 34.4245 30.6773 AC
1.62128 63.3404 61.2456 AD
4.18347 5.40541 0.0 AE
2.04696 53.7151 51.0703 BC
1.51006 65.8552 63.904 BD
2.04406 53.7807 51.1396 BE
1.65123 62.6632 60.5296 CD
3.01392 31.8508 27.9566 CE
1.49874 66.1112 64.1747 DE
2.08516 54.1984 50.1571 ABC
1.53649 66.2504 63.2725 ABD
2.10258 53.8158 49.7407 ABE
1.63487 64.0893 60.9207 ACD
2.82341 37.9826 32.5105 ACE
1.45631 68.0116 65.1891 ADE
1.55192 65.9114 62.9036 BCD
1.91841 57.8612 54.143 BCE
1.34863 70.3767 67.7629 BDE
1.52707 66.4572 63.4975 CDE
1.57108 66.5054 62.4454 ABCD
1.91353 59.2046 54.2597 ABCE
1.33818 71.4708 68.0128 ABDE
1.43902 69.3209 65.6022 ACDE
1.38502 70.4721 66.893 BCDE
1.3515 72.0599 67.6943 ABCDE
------------------------------------------

17
c) If you have to choose one of these models, which model will you choose? Give
arguments.

One uses stepwise regression (where we can use all 5 variables).

d) Which variable is the first variable that will be chosen in the model? Why?
e) Must another variable be added to the model? Give arguments.
Consider the model with only the variables ’Aroma’ (x2 ), ’Flavor’ (x4 ) and ’Oakiness’
(x5 ). The output is

Multiple Regression Analysis Dependent variable: Quality


------------------------------------
Parameter Estimate
------------------------------------
CONSTANT 6.46719
Aroma 0.58012
Flavor 1.19969
Oakiness -0.602325
------------------------------------

Regression Results for Quality with ’Aroma’=5, ’Flavor’=5, ’Oakiness’=4.


-----------------------------------------------------------------------------------
Lower Upper Lower Upper
Fitted Stnd. Error 95.0% CL 95.0% CL 95.0% CL 95.0% CL
Value for Forecast Forecast Forecast for Mean for Mean
-----------------------------------------------------------------------------------
????? 1.17966 ????? ??????? 12.5357 13.3782
-----------------------------------------------------------------------------------

f) You buy a bottle of wine with ’Aroma’= 5, ’Flavor’= 5, ’Oakiness’= 4. Give a 95%
prediction interval for the quality of the wine.

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy