T Exercises
T Exercises
1. Is there an interval [a, b] such that f (x) = sin(x) for a ≤ x ≤ b is a probability density
function?
3. A random variable X denotes the time between two crashes at the stock exchange and
has the following density: fX (x) = (b − 1)/xb for 1 ≤ x < ∞, and fX (x) = 0 for x < 1.
(We have b > 1).
4. Correct or wrong? For the life time of a car the exponential distribution is a good
model.
5. Correct or wrong? The uniform distribution on the interval 0 ≤ x ≤ 1 has the lack of
memory property: P (X < t + t0 |t > t0 ) = P (X < t).
7. Philips sells lamps and Philips claims that the life time is at least one year with a
probability of more than 0.95. Assume that the lamps do not wear out (but fail because
of incidents).
a) Show that the expected life time of such a lamp is at least −1/ ln(0.95) ' 19.49
year.
b) Now assume that the expected life time is 20 year. Now assume that a lamps burns
continuously and that it is immediately replaced when it fails. Is it sufficient to
have four lamps to guarantee that one has light for more than 3 years with a
probability of at least 0.95?
1
8. Consider a supermarket with pay desks. The probability that one has to wait longer
than 5 minutes before paying the bill at a pay desk equals 0.2. A man and a woman go
shopping and choose different pay desks at the super market. They arrive at the same
time at their pay desk.
a) What is the probability that the man and the woman both have to wait less than
5 minutes? Assume independency.
9. You call a helpdesk. You hear: there are three people waiting before you. Give a
probability distribution that can be used as a model for the waiting time before it is
your turn.
10. Correct or wrong? If X has an Erlang distribution with parameters (r, λ) and Y is
independent from X and has also an Erlang distribution with parameters (r0 , λ0 ), then
the sum X + Y is also Erlang distributed.
11. The time between two arrivals of patients at a clinic in a hospital is an important
variable. So is the treatment time of the patient. Very often it is assumed that both
times are exponentially distributed. Assume that the mean time between two arrivals
is 4 minutes.
a) What is the probability that the time between two arrivals is less than one minute?
b) What is the distribution of the number of arrivals in 3 hour? Give also the para-
meter(s) of the distribution.
c) Give (using the normal distribution) an approximation for the probability that
less than 60 patients arrive in 3 hours.
Assume that the mean treatment time is 5 minutes. Melvin arrives and two people are
waiting.
d) What is the expected value of the time Melvin has to wait before it is his turn?
2
12. Two employees (A and B) have flexible work-hours. They have to work at least 6 hours
on a day, but they can choose to work 7, 8 or 9 hours. So, they can choose to work
0, 1, 2 or 3 hours extra. The number of hours that A works extra is denoted by X and
the number of hours that B works extra with Y . The joint probability mass function
of the discrete random variables X and Y is given by
X/Y 0 1 2 3
0 1/8 1/8 0 0
1 1/8 1/16 1/16 0
2 0 1/16 1/16 1/8
3 0 0 1/8 1/8
13. Correct or not? The variance of the sum of two random variables is equal to the sum
of the variances.
15. A mechanical assembly used in an automobile engine contains four major components.
The weights of the components are independent and normally distributed with the
following means and standard deviations (in grams).
a) What is the probability that the weight of an assembly exceeds 28.5 ounces?
b) What is the probability that the total weight of 8 independent assemblies exceeds
8 · 28.5 = 228 ounces?
Now assume that the weights of the ’Left case’ and the ’Right case’ are dependent. The
correlation is equal to −0.5. The other independencies still hold. The weight of an
assembly is normally distributed.
c) What is the probability that the weight of an assembly exceeds 28.5 ounces?
3
16. Consider n uniformly distributed random variables on the interval [0, 1]. The random
variables are independent.
Application of item a): a production process where a product has to pass n stations on
a production line. So, at the same moment n products are on the line (in each station
one product). The process can continue if on all the stations the operation has been
completed. Let Xi be the operation time in station i. Let T be the time it takes before
the production process can continue. Then T = max{X1 , · · · , Xn }.
17. -
18. A company sells bags of 25 kg special sand used for making cement. The company
purchases these bags for 1.50 euro per sack. The company sells the bags for 2.50, but
gives some discount if a customer buys more bags. See the table below.
The company wants to know what the expected profit per customer is. A random
sample of size 76 is taken and one makes a list of the number of bags each customer
buys.
4
Number of Number
bags of customers
i a(i)
1 13
2 18
3 9
4 6
5 9
6 7
7 6
8 2
9 3
10 2
12 1
a) Estimate the expected value of the number of bags that a customer buys.
b) The answer for a) is 304/76 = 4. Somebody argues: ’now we also know an estimate
for the expected profit per customer’; it equals 4 · 0.86 = 3.44’, where 0.86 is the
profit per bag if a customer buys 4 bags. Why is this not correct?
c) Give a proper estimate for the expected profit per customer.
19. Consider the density fX (x) = (b − 1)/xb for 1 ≤ x < ∞, and fX (x) = 0 for x < 1.
Assume b > 1.
20. The number of telephone calls at call center A follow a Poisson-process with expected
value µ per hour. The number of calls at call center B follow a Poisson-process with
expected value 2µ. One has n observations at call center A (X1 , X2 , · · · , Xn ) and n
observations at call center B (Y1 , Y2 , · · · , Yn ). All observations are independent.
Consider the following estimators for µ
W1 = (X + Y )/2,
W2 = (2X + Y )/4.
21. Let X1 , X2 , · · · , Xn a random sample from a uniform distribution on the interval [0, θ]
with θ unknown. The density of the distribution is
1
f (x) = (0 ≤ x ≤ θ) .
θ
5
An estimator θb for the unknown θ is
Pn
Xi
θb = 2X = 2 i=1 .
n
22. The waiting time for a ticket office is exponentially distributed on the interval [θ, ∞).
The density is
f (x) = e−(x−θ) , (x ≥ θ) .
This means that the waiting time is the sum of a threshold θ and an exponentially
distributed waiting time on the interval [0, ∞). The parameter θ is unknown. One has
a random sample of size 15 (X1 , X2 , · · · , X15 ) to estimate θ. It is known that a good
estimator is Y =min(X1 , X2 , · · · , X15 ) . For the density f (y) we have
f (y) = 15e−15(y−θ) , (y ≥ θ) .
23. With a computer 15 random samples of size 4 are taken from a normal distribution
with µ = 11 en σ = 1. These are used to construct 15 90%-confidence intervals (CI)
for µ, where it is assumed that one does not know µ, but σ is known.
a) Is the width (upper bound minus lower bound) of these CI’s the same?
b) Do these 15 intervals all contain the value 11? Why.
c) What is the probability that the 15 random samples will be generated in such a
way that they all will contain the value 11?
d) Consider the first random sample. What is the probability that it will be generated
in such a way that it will contain the value 12?
e) The upper bound of the third CI is the realisation of a random variable; we notate
it with R3 . Compute E(R3 ).
f) What do you expect? Will there be more or less CI’s that will contain the value
11?
6
24. In a lake one investigates the mercury (Dutch: kwik) contamination using a random
sample of fish with size 53. A normal probability plot gives
a) Can the normal distribution be used to find a confidence interval on µ (the popu-
lation mean)? Give an argument.
A statistical package gives the following output.
Count = 53
Average = 0.524981
Variance = 0.12154
Standard deviation = 0.348625
Minimum = 0.04
Maximum = 1.33
Range = 1.29
7
25. The waiting time X in minutes at a service desk is exponentially distributed with
unknown parameter λ > 0. So the expected value µ equals 1/λ. A random sample
of size 20 was selected and the mean sample waiting time was 10 minutes. A good
estimator for the expected value µ is X, the average of the sample. For the variance of
X we have
1 µ2
V (X) = = .
nλ2 n
a) Construct a 95% two-sided confidence interval for the expected value µ of the
waiting time using the normal approximation.
2nX
(0.1)
µ
26. A manufacturer wants to know the mean of the lifetime of the batteries he produces.
He wants to have a 95% confidence interval on the expected lifetime. It is known that
the lifetime has a normal distribution with standard deviation 10 hour.
a) What should the sample size be if the width of the interval should not exceed 2?
b) Give a 95-% lower-confidence bound (of the form (l, ∞) for the expected lifetime.
Assume that the standard deviation is unknown.
Assume that the interval in b) equals (47, ∞). The manufacturer claims ’If you buy
a battery at our company, the probability that the lifetime is more than 47.0 hour is
0.95’.
8
27. In a complicated production process the proportion of defective products is high (on
average 10%). One wants to investigate if the proportion has become even higher using
a random sample of size 20. The null hypothesis is rejected if the number of defect
products is equal to 4 or more.
(Remark: Answer the following questions without using the normal approximation)
a) What is the significance level (or α-error or type I error probability) of this test?
b) The sample yields 3 defective products. What is the p-value?
c) What is the type II error probability if the true proportion is 20%?
28. Consider 500g packs of butter. The weight has a normal distribution with population
mean µ = 500. One wants to test the null hypothesis H0 : µ = 500. The standard
deviation is known (but not given here yet) and a random sample is taken. The 95%
two-sided confidence interval, based on this random sample, is (501.1, 510.9). Consider
the test µ = 500 against the alternative µ 6= 500 with confidence level α = 0.05.
a) Is the null hypothesis rejected?
b) What is the acceptance region for this test?
Now assume that the standard deviation equals σ = 10 g and that the random sample
has size n = 25. In that case the acceptance region for the mentioned test (with
confidence level 0.05) is (496.08, 503.92).
c) What is the power for the test if the true value of µ equals 495?
d) What sample size do we need to have a power of 0.90 if µ = 495?
29. The number of orders a company has each week can be modeled with a Poisson process
with expected value µ. Until now it is assumed that µ equals 2. One wants to test
if the number of orders is larger than it used to be and one wants to have a strong
conclusion. One decides to use the number of orders during 5 weeks to test the null
hypothesis H0 : µ = 2. The null hypothesis is rejected if the total number is 16 or more.
Remark: the sum of n independent Poisson distributed random variables with expected
value µ is also Poisson distributed (with expected value n · µ).
a) What is the significance level (or α-error or type I error probability) of this test?
b) What is the type II error probability if the true proportion of µ is 3?
c) The number of orders that the company got in the 5 weeks was 14. What is the
p-value?
30. The weight (x) of 110 randomly chosen men is measured. They are classified as follows:
One wants to test if the weight of men is normally distributed with σ = 10. The
expected value of the normal distribution is estimated with the sample. The sample
mean is 76.
9
a) What is the probability distribution of the test statistic (give also the parameter(s)
of the distribution)?
b) What is the expected number of men with a weight less or equal than 60 kg?
c) The value of the test statistic is 3.52. Is the null hypothesis rejected (α = 0.05)?
31. A company sells vegetables in cans. Some of the cans do not meet the specifications.
There are several reasons for this: a stain(1), a dint(2), the eye to open it is on the
wrong place(3), there is no eye to open it (4), other (5). We call these deviations. One
takes a random sample and classifies the wrong scans.
One wants to investigate if the number of deviations for the 5 classes follow the pattern
2:3:2:2:1.
a) Formulate the null hypothesis in terms of probabilities and compute the number of
expected cans for each class when the null hypothesis is true.
b) The value of the test statistic is 23.7. Is the null hypothesis rejected (α = 0.05)?
Why?
c) Give a 95% two-sided confidence interval for the proportion of tins with a dint.
32. Consider crates with 20 bottles with beer. Define X as the number of bottles that
contain not enough beer. The number of bottles with not enough beer is counted for
50 crates and the results are given below.
Value of X 0 1 2 3
Number of crates 15 22 11 2
a) One wants to investigate if the binomial model is a good model for the number of
bottles with not enough beer. Estimate the expected number of crates in class 0
(so all bottles contain enough beer) if the binomial model is a good model.
b) The value of the test statistic is 2.09. Give the critical value for a confidence level
of α = 0.05? Is the null hypothesis rejected? Why (not)?
33. A company operates four machines in three shifts. From production records the follo-
wing data on the number of breakdowns are collected.
Machines
Shift A B C D
1 41 20 12 16
2 31 11 9 14
3 15 17 16 10
One wants to test the null hypothesis that the breakdowns are independent of the shift.
a) Compute the expected number of breakdowns for machine B for shift 2 if the
hypothesis is true.
10
b) The value of the test statistic is 11.65. Is the null hypothesis rejected? Why (not)?
Use α = 0.05.
34. One adds a certain ingredient to an laundry detergent to try to improve the effect of
the detergent. At random 10 dirty overalls are chosen and cut in two parts. One part
is washed with the detergent with the ingredient and the other part is washed with the
detergent without the ingredient. One measures in some unit how clean the (parts of)
the overalls are. The observations are
Overall 1 2 3 4 5 6 7 8 9 10
Detergent without (’OUD’) 5 5 12 29 10 33 33 17 2 8
Detergent with (’NIEUW’) 7 9 17 36 8 40 29 27 5 19
With a statistical package two analyses are done (one for paired observations and one
for independent observations). The results are given below
a1) Which of the two methods (paired or independent) should be applied here for the
analysis? Explain your choice.
a2) Explain the difference in p-values for the two methods.
b) Test ONE-SIDED the null hypothesis that the new detergent (with the ingredient)
is better. Use α = 0.05.
11
35. Consider simple linear regression and the data.
x y
−2 −5
−1 0
0 −1
1 3
2 3
It is given that the estimate for the variance of the error term is equal to 2.6333.
36. One wants to investigate the relation between blood pressure Y and noise x. The
following 20 observations are available.
y 1 0 1 2 5 1 4 6 2 3
x 60 63 65 70 70 70 80 90 80 80
y 5 4 6 8 4 5 7 9 7 6
x 85 89 90 90 90 90 94 100 100 100
Here y is the increase of the blood pressure (in mm Hg) and x is the level of the noise
in decibel. A statistical package gives the following results
12
37. Corrosion of iron in reinforced concrete is a problem for its sustainability. For a number
of constructions the strength y (in MPa) is measured and also the so called ’depth of
carbonification’ x (in mm) which is an important variable for corrosion. A statistical
package gives as result
Analysis of Variance
-----------------------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
-----------------------------------------------------------------------------
Model 428.615 1 ??????? ????? ?????
Residual 131.242 16 ???????
-----------------------------------------------------------------------------
Total (Corr.) 559.858 17
13
38. The height of a tree y (in cm) is measured during a period of 10 year. The years x are
coded with 1, 2, 3, · · · , 10. The table is (partially) given here.
Regression analysis (based on this 10 observations) gives the following results (’hoogte’=height
and ’jaar’=year).
The next picture shows the residuals against the number of the year.
14
39. One wants to optimize a production process. Two factors are important: temperature
and the type of the machine. In an experiment the temperature has two values (60o C
en 70o C). There are three types of machines. An observation is done for all possible
combinations. The (coded) observations are
Formulate this problem as a regression model (including cross products between tem-
perature and machine).
40. One wants to relate the quality of wine to a number of variables: ’Clarity’ (x1 or A),
’Aroma’ (x2 or B), ’Body’ (x3 or C), ’Flavor’ (x4 or D) and ’Oakiness’ (x5 or E);
38 samples of wine have been tested. The dependent variable y is a score which is a
measure for the quality. Consider the model
Y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 + ε ,
15
Analysis of Variance
----------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
----------------------------------------------------------------
Model 111.54 5 22.3081 16.51 ??????
Residual 43.248 32 1.3515
----------------------------------------------------------------
Total (Corr. 154.788 37
Analysis of Variance
---------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
---------------------------------------------------------------
Model 108.935 3 36.3117 26.92 0.0000
Residual 45.8534 34 1.34863
---------------------------------------------------------------
Total (Corr.) 154.788 37
16
All possible 32 models (using the 5 regressors) are considered. This gives the following
output
17
c) If you have to choose one of these models, which model will you choose? Give
arguments.
d) Which variable is the first variable that will be chosen in the model? Why?
e) Must another variable be added to the model? Give arguments.
Consider the model with only the variables ’Aroma’ (x2 ), ’Flavor’ (x4 ) and ’Oakiness’
(x5 ). The output is
f) You buy a bottle of wine with ’Aroma’= 5, ’Flavor’= 5, ’Oakiness’= 4. Give a 95%
prediction interval for the quality of the wine.
18