PSLP notes

1. Which of the following relation is correct for a negative skewed distribution?
(a) Mean=Mode=Median
(b) Mean>Median>Mode
(c) Mode>Median>Mean
(d) Mean>Mode=Median
Solution:(c)
Explanation:
2. In the symmetric covariance matrix:
(a) Diagonal elements must be positive and other elements are always zero.
(b) Diagonal elements can never be negative and other elements are always positive.
(c) Diagonal elements can never be negative and other elements can be negative or positive.
(d) Diagonal elements can be negative and positive and other elements are always negative.
Solution: (c)
Explanation: In a covariance matrix, the diagonal entries represent covariance of the variable with itself
which is equal to the variance of that variable and is calculated as the square of standard deviation. Since
variance is always positive, therefore diagonal entries are always positive.
3. Presence of Outliers in a dataset not affects:
(a) Standard deviation
(b) Range
(c) Mean
(d) Inter-quartile Range(IQR)
Solution: (d)
Explanation: The IQR is essentially the range of the middle 50% of the data. Since it uses the middle 50%,
therefore it is not affected by the outliers.
4. If X and Y are independent random variables, then which of the following is TRUE?
(a) E(XY)=E(X)E(Y) [ E represents Expectation value ]
(b) Cov(X,Y)=0 [ Cov represents covariance between variables ]
(c) Var(X+Y)=Var(X)+Var(Y) [ Var represents variance ]
(d) All of the above
Solution: (d)
Explanation: If X and Y are independent then Cov(X,Y)=0 and Var(X+Y) = Var(X)+Var(Y) (∵ 2Cov(X, Y) = 0)
5. For a normal distribution Z, which option is TRUE?
(a) Coefficient of skewness (E(Z3))=0
(b) E(Z)=0 ; E(Z2)=Var(Z)=1
(c) Kurtosis (E(Z4))=3
(d) Its density is symmetric about the mean.
Solution: (d)
Explanation:
6. Let X and Y be normal random variables with their respective means 3 and 4 and variances
9 and 16, then 2X-Y will have normal distribution with parameters:
(a) Mean=2 and Variance=52
(b) Mean=0 and Variance=1

(c) Mean=2 and Variance=1
(d) None of the above
Solution: (d)
Hint: Var(aX + bY) = a2 Var(X) + b2 Var(Y) + 2abCov(X,Y)
7. Suppose X and Y take values {0,1} and are independent with P(X=1)=1/2 and P(Y=1)=1/3.
What is the probability that P(X+Y=1)?
(a) 5/18
(b) 1/2
(c) 5/6
(d) 1/6
Solution:(b)
Explanation: P(X +Y =1) = P(X=0).P(Y=1) + P(X=1).P(Y=0) = (1/2)(1/3) + (1/2)(2/3) = 1/2.
8. Let X and Y are random variables with E(X)=μ/2 and E(Y)=μ, then which one is TRUE?
(a) g=X+Y is an unbiased estimator of μ
(b) g = X+Y is a biased estimator of μ with bias equals to μ
(c) h=X+(Y/2) is an unbiased estimator of μ
(d) h= X+(Y/2) is a biased estimator of μ with bias equals to μ/2
Solution: (c)
Explanation: E(g)= E(X+Y)= E(X) + E(Y)=3μ/2 ; Bias(g)= E(g)-μ = μ/2

E(h)= E(X+(Y/2))= E(X) + 1/2E(Y) = μ, Bias(h)= E(h)-μ = 0
9. Suppose that X takes values between 0 and 1 and has probability density function(PDF)
2x, then the value of Variance of X2 is :
(a)1/12
(b) 1/18
(c) 1/6
(d) 5/18
Solution:(a)
Hint: Use Var(X2)= E(X4) -(E(X2))2
10. For random variables X and Y, we have Var(X)=1, Var(Y)=4, and Var(2X-3Y)=34, then the
correlation between X and Y is:
(a) 1/2
(b) 1/4
(c) 1/3
Solution:(b)
Explanation: Var(2X-3Y) = 34
= 4Var(X)+9Var(Y)-12Cov(X, Y)
= 4(1)+9(4)-12Cov(X, Y) = 34
∴ Cov(X, Y)=1/2
11. A fair die is rolled repeatedly until a number larger than 4 is observed. If K is the total
number of times that the die is rolled, then P(K=4) is equal to:
(a) 16/81
(b) 8/81
(c) 8/27
(d) 16/27
Solution: (b)
Explanation: P(K=4) = (P(#less than 4 or equal))3.P({4}) = (2/3)3.(1/3) = 8/81.
12. Let X and Y be independent uniform (0, 1) random variables. Define A=X+Y and B=X-Y.
Then,
(a) A and B are independent random variables
(b) A and B are uncorrelated random variables
(c) A and B are both uniforms (0,1) random variables.

(d) None of these
Solution: (b)
Explanation: Cov(X+Y, X-Y) = Cov(X, X) – Cov(X, Y) + Cov(Y, X) – Cov(Y ,Y) ⇒ Var(X) – Var(Y) = 0
13. If g is a point estimator of X, then Mean Square error(MSE) for g is:
(a) Variance(g) + Bias(g)
(b) Variance(g) + Bias(g2)
(c) Variance(g) + (Bias(g))2
(d) Variance(g2) + Bias(g)
Solution: (c)
Explanation: MSE(g) = E[ (g-X)2 ] = Var(g-X) + (E[ g-X ])2 = Var(g) + (Bias(g))2
14. Let X and Y be two random variables and let a, b, c, d be real numbers, then which one
of the following is FALSE?
(a) Cov(X+b, Y+d) = Cov(X, Y)
(b) Cov(aX, cY) = ac*Cov(X, Y)
(c) Cov(aX+b, cY+d) = ac*Cov(X, Y)
(d) Corr(aX+b, cY+d) = ac*Corr(X, Y) for a,c>0
Solution: (d)
Explanation: Corr(aX+b, cY+d) = Corr(X, Y)
15. Let X and Y be jointly(bivariate) normal with Var(X) = Var(Y), then:
(a) X+Y and X-Y are jointly normal
(b) X+Y and X-Y are uncorrelated
(c) X+Y and X-Y are independent
(d) All of the above
Solution: (d)
Explanation: If X and Y be the bivariate normal distribution, then any linear combination of X and Y is also
normally distributed.
16. Let X1, X2, X3, ——-, Xn be a random sample from a distribution with E(Xi)=μ and
Var(Xi)=.σ2 Now, consider two estimators:
g1=X1 g2=X’=(X1+X2+X3+————-Xn)/n
Which of these estimator has high mean squared error(MSE)?
(a) g1
(b) g2
(c) Same for both g1 and g2
Solution: (a)
Explanation: MSE(g1)=E[(g1–μ)2] = E[(X1-E(X1))2] = Var(X1) = σ2
MSE(g2)=E[(g2–μ)2]= E[(X’-μ)2] = Var(X’-μ) + (E[X’-μ])2 = Var(X’) = σ2/n
17. A random sample of n=6 taken from the population has the elements 6, 10, 13, 14 ,18, 20.
Then, which option is False?
(a) Point estimate for population mean is 13.5
(b) Point estimate for population standard deviation is 4.68
(c) Point estimate for population standard deviation is 3.5
(d) Point estimate for standard error of mean is 1.91
Solution: (c)
Explanation: Population mean(X’) = (Σ Xi/n ) = 13.5
Population standard deviation(S) = sqrt( (Σ Xi2/n) – (Σ Xi/n)2 ) = 4.68
Standard error of mean = S/sqrt(n) = 4.68/sqrt(6) = 1.91
18. True or False: If the Pearson’s correlation between 2 variables is zero, then they are
necessarily independent.
Solution: False
Explanation: Correlation is a measure of linear dependence between the variables.
19. True or False: Let g be an unbiased estimator of X and U be a random variable with zero
means, then h=g+U is also unbiased for X.
Solution: True.
Explanation: E(h) =E(g) + E(U) = 0+0 =0( ∵ E(g)=0 due to unbiased estimator)
20. True or False: Let X and Y be two independent standard normal random variables and
T=XY2+X+1 and P=X-3, then Cov(T, P)=1
Solution: False.
Hint: Use properties mentioned in Question-14.
21. True or False: Let X has a normal distribution with parameters μ and σ2, then X2 follows
a chi-square distribution with parameter 1.
Solution: False.
Explanation: For the given statement to be True, X should be Standard normal distribution(μ=0, σ2=1)
22. True or False: If the characteristic function of a random variable exists, then its
expectation and variance will also exist.
Solution: False.
Hint: Moment Generating Function(MGF)
23. True or False: Let X has uniform distribution U(a, b) such that E(X)=2 and Var(X)=3/4, then
P(X<1)=1/6.
Solution: True.
Explanation: E(X) = (a+b)/2 = a+b=4 ; Var(X) = (b-a)2/12 = (b-a)=3 ⇒ X~ U(0.5, 3.5)
24. True or False: The correlation coefficient between X+Y and X-Y, where X and Y are
independent random variables with variances 36 and 16 respectively is 6/13.
Solution: False.
Explanation: Corr(X+Y, X-Y) = Cov(X+Y, X-Y)/ Std(X+ Y).Std(X-Y) [Std= Standard Deviation]
25. True or False: In interval estimation, As the confidence level increases the margin of error
decreases.
Solution: False.
Explanation: The Confidence Interval is defined as X ± Z(s/√n)
26. What is the difference between quantitative and qualitative data?

Quantitative data is data defined by a numeric value such as a count or range—for example, a person’s
height in cm. Qualitative data is described as a quality or characteristic and is usually presented in words.
For example, using words like ‘tall’ or ‘short’ to describe a person’s height.
27. What is the Central Limit Theorem?
The Central Limit Theorem states that as the sample size gets larger, the distribution of the sample mean
gets closer to the actual population distribution. This means that as the sample size increases, the sample
error will reduce.
28. What Is Hypothesis Testing?
We’ve already touched on this topic with some of the previous statistics and probability interview
questions. But since it’s a fundamental part of data analysis, we wish to cover it in more detail.
Hypothesis testing allows us to evaluate a hypothesis about the population based on sample data. How is it
conducted -
First, we formulate a null hypothesis (or H0)—assuming no difference or relationship between the variables.
For each null hypothesis, there’s an alternative one considering the opposite. If H0 is rejected, the
alternative hypothesis is supported.
We need to choose an appropriate statistical test to determine whether the data supports a particular
hypothesis. If the probability of the null hypothesis is below a predetermined significance level, we can
reject it.
29. Explain the process of bootstrapping.
If there are limited samples of the actual population, bootstrapping is used to sample repeatedly from the
sample population. The sample mean will vary for each resample, and a sampling distribution will be created
based on these sample means.
30. List three ways to mitigate overfitting.
L1 and L2 regularization
Collect more samples
Using K-fold cross-validation instead of a regular train-test split
31. How do you deal with missing data?
There are several ways you can handle missing data based on the number of missing values and type of
variable:
Deleting missing values

Imputing missing values with the mean/median/mode
Building a machine learning model to predict the missing value based on other values in the dataset
Replacing missing values with a constant
32. What Are the Main Measures of Variability?
Variability measures are also crucial in describing data distribution. They show how spread-out data points
are and how far away they are from the mean.
Some basic questions during a statistics interview might require you to explain the meaning and usage of
variability measures. Here’s your cheat sheet:
Variance measures the average squared distance of data points from the mean. A small variance
corresponds to a narrow spread of the values, while a big variance implies that data points are far
from the mean.
Standard deviation is the square root of the variance. It shows the amount of variation of values in a
dataset.
Range is the difference between the maximum and minimum data value. It’s a good indicator of
variability when there are no outliers in a dataset, but when there are, it can be misleading.
Interquartile range (IQR) measures the spread of the middle part of a dataset. It’s essentially the
difference between the third and the first quartile.
33. What is A/B testing? Explain with an example.
A/B testing is a mechanism used to test user experience with the help of a randomized experiment. For
example, a company wants to test two versions of their landing page with different backgrounds to
understand which version drives conversions. A controlled experiment is created, and two variations of the
landing page are shown to different sets of people.
34. Explain three different types of sampling techniques.

1. Simple random sampling: The individual is selected from the true population entirely by chance, and
every individual has an equal opportunity to get selected.
2. Stratified sampling: The population is first divided into multiple strata that share similar characteristics,
and each strata is sampled in equal sizes. This is done to ensure equal representation of all sub-groups.
3. Systematic sampling: Individuals are selected from the sampling frame at regular intervals. For
example, every 10th member is selected from the sampling frame. This is one of the easiest sampling
techniques but can introduce bias into the sample population if there is an underlying pattern in the
true population.
35. What Are Skewness and Kurtosis?
Next on our list of statistics questions for a data science interview are the measures of the shape of data
distribution: skewness and kurtosis.
Let’s start with the former.
Skewness is an excellent way to measure the symmetry of distribution and the likelihood of a given value
falling in the tails. With symmetrical distribution, the mean and median coincide. If the data distribution
isn’t symmetrical, it’s skewed.
There are two types of skewness:
Positive is when the right tail is longer. Most values are clustered around the left tail, and the median
is smaller than the mean.
Negative is when the left tail is longer. Most values are clustered around the right tail, and the
median is greater than the mean.
Kurtosis, on the other hand, reveals how heavy or light-tailed data is compared to the normal distribution.
There are three types of kurtoses:
Mesokurtic distributions approximate a normal distribution.

Leptokurtic distributions have a pointy shape and heavy tails, indicating a high probability of extreme
events occurring.
Platykurtic distributions have a flat shape and light tails. They reveal a low probability of the
occurrence of extreme events.
Knowing the meaning and calculations of these measures may be enough for an entry-level job. But statistics
interview questions for advanced data science positions may revolve around using these concepts in practice.
36. Explain the terms confidence interval and confidence level.
A confidence interval is a probability that the true population parameter falls between a range of two
estimates. The level of confidence (for example, 95% or 99%) refers to the certainty that the true parameter
lies within the confidence interval as multiple samples are repeatedly taken.
37. What is a p-value?
A p-value is a probability of obtaining the observed result if the null hypothesis were true. We can set a
threshold for the p-value based on the hypothesis created, and if the p-value falls below this threshold,
then there is little to no chance that the observed result could have occurred. This gives us enough evidence
to reject the null hypothesis.
38. What is standardization? Under which circumstances should data be standardized?
Standardization is the process of putting different variables on the same scale. Variables are made to follow
a standard normal distribution with a mean of 0 and a standard deviation of 1.
Standardizing data can give us a better idea of extreme outliers, as it is easy to identify values that are 2–
3 standard deviations away from the mean. Standardization is also used as a pre-processing technique
before feeding data into machine learning models so that all variables are given the same weightage.
39. What are some properties of a normal distribution? Give some examples of data points
that follow a normal distribution.
The mean, median, and mode in a normal distribution are very close to each other.
There is a 50% probability that a value will fall on the left of the normal distribution, and a 50%
probability that a value will fall on the right.
The total area under the curve is 1.
Example: Values like a population’s height and IQ are normally distributed.
40. What Is the Normal Distribution?
Normal distribution is a central concept in mathematics and data analysis. As such, it often appears in
statistics interview questions.
The normal (or Gaussian) distribution is the most important probability distribution in statistics. It’s often
called а “bell curve” because of its shape—tall in the middle, flat toward the ends.
41. What is a correlation coefficient?
A correlation coefficient is an indicator of how strong the relationship between two variables is. A coefficient
near +1 indicates a strong positive correlation, a coefficient of 0 indicates no correlation, and a coefficient
near -1 indicates a strong negative correlation.
42. what is a covariance coefficient?
Covariance is a statistical term that refers to a systematic relationship between two random variables in
which a change in the other reflects a change in one variable. The covariance value can range from -∞ to +∞,
with a negative value indicating a negative relationship and a positive value indicating a positive
relationship. The greater this number, the more reliant the relationship. Positive covariance denotes a direct
relationship and is represented by a positive number.
43. What Is A Covariance Matrix?
A covariance matrix is a square matrix that illustrates the variance of dataset elements and the covariance
between two datasets. Variance is a measure of dispersion defined as data spread from the provided
dataset's mean. Covariance between two variables is calculated and used to measure how the two
variables fluctuate together.
44. What Is A Correlation Matrix?
A correlation matrix can be defined as a matrix with correlation coefficients among different variables. The
connection between the two variables is represented by each cell in the table. A correlation matrix can be
used to summarize data, as an input to a more advanced analysis, or as a diagnostic for further studies.
45. Explain the Difference Between Probability Distribution and Sampling Distribution
As noted, you may be asked various statistics interview questions regarding sampling and the
generalizability of results. The difference between probability and sampling distribution is just one example.
A probability distribution is a function used to calculate the probability of a random variable X taking
different values. There are two main types depending on the variable: discrete and continuous.
Examples of the former are the binomial and Poisson distributions, and of the latter: normal and
uniform distributions.
A sampling distribution is the probability distribution of a statistic based on a range of random samples
from a population. The definition sounds confusing, but it’s encountered often in practice.
For example, imagine you’re a clinical data analyst working on developing a new treatment for patients
with Alzheimer’s. You’ll likely be working with samples from the entire population of individuals with the
disease. So, you’ll use the sampling distribution during the data analysis.

PSLP notes

Uploaded by

Copyright:

Available Formats

PSLP notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PSLP notes

Uploaded by

Copyright:

Available Formats

1. Which of the following relation is correct for a negative skewed distribution?

2. In the symmetric covariance matrix:

3. Presence of Outliers in a dataset not affects:

(a) Standard deviation

(d) Inter-quartile Range(IQR)

(a) E(XY)=E(X)E(Y) [ E represents Expectation value ]

(b) Cov(X,Y)=0 [ Cov represents covariance between variables ]

(c) Var(X+Y)=Var(X)+Var(Y) [ Var represents variance ]

(d) All of the above

5. For a normal distribution Z, which option is TRUE?

(a) Coefficient of skewness (E(Z3))=0

(b) E(Z)=0 ; E(Z2)=Var(Z)=1

(c) Kurtosis (E(Z4))=3

(d) Its density is symmetric about the mean.

(a) Mean=2 and Variance=52

(b) Mean=0 and Variance=1

(d) None of the above

Hint: Var(aX + bY) = a2 Var(X) + b2 Var(Y) + 2abCov(X,Y)

Explanation: P(X +Y =1) = P(X=0).P(Y=1) + P(X=1).P(Y=0) = (1/2)(1/3) + (1/2)(2/3) = 1/2.

(a) g=X+Y is an unbiased estimator of μ

(b) g = X+Y is a biased estimator of μ with bias equals to μ

(c) h=X+(Y/2) is an unbiased estimator of μ

(d) h= X+(Y/2) is a biased estimator of μ with bias equals to μ/2

Explanation: E(g)= E(X+Y)= E(X) + E(Y)=3μ/2 ; Bias(g)= E(g)-μ = μ/2

Hint: Use Var(X2)= E(X4) -(E(X2))2

(d) None of the above

Explanation: P(K=4) = (P(#less than 4 or equal))3.P({4}) = (2/3)3.(1/3) = 8/81.

(a) A and B are independent random variables

(b) A and B are uncorrelated random variables

(c) A and B are both uniforms (0,1) random variables.

13. If g is a point estimator of X, then Mean Square error(MSE) for g is:

(a) Variance(g) + Bias(g)

(b) Variance(g) + Bias(g2)

(c) Variance(g) + (Bias(g))2

(d) Variance(g2) + Bias(g)

Explanation: MSE(g) = E[ (g-X)2 ] = Var(g-X) + (E[ g-X ])2 = Var(g) + (Bias(g))2

(a) Cov(X+b, Y+d) = Cov(X, Y)

(b) Cov(aX, cY) = ac*Cov(X, Y)

(c) Cov(aX+b, cY+d) = ac*Cov(X, Y)

(d) Corr(aX+b, cY+d) = ac*Corr(X, Y) for a,c>0

Explanation: Corr(aX+b, cY+d) = Corr(X, Y)

15. Let X and Y be jointly(bivariate) normal with Var(X) = Var(Y), then:

(a) X+Y and X-Y are jointly normal

(b) X+Y and X-Y are uncorrelated

(c) X+Y and X-Y are independent

(d) All of the above

Which of these estimator has high mean squared error(MSE)?

(c) Same for both g1 and g2

(d) None of the above

Explanation: MSE(g1)=E[(g1–μ)2] = E[(X1-E(X1))2] = Var(X1) = σ2

MSE(g2)=E[(g2–μ)2]= E[(X’-μ)2] = Var(X’-μ) + (E[X’-μ])2 = Var(X’) = σ2/n

(a) Point estimate for population mean is 13.5

(b) Point estimate for population standard deviation is 4.68

(c) Point estimate for population standard deviation is 3.5

(d) Point estimate for standard error of mean is 1.91

Explanation: Population mean(X’) = (Σ Xi/n ) = 13.5

Population standard deviation(S) = sqrt( (Σ Xi2/n) – (Σ Xi/n)2 ) = 4.68

Standard error of mean = S/sqrt(n) = 4.68/sqrt(6) = 1.91

Explanation: Correlation is a measure of linear dependence between the variables.

Hint: Use properties mentioned in Question-14.