Statistics Interview Questions
Statistics Interview Questions
Questions
Answers:
The center limit theorem states that if any random variable,
regardless of the distribution, is sampled a large enough time, the
sample mean will be approximately normally distributed. This
allows for studying the properties of any statistical distribution as
long as there is a large enough sample size.
Important remark:
Applications:
NOTE: You will have to split your traffic randomly(to avoid sample
bias) into two versions. The split doesn’t have to be symmetric, you
just need to set the minimum sample size for each version to avoid
undersample bias.
Now if version A gives better results than version B, we will still have
to statistically prove that results derived from our sample represent
the entire population. Now one of the very common tests used to do
so is 2 sample t-test where we use values of significance level (alpha)
and p-value to see which hypothesis is right. If p-value<alpha, H is
rejected.
Common pitfalls:
In Layman’s terms:
The rule of thumb is to reject the null hypothesis if the p-value <
0.05, which means that the probability to get these results from the
existing approach is <95%. But this % changes according to task and
domain.
While the mode (which represents the most repeated value) will be
near the tip and the median is the middle element independent of
the distribution skewness, therefore it will be smaller than the mode
and more than the mean.
The mean, median, and mode for distributions with different skews.
Answer:
2. Population of cities
3. Pageviews of articles
Answer:
KPIs are an important way to ensure your teams are supporting the
overall goals of the organization. Here are some of the biggest
reasons why you need key performance indicators.
Answer:
The null hypothesis is that the coin is fair, and the alternative
hypothesis is that the coin is biased. The p-value is the probability of
observing the results obtained given that the null hypothesis is true,
in this case, the coin is fair.
In total for 10 flips of a coin, there are 2¹⁰ = 1024 possible outcomes
and in only 10 of them are there 9 tails and one head.
Hence, the exact probability of the given result is the p-value, which
is 10/1024 = 0.0098. Therefore, with a significance level set, for
example, at 0.05, we can reject the null hypothesis.
Answer:
The main consideration when we have a large number of tests is that
the probability of getting a significant test due to chance alone
increases. This will increase the type 1 error (rejecting the null
hypothesis when it’s actually true).
Answer:
In order to apply the central limit theorem, there are four conditions
that must be met:
Answer:
The mean of positively skewed data will be greater than the median.
In a negatively skewed distribution, the exact opposite is the case:
the mean of negatively skewed data will be less than the median. If
the data graphs symmetrically, the distribution has zero skewness,
regardless of how long or fat the tails are.
Answer:
Intuitively it is the maximum of the sample points. Here’s the
mathematical proof is in the figure below:
Q13: Discuss the Chi-square, ANOVA, and t-test
Answer:
Due to relationship
3. Paired t-test:
Used to compare means of different samples from the same group.