0% found this document useful (0 votes)

29 views20 pages

Statistics Interview Questions

Uploaded by

sairamesht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views20 pages

Statistics Interview Questions

Uploaded by

sairamesht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Statistics Interview

Questions & Answers for

Data Scientists

Questions

 Q1: Explain the central limit theorem and give

examples of when you can use it in a real-world
problem?

 Q2: Briefly explain the A/B testing and its

application? What are some common pitfalls
encountered in A/B testing?

 Q3: Describe briefly the hypothesis testing and

p-value in layman’s terms? And give a practical
application for them?

 Q4: Given a left-skewed distribution that has a

median of 60, what conclusions can we draw
about the mean and the mode of the data?

 Q5: What is the meaning of selection bias and

how to avoid it?

 Q6: Explain the long-tailed distribution and

provide three examples of relevant phenomena
that have long tails. Why are they important in
classification and regression problems?

 Q7: What is the meaning of KPI in statistics

 Q8: Say you flip a coin 10 times and observe only

one head. What would be the null hypothesis
and p-value for testing whether the coin is fair or
not?

 Q9: You are testing hundreds of hypotheses,

each with a t-test. What considerations would
you take into account when doing this?

 Q10: What general conditions must be satisfied

for the central limit theorem to hold?

 Q11: What is skewness discuss two methods to

measure it.

 Q12: You sample from a uniform distribution [0,

d] n times. What is your best estimate of d?

 Q13: Discuss the Chi-square, ANOVA, and t-test

Questions & Answers

Q1: Explain the central limit theorem and give

examples of when you can use it in a real-world
problem.

Answers:
The center limit theorem states that if any random variable,
regardless of the distribution, is sampled a large enough time, the
sample mean will be approximately normally distributed. This
allows for studying the properties of any statistical distribution as
long as there is a large enough sample size.

Important remark:

1. we can rely on the CLT with means (because it applies to

any unbiased statistic) only if expressing data in this way
makes sense. And it makes sense ONLY in the case of
unimodal and symmetric data, coming from additive
processes. So forget skewed, multi-modal data with
mixtures of distributions, coming from multiplicative
processes, and non-trivial mean-variance relationships.
That are the places where arithmetic means is
meaningless. Thus, using the CLT of e.g. bootstrap will
give some valid answers to an invalid question.

2. The distribution of means isn’t enough. Every single

kind of inference requires the entire test statistic to
follow a certain distribution. And the test statistic
consists also of the estimate of variance. Never assume
the same sample size sufficient for means will suffice for
the entire test statistic. See an excerpt from Rand Wilcox
attached. Especially do never believe in magic numbers
like N=30.
3. Think first about how to sensible describe your data,
state the hypothesis of interest, and then apply a valid
method.

Examples of real-world usage of CLT:

1. The CLT can be used at any company with a large

amount of data. Consider companies like Uber/Lyft
wants to test whether adding a new feature will increase
the booked rides or not using hypothesis testing. So if we
have a large number of individual ride X, which in this
case is a Bernoulli random variable (since the rider will
book a ride or not), we can estimate the statistical
properties of the total number of bookings.
Understanding and estimating these statistical
properties play a significant role in applying hypothesis
testing to your data and knowing whether adding a new
feature will increase the number of booked riders or not.

2. Manufacturing plants often use the central limit theorem

to estimate how many products produced by the plant
are defective.

Q2:Briefly explain the A/B testing and its application?

What are some common pitfalls encountered in A/B
testing?

A/B testing helps us to determine whether a change in something

will cause a change in performance significantly or not. So in other
words you aim to statistically estimate the impact of a given change
within your digital product (for example). You measure success and
counter metrics on at least 1 treatment vs 1 control group (there can
be more than 1 XP group for multivariate tests).

Applications:

1. Consider the example of a general store that sells bread

packets but not butter, for a year. If we want to check
whether its sale depends on the butter or not, then
suppose the store also sells butter and sales for next year
are observed. Now we can determine whether selling
butter can significantly increase/decrease or doesn’t
affect the sale of bread.

2. While developing the landing page of a website you

create 2 different versions of the page. You define
criteria for success eg. conversion rate. Then define your
hypothesis Null hypothesis(H): No difference between
the performance of the 2 versions. Alternative
hypothesis(H’): version A will perform better than B.

NOTE: You will have to split your traffic randomly(to avoid sample
bias) into two versions. The split doesn’t have to be symmetric, you
just need to set the minimum sample size for each version to avoid
undersample bias.

Now if version A gives better results than version B, we will still have
to statistically prove that results derived from our sample represent
the entire population. Now one of the very common tests used to do
so is 2 sample t-test where we use values of significance level (alpha)
and p-value to see which hypothesis is right. If p-value<alpha, H is
rejected.

Common pitfalls:

1. Wrong success metrics inadequate for the business

problem

2. Lack of counter metric, as you might add friction to the

product regardless along with the positive impact

3. Sample mismatch: heterogeneous control and treatment,

unequal variances

4. Underpowered test: too small sample or XP running too

short 5. Not accounting for network effects (introduce
bias within measurement)

Q3: Describe briefly the hypothesis testing and p-value

in layman’s terms? And give a practical application for
them?

In Layman’s terms:

 A hypothesis test is where you have a current state (null

hypothesis) and an alternative state (alternative
hypothesis). You assess the results of both of the states
and see some differences. You want to decide whether
the difference is due to the alternative approach or not.
You use the p-value to decide this, where the p-value is the
likelihood of getting the same results the alternative approach
achieved if you keep using the existing approach. It’s the probability
to find the result in the gaussian distribution of the results you may
get from the existing approach.

The rule of thumb is to reject the null hypothesis if the p-value <
0.05, which means that the probability to get these results from the
existing approach is <95%. But this % changes according to task and
domain.

To explain the hypothesis testing in Layman’s term with an example,

suppose we have two drugs A and B, and we want to determine
whether these two drugs are the same or different. This idea of
trying to determine whether the drugs are the same or different is
called hypothesis testing. The null hypothesis is that the drugs are
the same, and the p-value helps us decide whether we should reject
the null hypothesis or not.

p-values are numbers between 0 and 1, and in this particular case, it

helps us to quantify how confident we should be to conclude that
drug A is different from drug B. The closer the p-value is to 0, the
more confident we are that the drugs A and B are different.

Q4: Given a left-skewed distribution that has a median

of 60, what conclusions can we draw about the mean
and the mode of the data?

Answer: Left skewed distribution means the tail of the distribution is

to the left and the tip is to the right. So the mean which tends to be
near outliers (very large or small values) will be shifted towards the
left or in other words, towards the tail.

While the mode (which represents the most repeated value) will be
near the tip and the median is the middle element independent of
the distribution skewness, therefore it will be smaller than the mode
and more than the mean.

Mean < 60 Mode > 60

The mean, median, and mode for distributions with different skews.

Q5: What is the meaning of selection bias and how to

avoid it?

Answer:

Sampling bias is the phenomenon that occurs when a research study

design fails to collect a representative sample of a target population.
This typically occurs because the selection criteria for respondents
failed to capture a wide enough sampling frame to represent all
viewpoints.

The cause of sampling bias almost always owes to one of two

conditions.

1. Poor methodology: In most cases, non-representative

samples pop up when researchers set improper
parameters for survey research. The most accurate and
repeatable sampling method is simple random sampling
where a large number of respondents are chosen at
random. When researchers stray from random sampling
(also called probability sampling), they risk injecting
their own selection bias into recruiting respondents.

2. Poor execution: Sometimes data researchers craft

scientifically sound sampling methods, but their work is
undermined when field workers cut corners. By
reverting to convenience sampling (where the only
people studied are those who are easy to reach) or giving
up on reaching non-responders, a field worker can
jeopardize the careful methodology set up by data
scientists.

The best way to avoid sampling bias is to stick to probability-based

sampling methods. These include simple random sampling,
systematic sampling, cluster sampling, and stratified sampling. In
these methodologies, respondents are only chosen through
processes of random selection — even if they are sometimes sorted
into demographic groups along the way.

Q6: Explain the long-tailed distribution and provide

three examples of relevant phenomena that have long
tails. Why are they important in classification and
regression problems?

Answer: A long-tailed distribution is a type of heavy-tailed

distribution that has a tail (or tails) that drop off gradually and
asymptotically.

Three examples of relevant phenomena that have long tails:

1. Frequencies of languages spoken

2. Population of cities
3. Pageviews of articles

All of these follow something close to 80–20 rule: 80% of outcomes

(or outputs) result from 20% of all causes (or inputs) for any given
event. This 20% forms the long tail in the distribution.

It’s important to be mindful of long-tailed distributions in

classification and regression problems because the least frequently
occurring values make up the majority of the population. This can
ultimately change the way that you deal with outliers, and it also
conflicts with some machine learning techniques with the
assumption that the data is normally distributed.
Q7: What is the meaning of KPI in statistics

Answer:

KPI stands for key performance indicator, a quantifiable measure of

performance over time for a specific objective. KPIs provide targets
for teams to shoot for, milestones to gauge progress, and insights
that help people across the organization make better decisions.
From finance and HR to marketing and sales, key performance
indicators help every area of the business move forward at the
strategic level.

KPIs are an important way to ensure your teams are supporting the
overall goals of the organization. Here are some of the biggest
reasons why you need key performance indicators.

 Keep your teams aligned: Whether measuring project

success or employee performance, KPIs keep teams
moving in the same direction.

 Provide a health check: Key performance indicators give

you a realistic look at the health of your organization,
from risk factors to financial indicators.

 Make adjustments: KPIs help you clearly see your

successes and failures so you can do more of what’s
working, and less of what’s not.

 Hold your teams accountable: Make sure everyone

provides value with key performance indicators that help
employees track their progress and help managers move
things along.

Types of KPIs Key performance indicators come in many flavors.

While some are used to measure monthly progress against a goal,
others have a longer-term focus. The one thing all KPIs have in
common is that they’re tied to strategic goals. Here’s an overview of
some of the most common types of KPIs.
 Strategic: These big-picture key performance indicators
monitor organizational goals. Executives typically look to
one or two strategic KPIs to find out how the
organization is doing at any given time. Examples
include return on investment, revenue and market share.

 Operational: These KPIs typically measure performance

in a shorter time frame, and are focused on
organizational processes and efficiencies. Some
examples include sales by region, average monthly
transportation costs, and cost per acquisition (CPA).

 Functional Unit: Many key performance indicators are

tied to specific functions, such as finance or IT. While IT
might track time to resolution or average uptime, finance
KPIs track gross profit margin or return on assets. These
functional KPIs can also be classified as strategic or
operational.

 Leading vs Lagging: Regardless of the type of key

performance indicator you define, you should know the
difference between leading indicators and lagging
indicators. While leading KPIs can help predict
outcomes, lagging KPIs track what has already
happened. Organizations use a mix of both to ensure
they’re tracking what’s most important.
Q8: Say you flip a coin 10 times and observe only one
head. What would be the null hypothesis and p-value
for testing whether the coin is fair or not?

Answer:

The null hypothesis is that the coin is fair, and the alternative
hypothesis is that the coin is biased. The p-value is the probability of
observing the results obtained given that the null hypothesis is true,
in this case, the coin is fair.

In total for 10 flips of a coin, there are 2¹⁰ = 1024 possible outcomes
and in only 10 of them are there 9 tails and one head.

Hence, the exact probability of the given result is the p-value, which
is 10/1024 = 0.0098. Therefore, with a significance level set, for
example, at 0.05, we can reject the null hypothesis.

Q9: You are testing hundreds of hypotheses,

each with a t-test. What considerations would
you take into account when doing this?

Answer:
The main consideration when we have a large number of tests is that
the probability of getting a significant test due to chance alone
increases. This will increase the type 1 error (rejecting the null
hypothesis when it’s actually true).

Therefore we need to consider the Bonferroni Effect which happens

when we make many tests. Ex. If our significance level is 0.05 but we
made a 100 test it means that the probability of getting a value
inside the rejection rejoin is 0.0005, not 0.05 so here we need to use
another significance level which’s called alpha star = significance
level /K Where K is the number of the tests.

Q10: What general conditions must be

satisfied for the central limit theorem to hold?

Answer:

In order to apply the central limit theorem, there are four conditions
that must be met:

1. Randomization: The data must be sampled randomly

such that every member in a population has an equal
probability of being selected to be in the sample.

2. Independence: The sample values must be

independent of each other.

3. The 10% Condition: When the sample is drawn

without replacement, the sample size should be no larger
than 10% of the population.
4. Large Sample Condition: The sample size needs to
be sufficiently large.

Q11: What is skewness discuss two methods

to measure it.

Answer:

Skewness refers to a distortion or asymmetry that deviates from the

symmetrical bell curve, or normal distribution, in a set of data. If the
curve is shifted to the left or to the right, it is said to be skewed.
Skewness can be quantified as a representation of the extent to
which a given distribution varies from a normal distribution. There
are two main types of skewness negative skew which refers to a
longer or fatter tail on the left side of the distribution, while positive
skew refers to a longer or fatter tail on the right. These two skews
refer to the direction or weight of the distribution.

The mean of positively skewed data will be greater than the median.
In a negatively skewed distribution, the exact opposite is the case:
the mean of negatively skewed data will be less than the median. If
the data graphs symmetrically, the distribution has zero skewness,
regardless of how long or fat the tails are.

There are several ways to measure skewness. Pearson’s first and

second coefficients of skewness are two common methods. Pearson’s
first coefficient of skewness, or Pearson mode skewness, subtracts
the mode from the mean and divides the difference by the standard
deviation. Pearson’s second coefficient of skewness, or Pearson
median skewness, subtracts the median from the mean, multiplies
the difference by three, and divides the product by the standard
deviation.

Q12: You sample from a uniform distribution [0, d] n

times. What is your best estimate of d?

Answer:
Intuitively it is the maximum of the sample points. Here’s the
mathematical proof is in the figure below:
Q13: Discuss the Chi-square, ANOVA, and t-test

Answer:

Chi-square test A statistical method is used to find the difference or

correlation between the observed and expected categorical variables
in the dataset.

Example: A food delivery company wants to find the relationship

between gender, location, and food choices of people.

It is used to determine whether the difference between 2 categorical

variables is:
 Due to chance or

 Due to relationship

Analysis of Variance (ANOVA) is a statistical formula used to

compare variances across the means (or average) of different
groups. A range of scenarios uses it to determine if there is any
difference between the means of different groups.
t_test is a statistical method for the comparison of the mean of the
two groups of the normally distributed sample(s).

It comes in various types such as:

1. One sample t-test:

Used to compare the mean of a sample and the population.

2. Two sample t-tests:

Used to compare the mean of two independent samples and whether
their population is statistically different.

3. Paired t-test:
Used to compare means of different samples from the same group.

Statistics Lecture Part 4
No ratings yet
Statistics Lecture Part 4
100 pages
3141b86-6fd4-7726-D8ad-20a1516bcd Statistics Interview Cheat Sheet - Emmading - Com. All Rights Reserved.
No ratings yet
3141b86-6fd4-7726-D8ad-20a1516bcd Statistics Interview Cheat Sheet - Emmading - Com. All Rights Reserved.
10 pages
Statistics Interview Questions & Answers For Data Scientists
No ratings yet
Statistics Interview Questions & Answers For Data Scientists
43 pages
Testing of Hypothesis
67% (3)
Testing of Hypothesis
37 pages
Action Research
100% (2)
Action Research
18 pages
Isds361b Notes
No ratings yet
Isds361b Notes
103 pages
Q1. What Is Data Science? List The Differences Between Supervised and Unsupervised Learning
100% (1)
Q1. What Is Data Science? List The Differences Between Supervised and Unsupervised Learning
41 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
18 pages
Data Science Interview Questions - Statistics: Mohit Kumar Dec 12, 2018 11 Min Read
100% (1)
Data Science Interview Questions - Statistics: Mohit Kumar Dec 12, 2018 11 Min Read
14 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
26 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
26 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
435 pages
Chapter 16
No ratings yet
Chapter 16
24 pages
Statistics Interview Questions
100% (1)
Statistics Interview Questions
7 pages
Inferential Statistics
No ratings yet
Inferential Statistics
23 pages
Hypothesis Testing Lecture
No ratings yet
Hypothesis Testing Lecture
28 pages
Introduction To Data Science Exploratory Data Analysis
No ratings yet
Introduction To Data Science Exploratory Data Analysis
55 pages
FAQ in Data Science Interviews
No ratings yet
FAQ in Data Science Interviews
93 pages
BBA 103 Important Theory Questions
No ratings yet
BBA 103 Important Theory Questions
8 pages
Inferential Statistics: Sampling, Probability, and Hypothesis Testing
No ratings yet
Inferential Statistics: Sampling, Probability, and Hypothesis Testing
26 pages
DS306 数据科学面试 - AB Test专题.pptx (With Watermark) (Compressed) -水印
No ratings yet
DS306 数据科学面试 - AB Test专题.pptx (With Watermark) (Compressed) -水印
95 pages
Problem Set 2
No ratings yet
Problem Set 2
18 pages
Fundamentals of Business Statistics: Hypothesis Testing Dr. P.K.Viswanathan
No ratings yet
Fundamentals of Business Statistics: Hypothesis Testing Dr. P.K.Viswanathan
26 pages
Exam 2 Stats 212
No ratings yet
Exam 2 Stats 212
7 pages
Data Science Refresher: Gunjan Trivedi
No ratings yet
Data Science Refresher: Gunjan Trivedi
93 pages
Data Science Interview Preparation (30 Days of Interview Preparation)
No ratings yet
Data Science Interview Preparation (30 Days of Interview Preparation)
27 pages
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
No ratings yet
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
15 pages
U02Lecture05 - Statistical Experiments and Significance Testing
No ratings yet
U02Lecture05 - Statistical Experiments and Significance Testing
51 pages
Inferential Statistics 101 - Part 3: Shweta Doshi
No ratings yet
Inferential Statistics 101 - Part 3: Shweta Doshi
1 page
BIERNACKI, P WALDORF, D. Snowball Sampling. Problems and Techniques of Chain Referral Sampling
0% (1)
BIERNACKI, P WALDORF, D. Snowball Sampling. Problems and Techniques of Chain Referral Sampling
23 pages
Unit IV- Analytics Tasks (Students)
No ratings yet
Unit IV- Analytics Tasks (Students)
127 pages
Data Analytics Module 1 Lesson 6 Summary Notes
No ratings yet
Data Analytics Module 1 Lesson 6 Summary Notes
17 pages
AE 2023 Lecture4 PDF
No ratings yet
AE 2023 Lecture4 PDF
38 pages
Spring Poll 14 - Exec Summ EMB
No ratings yet
Spring Poll 14 - Exec Summ EMB
34 pages
Full
No ratings yet
Full
74 pages
Learner'S Packet (Leap) : Student Name: Section: Subject Teacher: Adviser
No ratings yet
Learner'S Packet (Leap) : Student Name: Section: Subject Teacher: Adviser
7 pages
IE5005 Lecture 04
No ratings yet
IE5005 Lecture 04
57 pages
Peer Relationship and Social Skills Development Among Grade 6 Teachers
No ratings yet
Peer Relationship and Social Skills Development Among Grade 6 Teachers
5 pages
1.1 Hypothesis Testing
No ratings yet
1.1 Hypothesis Testing
93 pages
University of Bristol Statistics Content
No ratings yet
University of Bristol Statistics Content
50 pages
00 - Inrroduction To Statistics
No ratings yet
00 - Inrroduction To Statistics
30 pages
CS194 Lec 06 EDA
No ratings yet
CS194 Lec 06 EDA
40 pages
Sampling Two Stage Sampling
No ratings yet
Sampling Two Stage Sampling
21 pages
ml unit 3
No ratings yet
ml unit 3
46 pages
Python, Machine Learning and Statistics
No ratings yet
Python, Machine Learning and Statistics
24 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
37 pages
Chapter 5
No ratings yet
Chapter 5
35 pages
Data Science Questions
No ratings yet
Data Science Questions
4 pages
Statistics Practise Questions
No ratings yet
Statistics Practise Questions
19 pages
Stats Unit5
No ratings yet
Stats Unit5
26 pages
Data Index
No ratings yet
Data Index
2 pages
MLS 2 - Statistics for Data Science
No ratings yet
MLS 2 - Statistics for Data Science
22 pages
Report Td.icool2
No ratings yet
Report Td.icool2
38 pages
EDUR8132 IntroductoryNotes 17august2010
No ratings yet
EDUR8132 IntroductoryNotes 17august2010
5 pages
3is FINAL Final
No ratings yet
3is FINAL Final
49 pages
Advanced Statistic
No ratings yet
Advanced Statistic
33 pages
Assessment of SMES
No ratings yet
Assessment of SMES
25 pages
Summary_Introduction to Statistics
No ratings yet
Summary_Introduction to Statistics
23 pages
Introduction to Statistical Hypothesis Testing in R
No ratings yet
Introduction to Statistical Hypothesis Testing in R
8 pages
Course: Probability and Statistics: Instructor: Adnan Aslam
No ratings yet
Course: Probability and Statistics: Instructor: Adnan Aslam
18 pages
50 Important Statistics' Q & A To Crack DS Interview
No ratings yet
50 Important Statistics' Q & A To Crack DS Interview
14 pages
Faculty Awareness and Attitudes Towards ChatGPT Integration in Higher Education
No ratings yet
Faculty Awareness and Attitudes Towards ChatGPT Integration in Higher Education
12 pages
CONSCI 3940 Final Exam Review copy
No ratings yet
CONSCI 3940 Final Exam Review copy
14 pages
Data Science and Ai Sample Questions
No ratings yet
Data Science and Ai Sample Questions
19 pages
Statistics
No ratings yet
Statistics
28 pages
PR2 Module 2
No ratings yet
PR2 Module 2
15 pages
Theory
No ratings yet
Theory
7 pages
100000322589_Mcclave_StatisticsForBusiness
No ratings yet
100000322589_Mcclave_StatisticsForBusiness
6 pages
Chapter 3-Coffee
No ratings yet
Chapter 3-Coffee
8 pages
Stats for data science
No ratings yet
Stats for data science
20 pages
Final Stats Intrerview Q&A
No ratings yet
Final Stats Intrerview Q&A
20 pages
Memonetal JASEM Editorial V4 Iss2 June2020
No ratings yet
Memonetal JASEM Editorial V4 Iss2 June2020
21 pages
CHAPTER 3 sample 1
No ratings yet
CHAPTER 3 sample 1
4 pages
Compare and Contrast Traditional and Activity-Based Costing Systems
No ratings yet
Compare and Contrast Traditional and Activity-Based Costing Systems
24 pages
BSC Statistics 01
No ratings yet
BSC Statistics 01
20 pages
Cyber Crime in Banking 3
No ratings yet
Cyber Crime in Banking 3
7 pages
FS-2-Compilation - Mondejar, Reo M. BECED 4-5
No ratings yet
FS-2-Compilation - Mondejar, Reo M. BECED 4-5
51 pages
Quizzes 4TH Quarter
No ratings yet
Quizzes 4TH Quarter
5 pages
Business Research Method
No ratings yet
Business Research Method
7 pages
Uas Rolt Ivan Maulana 2019111022
No ratings yet
Uas Rolt Ivan Maulana 2019111022
4 pages
Assignment3-22-2025page54
No ratings yet
Assignment3-22-2025page54
7 pages
Big Data Digital Media Perils
No ratings yet
Big Data Digital Media Perils
8 pages
Nonreactive Research
No ratings yet
Nonreactive Research
16 pages
Codex Standard For Quick Frozen Broccoli CODEX STAN 110-1981
No ratings yet
Codex Standard For Quick Frozen Broccoli CODEX STAN 110-1981
7 pages
The Preferences of Senior High School Specialization Among Grade 9 and 10 Students of Matin Ao National Highschool
No ratings yet
The Preferences of Senior High School Specialization Among Grade 9 and 10 Students of Matin Ao National Highschool
17 pages
Drivers and Barriers of LCC
No ratings yet
Drivers and Barriers of LCC
9 pages
Rubric For Research Final Defense
No ratings yet
Rubric For Research Final Defense
2 pages
Quiz 2 Cheatsheet v3
No ratings yet
Quiz 2 Cheatsheet v3
2 pages
PR2 - Prelims
No ratings yet
PR2 - Prelims
3 pages
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
From Everand
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
Lee Baker
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Statistics Interview Questions

Uploaded by

Statistics Interview Questions

Uploaded by

Statistics Interview

Questions & Answers for

 Q1: Explain the central limit theorem and give

 Q2: Briefly explain the A/B testing and its

 Q3: Describe briefly the hypothesis testing and

 Q4: Given a left-skewed distribution that has a

 Q5: What is the meaning of selection bias and

 Q6: Explain the long-tailed distribution and

 Q7: What is the meaning of KPI in statistics

 Q8: Say you flip a coin 10 times and observe only

 Q9: You are testing hundreds of hypotheses,

 Q10: What general conditions must be satisfied

 Q11: What is skewness discuss two methods to

 Q12: You sample from a uniform distribution [0,

 Q13: Discuss the Chi-square, ANOVA, and t-test

Questions & Answers

Q1: Explain the central limit theorem and give

1. we can rely on the CLT with means (because it applies to

2. The distribution of means isn’t enough. Every single

Examples of real-world usage of CLT:

1. The CLT can be used at any company with a large

2. Manufacturing plants often use the central limit theorem

Q2:Briefly explain the A/B testing and its application?

A/B testing helps us to determine whether a change in something

1. Consider the example of a general store that sells bread

2. While developing the landing page of a website you

1. Wrong success metrics inadequate for the business

2. Lack of counter metric, as you might add friction to the

3. Sample mismatch: heterogeneous control and treatment,

4. Underpowered test: too small sample or XP running too

Q3: Describe briefly the hypothesis testing and p-value

 A hypothesis test is where you have a current state (null

To explain the hypothesis testing in Layman’s term with an example,

p-values are numbers between 0 and 1, and in this particular case, it

Q4: Given a left-skewed distribution that has a median

Answer: Left skewed distribution means the tail of the distribution is

Mean < 60 Mode > 60

Q5: What is the meaning of selection bias and how to

Sampling bias is the phenomenon that occurs when a research study

The cause of sampling bias almost always owes to one of two

1. Poor methodology: In most cases, non-representative

2. Poor execution: Sometimes data researchers craft

The best way to avoid sampling bias is to stick to probability-based

Q6: Explain the long-tailed distribution and provide

Answer: A long-tailed distribution is a type of heavy-tailed

Three examples of relevant phenomena that have long tails:

1. Frequencies of languages spoken

All of these follow something close to 80–20 rule: 80% of outcomes

It’s important to be mindful of long-tailed distributions in

KPI stands for key performance indicator, a quantifiable measure of

 Keep your teams aligned: Whether measuring project

 Provide a health check: Key performance indicators give

 Make adjustments: KPIs help you clearly see your

 Hold your teams accountable: Make sure everyone

Types of KPIs Key performance indicators come in many flavors.

 Operational: These KPIs typically measure performance

 Functional Unit: Many key performance indicators are

 Leading vs Lagging: Regardless of the type of key

Q9: You are testing hundreds of hypotheses,

Therefore we need to consider the Bonferroni Effect which happens

Q10: What general conditions must be

1. Randomization: The data must be sampled randomly

2. Independence: The sample values must be

3. The 10% Condition: When the sample is drawn

Q11: What is skewness discuss two methods

Skewness refers to a distortion or asymmetry that deviates from the

There are several ways to measure skewness. Pearson’s first and

Q12: You sample from a uniform distribution [0, d] n

Chi-square test A statistical method is used to find the difference or

Example: A food delivery company wants to find the relationship

It is used to determine whether the difference between 2 categorical

Analysis of Variance (ANOVA) is a statistical formula used to

It comes in various types such as:

1. One sample t-test:

2. Two sample t-tests:

You might also like