Classx DS Unit 3
Classx DS Unit 3
SCIENCE
Grade X
Chapter 3: Identifying
Patterns
This chapter aims at teaching
students how to identify
partiality, preference & prejudice.
At the end of this chapter,
students should be able to
understand:
1. What is the Data Science term used to describe partiality, preference, and prejudice?
a) Bias
b) Favoritism
c) Influence
d) Unfairness
Answer: a
a) Selection Bias
b) Linearity Bias
c) Recall Bias
d) Trial Bias
Answer: d
Exercises: Objective Type Questions
3. Which of the following is not a correct statement about a probability
a) It must have a value between 0 and 1
b) It can be reported as a decimal or a fraction
c) A value near 0 means that the event is not likely to occur/happen
d) It is the collection of several experiments
Answer: d
4. The central limit theorem states that sampling distribution of the sample mean is
approximately normal if
a) All possible samples are selected
b) The sample size is large
c) The standard error of the sampling distribution is small
Answer: b
Exercises: Objective Type Questions
5. The central limit theorem says that the mean of the sampling distribution of the
sample mean is
a) Equal to the population mean divided by the square root of the sample size
b) Close to the population mean if the sample size is large
c) Exactly equal to the population mean
Answer: c
6. Sample of size 25 are selected from a population with mean 40 and standard
deviation 7.5. The mean of the sampling distribution sample mean is
a) 7.5
b) 8
c) 40
Answer: c
Standard Questions
1. Explain what is Bias and why it occurs in data science?
Ans: We often come across situations where if we have a
special fondness towards a particular thing, we tend to be
slightly partial towards it. This, in majority cases may affect
the outcome or you can say it can deviate the outcome in
favor of certain thing. Naturally, it is not the right way of
dealing with the data on larger scale.
This partiality, preference and prejudice towards a set of data
is called as a Bias.
In Data Science, bias is a deviation from the expected
outcome in the data. Fundamentally, you can also call bias as
error in the data. However, it is observed that this error is
indistinct and goes unnoticed.
Standard Questions
2. Explain Selection Bias with the help of an example
Ans: This type of bias usually occurs when a model itself influences the
creation of data that is used to train it. Selection bias is said to occur
when the sample data that is gathered is not representative of the true
future population of cases that the model will see. This bias occurs
mostly in systems that rank the content like recommendation systems,
polls or personalized advertisements. This is because, user responses
for the content that is displayed is collected and the user response for
the content that is not displayed is unknown.
Eg: So let’s say Apple launched a new iPhone and on the same day
Samsung launched a new Galaxy Note. You send out surveys to 1000
people to collect their reviews. Now instead of randomly selecting the
responses for analysis, you decide to choose the first 100 customers
that responded to your survey. This will lead to sampling bias since
those first 100 customers are more likely to be enthusiastic about the
product and are likely to provide good reviews.
Standard Questions
3. Explain Recall Bias with the help of an example
Ans: Recall Bias is a type of measurement bias. It is common
at the data labeling stage of any project. This type of bias
occurs when you label similar type of data inconsistently.
Thus, resulting in lower accuracy. For example, let us say we
have a team labeling images of damaged laptops. The
damaged laptops are tagged across labels as damaged,
partially damaged, and undamaged. Now, if someone in the
team labels an image as damaged and some similar image as
partially damaged, your data will obviously be inconsistent.
Standard Questions
Where,
μ = Population mean
σ = Population standard deviation
μx¯¯¯ = Sample mean
σx¯¯¯ = Sample standard deviation
n = Sample size
Standard Questions
8. What is real life application of central limit theorem?
Ans: Practical implementations of the Central Limit Theorem
include:
1. Voting polls estimate the count of people who support a
particular election candidate. The results of news channels
that come with confidence intervals are all calculated using
the Central Limit Theorem.
2. The Central Limit Theorem can also be used to calculate
the mean family income for a specific region.
Standard Questions
9. Why central limit theorem is important?
Ans: The Central Limit Theorem states that no matter what
the distribution of population is, the shape of the sampling
distribution will always approach normality as the sample size
increases.
This is helpful, as any research never knows which mean in
the sampling distribution is the same as population mean,
however, by selecting many random samples from
population, the sample means will cluster together, allowing
the researcher to make a good estimate of the population
mean.
Having said that, as the sample size increases, the error will
always decrease.
Standard Questions
10. The coaches of various sports around the world use
probability to better their game and create gaming
strategies. Can you explain how probability is applied in this
case and how does it help players?
Ans: Coaches use probability to decide the best possible
strategy to pursue in a game. When a particular batter goes
up to bat in a baseball game, the players and coach can look
up the player’s specific batting average to deduce how that
player will perform. The coach can then plan their approach
accordingly.
Higher Order Thinking Skills
1. As per reports, in October 2019, researchers found that an
algorithm used on more than 200 million people in US hospitals
to predict which patients who would likely need extra medical
care heavily favored white patients over black patients. Can
you reason about what must have caused this bias and
categorize it into the types of bias that you learnt in this
chapter?