Probability Distribution
Probability Distribution
Probability distributions are theoretical. They’re idealized versions of frequency distributions that
aim to describe the population the sample was drawn from.
Random variables:
Variables that follow a probability distribution are called random variables. There’s special
notation you can use to say that a random variable follows a specific distribution:
means “the random variable X follows a normal distribution with a mean of µ and a variance of σ2.”
Types of distributions:
1. Discrete distribution:
a. Discrete probability distributions only include the probabilities of values that are
possible.
b. The probability of all possible values in a discrete probability distribution adds up to
one
A continuous variable can have any value between its lowest and highest values. Therefore,
continuous probability distributions include every number in the variable’s range
You can determine the probability that a value will fall within a certain interval by
calculating the area under the curve within that interval.
Continuous Describes data for which equal-sized intervals have equal The amount of time cars
uniform probability. waits at a red light
Probability mass
Discrete uniform
function
Probability mass
Poisson
function
Probability density
Normal
function
Probability density
Exponential
function
NORMAL DISTRIBUTION:
In a normal distribution, data is symmetrically distributed with no skew. When plotted on a
graph, the data follows a bell shape, with most values clustering around a central region and
tapering off as they go further away from the centre.
Normal distributions are also called Gaussian distributions or bell curves because of their
shape.
Properties of normal distribution:
The empirical rule, or the 68-95-99.7 rule, tells you where most of your values lie in a
normal distribution:
Around 68% of values are within 1 standard deviation from the mean.
Around 95% of values are within 2 standard deviations from the mean.
Around 99.7% of values are within 3 standard deviations from the mean.
Example:
The data follows a normal distribution with a mean score (M) of 1150 and a standard
deviation (SD) of 150.
Around 68% of scores are between 1,000 and 1,300, 1 standard deviation above and
below the mean.
Around 95% of scores are between 850 and 1,450, 2 standard deviations above and
below the mean.
Around 99.7% of scores are between 700 and 1,600, 3 standard deviations above and
below the mean.
The central limit theorem is the basis for how normal distributions work in statistics.
In research, to get a good idea of a populations mean, ideally you’d collect data from multiple
random samples within the population. A sampling distribution of the mean is the
distribution of the means of these different samples.
Law of Large Numbers: As you increase sample size (or the number of samples), then the
sample mean will approach the population mean.
With multiple large samples, the sampling distribution of the mean is normally distributed,
even if your original variable is not normally distributed.
Parametric statistical tests typically assume that samples come from normally distributed
populations, but the central limit theorem means that this assumption isn’t necessary to meet
when you have a large enough sample.
You can use parametric tests for large samples from populations with any kind of distribution
as long as other assumptions are met. A sample size of 30 or more is generally considered
large.
For small samples, the assumption of normality is important because the sampling
distribution of the mean isn’t known. For accurate results, you have to be sure that the
population is normally distributed before you can use parametric tests with small samples.
f(x) = probability
x = value of the variable
μ = mean
σ = standard deviation
σ2 = variance
Standard Normal Distribution:
The standard normal distribution, also called the z-distribution, is a special normal distribution
where the mean is 0 and the standard deviation is 1.
Every normal distribution can be converted to the standard normal distribution by turning the
individual values into z-scores.
Z-scores tell you how many standard deviations away from the mean each value lies.
Z-score formula:
x = individual value
μ = mean
σ = standard deviation
Example:
To find the probability of SAT scores in your sample exceeding 1380, you first find the z-
score.
The mean of our distribution is 1150, and the standard deviation is 150. The z-score tells you
how many standard deviations away 1380 is from the mean.
For a z-score of 1.53, the p-value is 0.937. This is the probability of SAT scores being 1380 or less
(93.7%), and it’s the area under the curve left of the shaded area
To find the shaded area, you take away 0.937 from 1, which is the total area under the curve.
That means it is likely that only 6.3% of SAT scores in your sample exceed 1380