Evans - Analytics2e - PPT - 05 Data Modelling

Chapter 5
Probability Distributions
and Data Modeling
Basic Concepts of Probability
 Probability is the likelihood that an outcome
occurs. Probabilities are expressed as values
between 0 and 1.
 An experiment is the process that results in an
outcome.
 The outcome of an experiment is a result that
we observe.
 The sample space is the collection of all
possible outcomes of an experiment.

Definitions of Probability
Probabilities may be defined from one of three

perspectives:
 Classical definition: probabilities can be deduced
from theoretical arguments

 Relative frequency definition: probabilities are
based on empirical data

 Subjective definition: probabilities are based on
judgment and experience

Example 5.1 Classical Definition of
Probability
Roll 2 dice
 36 possible rolls (1,1), (1,2),…(6,5), (6,6)
 Probability = number of ways of rolling a number

divided by 35; e.g., probability of a 3 is 2/36
Suppose two consumers try a new product.
 Four outcomes:
1. like, like
2. like, dislike
3. dislike, like
4. dislike, dislike
 Probability at least one dislikes product = 3/4
Example 5.2: Relative Frequency
Definition of Probability
 Use relative frequencies as probabilities
 Probability a computer is repaired in 10 days = 0.076
Probability Rules and Formulas
 Label the n outcomes in a sample space as O1, O2, …,
On, where Oi represents the ith outcome in the sample
space. Let P(Oi) be the probability associated with the
outcome Oi.
 The probability associated with any outcome must be
between 0 and 1.
0 ≤ P(Oi) ≤ 1 for each outcome Oi (5.1)
 The sum of the probabilities over all possible outcomes
must be equal to 1.
P(O1) + P(O2) + … + P(On) = 1 (5.2)
Probabilities Associated with Events
 An event is a collection of one or more
outcomes from a sample space.
 Rule 1. The probability of any event is the sum of
the probabilities of the outcomes that comprise

that event.
Example 5.3: Computing the Probability
of an Event
Consider the events:
 Rolling 7 or 11 on two dice
Probability = 6/36 + 2/36 = 8/36.

 Repair a computer in 7 days or less
Probability =
= O 1 + O2 + O 3 + O 4 + O 5 + O 6 + O7
= 0 + 0 + 0 + 0 + .004 + .008 + .020
= 0.032
Complement of an Event
 If A is any event, the complement of A, denoted
Ac, consists of all outcomes in the sample space
not in A.
 Rule 2. The probability of the complement of any
event A is P(Ac) = 1 – P(A).

of the Complement of an Event
Dice example:
 A = {7, 11}
P(A) = 8/36
 Ac = {2, 3, 4, 5, 6, 8, 9, 10, 12}
 Using Rule 2:
P(Ac) = 1 − 8/36 = 28/36

Union of Events
 The union of two events contains all outcomes
that belong to either of the two events.
◦ If A and B are two events, the probability that some
outcome in either A or B (that is, the union of A and B)
occurs is denoted as P(A or B).
 Two events are mutually exclusive if they have no
outcomes in common.
 Rule 3. If events A and B are mutually exclusive,
then P(A or B) = P(A) + P(B).

of Mutually Exclusive Events
Dice Example:
 A = {7, 11}: P(A) = 8/36
 B = {2, 3, 12}: P(B) = 4/36
 P(A or B) = Union of events A and B
= P(A) + P(B)
= 8/36 + 4/36 = 12/36
Non-Mutually Exclusive Events
 The notation (A and B) represents the intersection of
events A and B – that is, all outcomes belonging to
both A and B .
 Rule 4. If two events A and B are not mutually
exclusive, then P(A or B) = P(A)+ P(B) - P(A and B).
of Non-Mutually Exclusive Events
Dice Example:
 A = {2, 3, 12}: P(A) = 4/36
 B = {even number} : P(B) = 18/36
 (A and B) = {2, 12}: P(A and B) = 2/36
 P(A or B) = P(A) + P(B)− P(A and B)
= 4/36 + 18/36− 2/36

= 20/36
Joint and Marginal Probability
 The probability of the intersection of two events is
called a joint probability.
 The probability of an event, irrespective of the
outcome of the other joint event, is called a

marginal probability.
Application of Joint and Marginal
Probability
 A sample of 100 individuals were asked to evaluate their preference for
three new proposed energy drinks in a blind taste test.
 The sample space consists of two types of outcomes corresponding to each
individual: gender (F = female or M = male) and brand preference (B1, B2, or
B3).
 Define a new sample space consisting of the outcomes that reflect the
different combinations of outcomes from these two sample spaces.
◦ O1 = the respondent is female and prefers brand 1
◦ O4 = the respondent is male and prefers brand 1
 The probability of each of these events is the intersection of the gender and
brand preference event. For example, P(O1) = P(F and B1)
Example 5.7: Applying Probability
Rules to Joint Events
 Energy Drink Survey
 The joint probabilities of gender and brand preference
are calculated by dividing the number of respondents
corresponding to each of the six outcomes listed above
by the total number of respondents, 100.
◦ E.g., P(F and B1) = P(O1) = 9/100 = 0.09
Joint
probabilities
Example 5.7: Continued
 The marginal probabilities for gender and brand
preference are calculated by adding the joint
probabilities across the rows and columns
◦ E.g., the event F, (respondent is female) is comprised of the
outcomes O1, O2, and O3, and therefore P(F) = P(F and B1) +
P(F and B2) + P(F and B3) = 0.37
Marginal
probabilities
Joint/Marginal Probability Rule
 Calculations of marginal probabilities leads to the
following probability rule:
 Rule 5. If event A is comprised of the outcomes

{A1, A2, …, An} and event B is comprised of the
outcomes {B1, B2, …, Bn}, then
P(Ai) = P(Ai and B1) + P(Ai and B2) + … + P(Ai and Bn)
Example 5.7 Continued
 Events F and M are mutually exclusive, as are events B1, B2, and B3
since a respondent may be only male or female and prefer exactly
one of the three brands. We can use Rule 3 to find, for example,
P(B1 or B2) = 0.34 + 0.23 = 0.57.
 Events F and B1, however, are not mutually exclusive because a
respondent can be both female and prefer brand 1. Therefore, using
Rule 4, we have P(F or B1) = P(F) + P(B1) – P(F and B1) = 0.37 +
0.34 – 0.09 = 0.62.
Conditional Probability
 Conditional probability is the probability of
occurrence of one event A, given that another
event B is known to be true or has already
occurred.
Example 5.8 Computing a Conditional
Probability in a Cross-Tabulation
 Suppose we know a respondent is male. What is the probability that
he prefers Brand 1?
 Using cross-tabulation: Of 63 males, 25 prefer Brand 1, so the
probability of preferring Brand 1 given that a respondent is male =
25/63
 Using joint probability table: divide the joint probability 0.25 (the
probability that the respondent is male and prefers brand 1) by the
marginal probability 0.63 (the probability that the respondent is male).
Example 5.9: Conditional Probability in
Marketing
 Apple Purchase History
 The PivotTable shows the count of the
type of second purchase given that
each product was purchased first.
 Probability of purchasing an
iPad given that a customer already
purchased an iMac = 2/13
Conditional Probability Formula
 The conditional probability of an event A given
that event B is known to have occurred is
 We read the notation P(A|B) as “the probability of

A given B.”
Example 5.10: Using the Conditional
Probability Formula
 P(B1|M) = P(B1 and M)/ P(M) = (0.25)/(0.63) = 0.397
 P(B1|F) = P(B1 and F)/ P(F) = (0.09)/(0.37) = 0.243
 Summary of conditional probabilities:
 Applications in marketing and advertising.

Variations of the Conditional
Probability Formula
 P(A and B) = P(A | B) P(B)

 P(B and A) = P(B | A) P(A)
◦ Note: P(A and B) = P(B and A)
 Multiplication law of probability:

Extension of the Multiplication Law
 Suppose B1, B2, . . . , Bn are mutually exclusive
events whose union comprises the entire sample
space. Then
Example 5.11: Using the Multiplication
Law of Probability
 “Texas Hold ‘Em” Poker
 Probability of pocket aces (two aces in hand)
 A = Ace on first card; A = Ace on second card

1 2
 P(A1 and A2) = P(A2|A1) P(A1)
= (3/51) (4/52)
= 0.004525
Independent Events
 Two events A and B are independent if
P(A | B) = P(A).
 Energy Drink Survey example: the probability of
preferring a brand depends on gender.

 Thus, we may say that brand preference and
gender are not independent.

Example 5.12: Determining if Two Events
are Independent
 Are Gender and Brand Preference Independent?
 P(B1) = 0.34
 P(B1|M) = 0.397
 Because 0.397 ≠ 0.34, Gender and Brand Preference

are not independent.
Multiplication Law for Independent
Events
 If two events are independent, then we can
simplify the multiplication law of probability in
equation (5.4)
 by substituting P(A) for P(A | B):

Example 5.13: Using the Multiplication
Law for Independent Events
Dice Rolls:
 Rolling pairs of dice are independent events since
they do not depend on the previous rolls.

 A = {roll a sum of 6 on first roll}
 B = {roll a sum of 2, 3, or 12 on second roll}
 Using formula (5.5): P(A and B) = P(A) P(B)
= (5/36) (4/36) = 0.0154

Random Variables
 A random variable is a numerical description of
the outcome of an experiment.
 A discrete random variable is one for which the
number of possible outcomes can be counted.

 A continuous random variable has outcomes
over one or more continuous intervals of real

numbers.
Example 5.14: Discrete and Continuous
Random Variables
Examples of discrete random variables:
 outcomes of dice rolls
 whether a customer likes or dislikes a product
 number of hits on a Web site link today
Examples of continuous random variables:

 weekly change in DJIA
 daily temperature
 time between machine failures
Probability Distributions
 A probability distribution is a characterization of
the possible values that a random variable may
assume along with the probability of assuming
these values.
 We may develop a probability distribution using
any one of the three perspectives of probability:

classical, relative frequency, and subjective.
Example 5.14 Probability Distribution of
Dice Rolls
Empirical Probability Distributions
 We can calculate the relative frequencies from a sample
of empirical data to develop a probability distribution.
Because this is based on sample data, we usually call
this an empirical probability distribution.
 An empirical probability distribution is an approximation
of the probability distribution of the associated random
variable, whereas the probability distribution of a random
variable, such as one derived from counting arguments,
is a theoretical model of the random variable.
Empirical Probability Distribution
Example
Subjective Probability Distributions
 We could simply specify a probability distribution
using subjective values and expert judgment.
 This is often done in creating decision models for
phenomena for which we have no historical data.

Example 5.16: A Subjective Probability
Distribution
 Distribution of an expert’s assessment of how the
DJIA might change next year.
Discrete Probability Distributions
 For a discrete random variable X, the probability
distribution of the discrete outcomes is called a
probability mass function and is denoted by a
mathematical function, f(x).
◦ The symbol xi represents the ith value of the random
variable X and f(xi) is the probability.
 Properties:
◦ the probability of each outcome must be between 0 and
1
◦ the sum of all probabilities must add to 1
Example 5.17: Probability Mass Function
for Rolling Two Dice
 xi = values of the random variable X, which
represents sum of the rolls of two dice
◦ x1 = 2, x2 = 3, …, x10 = 11, x11 = 12
 f(x1) = 1/36 = 0.0278; f(x2) = 2/36 = 0.0556, etc.
Cumulative Distribution Function
 A cumulative distribution function, F(x), specifies
the probability that the random variable X
assumes a value less than or equal to a specified
value, x; that is,
F(x) = P(X ≤ x)
Example 5.18: Using the Cumulative
Distribution Function
 Probability of rolling a 6 or less = F(6) = 0.1667
 Probability of rolling between 4 and 8:
= P(4 ≤ X ≤ 8) = P(3 < X ≤ 8) = P(X ≤ 8) – P(X ≤ 3)

= 0.7222 – 0.0833 = 0.6389
Expected Value of a Discrete Random
Variable
 The expected value of a random variable
corresponds to the notion of the mean, or
average, for a sample.
 For a discrete random variable X, the expected
value, denoted E[X], is the weighted average of all

possible outcomes, where the weights are the
probabilities:
Example 5.19: Computing the Expected
Value
 Rolling two dice
◦ E[X] = 2(0.0278) + 3(0.0556) + 4(0.0833) + 5(0.1111) +
6(0.1389) + 7(0.1667) + 8(0.1389) + 9(0.1111) +
10(0.0833) + 11(0.0556) + 12(0.0278) = 7
Example 5.20: Expected Value on
Television
The Apprentice
 Teams were required to select an artist (mainstream or avant-garde)
and sell their art for the most money possible. A back-of-the-
envelope expected value calculation would have easily predicted
the winner.
Deal or No Deal
 Contestant had 5 briefcases left with $100, $400, $1000, $50,000 or
$300,000 in them.
 Expected value of briefcases is $70,300.
 Banker offered contestant $80,000 to quit, which was higher than
the expected value. The probability of choosing the $300,000

briefcase was only 0.2, so the decision should have been easy to
make.
Expected Value and Decision Making
 The expected value is a “long-run average” and is
appropriate for decisions that occur on a repeated
basis.
 For one-time decisions, however, you need to
consider the downside risk and the upside

potential of the decision.
Example 5.21: Expected Value of a
Charitable Raffle
 Cost of raffle ticket is $50
 1000 raffle tickets are sold.
 Winning prize is $25,000
 E[X] = −$25
 If you played this game repeatedly over the long run, you
would lose an average of $25.00 each time you play.
 For any one game, you would either lose $50 or win
$24,950.
◦ Is the risk of losing $50 worth the potential of winning $24,950?
Example 5.22: Airline Revenue
Management
 Full and discount airfares are available for a flight.
 Full-fare ticket costs $560
 Discount ticket costs $400
 X = ticket price paid
 p = 0.75 (the probability of selling a full-fare ticket)
 E[X] = 0.75($560) + 0.25(0) = $420
 The airline should not discount full-fare tickets

because the expected value of a full-fare ticket is
greater than the cost of a discount ticket.
 Break-even point: $400 = p($560) or p = 0.714
Variance of a Discrete Random
Variable
 The variance, Var[X ], of a discrete random
variable X is a weighted average of the squared
deviations from the expected value:
Example 5.23: Computing the Variance of
a Random Variable
 Rolling two dice
Bernoulli Distribution
 Two possible outcomes, “success” and “failure,” each
with a constant probability of occurrence; p is the
probability of a success and 1 – p is the probability of a
failure
 Typically, x = 1 represents “success” and x = 0
represents “failure”
 Probability mass function:
 E[X] = p
 Var[X] = p(1 − p)
Example 24: Using the Bernoulli
Distribution
 The Bernoulli distribution can be used to model whether
an individual responds positively (x = 1), or negatively
(x = 0) to a telemarketing promotion.
 For example, if you estimate that 20% of customers
contacted will make a purchase, the probability
distribution that describes whether or not a particular
individual makes a purchase is Bernoulli with p = 0.2
Binomial Distribution
 Models n independent replications of a Bernoulli experiment, each
with a probability p of success.
◦ X represents the number of successes in these n experiments
 The number of ways of choosing x distinct items from a group of n

items and is
where n! (n factorial) = n(n - 1)(n - 2) . . . (2)(1), and 0! is defined as 1.

 Expected value = np; variance = np(1 – p)
Example 5.25: Computing Binomial
Probabilities
 Suppose 10 individuals receive a telemarking promotion. Each
individual has a 0.2 probability of making a purchase. Find the
probability that exactly 3 of the 10 individuals make a purchase.
 The probability distribution that x individuals out of 10 calls will make
a purchase is:
 Excel function:
=BINOM.DIST(number_s, trials, probability_s, cumulative)
 If cumulative is set to TRUE, then this function will provide
cumulative probabilities; otherwise the default is FALSE, and it
provides values of the probability mass function, f(x).
Example 5.26: Using Excel’s Binomial
 The probability that exactly 3 of 10 individuals will make
a purchase is P(x = 3): =BINOM.DIST(3,10,0.2,TRUE) =
0.20133
 The probability that 3 or fewer of 10 individuals will make
a purchase is P(x ≤ 3): =BINOM.DIST(3,10,0.2,FALSE)
= 0.87913
Shapes and Skewness of the Binomial
Distribution
 The binomial distribution is symmetric when
p = 0.5; positively skewed when p < 0.5,
and negatively skewed when p > 0.5.
Example of
negatively-skewed
distribution
Poisson Distribution
 Models the number of occurrences in some unit of
measure (often time or distance).
 There is no limit on the number of occurrences.
 The average number of occurrence per unit is a constant
denoted as λ.
 Expected value = λ; variance = λ

Example 5.27: Computing Poisson
Probabilities
 Suppose the average number of customers arriving at a
Subway restaurant during lunch hour is λ =12 per hour.
 The probability that exactly x customers arrive during the
hour is given by the Poisson distribution with a mean of
12.
 Excel function: =POISSON.DIST(x, mean, cumulative)

Example 5.28: Using Excel’s Poisson
 With l = 12, the probability that X = 1 is
=POISSON.DIST(A7,$B$3,FALSE) = 0.00007
 The probability that X ≤ 4 is
=POISSON.DIST(A10,$B$3,TRUE) = 0.00760
Continuous Probability Distributions
 A probability density function is a mathematical
function that characterizes a continuous random variable
Continuous Probability Distributions
 Properties
 f(x) ≥ 0 for all values of x
 Total area under the density function equals 1.
 P(X = x) = 0
 Probabilities are only defined over intervals.
 P(a ≤ X ≤ b) is the area under the density function between
a and b.
Uniform Distribution
 The uniform distribution characterizes a continuous
random variable for which all outcomes between a
minimum (a) and a maximum (b) are equally likely.
 Density function:
 Cumulative distribution function:
 Expected value = (a + b)/2; variance = (b – a)2/12

Example 5.29: Computing Uniform
Probabilities
 Sales revenue for a product varies uniformly each week between
$1000 and $2000.
 Probability that sales revenue will be less than x = $1,300.
◦ F(1,300) = (1,300 - 1,000) / (2,000 - 1,000) = 0.30
 Probability that revenue will be between $1,500 and $1,700.

◦ P(1,500 ≤ X ≤ 1,700) = P(X ≤ 1,700) - P(X ≤ 1,500) = F(1,700) - F(1,500)
= F(1,700) - F(1,500) = 0.7 – 0.5 = 0.2
Discrete Uniform Distribution
 A variation of the uniform distribution is one for
which the random variable is restricted to integer
values between a and b (also integers); this is
called a discrete uniform distribution.
◦ Example: roll of a single die. Each of the numbers 1
through 6 have a 1/6 probability of occurrence.
Normal Distribution
 f(x) is a bell-shaped curve
 Characterized by 2 parameters:
 (mean)
 (standard deviation)
 Properties
1. Symmetric
2. Mean = Median = Mode
3. Range of X is unbounded
4. Empirical rules apply
Computing Normal Probabilities
Excel function:
=NORM.DIST(x, mean, standard_deviation, cumulative).
◦ NORM.DIST(x, mean, standard_deviation, TRUE) calculates the
cumulative probability
◦ If cumulative is set to FALSE, the function simply calculates the
value of the density function f(x), which has little practical
application.
Example 5.30 Using the NORM.DIST
Function to Compute Normal Probabilities
 The distribution for customer demand (units per month) is normal
with mean = 750 and standard deviation = 100
 Find the probability that demand will be:
1. at most 900 units/month
2. exceed 700 units/month
3. be between 700 and 900 units/month
Example 5.30: Question 1
 Probability that demand will be at most 900 units,
or P(X ≤ 900):
◦ =NORM.DIST(900,750,100,TRUE) = 0.9332.
 Probability that demand will exceed 700 units, or P(X >
700).
◦ =1-NORM.DIST(700,750,100,TRUE) = 1 - 0.3085 = 0.6915
 Probability that demand will be between 700 and 900, or
P(700 < X < 900):
◦ =NORM.DIST(900,750,100,TRUE) -
NORM.DIST(700,750,100,TRUE) =0.9332 - 0.3085 = 0.6247
The NORM.INV Function
Normal Inverse function:
=NORM.INV(probability, mean, stdev)
provides the x value with F(x) = probability
Example 5.31: Using the NORM.INV
Function
 What level of demand would be exceeded at most
10% of the time?
 Find x such that F(x) = 0.90:
= NORM.INV(0.90, 750, 100) = 878.155

Standard Normal Distribution
 A standard normal distribution is a normal distribution
with a mean of 0 and standard deviation of 1.
◦ A standard normal random variable is denoted by Z.
◦ The scale along the z-axis represents the number of standard
deviations from the mean of zero.
◦ The Excel function =NORM.S.DIST(z) finds probabilities for the
standard normal distribution.
Example 5.32: Computing Probabilities
with the Standard Normal Distribution
 Verify the empirical rules using Excel.
 Example: The probability within one standard deviation
of the mean is P(-1 < Z < 1)
= NORMS.DIST(1) – NORMS.DIST(-1)
= 0.84134 – 0.15866
= 0.6827
~ 68%
Using Standard Normal Distribution
Tables
 Table 1 of Appendix A
 We may compute probabilities for any normal random

variable X having a mean m and standard deviation s by
converting it to a standard normal random variable Z:
Example 5.33: Computing Probabilities
with Standard Normal Tables
 From Example 5.30, what is the
probability that demand will be at
least 900 units/month?
 z = (900 − 750)/100 = 1.50
 Using Table 1 in Appendix A, we
find:
 P(X < 900) = P(Z < 1.50) = 0.93319
Exponential Distribution
 Models the time between randomly occurring events
 Density function:
 Cumulative distribution function:
 Mean = m = 1/l
 Excel function:
◦ =EXPON.DIST(x, lambda, cumulative)
If the number of events
occurring during an interval of
time has a Poisson distribution,
then the time between events is
exponentially distributed.
Example 5.34: Using the Exponential
Distribution
 The mean time to failure of a critical engine component is µ = 8,000
hours. What is the probability of failing before 5000 hours?
 P(X < x) =EXPON.DIST(x, lambda, cumulative)
 λ = 1/8000
 P(X < 5000) =EXPON.DIST(5000, 1/8000, TRUE)
= 0.4647
Other Useful Distributions
 Triangular Distribution
 Lognormal Distribution
 Beta Distribution
Random Sampling from Probability Distributions
 A random number is one that is uniformly

distributed between 0 to 1.
 Excel function: =RAND( )
Example 5.35: Sampling from the Distribution of
Dice Outcomes
Probability distribution Intervals for random sampling
1. Generate a random number

2. Find the interval in which it falls
3. Use the associated outcome as the sample
Example 5.36: Using the VLOOKUP Function
 Sample from the probability distribution of predicted
change in the Dow Jones Industrial Average index
 Compute F(x) and assign intervals to outcomes
 Generate random numbers using the Excel function
=RAND( )
◦ E.g. Cell J2: =VLOOKUP(I2,$E2:$G$10,3)
Sampling from Common Probability
Distributions
 A value randomly generated from a specified probability
distribution is called a random variate.
◦ Example: Uniform distribution
 Analysis Toolpak Random Number Generation Tool

◦ Can sample from uniform, normal, Bernoulli, binomial, Poisson,
patterned, and discrete distributions.
◦ Can also specify a random number seed – a value from which a
stream of random numbers is generated. By specifying the same
seed, you can produce the same random numbers at a later time.
Example 5.37: Using Excel’s Random Number
Generation Tool
 Generate 100 outcomes
from a Poisson distribution
with a mean of 12
◦ Number of Variables = 1.
◦ Number of Random Numbers
= 100
◦ Distribution = Poisson
◦ Dialog changes and prompts
you to enter Lambda (mean
of Poisson) = 12
Example 5.37 Results
(Histogram created manually)

Using Excel Functions to Generate Random
Variates
 Normal: =NORM.INV(RAND( ), mean, stdev)
 Standard normal: =NORM.S.INV(RAND( ))
Example 5.38: A Sampling Experiment for
Evaluating Capital Budgeting Projects
 In finance, one way of evaluating capital budgeting

projects is to compute a profitability index: PI = PV / I,
 PV is the present value of future cash flows
 I is the initial investment
 What is the probability distribution of PI when PV is
estimated to be normally distributed with a mean of $12
million and a standard deviation of $2.5 million, and the
initial investment is also estimated to be normal with a
mean of $3.0 million and standard deviation of $0.8
million.?
 Column F:
=NORM.INV(RAND( ), 12, 2.5)
 Column G:
=NORM.INV(RAND( ), 3, 0.8)
Analytic Solver Platform Distribution Functions
 Analytic Solver Platform provides Excel functions

to generate random variates for many distributions
Example 5.39: Using Analytic Solver Platform
Distribution Functions
 An energy company was considering offering a new product and
needed to estimate the growth in PC ownership.
 Using the best data and information available, they determined that
the minimum growth rate was 5.0%, the most likely value was 7.7%,
and the maximum value was 10.0% (a triangular distribution).
◦ A portion of 500 samples that were generated using the function
PsiTriangular(5%, 7.7%, 10%):
Data Modeling and Distribution Fitting
 Using sample data may limit our ability to predict
uncertain events that may occur because potential
values outside the range of the sample data are not
included.
 A better approach is to identify the underlying probability
distribution from which sample data come by “fitting” a
theoretical distribution to the data and verifying the
goodness of fit statistically.
◦ Examine a histogram for clues about the distribution’s shape
◦ Look at summary statistics such as the mean, median, standard
deviation, coefficient of variation, and skewness
Example 5.40: Analyzing Airline
Passenger Data
 Sample data on passenger demand for 25 flights
◦ The histogram shows a relatively symmetric distribution. The

mean, median, and mode are all similar, although there is
moderate skewness. A normal distribution is not unreasonable.
Example 5.41: Analyzing Airport Service
Times
 Sample data on service times for 812 passengers at an
airport’s ticketing counter
◦ It is not clear what the distribution might be. It does not appear to
be exponential, but it might be lognormal or another distribution.
Goodness of Fit
 A better approach that simply visually examining a
histogram and summary statistics is to analytically fit the
data to the best type of probability distribution.
 Three statistics measure goodness of fit:
◦ Chi-square (need at least 50 data points)
◦ Kolmogorov-Smirnov (works well for small samples)
◦ Anderson-Darling (puts more weight on the differences between
the tails of the distributions)
 Analytic Solver Platform has the capability of fitting a
probability distribution to data.
Example 5.42: Fitting a Distribution to
Airport Service Times
1. Highlight the data
Analytic Solver Platform >
Tools > Fit
2. Fit Options dialog
Type: Continuous
Test: Kolmorgov-Smirnov
Click Fit button
 The best-fitting distribution is called an Erlang
distribution.

Evans - Analytics2e - PPT - 05 Data Modelling

Uploaded by

Copyright:

Available Formats

Evans - Analytics2e - PPT - 05 Data Modelling

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evans - Analytics2e - PPT - 05 Data Modelling

Uploaded by

Copyright:

Available Formats

Chapter 5

possible outcomes of an experiment.

Probabilities may be defined from one of three

from theoretical arguments

based on empirical data

judgment and experience

 Probability = number of ways of rolling a number

the probabilities of the outcomes that comprise

Probability = 6/36 + 2/36 = 8/36.

event A is P(Ac) = 1 – P(A).

P(Ac) = 1 − 8/36 = 28/36

then P(A or B) = P(A) + P(B).

= 4/36 + 18/36− 2/36

outcome of the other joint event, is called a

 Rule 5. If event A is comprised of the outcomes

 We read the notation P(A|B) as “the probability of

 P(B1|M) = P(B1 and M)/ P(M) = (0.25)/(0.63) = 0.397

 P(B1|F) = P(B1 and F)/ P(F) = (0.09)/(0.37) = 0.243

 Summary of conditional probabilities:

 Applications in marketing and advertising.

 P(A and B) = P(A | B) P(B)

◦ Note: P(A and B) = P(B and A)

 Multiplication law of probability:

 A = Ace on first card; A = Ace on second card

preferring a brand depends on gender.

gender are not independent.

 Because 0.397 ≠ 0.34, Gender and Brand Preference

 by substituting P(A) for P(A | B):

they do not depend on the previous rolls.

= (5/36) (4/36) = 0.0154

number of possible outcomes can be counted.

over one or more continuous intervals of real

Examples of continuous random variables:

any one of the three perspectives of probability:

phenomena for which we have no historical data.

= P(4 ≤ X ≤ 8) = P(3 < X ≤ 8) = P(X ≤ 8) – P(X ≤ 3)

value, denoted E[X], is the weighted average of all

 Banker offered contestant $80,000 to quit, which was higher than

the expected value. The probability of choosing the $300,000

consider the downside risk and the upside

 The airline should not discount full-fare tickets

 The number of ways of choosing x distinct items from a group of n

where n! (n factorial) = n(n - 1)(n - 2) . . . (2)(1), and 0! is defined as 1.

 Expected value = λ; variance = λ

 Excel function: =POISSON.DIST(x, mean, cumulative)

 Cumulative distribution function:

 Expected value = (a + b)/2; variance = (b – a)2/12

 Probability that revenue will be between $1,500 and $1,700.

= NORM.INV(0.90, 750, 100) = 878.155

 We may compute probabilities for any normal random

 Cumulative distribution function:

 A random number is one that is uniformly

1. Generate a random number

 Analysis Toolpak Random Number Generation Tool

(Histogram created manually)

 In finance, one way of evaluating capital budgeting

 Analytic Solver Platform provides Excel functions

◦ The histogram shows a relatively symmetric distribution. The

You might also like