Business Statistics: Lecture 1: Course Introduction & Descriptive Statistics
Business Statistics: Lecture 1: Course Introduction & Descriptive Statistics
1
Goals for this Lecture
2
Contact Information
4
Course Goals
• Be able to:
• Apply basic statistical methods to business
problems
• Understand more advanced statistical
techniques and how they are properly
applied
• Judge good statistics and statistical
practice from bad
• Know when to call in statistical experts
5
Course Outline
• Eleven lectures over nine class
meetings:
• Descriptive statistics
• Basic probability
• Confidence intervals
• Hypothesis testing
• Course texts:
• Numerical
• Mean, median, mode
• Variance standard
deviation, range
• Graphical
• Histograms
• Boxplots
• Scatter plots
8
Probability
• Basic concepts
• Discrete distributions
• Continuous distributions
• Conditional probability
9
Inferential Statistics
• Point Estimation
• Interval Estimation
• E.g., confidence intervals
• Hypotheses testing
• Testing sample means
and variances
10
How to Study Statistics
• Do the reading in multiple passes
• First skim for major ideas before the lecture
• After the lecture, go back for details
• Re-read as necessary to solidify concepts
• Do practice problems (homework)!
• Only after first completing reading assignment
• If necessary, make up simple data to see what
equations are doing
• Don’t just depend on your colleagues
to explain the concepts to you…
11
How Not to Study for this Course
Calvin & Hobbes by Bill Watterson
12
“Statistics”
• “Statistics” has two uses in English:
• Can mean “a collection of numerical data”
• Also refers to a branch of mathematics that
deals with the analysis of statistical data
• This class is all about the latter
• Though we must use “collections of
numerical data” to do our analyses
13
Why Study Statistics?
• The world is an uncertain place
• Your company is recruiting a new CEO.
What compensation should you offer?
• What GMAT score do you need to get in to
an MBA program?
• Statistics gives you the tools to make
informed decisions in uncertain
conditions
14
Statistics Uses Data
15
Variability
• Statistics is more than tabulating numbers
• Data exhibit variability
• CEO’s have different backgrounds, work in
different industries, etc.
• Students vary in ability and luck
• Standard statistics question: “Given the data I
have seen, what is the truth likely to be?”
17
Samples versus Populations
• A population consists of all possible
observations
• Example: All students enrolled in an MBA
program
• A sample is a subset of the population
• Example: Global MBA students are a
sample of all MBA students
• A random sample is a subset not drawn
in any systematic way from population
18
Samples versus Populations
Population Sample
19
Why Sample?
If we could see these: We wouldn’t need these:
• The TV viewing • Nielson survey of a
preferences for every sample of US television
individual in the US viewers
• The diameter of every • The diameters of 100
shaft ever produced by shafts produced by the
a manufacturing same process
process • The proportion of
• The proportion of individuals in a survey
potential customers who claiming knowledge of
know of your product your product
21
A Descriptive Question:
Population Sample
Population Sample
Data
Qualitative Quantitative
Discrete Continuous
25
Notation
• Capital roman letters usually represent
an unknown quantity
• Example: What the outcome of a dice roll?
• Label this outcome “X”
• X can be 1, 2, 3, 4, 5, or 6
X
i 1
i X1 X 2 X 3
27
Continuous Data
• Numerical Summaries
• Location:
Mean, median
• Spread or variability:
Variance, standard deviation, range, percentiles,
quartiles, interquartile range
• Graphical Descriptions
• Histogram
Next class
• Boxplot
• Scatterplot
28
Sample Mean ( x )
• Sample average or sample mean
• Often denoted by x (spoken “x-bar”)
• From previous example:
1 3 x1 x2 x3
x xi
3 i 1 3
n
1
• In general: x
n i 1
xi
Excel tip. Use the built-in function:
= AVERAGE ( cell reference )
29
Population Mean ( )
• Population mean
• Often denoted by (Greek letter “mu”)
N
1
• In general:
N i 1
xi
30
The Median
• The median is the “typical” value
• Steps to calculate the median:
• Order your data from smallest to largest
• If the number of data is odd, the middle
observation is the median
1 3 5 6 12 12 99
• If the number is even, then the average of
the two middle observations is the median
1 3 5 6 12 12 Excel tip. Built-in function:
5.5 = MEDIAN ( cell reference )
31
Mean vs. Median
33
Population Variance (s2)
• Population variance measures data
variability too
• For N observations, the population
variance is
n
1
s (x x )
2 2
i
N i 1
34
Standard Deviation (s or s)
• The standard deviation is the square
root of the variance
s s 2
• Summed: 16+4+4+16 = 40 n
(X
2
i X)
• Divide by n-1: 40/3 = 13.3333 i 1
• Standard deviation: 1 n
2
(Xi X)
• SD = 13.333 3.65 n 1 i 1
36
The Range
• Range is another measure of variability
• Denoted by R
• In words, it is the largest observation in
the sample minus the smallest
observation
• Example: Imagine we collect the ages of
students in the class
• Data: 21, 23, 23, 25, 25, 26, 27, 31, 33, 33, 35
• Range = 35 - 21 = 14
37
Other Measures of Variation
• Percentiles
• pth percentile: value of x such that p% of
the data is less than or equal to x
• Special Percentiles:
• Max: 100th percentile
• Min: 0th percentile
• Median: 50th percentile
• Quartiles: 25th and 75th percentiles
• Interquartile Range (IQR):
IQR = 75th percentile - 25th percentile
38
Categorical Data
• Numerical Measures:
• Mode: most commonly occurring value
• Frequency table: how often each value
occurs
• Graphics:
• Bar chart of frequencies (histogram)
Next
• Mosaic chart (stacked bar chart)
class
• Pareto chart
39
Mode
• Mode is the most frequently occurring
value in the sample or population
• It is the “typical” or “common” value
• For example, in the following data
1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 5, 6, 7
the mode is “1”
• “1” occurs 4 times
• All other observations occur less than 4
times
40
Frequency Tables
• Tables of counts
by two or more
categorical
variables
• Example: Executive
compensation
(Forbes94.jmp)
41
Introduction to JMP
42
Introduction to JMP
• Demonstration using GMAT case study (GMAT.jmp)
43
Remember the Notation
• Summation
• Σ notation and subscripts
• Size
• n denotes size of sample
• N denotes size of population
45
What We’ve Covered
• Introduced professor & course
• Defined some basic statistics
terminology
• Populations vs. samples
• Descriptive vs. inferential statistics
• Learned about some numerical
descriptive statistics
• Measures of location
• Measures of dispersion
• Introduced JMP 46