Math Written Reportgroup 4 PDF
Math Written Reportgroup 4 PDF
Math Written Reportgroup 4 PDF
It then issues
statistical reports that indicate changes and trends in the U.Ss population. For instance. according
to The world Factbook, published by the Central Intelligence Agency (CIA), in 2015 there were
approximately 105 males for every 100 females between the ages of 15 and 24.
However, in the category of people 65 years old and older, there were approximately 79
men foe every 100 women.
STATISTICS
BRANCH OF STATISTICS
Descriptive Statistics - is the branch of statistics that involves the collection, organization,
summarization and presentation of data.
Inferential Statistics - is the branch of statistics that interprets and draws conclusions from the
data.
Once of the most basic statistical concepts involves finding measures of central tendency of a set
of numerical data. It is often helpful to find numerical values that locate, in some sense, the
center of a set of data.
Suppose Elle is a senior at a iniversity. In a few months she plans to graduate and start a
career as a landscape architect. A survey of five landscape architect from last year's
senior class shows that they recieved job offers with the following yearly salaries.
• Before Elle interviews for a job, she wishes to determine an average of these 5 salaries.
This average should be a “central” number around which the salaries cluster. We will
consider three types of averages, known as the arithmetic mean, the median and the
mode. Each of these averages is a measure of central tendency for the numerical data.
• The Arithmetic mean is the most commonly used measure of central tendency. The
arithmetic mean of a set of numbers is often referred to as simply the mean.
• To find the arithmetic mean of a set of numbers is often referred to as simply the mean.
To find the mean for a set of data, find the sum of the of the data values and divided by
the number of data values. For instance, to find the mean of the of the 5 salaries listed,
Elle would divided the sum of the salaries by 5.
mean = $43,750+$39,500+$38,000+$41,250+$44,000
5
= $206,500 = $41,300
5
• In statistics it is often necessary to find the sum of a set of numbers. The traditional
symbol used to indicate a summation is the Greek letter sigma,
• Thus the notation x, called summation notation, denoted the sum of all the
numbers in a given set. We can define the mean using summation notation.
• The mean of n numbers is the sum of the numbers divided by n.
Mean = x
The Median
• Another type of average is the median. Essentially, the median is the middle number or
the mean of two middle numbers in a list of numbers that have been arranged in
numerical order from smallest to largest or largest to smallest. Any list of numbers that is
arranged in numerical order from smallest to largest or largest to smallest is a ranked
list.
MEDIAN
The median of a ranked list of n numbers is:
• *the middle number if n is odd
• *the mean of the two middle numbers if n is even.
Find a Median
Find the median of the data in the following lists.
a.) 4,8,1,14,9,21,12 b.) 46,23,92,89,77,108
Solution:
a.) The list 4,8,1,14,9,21,12 contains 7 numbers. The median of a list with an odd number of
entries is found by ranking the numbers and finding the middle number.
Ranking the numbers from smallest to largest gives
1,4,8,9,12,14,21
The middle number is 9. Thus 9 is the median.
b.) The list 46,23,92,89,77, 108 contains 6 numbers. The median of a list of data with an even
number of entries is found by ranking the numbers and computing the meanof the two middle
numbers. Ranking the numbers from smallest to largest gives
23,46,77,89,92,108
The two middle numbers are 77 and 89. The mean 77 and 89 is 83. Thus 83 is the median of the
data.
The Mode
• A third type of average is the mode.
• The mode of a list of numbers is the number that occurs most frequently.
– Some list of numbers do not have a mode. For instance, in the list
1,6,8,10,32,15,49, each number occurs exactly once. Because no number occurs
more often than the other numbers, there is no mode.
– A list o numerical datacan have more than one mode. For instance, in the list
4,2,6,2,7,9,2,4,9,8,9,7, the number 2 occurs three times and the number 9 occuers
three times. Each of the other numbers occurs less than three times. Thus 2 and 9
are both modes for the data
THE RANGE
The range of a set of data values is the difference between the greatest data value and the
least data value.
TABLE 4.5
soda dispensed (ounces)
MACHINE 1 MACHINE 2
9.52 8.01
6.41 7.99
10.07 7.95
5.85 8.03
8.15 8.02
x = 8.0 x = 8.0
EX. 1
• Find the range of the numbers of ounces dispensed by machine 1 in table 4.5
SOLUTION:
• Greatest number of ounces = 10.07
• Least number of ounces = 5.85
• Range = 10.07-5.85 = 4.22 oz
– The range of the numbers of ounces dispensed 4.22 oz
–
EX. 2
SOLUTION:
Greatest number of ounces dispensed = 8.03
Least number of ounces dispensed = 7.95
Range 8.03-7.95 = 0.08 Range
X X-X
SOLUTION:
STEP 1: The mean of the number is
• x= 2+4+7+12+15 = 40 = 8
5 5
STEP 2: For each number, calculate the deviation between the number and the mean.
X X-x
2 2 - 8 = -6
4 4 - 8 = -4
7 7 - 8 = -1
12 12 - 8 = 4
15 15 - 8 = 7
STEP 3: Calculate the square of each deviation in Step 2, and find the sum of these
squared deviations.
X X-x ( X - x )2
2 2 - 8 = -6 ( -6)2 = 36
4 4 - 8 = -4 ( -4 )2 = 16
7 7-8=-1 ( -1 )2 = 1
12 12 - 8 = 4 42 = 16
15 15 - 8 = 7 72 = 49
118 SUM
STEP 4: Because we have a sample of n = 5 values, divide the sum 118 by n - 1, which is
4.
118 = 29.5
4
z-Score
The number of standard deviations between a data value and the mean is known as the
data value's z-score or standard score.
• Population: Zx = x - u
O
The z-score for a given data value x is the number of standard deviations that x is above
or below the mean of the data. The following formulas show how to calculate the z-score
for a data value x in a population and in sample.
Sample: Zx = x - x
s
Example 1
Raul has taken two test in his Chemistry class. He scored 72 on the first test, for which
the mean of all scores was 65 and the standard deviation was 8. He received a 60 on a
second test, for which the mean of all scores was 45 and the standard deviation was 12. In
comparison to the other students, did Raul do better on the first test or the second test?
SOLUTION:
Find the z-score for each test.
Example 2
A costumer group tested a sample of 100 light bulbs. It found that the mean life
expectancy of the bulbs was 842h, with a standard deviation of 90. One particular light
bulb from the DuraBright Company had a z-score of 1.2. What was the life spn of this
light bulb?
SOLUTION:
Substitute the given values into the z-score equation and solve for x.
Given:
zx= 1.2, X= 842, s= 90
Percentiles
A value x is called the pth percentile of a data set provided p% of the data values are less than x.
Example 3
In a recent year, the median annual salary for a physical therapist was $74,480. If the
90th percentile for the annual salary of a physical therapist was $105,900, find the
percent of the physical therapist whose annual salary was
o a. more than $74,480
o b. less than $105,900
o c. between $74,480 and $105,900
Solution:
a) By the definition, the median is the 50th percentile. Therefore, 50% of the physical
therapist earned more than $74,480 per year.
b) Because $105,900 is the 90th percentile, 90% of all physical therapist made less than
$105,900.
c) From parts a and b, 90% - 50% = 40% of the physical therapist earned between $74,480
and $105,900
Large sets of data are often displayed using a grouped frequency distribution or a
histogram.
0-5 6
5-10 17
10-15 43
15-20 92
20-25 151
25-30 192
30-35 190
35-40 149
40-45 90
45-50 45
50-55 15
55-60 10
Download time (in seconds)
The graph of a frequency distribution is called a histogram. A histogram provides a pictorial
view of how the data are ditributed.
Number of Subscribers
0-5 0.6
5-10 1.7
10-15 4.3
15-20 9.2
20-25 15.1
25-30 19.2
30-35 19.0
35-40 14.9
40-45 9.0
45-50 4.5
50-55 1.5
55-60 1.0
Normal Distribution and the Empirical Rule
Normal Distribution forms bell-shaped curve that is symmetric about a vertical line
through the mean of the data.
Table 4.11
Gives bivariate data showing the between two eruptions and the duration of the second
eruption for 10 eruptions of the geyser Old Faithful.
Once the data are collected, a scatter diagram or scatter plot can be drawn, as shown in Figure
4.15.
90 Series1
272, 89
88
86 250, 85
270, 85
84
237, 83
82 203, 81 238, 82
226, 81
80
227, 79
245, 79
78 218, 78
76
0 50 100 150 200 250 300
The Least-Squares Regression line for a set of bivariate data is the line that minimizes the
sum of the squares of the vertical deviations from each data point to the line.
2 2 2 2 2 2 2 2 2 2
d + d +d +d +d +d +d +d +d +d
1 2 3 4 5 6 7 8 9 10
Is the equation of the line best fit. The line regression line is the line that fits the data
better than any other line that might be drawn. In this expression, each 𝑑𝑛 represent the distance
from data point n to the line.
90
Length of eruption (seconds)
88
86
84
82
80
78
76
0 50 100 150 200 250 300
Seconds between eruptions
The Formula for the Least-Squares Line
The equation of the least-square line for the n ordered pairs
(x1,y1),(x2,y2),(x3,y3),…,(xn,yn)
is 𝑦 ̂ = ax + b, where,
n ∑ 𝒙𝒚−(∑ 𝒙)(∑ 𝒚)𝟐
a=
𝒏 ∑𝒙 𝟐−(∑ 𝒙)
To apply this formula to the data for old Faithful, we first find the value of each summation.
(10)(196,636)-(2386)(822)
a= ≈ 𝟎. 𝟏𝟏𝟖𝟗𝟓𝟓𝟗𝟔𝟔𝟔
(10)(573,560)−(𝟐𝟑𝟖𝟔)𝟐
𝑏 = 𝑦 - a𝑥
90
Length of eruption (seconds)
88
86 y = 0.119x + 53.817
84
82
80
78
76
0 50 100 150 200 250 300
Seconds between eruptions
𝑦̂ = 0.1189559666𝑥 + 53.81710637
≈ 0.1189559666(200) + 53.817110637
≈ 78
a. Adult man
Example
Use a Least-Squares Line to make a Prediction
Table 4.12a
a. 2.8m b. 4.8m
Solution
a. 𝑦̂ = 2.730263158𝑥 − 3.316447368
b. 𝑦̂ = 2.730263158𝑥 − 3.316447368
The produce in example a made use of an equation to determine a point between given
data points. This produce is referred to as interpolation. In example b, an equation was used to
determine a point to the right of the given data points. The process of using an equation to
determine a point to the right or left of a given data points is referred to as extrapolation.\
L 12
e Predicted by
n extrapolation
10
g (4.8,9.8)
t
h 8
o
f 6
e 4 (2.8,4.3)
r
u
p 2
t
i
0
o
0 1 2 3 4 5 6
n
Seconds between eruptions
(
s
e
c
Linear Correlation Coefficient
To determine the strength of a linear relationship between two variables, statisticians use
a statistic called the Linear correlation coefficient, which is denoted by the variable r and is
defined as follows.
If the linear correlation coefficient r is positive, the relationship between the variables
has a positive correlation. In this case, if one variable increases, the other variable also
tends to increase.
If r is negative, the linear relationship between the variables hass a negative correlation.
In this case, if one variable increases, the other variable tends to decrease.
In your work with applications that involve the linear correlation coefficient r, it is
important to remember the following properties.
By: GROUP 4
JAYNERALE ENRIQUEZ
MATT LENARD FACUN
HARDY GITTABAO
MELCHOR GRAGASIN
APPLE ANNE GUERERRO
Length of eruption (seconds)