Math Written Reportgroup 4 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

The U.S. government collects data on the population of the United States.

It then issues
statistical reports that indicate changes and trends in the U.Ss population. For instance. according
to The world Factbook, published by the Central Intelligence Agency (CIA), in 2015 there were
approximately 105 males for every 100 females between the ages of 15 and 24.
However, in the category of people 65 years old and older, there were approximately 79
men foe every 100 women.

STATISTICS

 4.1 Measures of Central Tendency


 4.2 Measures of Dispersion
 4.3 Measures of Relative Position
 4.4 Normal Distribution
 4.5 Linear Regression and Correlation

4.1 Measures of Central Tendency


The Arithmetic Mean
Statistics involves the collection, organization, summarization, presentation and
interpretation of data.

BRANCH OF STATISTICS
Descriptive Statistics - is the branch of statistics that involves the collection, organization,
summarization and presentation of data.
Inferential Statistics - is the branch of statistics that interprets and draws conclusions from the
data.
Once of the most basic statistical concepts involves finding measures of central tendency of a set
of numerical data. It is often helpful to find numerical values that locate, in some sense, the
center of a set of data.

 Suppose Elle is a senior at a iniversity. In a few months she plans to graduate and start a
career as a landscape architect. A survey of five landscape architect from last year's
senior class shows that they recieved job offers with the following yearly salaries.

$43,750 $39,500 $38,000 $41,250 $44,000

• Before Elle interviews for a job, she wishes to determine an average of these 5 salaries.
This average should be a “central” number around which the salaries cluster. We will
consider three types of averages, known as the arithmetic mean, the median and the
mode. Each of these averages is a measure of central tendency for the numerical data.
• The Arithmetic mean is the most commonly used measure of central tendency. The
arithmetic mean of a set of numbers is often referred to as simply the mean.

• To find the arithmetic mean of a set of numbers is often referred to as simply the mean.
To find the mean for a set of data, find the sum of the of the data values and divided by
the number of data values. For instance, to find the mean of the of the 5 salaries listed,
Elle would divided the sum of the salaries by 5.

mean = $43,750+$39,500+$38,000+$41,250+$44,000
5
= $206,500 = $41,300
5

• In statistics it is often necessary to find the sum of a set of numbers. The traditional
symbol used to indicate a summation is the Greek letter sigma, 

• Thus the notation  x, called summation notation, denoted the sum of all the
numbers in a given set. We can define the mean using summation notation.
• The mean of n numbers is the sum of the numbers divided by n.

Mean =  x

The Median
• Another type of average is the median. Essentially, the median is the middle number or
the mean of two middle numbers in a list of numbers that have been arranged in
numerical order from smallest to largest or largest to smallest. Any list of numbers that is
arranged in numerical order from smallest to largest or largest to smallest is a ranked
list.

MEDIAN
The median of a ranked list of n numbers is:
• *the middle number if n is odd
• *the mean of the two middle numbers if n is even.

Find a Median
Find the median of the data in the following lists.
a.) 4,8,1,14,9,21,12 b.) 46,23,92,89,77,108
Solution:
a.) The list 4,8,1,14,9,21,12 contains 7 numbers. The median of a list with an odd number of
entries is found by ranking the numbers and finding the middle number.
Ranking the numbers from smallest to largest gives
1,4,8,9,12,14,21
The middle number is 9. Thus 9 is the median.

b.) The list 46,23,92,89,77, 108 contains 6 numbers. The median of a list of data with an even
number of entries is found by ranking the numbers and computing the meanof the two middle
numbers. Ranking the numbers from smallest to largest gives
23,46,77,89,92,108
The two middle numbers are 77 and 89. The mean 77 and 89 is 83. Thus 83 is the median of the
data.

The Mode
• A third type of average is the mode.
• The mode of a list of numbers is the number that occurs most frequently.
– Some list of numbers do not have a mode. For instance, in the list
1,6,8,10,32,15,49, each number occurs exactly once. Because no number occurs
more often than the other numbers, there is no mode.
– A list o numerical datacan have more than one mode. For instance, in the list
4,2,6,2,7,9,2,4,9,8,9,7, the number 2 occurs three times and the number 9 occuers
three times. Each of the other numbers occurs less than three times. Thus 2 and 9
are both modes for the data

4.2 MEASURES OF DISPERSION


In the preceding section we introduced three types of data set: the mean, median and the
mode. Some characteristics of a set of data may not be evidence from an examination of
averages. For instances, consider a soft - drink dispensing machine thaht should dispense 8 0z of
your selection into cup. Table 4.5 shows data for two of these machines. o measure the spread or
dispersion of data, we must introduce statistical value known as the range and standard deviation.

THE RANGE
The range of a set of data values is the difference between the greatest data value and the
least data value.
TABLE 4.5
soda dispensed (ounces)

MACHINE 1 MACHINE 2

9.52 8.01

6.41 7.99

10.07 7.95

5.85 8.03

8.15 8.02

x = 8.0 x = 8.0

EX. 1
• Find the range of the numbers of ounces dispensed by machine 1 in table 4.5

 SOLUTION:
• Greatest number of ounces = 10.07
• Least number of ounces = 5.85
• Range = 10.07-5.85 = 4.22 oz
– The range of the numbers of ounces dispensed 4.22 oz

EX. 2

 Find range of the numbers of ounces dispensed by Machine 2 in table 4.5

 SOLUTION:
 Greatest number of ounces dispensed = 8.03
 Least number of ounces dispensed = 7.95
 Range 8.03-7.95 = 0.08 Range

The Standard Deviation


The standard of a set of numerical data makes use of the amount by which individual data
value deviates from the mean. These deviations, represented by ( x - x), are positive when the
data value x is greater than the mean x and are negative when x is less than the mean x. The
sum of all data in the deviations ( x - x) is 0 for all sets of data.
Because the sum of all the deviations of the data is always 0(zero), we cannot use the sum
of the deviations as a measure and dispersion. Instead, the standard deviation uses the sum of
squares.

X X-X

8.01 8.01-8 = 0.01

7.99 7.99-8 = -0.01

7.45 7.95-8 = -0.05

8.03 8.03-8 = 0.03

8.02 8.02-8 = 0.02

mean = 8 sum of = 0 deviation

Procedure for Computing a Standard Deviation


1. Determine the mean of the n numbers.
2. For each number, calculate the deviation (difference) between the number and the mean
of the number.
3. Calculate the square of each deviation and the find the sum of these squared deviations.
4. If the data is a population, then divide the sum by n. If the data is a sample, then divide
the sum by n - 1.
5. Find the square root of the quotient in step 4.

Example 3: Find the Standard Deviation

 The following numbers were obtained by sampling a population.


2, 4, 7, 12, 15

 Find the standard deviation of the sample.

 SOLUTION:
 STEP 1: The mean of the number is

• x= 2+4+7+12+15 = 40 = 8
5 5
 STEP 2: For each number, calculate the deviation between the number and the mean.
X X-x

2 2 - 8 = -6

4 4 - 8 = -4

7 7 - 8 = -1

12 12 - 8 = 4

15 15 - 8 = 7

 STEP 3: Calculate the square of each deviation in Step 2, and find the sum of these
squared deviations.

X X-x ( X - x )2

2 2 - 8 = -6 ( -6)2 = 36

4 4 - 8 = -4 ( -4 )2 = 16

7 7-8=-1 ( -1 )2 = 1

12 12 - 8 = 4 42 = 16

15 15 - 8 = 7 72 = 49

118 SUM

 STEP 4: Because we have a sample of n = 5 values, divide the sum 118 by n - 1, which is
4.
118 = 29.5
4

 STEP 5: The standard deviation of the sample is s= 29.5 .


To the nearest hundreds of the standard deviation is S = 5.43
The Variance
• A statistic known as the variance is also used as a measure of dispersion. The variance
for a given set of data is the square of the standard deviation of the data. The following
chart shows the mathematical notations that are used to denote standard deviations and
the variances.
Notations for Standard Deviation and Variance
• O - is the standard deviation of a population
• O 2 - is the variance of a population
• S - is the standard deviation of a sample
• s2 - is the variance of a sample

4.3 MEASURES OF RELATIVE POSITION


Measures of Position. Statisticians often talk about the position of a value, relative to other
values in a set of data. The most common measures of position are standard scores (aka, z-
scores), percentiles, and quartiles.

z-Score

 The number of standard deviations between a data value and the mean is known as the
data value's z-score or standard score.
• Population: Zx = x - u
O

 The z-score for a given data value x is the number of standard deviations that x is above
or below the mean of the data. The following formulas show how to calculate the z-score
for a data value x in a population and in sample.

 Sample: Zx = x - x
s
Example 1

 Raul has taken two test in his Chemistry class. He scored 72 on the first test, for which
the mean of all scores was 65 and the standard deviation was 8. He received a 60 on a
second test, for which the mean of all scores was 45 and the standard deviation was 12. In
comparison to the other students, did Raul do better on the first test or the second test?
SOLUTION:
Find the z-score for each test.

Z72 = 72 - 65 =0.875 Z60 = 60 - 45 = 1.25


8 12

Example 2

 A costumer group tested a sample of 100 light bulbs. It found that the mean life
expectancy of the bulbs was 842h, with a standard deviation of 90. One particular light
bulb from the DuraBright Company had a z-score of 1.2. What was the life spn of this
light bulb?

 SOLUTION:
 Substitute the given values into the z-score equation and solve for x.

 Given:
 zx= 1.2, X= 842, s= 90

 Zx = x-x 108 = x – 842


s

 1.2 = x-842 950 = x


90

Percentiles
A value x is called the pth percentile of a data set provided p% of the data values are less than x.

Example 3

 In a recent year, the median annual salary for a physical therapist was $74,480. If the
90th percentile for the annual salary of a physical therapist was $105,900, find the
percent of the physical therapist whose annual salary was
o a. more than $74,480
o b. less than $105,900
o c. between $74,480 and $105,900

Solution:
a) By the definition, the median is the 50th percentile. Therefore, 50% of the physical
therapist earned more than $74,480 per year.
b) Because $105,900 is the 90th percentile, 90% of all physical therapist made less than
$105,900.
c) From parts a and b, 90% - 50% = 40% of the physical therapist earned between $74,480
and $105,900

4.4 Normal Distributions

 In probability theory, the normal distribution is a very common continuous probability


distribution. Normal distributions are important in statistics and are often used in the
natural and social sciences to represent real-valued random variables whose distributions
are not known.

Frequency Distributions and Histograms

 Large sets of data are often displayed using a grouped frequency distribution or a
histogram.

Grouped frequency Distributions:


A grouped of Frequency with 12 Classes:
• The table is called a grouped frequency distribution.

DOWNLOAD TIME NUMBER OF SUBSCRIBERS


( in seconds)

0-5 6

5-10 17

10-15 43

15-20 92

20-25 151

25-30 192

30-35 190

35-40 149

40-45 90

45-50 45

50-55 15

55-60 10
Download time (in seconds)
The graph of a frequency distribution is called a histogram. A histogram provides a pictorial
view of how the data are ditributed.

Number of Subscribers

Relative Frequency Distribution


The type of frequency distribution that lists the percent of data in each class is called a
relative frequency distribution.
A relative Frequency Ditribution:

DOWNLOAD TIME NUMBER OF SUBSCRIBERS


( in seconds)

0-5 0.6

5-10 1.7

10-15 4.3

15-20 9.2

20-25 15.1

25-30 19.2

30-35 19.0

35-40 14.9

40-45 9.0

45-50 4.5

50-55 1.5

55-60 1.0
Normal Distribution and the Empirical Rule

 Normal Distribution forms bell-shaped curve that is symmetric about a vertical line
through the mean of the data.

Properties of a Normal Distribution


 Every normal distribution has the following properties:
 The graph is symmetric about a vertical line through the mean of the distribution.
 The mean, median, mode are equal.
 The y-value of each point on the curve is the percent of the data at the corresponding x-
value.
 Areas under the curve that are symmetric about the mean are equal.
 The total are under the curve is 1.

Linear Regression and Correlation


Linear Regression attempts to model the relationship between two variables by fitting a
linear equation to observed data. For instance, a geologist might want to know whether there is a
relationship between the duration of an eruption of a geyser and the time between eruptions. A first step
in this determination is to collect some data. Data involving two variables are called bivariate data.

Table 4.11
Gives bivariate data showing the between two eruptions and the duration of the second
eruption for 10 eruptions of the geyser Old Faithful.
Once the data are collected, a scatter diagram or scatter plot can be drawn, as shown in Figure
4.15.

The Least-Squares Regression Line

90 Series1
272, 89
88

86 250, 85
270, 85
84
237, 83
82 203, 81 238, 82
226, 81
80
227, 79
245, 79
78 218, 78

76
0 50 100 150 200 250 300

The Least-Squares Regression line for a set of bivariate data is the line that minimizes the
sum of the squares of the vertical deviations from each data point to the line.

2 2 2 2 2 2 2 2 2 2
d + d +d +d +d +d +d +d +d +d
1 2 3 4 5 6 7 8 9 10
Is the equation of the line best fit. The line regression line is the line that fits the data
better than any other line that might be drawn. In this expression, each 𝑑𝑛 represent the distance
from data point n to the line.

Series1 Linear (Series1) Linear (Series1) Linear (Series1)

90
Length of eruption (seconds)

88
86
84
82
80
78
76
0 50 100 150 200 250 300
Seconds between eruptions
The Formula for the Least-Squares Line
The equation of the least-square line for the n ordered pairs

(x1,y1),(x2,y2),(x3,y3),…,(xn,yn)
is 𝑦 ̂ = ax + b, where,
n ∑ 𝒙𝒚−(∑ 𝒙)(∑ 𝒚)𝟐
a=
𝒏 ∑𝒙 𝟐−(∑ 𝒙)

To apply this formula to the data for old Faithful, we first find the value of each summation.

∑ 𝐲 = 𝟖𝟐𝟐 ∑ 𝐱 = 𝟐𝟑𝟖𝟔 ∑ 𝟐 = 𝟓𝟕𝟑, 𝟓𝟔𝟎 ∑ 𝟐 = 𝟏𝟗𝟔, 𝟔𝟑𝟔


𝐱 𝐲

Next, we use these values to find the value of a.

n ∑ 𝒙𝒚−(∑ 𝒙)(∑ 𝒚)𝟐


a=
𝒏 ∑𝒙 𝟐−(∑ 𝒙)

(10)(196,636)-(2386)(822)
a= ≈ 𝟎. 𝟏𝟏𝟖𝟗𝟓𝟓𝟗𝟔𝟔𝟔
(10)(573,560)−(𝟐𝟑𝟖𝟔)𝟐

When then find the value of 𝑥 and 𝑦,


∑𝑥 2386 ∑𝑦 822
𝑥= = = 238.6 and 𝑦=
𝑛
=
10
= 82.2
𝑛 10

And use them to find the y-intercept, b.

𝑏 = 𝑦 - a𝑥

≈ 82.2 − 0.1189559666 (238.6) = 53.81710637

Series1 Linear (Series1) Linear (Series1) Linear (Series1)

90
Length of eruption (seconds)

88
86 y = 0.119x + 53.817
84
82
80
78
76
0 50 100 150 200 250 300
Seconds between eruptions

𝑦̂ = 0.1189559666𝑥 + 53.81710637

≈ 0.1189559666(200) + 53.817110637

≈ 78

The approximate duration of the eruption is 78.


Table 4.12
Speeds for selected Stride Lengths

a. Adult man

Example
Use a Least-Squares Line to make a Prediction

Table 4.12a
a. 2.8m b. 4.8m

Solution

a. 𝑦̂ = 2.730263158𝑥 − 3.316447368

= 2. 730263158(2.8) - 3.316447368 ≈ 4.328


The predicted average speed of an adult man with a stride length of 2.8 m is 4.3 m/s.

b. 𝑦̂ = 2.730263158𝑥 − 3.316447368

= 2. 730263158(4.8) - 3.316447368 ≈ 9.789


The predicted average speed of an adult man with a stride length of 4.8 m is 9.8 m/s.

The produce in example a made use of an equation to determine a point between given
data points. This produce is referred to as interpolation. In example b, an equation was used to
determine a point to the right of the given data points. The process of using an equation to
determine a point to the right or left of a given data points is referred to as extrapolation.\

L 12
e Predicted by
n extrapolation
10
g (4.8,9.8)
t
h 8

o
f 6

e 4 (2.8,4.3)
r
u
p 2
t
i
0
o
0 1 2 3 4 5 6
n
Seconds between eruptions
(
s
e
c
Linear Correlation Coefficient
To determine the strength of a linear relationship between two variables, statisticians use
a statistic called the Linear correlation coefficient, which is denoted by the variable r and is
defined as follows.

Linear correlation Coefficient


For the n ordered pairs ((x1,y1),(x2,y2),(x3,y3),…,(xn,yn), the linear correlation coefficient
r is given by

𝑛(∑ 𝑥𝑦) −(∑ 𝑥)(∑ 𝑦)


r=
√𝑛(∑𝑥 2)−( ∑ 𝑥)2 ∙ √𝑛(∑𝑦 2) −(∑ 𝑦 )2

 If the linear correlation coefficient r is positive, the relationship between the variables
has a positive correlation. In this case, if one variable increases, the other variable also
tends to increase.
 If r is negative, the linear relationship between the variables hass a negative correlation.
In this case, if one variable increases, the other variable tends to decrease.

In your work with applications that involve the linear correlation coefficient r, it is
important to remember the following properties.

Properties of the Linear Correlation Coefficient


1. The linear correlation coefficient r is always a real number between -1 and 1,inclusive. In the case
in which
• all of the ordered pairs lie on a line with positive slope, r is 1.
• All of the ordered pairs lie on a line with negative slope, r is -1.
2. For any set of ordered pairs, the linear correlation coefficient r and the slope of the least-squares
line both have the same sign.
3. Interchanging the variables in the ordered pairs does not change the value of r. Thus the value of r
for the ordered pairs (𝑥1, 𝑦1, ), (𝑥2, 𝑦2, ),…, (𝑥𝑛, 𝑦𝑛, ) is the same as the value r for the ordered pairs
(𝑦1, 𝑥1, ), (𝑦2, 𝑥2, ),., (𝑦𝑛, 𝑥𝑛, ).
4. The value of r does not depend on the units used. You can change the units of a variable from, for
example, feet to inches, and the value of r will remain the same.
STATISTIC
(WRITTEN REPORT)

By: GROUP 4

JAYNERALE ENRIQUEZ
MATT LENARD FACUN
HARDY GITTABAO
MELCHOR GRAGASIN
APPLE ANNE GUERERRO
Length of eruption (seconds)

Figure 4.15. Seconds between eruptions

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy