0% found this document useful (0 votes)
56 views68 pages

Stat Descr

A box-plot can be used to visually compare the distributions of two or more data sets. By looking at the boxes and whiskers, you can see if the data sets have similar spreads (variability), centers, or outliers. This allows someone to quickly understand differences or similarities between groups. For example, a researcher may create box-plots to compare test scores between male and female students. They could see if one gender tended to score higher on average or if the scores were more spread out. Box-plots are useful for distribution shape comparisons.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views68 pages

Stat Descr

A box-plot can be used to visually compare the distributions of two or more data sets. By looking at the boxes and whiskers, you can see if the data sets have similar spreads (variability), centers, or outliers. This allows someone to quickly understand differences or similarities between groups. For example, a researcher may create box-plots to compare test scores between male and female students. They could see if one gender tended to score higher on average or if the scores were more spread out. Box-plots are useful for distribution shape comparisons.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Statistics Descriptive (2)

#3

Niniet Indah A., MT


niniet@ie.its.ac.id
Industrial Engineering Department
Sepuluh Nopember Institute of Technology
INDONESIA 2010
Summary #2
 There are two ways for presenting data :
Graphical (qualitative) and Numerical
(quantitative)
 A histogram is a graphical view of a frequency
distribution that summarizes a sample by placing
the values into groups (classes)
 A steam and leaf diagram is a graphical
representation on an entire sample
 A bar chart summarizes categorical (nominal) or
ordinal data
 A pie chart presents a percentage breakdown of
particular quantity
 The purpose of those graphs is to convey
information at a glance about the distribution of
the values in a sample.
Background of # 3
 Every sample data set is small part of a
much larger population, even we don’t
always mention it.
 Every population has properties (called
parameters) that describe it.
 # 2  how to reduce a set of sample
data into a graph.
 # 3  how to reduce data to one or
more number (called descriptive
measure)
Numerical Data Properties
1. NUMERICAL DATA PROPERTIES
FOR UNGROUPED DATA
THINKING CHALLENGE

$400,000

$70,000

$50,000 ... employees cite low pay --


most workers earn only
$30,000 $20,000.
... President claims average
$20,000 pay is $70,000!
STANDARD NOTATION

populasi sample
PARAMETER STATISTICS
µ Mean X
σ2 Variance s2
σ Standard deviation s
N Data size n
NUMERICAL DATA PROPERTIES

Central Tendency
(Location)
Concerned with where values are concentrated; which data value
occurs most often? ; where is the middle of my data?

Variation
(Dispersion)
Concerned with the extent to which values vary ; how spread out
are my data values?

Shape
Concerned with extent to which values are symmetrically distributed.
NUMERICAL DATA PROPERTIES & MEASURES

Properties
Measurement
CENTRAL TENDENCY
NUMERICAL DATA PROPERTIES & MEASURES
MEAN

1. Measure of Central Tendency


2. Most Common Measure
3. Acts as ‘Balance Point’
4. Affected by Extreme Values (‘Outliers’)
5. Formula (Sample Mean)
n
 Xi X1  X 2    X n
i 1
X 
n n
MEAN
CONTOH :
Raw Data : 10.3 4.9 8.9 11.7 6.3 7.7
n
 Xi X1  X 2  X 3  X 4  X 5  X 6
i 1
X 
n 6
103
.  4.9  8.9  117
.  6.3  77
.

6
 8.30
NUMERICAL DATA PROPERTIES & MEASURES
MEDIAN
1. Measure of Central Tendency
2. Middle Value In Ordered Sequence
If Odd n, Middle Value of Sequence
If Even n, Average of 2 Middle Values
3. Position of Median in Sequence
n 1
positioning po int 
2

4. Not Affected by Extreme Values


m or Md (sampel)
τ (populasi)
MEDIAN

CONTOH ODD-SIZED :
Raw Data: 24.1 22.6 21.5 23.7 22.6
Ordered: 21.5 22.6 22.6 23.7 24.1
Position: 1 2 3 4 5

n  1 5 1
Positioning Point    30
.
2 2
Median  226.
MEDIAN

CONTOH EVEN-SIZED :

Raw Data : 10.3 4.9 8.9 11.7 6.3 7.7


Ordered : 4.9 6.3 7.7 8.9 10.3 11.7
Position : 1 2 3 4 5 6

n  1 6 1
Positioning Point    35
.
2 2
77.  89
.
Median   830
.
2
NUMERICAL DATA PROPERTIES & MEASURES
MODE

1. Measure of Central Tendency


2. Value That Occurs Most Often
3. Not Affected by Extreme Values
4. May Be No Mode or Several Modes
5. May Be Used for Numerical & Categorical
Data
MODE

CONTOH :
No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7

One Mode
Raw Data: 6.3 4.9 8.9 6.3 4.9 4.9

> 1 Mode
Raw Data: 21 28 28 41 43 43
THINKING CHALLENGE

You’re a financial analyst


for Prudential-Bache
?? moo…??
Securities. You have
collected the following
closing stock prices of new
stock issues:
17, 16, 21, 18, 13, 16, 12,
11.

Describe the stock prices


in terms of central
tendency.
CENTRAL TENDENCY SOLUTION*

MEAN
n
 Xi X1  X 2    X 8
i 1
X  
n 8
17  16  21 18  13  16  12  11

8
X  155
.
CENTRAL TENDENCY SOLUTION*

MEDIAN
Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8

n1 81
Positioning Point    4.5
2 2
16  16
Median   16
2
CENTRAL TENDENCY SOLUTION*

MODE

Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
SUMMARY OF
CENTRAL TENDENCY MEASURES

Measure Equation Description


Mean Xi / n Balance Point
Median (n+1)Position Middle Value
2 When Ordered
Mode none Most Frequent
VARIATION
NUMERICAL DATA PROPERTIES & MEASURES
RANGE

1. Measure of Dispersion
2. Difference Between Largest & Smallest
Observations

Range  Xlargest Xsmallest


DISADVANTAGES of the RANGE

• Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

• Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
NUMERICAL DATA PROPERTIES & MEASURES
QUARTILES <add>

1. Measure of Noncentral Tendency


2. Split Ordered Data into 4 Quarters

Q1 Q2 Q3

25 25 25 25
% % % %

3. Position of i-th Quartile


i  (n  1)
Positioning Point of Qi 
4
QUARTILE (Q1) EXAMPLE

Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7


Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
Position: 1 2 3 4 5 6

1 (n 1) 1 (6 1)
Q1 Position    175
. 2
4 4
Q1  6.3
QUARTILE (Q2) EXAMPLE

Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7


Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
Position: 1 2 3 4 5 6

2  (n 1) 2  (6 1)
Q 2 Position   3.5
4 4
77
.  8.9
Q2   8.3
2
QUARTILE (Q3) EXAMPLE

Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7


Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
Position: 1 2 3 4 5 6

3  (n 1) 3  (6 1)
Q 3 Position   5.25  5
4 4
Q 3  103
.
PERCENTILES <add>

The pth percentile in an ordered array of n values is


the value in ith position, where

p
i is position i=
100
(n + 1)

Example: The 60th percentile in an ordered array of 19 values


is the value in 12th position:

p 60
i= (n + 1) = (19 + 1) = 12
100 100
INTERQUARTILE RANGE

1. Measure of Dispersion
2. Also Called Midspread
3. Difference Between Third & First Quartiles

Interquartile Range  Q3  Q1

4. Not Affected by Extreme Values


INTERQUARTILE RANGE

Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%

12 30 45 57 70

Interquartile range
= 57 – 30 = 27
BOX PLOT

 Graphical Display of Data Using 5-Number


Summary

XsmallestQ1 Median Q3 Xlargest

4 6 8 10 12

Dari konsep Quartile dikembangkan menjadi konsep Box-Plot


SHAPE of BOX and WHISKER PLOTS

 The Box and central line are centered


between the endpoints if data is symmetric
around the median

 A Box and Whisker plot can be shown in


either vertical or horizontal format

Box-plot
digunakan
untuk apa?
Aplikasi Box-plot 1:
DISTRIBUTION SHAPE AND BOX AND
WHISKER PLOT

Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

1. Mengetahui bentuk (shape) distribusi data


Aplikasi Box-plot 2:
METODE DETEKSI UNTUK DATA OUTLIER

Outlier :
Unnatural data (data yang dianggap tidak wajar)

Source of data outlier :


Salah ukur/ catat
Datang dari populasi lain
Kejadian langka

METODE DETEKSI 1 : dengan memanfaatkan nilai Z


Bila sebuah data memiliki nilai I z > 3 I, dianggap outlier

(y - y)
z =
s
Metode deteksi 1 disini tidak menggunakan pendekatan Box-Plot
Aplikasi Box-plot 2:
METODE DETEKSI UNTUK DATA OUTLIER

METODE DETEKSI 2 : dengan membuat box plot.


Langkah – langkah pembuatan box plot :
1. Hitung Q1, Q2/ median, dan Q3, serta IQR
2. Bentuk kotak dengan lebar IQR,
batas kiri = Q1 dan batas kanan = Q3
3. Tarik garis vertikal dalam kotak yang menunjukkan
median
4. Buat pagar dalam (IF) dengan jarak 1,5 IQR di kiri
Q1 dan di kanan Q2
5. Buat pagar luar (OF) dengan jarak 3 IQR dikiri Q1 dan di
kanan Q2
6. Beri tanda x untuk data terluar yang masih didalam IF

2. Mendeteksi data outlier


Aplikasi Box-plot 2:
METODE DETEKSI UNTUK DATA OUTLIER
NUMERICAL DATA PROPERTIES &
MEASURES
VARIANCE & STANDARD
DEVIATION

1. Measures of Dispersion
2. Most Common Measures
3. Consider How Data Are Distributed
4. Show Variation About Mean (X or )

X = 8.3 Calculate the


data mean
first before
calculating
4 6 8 10 12 variance and
standard
deviation
SAMPLE VARIANCE FORMULA
n n - 1 in
2
 (Xi  X) denominator! (Use
N if Population
i 1
s2  Variance)
n1
2 2 2
(X1  X)  (X2 X)    (Xn X)

n1
What does he
Untuk populasi :
mean ?
Xμ

Gunakan (n-1) sebagai pembagi untuk variansi sample (s 2), dan


gunakan (N) sebagai pembagi untuk variansi populasi (σ2)
SAMPLE STANDARD DEVIATION
FORMULA

2 Standard deviasi
s  s adalah akar dari
variansi

n
2
 (Xi  X)
i 1

n1

2 2 2
(X1  X)  (X2  X)    (Xn  X)

n1
VARIANCE EXAMPLE

Raw Data : 10.3 4.9 8.9 11.7 6.3 7.7

n n
2
 (Xi  X)  Xi
i 1 i 1
s2  whereX   8.3
n1 n
2 2 2
.  8.3)  (4.9  8.3)    (77
(103 .  8.3)
s2 
61
 6.368
THINKING CHALLENGE

You’re a financial analyst


for Prudential-Bache
Securities. You have
?
collected the following ?
closing stock prices of new
stock issues:
17, 16, 21, 18, 13, 16, 12,
11.

What are the variance and


standard deviation of the
stock prices?
VARIATION SOLUTION*

Sample Variance
Raw Data: 17 16 21 18 13 16 12 11

n n
2
 (Xi  X)  Xi
s2  i 1
whereX  i 1
 155
.
n1 n
2 2 2
2 (17 155)
.  (16 155)
.    (11 155)
.
s 
81
 1114
.
STANDARD DEVIATION SOLUTION*

Sample Standard Deviation

n
2
 Xi  X
s  s2  i 1
 1114
.  3.34
n1
COMPARING STANDARD DEVIATIONS

Data A
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258

Data C

Mean = 15.5
s = 4.57
11 12 13 14 15 16 17 18 19 20 21

Mana data sample yang paling baik?


SUMMARY OF VARIATION MEASURES

Measure Equation Description


Range Xlargest - Xsmallest Total Spread
Interquartile Range Q3 - Q1 Spread of Middle 50%
Standard Deviation 2 Dispersion about
 X i
 X
(Sample) Sample Mean
n1
Standard Deviation   X    2 Dispersion about
i X
(Population) Population Mean
N
Variance (Xi -X )2 Squared Dispersion
(Sample) n- 1 about Sample Mean
SHAPE
NUMERICAL DATA PROPERTIES & MEASURES
SHAPE
1. Describes How Data Are Distributed
2. Measures of Shape
Skew = Symmetry
SHAPE
Concerned with extent to which values are symmetrically distributed.
SHAPE

Left-Skewed Symmetric Right-Skewed


Mean MedianMode Mean=Median=Mode Mode MedianMean

Negative skew Positive skew

Pearson’s coefficient of 3( x  m)
skewness (Sk) :
Sk  , which m= modus
s
Skew :
The extent to which a distribution is symmetric or has a tail.
Values are 0 if normal distribution. If the values are negative,
then negative or left-skewed.
Koef. Kemiringan < 0  kemiringan negatif Left-Skewed
MeanMedianMode

Right-Skewed
Mode MedianMean

Koef. Kemiringan > 0  kemiringan positif

Symmetric
Mean=Median=Mode

Koef. Kemiringan = 0  simetris


2. NUMERICAL DATA PROPERTIES
FOR GROUPED DATA
GROUPED DATA

Raw data unavailable, but grouped into frequency


distribution.

CLASS INTERVAL FREQUENCY CUMM. FREQ.


95 - under 100 7 7
100 - under 105 23 30
105 - under 110 22 52
110 - under 115 17 69
115 -under 120 4 73
n = 73

Jadi, misalkan data yang tersedia adalah data yang


sudah dikelompokkan (“raw data atau data mentah atau
data asli” tidak tersedia)
PROPERTIES of GROUPED DATA

Mean Median

Dimana : Li = batas bwh kls median


n = total banyaknya data
Dimana : n = total banyaknya data
(∑f )i = jumlah akumulatif
fi = frekuensi kelas ke i
frekuensi semua kelas
xi = nilai tengah kelas ke i
k sebelum kelas median
n f i
fmed = frekuensi kls median
i 1 c = ukuran selang kelas
median
PROPERTIES of GROUPED DATA

Modus

Dimana : Li = batas bawah kelas modus


∆1 = kelebihan frek. Thd kelas yang lebih rendah
∆2 = kelebihan frek. Thd kelas yang lebih tinggi

Variance Standard Deviation

2 2

s 2

 fx
i i  nX
2
n 1 s s
THINKING CHALLENGE
1. Rata-rata ?
kelas frek (fi) frek. Kumulatif xi fi.xi
14,5 - 19,5 18 18 17 306
19,5 - 24,5 74 92 22 1628
24,5 - 29,5 62 154 27 1674
29,5 - 34,5 26 180 32 832
34,5 - 39,5 20 200 37 740
n=200 5180

Given Data
Rata – rata
Dimana : n = total banyaknya data
(frekuensi)
= 5180/ 200 = 25.9 fi = frekuensi kelas ke i
xi = nilai tengah kelas ke i
THINKING CHALLENGE
2. Median ?
kelas frek (fi) frek. Kumulatif xi fi.xi
14,5 - 19,5 18 18 17 306
19,5 - 24,5 74 92 22 1628
Kls median 24,5 - 29,5 62 154 100 27 1674
29,5 - 34,5 26 180 32 832
34,5 - 39,5 20 200 37 740
n=200 5180
Median (cari dahulu kelas mediannya)  200/2 = 100

= 24,5 + [ (200/2 – 92) / 62 ]5 = 25.15

Dimana : Li = batas bwh kls median


n = total banyaknya data
(∑f )i = jumlah akumulatif frekuensi semua kelas sebelum kelas median
fmed = frekuensi kls median
c = ukuran selang kelas median
THINKING CHALLENGE
3. Modus ?
kelas frek (fi) frek. Kumulatif xi fi.xi
14,5 - 19,5 18 18 17 306
Kls modus 19,5 - 24,5 74 92 22 1628
24,5 - 29,5 62 154 27 1674
29,5 - 34,5 26 180 32 832
34,5 - 39,5 20 200 37 740
n=200 5180

Modus (cari dahulu kelas modusnya)


∆1 =74-18 = 56
= 19,5 + [ 56 / 56 + 12 ] 5 = 23.62 ∆2= 74-62 = 12

Dimana : Li = batas bawah kelas modus


∆1 = kelebihan frek. Thd kelas yang lebih rendah
∆2 = kelebihan frek. Thd kelas yang lebih tinggi
THINKING CHALLENGE
4. Variansi dan Standard deviasi ?

kelas frek (fi) frek. Kumulatif xi fi.xi fi.xi 2


14,5 - 19,5 18 18 17 306 5202
19,5 - 24,5 74 92 22 1628 35816
24,5 - 29,5 62 154 27 1674 45198
29,5 - 34,5 26 180 32 832 26624
34,5 - 39,5 20 200 37 740 27380
n=200 5180 140220
Variance Standard Deviation

2 2

s 2

 fx i i  nX
n 1

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy