0% found this document useful (0 votes)

18 views

Report 1

This document contains a chapter summary on graph representation and measures of central tendency from a probability and statistics textbook. It discusses several types of graphs that can be used to visually represent numerical data, including pie charts, bar diagrams, histograms, frequency polygons, and cumulative frequency curves. Steps are provided for constructing each graph type using exam score data from 40 students as an example. It also covers calculating the arithmetic mean and harmonic mean, which are measures used to find the average or central value within a data set. Formulas and worked examples are shown for determining the arithmetic mean from the exam score data.

Uploaded by

skbtemp12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Report 1

Uploaded by

skbtemp12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Report

Probability and Statistics

Fall 233 MATH 2205

Section: E

Submitted by:

Sadat Reza Apon – 011221035

Sheikh Shakib Hossain – 011221031
Sheikh Md. Sajjad – 011221007
Sauda Binti Noor – 011221049
Abrar Zahin Arian - 011221018
Chapter 1 : Graph Representation

Graphical representation is an alternative method for analyzing numerical information. It

involves utilizing a graph, which visually displays statistical data by presenting lines or
curves across plotted points on a coordinated surface.
Typically, there are four techniques for visually representing a frequency distribution. These
methods include the histogram, smoothed frequency graph, Ogive (Cumulative Frequency)
graph, and pie chart.
Various types of graphical representation include:
 Pie Chart
 Bar Diagram
 Histogram
 Frequency Polygon
 Cumulative Frequency Curve
 Cumulative Percentage Curve

Before understanding graphs, we need some numerical data to represent them. Here we’ll
be using the class marks of English exam of a total of 40 students. The list of data is given
below:

171, 57, 78, 159, 44, 23, 111, 17, 46, 143, 53, 96, 102, 14, 66, 9, 22, 156, 89,
117,60, 39, 174, 24, 77, 108, 51, 132, 19, 168, 62, 67,198, 124, 59, 5, 71, 165,
75, 74

Pie Graph:

The "pie chart," alternatively termed as the "circle graph," partitions a circular statistical
illustration into segments or slices to depict numerical data. Each segment represents a
proportional fraction of the entirety. When examining the makeup of a whole, the pie chart
proves particularly effective. Often, pie charts substitute other graphical representations such
as bar graphs, line plots, and histograms in various scenarios.
Steps taken:
1. Enter the data into the table. This this case we are making interval of 25 to make the
table.
2. Add all the value in the table to get the total
3. Divide each value by the total and then multiply by 100 to get a percent.
4. Next to know how many degrees for each “pie sector” we need, we will take a full
circle of 360° and follow the calculations below:
The central angle of each component = (Value of each component/sum of values of all the
components) ✕360°

Exam Marks Number of Students percent

1 - 25 8 (8/40) *100 = 20
26 - 50 3 (3/40) * 100 =7.5
51 - 75 11 (11/40) * 100 =27.5
76 - 100 4 (4/40) * 100 =10
101 - 125 5 (5/40) * 100 =12.5
126 - 150 2 (2/40) * 100 =5
151 - 175 6 (6/40) * 100 =15
176 - 200 1 (1/40) * 100 =2.5
Bar Diagram:
A bar graph, also called a bar diagram, visually displays data using rectangular shapes. These
rectangles are evenly spaced apart and share the same width, key characteristics defining a
bar graph.

Using the following data, we can make a bar diagram.

Exam Marks Exam Grade Number of Students

(frequency)
1 - 25 D 8
26 - 50 C 3
51 - 75 B- 11
76 - 100 B 4
101 - 125 B+ 5
126 - 150 A- 2
151 - 175 A 6
176 - 200 A+ 1

If we take frequency, which is the number of students to be represented on the y-axis and
the grades on the x-axis, we will get a graph that resembles the one below.
The rectangles here are called bars. Note that the bars have equal width and are equally
spaced, as mentioned above. This is a simple bar diagram.

Histogram:
A histogram, a type of non-cumulative frequency graph, is constructed on a natural scale,
portraying frequencies of various value ranges using closely spaced vertical rectangles. This
graph facilitates the easy determination of the mode, a measure of central tendency, within
the data.
Steps to create a histogram:
1. Plot class intervals on the X-axis and their frequencies on the Y-axis using a natural
scale
2. Begin the X-axis with the lowest limit of the lowest class interval. If this limit is far
from the origin, create a break in the X-axis to indicate its displacement.
3. Draw bars aligned with the Y-axis over each class interval, using the class units as
their bases. Ensure the areas of the rectangles represent the frequencies of their
respective classes. In this graph we shall take class intervals in the X axis and
frequencies in the Y axis. Before plotting the graph, we must convert the class into
their exact limits.

Original Class Frequency (f)

53.5-57.5 5
57.5-61.5 5
61.5-65.5 9
65.5-69.5 14
69.5-73.5 7

Frequency Polygon:

In this graph we shall take the class intervals (marks in English) in X axis, and frequencies
(Number of students) in the Y axis. Before plotting the graph, we must convert the C.I. into
their exact limits and extend one C.I. in each end with a frequency of O.
Steps to draw frequency polygon:

1. Draw the 'X' axis with class intervals marked. If the lowest score is large, create a
break in the axis () to adjust. Add two points at each end.
2. Create the 'OY' axis vertically, marking units for class interval frequencies. Scale it to
make the highest frequency about 75% of the figure's width.
3. Plot points above midpoints of class intervals, proportional to their frequencies.
4. Connect these points with short lines to form the frequency polygon. Include extra
intervals at both ends with a frequency of zero to complete the graph.

Class Marks (x) Frequency (f)

55.5 5
59.5 5
63.5 9
67.5 14
71.5 7

Cumulative Frequency Curve:

To plot this graph first we must convert the class intervals into their exact limits. Then we
must calculate the cumulative frequencies of the distribution.

Class Upper Boundary CF (F)

57.5 5
61.5 10
65.5 19
69.5 33
73.5 40

Chapter 2 : Measures of Central Tendency

Arithmetic Mean:
A value obtained by dividing the sum of all observations by the number of observations is
called arithmetic mean.
Calculating the arithmetic mean involves adding up all the values in a dataset and then
dividing that sum by the total number of values. This process gives you the average or
typical value within the dataset.
Exam Marks Midpoint (xi) Frequency (fi) fixi
1 - 25 13 8 104
26 - 50 38 3 114
51 - 75 63 11 693
76 - 100 88 4 352
101 - 125 113 5 565
126 - 150 138 2 276
151 - 175 163 6 978
176 - 200 188 1 188

Σ fixi 6340
Arithmetic mean = = =49.5
Σ fi 40

The arithmetic mean is a fundamental statistical concept used for various purposes:
Central Tendency: It provides a central or representative value within a dataset, helping to
understand the "average" or typical value of a set of numbers.
Comparative Analysis: It allows for easy comparison between different sets of data, making
it useful in various fields like finance, economics, and science.
Prediction and Estimation: It's often used to predict future values or estimate missing values
within a dataset.
Basis for Further Analysis: The mean serves as a foundational statistic, often alongside other
measures like standard deviation, forming the basis for more advanced statistical analyses.
Understanding Patterns: It helps identify trends or patterns within data, aiding in decision-
making processes.

Harmonic Mean:
The harmonic mean is a type of average that is particularly useful when dealing with rates or
ratios. It's the reciprocal of the arithmetic mean of the reciprocals of a set of numbers.
Here are the steps to calculate the harmonic mean:
 Reciprocal of Each Number: Find the reciprocal of each number in the dataset.
 Find the Mean of the Reciprocals: Calculate the arithmetic mean of these reciprocals.
 Reciprocal of the Result: Finally, take the reciprocal of the arithmetic mean obtained in
step 2 to find the harmonic mean.

Exam Marks Midpoint (xi) Frequency (fi) fi

xi
1 - 25 13 8 0.62
26 - 50 38 3 0.08
51 - 75 63 11 0.17
76 - 100 88 4 0.04
101 - 125 113 5 0.04
126 - 150 138 2 0.01
151 - 175 163 6 0.03
176 - 200 188 1 0.005

n
40
Harmonic Mean = Σ 1 =
xi ( )
0.4486
= 47.55

The harmonic mean offers several advantages in specific contexts:

 Dealing with Rates: It's particularly useful when dealing with rates, ratios, or
frequencies. For instance, it's beneficial in calculating average speeds, average rates
of return on investments, or average ratios.
 Balancing Extreme Values: Unlike the arithmetic mean, the harmonic mean tends to
give lower weight to extreme values. This property makes it more resistant to the
influence of outliers or extremely large/small values in a dataset.
 Accuracy in Averaging Rates: When averaging rates or ratios (like speed or
efficiency), the harmonic mean provides a more accurate representation than the
arithmetic mean. It considers the proportions within the dataset rather than just the
values themselves.
 Consistency in Relationships: In scenarios where relationships between values are
important, the harmonic mean ensures that the average retains the same relationship
as the original values. For instance, if A is to B as B is to C, then the harmonic mean
of A and C will be B.
 Weighted Averages: It's useful for calculating weighted averages when different
components have different weights, especially in fields like finance and economics.

Median:
The median is the middle value in a dataset when the values are arranged in ascending or
descending order. If there's an odd number of values, the median is the middle number. If
there's an even number of values, the median is the average of the two middle numbers.
Steps to calculate the median:
 Order the Data: Arrange the values in ascending or descending order.
 Identify the Middle Value: For an odd number of values, the median is the middle
number. For an even number, it's the average of the two middle numbers.
 Calculate the Median: Once the middle value(s) is identified that value or the
average of the two middle values is the median.

Exam Marks Midpoint (xi) Frequency (fi) Cumulative

frequency
1 - 25 13 8 11
26 - 50 38 3 22
51 - 75 63 11 26
76 - 100 88 4 32
101 - 125 113 5 35
126 - 150 138 2 36
151 - 175 163 6 37
176 - 200 188 1 40
h n
Median = l + ( - c) = 45.5+
f 2 305 2 (
10 802
−285 = 35 )

Quartile, Percentile and Decile:

Exam Marks Midpoint (xi) Frequency (fi) Cumulative

frequency
1 - 25 13 8 11
26 - 50 38 3 22
51 - 75 63 11 26
76 - 100 88 4 32
101 - 125 113 5 35
126 - 150 138 2 36
151 - 175 163 6 37
176 - 200 188 1 40

Quartile:
Quartiles divide a dataset into four equal parts. There are three quartiles: Q1, Q2 (which is
also the median), and Q3. Q1 represents the value below which 25% of the data falls, Q2 is
the median (50% of the data falls below and 50% above), and Q3 represents the value below
which 75% of the data falls.

Qi=l+ ( )
h i× n
f 4
−C

211 ( 10 )
10 2 ×40
Q 2=69.5+ −589=73.75

Decile:
Deciles divide a dataset into ten equal parts. There are nine deciles in a dataset: D1, D2,
D3...D9. D1 represents the value below which 10% of the data falls, D2 represents the value
below which 20% of the data falls, and so on until D9, which represents the value below
which 90% of the data falls
Di=l+ ( )
h i× n
f 10
−C

211 ( 10 )
10 8 × 40
D 8=69.5+ −589=75.11

Percentile:
Percentiles divide a dataset into hundred equal parts. A percentile is a measure indicating the
value below which a given percentage of points in a dataset fall. For example, the 25th
percentile represents the value below which 25% of the data falls.

Pi=l+ ( )
h i× n
f 100
−C

211 ( 100 )
10 2 × 40
P 18=49.5+ −589=74.182

Mode:
In statistics, the mode refers to the value that appears most frequently in a dataset. It's a
measure of central tendency alongside the mean and median. Unlike the mean and median,
which are concerned with the average or middle value, the mode focuses on the most
common value or values within a dataset.
For example, consider a dataset representing the number of pets owned by households in a
neighborhood:

2,1,3,2,5,2,1,4,2,3
In this dataset, the number "2" appears most frequently—it occurs four times, more than any
other number. Therefore, the mode of this dataset is "2". If there were two values tied for the
most frequent occurrence, the dataset would be described as "bimodal" (two modes). If more
than two values occurred with equal frequency and more frequently than any other values, it
could be described as "multimodal."
The mode is particularly useful when dealing with categorical or nominal data, such as
colors, types of cars, or categories of products, where identifying the most common category
can be informative.
What is a measure of location? What is the purpose served by it? What are its desirable
qualities?

A measure of location, also known as a measure of central tendency, is a statistic that

represents a single value that best describes the center of a dataset. The primary purpose of a
measure of location is to provide a representative or typical value around which the data is
centered.
The desirable qualities of a measure of location include:
 Representativeness: It should accurately represent the central value or tendency of
the dataset, giving a meaningful summary of the data.
 Robustness: It should be relatively unaffected by outliers or extreme values in the
dataset. A robust measure of location won't be heavily skewed by extreme
observations.
 Ease of Interpretation: A good measure of location should be easily understood and
interpreted, making it useful for conveying information about the dataset to others.
 Applicability: It should be applicable to different types of data distributions, whether
the data is normally distributed, skewed, or has other characteristics.
 Mathematical Properties: It should possess desirable mathematical properties that
allow for meaningful statistical calculations and analyses.
Common measures of location include the mean (arithmetic average), median (middle
value), mode (most frequent value), quartiles, percentiles, and deciles. Different measures
have their strengths and weaknesses, making them suitable for various scenarios based on the
nature of the dataset and the specific context of analysis. The choice of the measure of
location often depends on the characteristics of the data and the objectives of the analysis.
Name : Abrar Zahin Arian
ID: 011221018

Measures of Dispersion
In the realm of statistics, effective data presentation and analysis play a pivotal role,
particularly when exploring measures of dispersion. Measures of dispersion, such as
range and standard deviation, provide crucial insights into the variability and spread of a
dataset, offering a deeper understanding beyond central tendencies like the mean.
Accurate depiction and interpretation of dispersion are essential in making informed
decisions, identifying patterns, and drawing meaningful conclusions from data. Whether
in scientific research, business analytics, or various other domains, a comprehensive
grasp of measures of dispersion enhances the ability to assess the reliability and
consistency of data, facilitating more robust and reliable statistical inferences. As data-
driven decision-making becomes increasingly prevalent, proficiency in presenting and
analyzing measures of dispersion becomes an indispensable skill for professionals and
researchers alike, contributing to the robustness and credibility of statistical findings.

Measures of Dispersion Analysis: Daily Commute Times

We examined a dataset representing the daily commute times (in minutes) for a group of
individuals over the course of a week. Two key measures of dispersion, the range and
standard deviation, were employed to assess the variability and spread within the data.

1. Range:
o Definition: The range is the difference between the maximum and
minimum values in a dataset.
o Calculation: Range=Max−Min
o Result: For the given commute times, the range was found to be 20
minutes, indicating the span between the shortest and longest commute
durations.
2. Standard Deviation:
o Definition: Standard deviation quantifies the amount of variation or
dispersion in a set of values.

√
n

o Calculation: σ = ∑ (Xi− X)2

i=1
n

 Results:
 Mean: 30 minutes
 Calculation of Squared Differences and Summation: σ =

√
2
( 20−30 ) +(25−30)2 +(30−30)2+(35−30)2+(40−30)2
5

2.

 Standard Deviation: The calculated standard deviation provides a
numerical measure of the average deviation of each commute time
from the mean, offering insights into the overall variability within
the dataset.

This analysis not only highlights the spread of commute times but also provides a
foundation for informed decision-making and a deeper understanding of the dataset's
characteristics. Such clarity in measures of dispersion contributes to the robustness and
reliability of statistical insights, crucial for data-driven decision-making in various fields.

Dispersion: “The variability (spread) that exists between the value of a data is called
dispersion”

Types of Measures of Dispersion: There are two types of measure of dispersion

I) Absolute Measure of Dispersion
II) Relative Measure of Dispersion
Absolute Measure of Dispersion: “An absolute measure of dispersion measures the
variability in terms of the same units of the data”
e.g. if the units of the data are Rs, meters, kg, etc. The units of the measures of
dispersion will also be Rs, meters, kg, etc. The common absolute measures of dispersion
are:
 Range
 Quartile Deviation or Semi Inter-Quartile Range
 Average Deviation or Mean Deviation
 Standard Deviation
Relative Measure of Dispersion: “A relative measure of dispersion compares the
variability of two or more data that are independent of the units of measurement”
The common relative measures of dispersion are:
 Coefficient of Dispersion or Coefficient of Range
 Coefficient of Quartile Deviation
 Coefficient of Mean Deviation
 Coefficient of Standard Deviation or Coefficient of Variation (C.V)
Coefficient of Range or Coefficient of Dispersion: The coefficient of range or coefficient
of dispersion is a relative measure of dispersion and is given by: Coefficient of Range =
(Xm - X0)/ (Xm+ X0)
Quartile Deviation or Semi-inter-quartile Range: “half of the difference between the
upper quartile and lower quartile is called the semi-inter quartile range or quartile
deviation.” i.e. Quartile deviation = (Q3-Q1)/2
Ex - Calculate quartile deviation for continuous grouped data.
Class boundaries Midpoints xi Frequency  fi Cumulative
frequency c. f 
29.5---39.5 34.5 8 8
39.5---49.5 44.5 85 93
49.5---59.5 54.5 184 277
59.5---69.5 64.5 369 646
69.5---79.5 74.5 210 856
79.5---89.5 84.5 89 945
89.5---99.5 94.5 24 969
n

∑ fi = 969
i=1

Q1 = l + h/f(n/4 - c) = 57.611
Here, l = 49.5 ; h = 10 ; f = 184 ; c = 93
Q1 = l + h/f(n/4 - c) = 73.435
Here, l = 69.5 ; h = 10 ; f = 210 ; c = 646
Quartile deviation = (Q3-Q1)/2 = (73.435-57.611)/2 = 7.912(Ans).
Mean Absolute Deviation or Mean Deviation (Average Deviation): “The arithmetic
mean of the absolute deviation from an average (mean, median etc .) is called mean
deviation or average deviation.”
Grouped Ungrouped
M.D from Mean
M.D =
∑ f |xi−x| M.D =
∑ |xi−x|
n n
M.D from Median
M.D =
∑ f |xi−Med| M.D =
∑ |xi−Med|
n n

Calculate the mean deviation and coefficient of mean deviation from (i) the mean, (ii)
the median, in the ungrouped data case, of the following set.
Xi 46,33,38,47,40,37,42,49,37
N=9
Xi Xi- x , | Xi−x| | Xi−Med|
X =41 Med=40
33 -8 8 7
37 -4 4 3
37 -4 4 3
38 -3 3 2
40 -1 1 0
42 1 1 2
46 5 5 6
47 6 6 7
49 8 8 9
∑ xi = 369 ∑|Xi−x| = 40 ∑|Xi−Med| = 39

x=
∑ xi = 369 = 41
n 9
th
n+1
Median marks obtained by the student ( ) data = 40.
2

M.D (mean) = 40/9 = 4.4 (Ans)

M.D (Med) = 39/9 = 4.3 (Ans)
Standard Deviation: “The positive square root of the variance is called standard
deviation.”
Coefficient of Standard Deviation OR Coefficient of Variation: The coefficient of
standard deviation is a relative measure of dispersion and is given by :
C.V  (Standard Deviation/Mean) X 100

Calculate the Variance, Standard deviation and Coefficient of Variation from the
following weight of 60 mangoes in Continuous grouped data:
Weight MidPoints(Xi) Frequency, Fi FiXi 2
fixi
65----84 74.5 9 670.5 49 952.25
85----104 94.5 10 945 89 302.50
105----124 114.5 17 1946.5 222 874.25
125----144 134.5 10 1345 180 902.50
145----164 154.5 5 772.5 119 351.25
165----184 174.5 4 698 121 801.00
185----204 194.5 5 972.5 189 151.25
∑ fi = 60 ∑ fi xi = 7350 ∑ fixi2=
973335

Variance :

∑ fixi2 - ( ∑ fi xi )
2

S
2
=
∑ fi ∑ fi
S = 16222.25 – 15006.25
2

= 1216 unit^2 (Ans)

Standard deviation :
S = √ 1216 = 34.87 unit

Coefficient of Variation:
S .D
C.V = *100 = 28.46 (Ans)
x
Measures of dispersion, such as range and standard deviation, have various practical
applications across different fields. Here are some key applications:

1. Risk Assessment in Finance:

o In finance, standard deviation is often used to measure the volatility of
stock prices. Higher standard deviation indicates greater price variability,
which can be seen as a measure of risk. Investors and financial analysts
use this information to assess and manage investment risk.
2. Quality Control in Manufacturing:
o Measures of dispersion are employed to assess the consistency and
reliability of manufacturing processes. For example, in the production of
goods, a low standard deviation in product dimensions indicates that
items are consistently manufactured to meet specified standards.
3. Education Assessment:
o In educational testing, measures of dispersion can be applied to evaluate
the consistency of student scores. A low standard deviation in test scores
suggests a more consistent performance across students, while a higher
standard deviation may indicate greater variability in student
performance.
4. Public Health and Epidemiology:
o In epidemiological studies, measures of dispersion help assess the
variability in health-related data, such as disease prevalence or patient
response to treatments. This information aids in understanding the range
of outcomes and planning public health interventions accordingly.
5. Market Research:
o Measures of dispersion are essential in market research to analyze
consumer preferences and behaviors. For instance, the range or standard
deviation of responses to a survey question can indicate the diversity of
opinions within a target population.
6. Project Management:
o In project management, measures of dispersion can be applied to evaluate
the variability in project timelines or costs. Understanding the range of
possible outcomes helps project managers make more accurate
predictions and set realistic expectations.
7. Climate Studies:
o Meteorologists use measures of dispersion to analyze weather data. For
example, the standard deviation of temperatures over a period can
provide insights into the variability of the climate in a specific region.

8. Sports Analytics:
o In sports, measures of dispersion are used to assess the consistency of
player performance. Coaches and analysts may use standard deviation to
evaluate how consistently a player performs over a series of games.

In all these applications, measures of dispersion contribute to a more comprehensive

understanding of data variability, helping professionals make informed decisions,
manage risks, and plan interventions or strategies tailored to the characteristics of the
data at hand.

Measures of dispersion, such as range and standard deviation, play a crucial role in
various fields. In finance, standard deviation is employed to gauge stock price volatility,
aiding investors and analysts in risk assessment. In manufacturing, these measures
ensure product consistency by assessing the reliability of processes, while in education,
they evaluate the consistency of student scores, offering insights into performance
variations. In public health, measures of dispersion assist in understanding data
variability in epidemiological studies, guiding the planning of interventions. Market
researchers use them to analyze consumer preferences, project managers apply them to
assess project variability, and meteorologists utilize them in climate studies to
understand weather data variability. Even in sports analytics, measures of dispersion
help assess the consistency of player performance. Across these diverse applications,
measures of dispersion contribute to a comprehensive understanding of data variability,
facilitating informed decision-making, risk management, and tailored intervention
strategies.
Moments
In statistics, raw moments quantify the shape of a dataset by emphasizing

deviations from the mean. The (r)-th raw moment, denoted as ( Mr ), is calculated
as:

r
Mr = Σ(Xi - X̄ ) / n

where:

● ( Mr ) is the (r)-th raw moment,

● ( n ) is the total number of data points,

● ( Xi ) is the (i)-th data point,

● ( X̄ ) is the mean of the dataset.

This report explores the calculation and significance of the first four raw
moments.

First-Order Raw Moment

The first-order raw moment ( r = 1 ) is given by:

M1 = Σ(Xi - X̄ ) / n
This moment measures the average deviation of each data point from the mean,
providing insights into the central tendency of the dataset.

Second-Order Raw Moment

The second-order raw moment ( r = 2 ) is defined as:

2
M2 = Σ(Xi - X̄ ) / n

This moment quantifies the variability or spread of the dataset, emphasizing

squared deviations from the mean.

Third-Order Raw Moment

The third-order raw moment ( r = 3 ) is calculated by:

3
M3 = Σ(Xi - X̄ ) / n

This moment captures the skewness of the dataset, indicating whether the
distribution is symmetric or skewed.

Fourth-Order Raw Moment

The fourth-order raw moment ( r = 4 ) is expressed as:

4
M4 = Σ(Xi - X̄ ) / n

This moment provides information about the kurtosis, highlighting the tails and
peakedness of the distribution.

Example: Exam Scores Dataset

Consider a larger dataset of exam scores: {60, 75, 80, 85, 90, 95, 100, 105, 110,
115}. Let's calculate the first four raw moments for this dataset using the provided
formula.

1. First-Order Raw Moment:

M1 = Σ(60 - X̄ ) + (75 - X̄ ) + ... + (115 - X̄ ) / 10

1. Second-Order Raw Moment:

2 2 2
M2 = Σ(60 - X̄ ) + (75 - X̄ ) + ... + (115 - X̄ ) / 10

1. Third-Order Raw Moment:

3 3 3
M3 = Σ(60 - X̄ ) + (75 - X̄ ) + ... + (115 - X̄ ) / 10

1. Fourth-Order Raw Moment:

4 4 4
M4 = Σ(60 - X̄ ) + (75 - X̄ ) + ... + (115 - X̄ ) / 10
r-th Moment about the Origin (O)

The r-th moment about the origin is given by:

r
Mr(O) = Σ(Xi ) / n

These moments describe the distribution of data points with respect to the origin,
providing insights into symmetry and concentration.

r-th Moment about Arbitrary Origin (A)

The r-th moment about an arbitrary origin A is expressed as:

r
Mr^(A) = Σ(Xi - A) / n

These moments offer insights into the distribution of data points relative to the
chosen origin A, allowing for a more flexible analysis.

Dimensionless Forms: Skewness and Kurtosis

The skewness (Sk) and kurtosis (K) can be expressed in dimensionless form:

(3/2)
SK = M3 / (M2)

2
K = M4 / (M2) - 3
These dimensionless measures provide standardized indicators of skewness and
kurtosis, making them comparable across different datasets.

Conditions for Skewness and Kurtosis

1. Skewness (SK):

2. If SK = 0, the distribution is perfectly symmetrical.

3. If SK > 0, the distribution is positively skewed (tail on the right).

4. If SK < 0, the distribution is negatively skewed (tail on the left).

5. Kurtosis (K):

6. If K = 0, the distribution has the same kurtosis as a normal distribution

(mesokurtic).
7. If K > 0, the distribution is leptokurtic (heavier tails and a sharper peak).

8. If K < 0, the distribution is platykurtic (lighter tails and a flatter peak).

Example:

Dataset of exam scores: {60, 75, 80, 85, 90, 95, 100, 105, 110, 115}. Let's
calculate the first four raw moments, r-th moment about the origin (O), r-th
moment about an arbitrary origin (A = 90), and dimensionless forms of skewness
and kurtosis.

1. First-Order Raw Moment: M1 = Σ(Xi - X̄ ) / 10

2
2. Second-Order Raw Moment: M2 = Σ(Xi - X̄ ) / 10
3
3. Third-Order Raw Moment: M3 = Σ(Xi - X̄ ) / 10

4
4. Fourth-Order Raw Moment: M4 = Σ(Xi - X̄ ) / 10

r
5. r-th Moment about the Origin (O): Mr(O) = Σ(Xi ) / 10

r
6. r-th Moment about Arbitrary Origin (A = 90): Mr(90) = Σ(Xi - 90) / 10

(3/2)
7. Dimensionless Skewness (SK): SK = M3 / (M2)

2
8. Dimensionless Kurtosis (K): K = M4 / (M2) - 3
Box-and-Whisker Plot

Summary: The box-and-whisker plot, commonly referred to as a box plot, is a powerful

graphical tool used in statistical analysis to depict the distribution and central tendencies of a
dataset. It provides a visual summary that aids in understanding key statistical measures,
enabling researchers, analysts, and decision-makers to gain insights into the variability and
patterns within the data.

Which components need for Box and Whisker Plot:

1. Box: Represents the interquartile range (IQR), encompassing the middle 50% of the data.
The length of the box indicates the spread of the central part of the data.
2. Whiskers: Extend from the box to the minimum and maximum values within a specified
range. Provide information about the overall range of the data.
3. Median (Q2): A line inside the box denotes the median, representing the middle value of
the dataset and dividing it into two equal halves.
4. Outliers: Individual data points beyond the whiskers are considered outliers and may be
marked separately.

Character of Box and Whisker Plot:

1. Symmetry and Skewness: Symmetry or skewness is evident from the position of the
box within the whiskers. Asymmetry indicates skewness in the data distribution.

2. Outliers: Outliers are easily identifiable, aiding in the detection of unusual data points
that might significantly impact the analysis.

3. Spread and Dispersion: The length of the box and whiskers provides insights into the
spread and dispersion of the data. A longer box and whiskers suggest greater variability.

4. Central Tendency: The position of the median within the box indicates the central
tendency. A median closer to one quartile than the other signifies skewness in the data.

Advantage:
1. Comparison: Facilitates the comparison of multiple datasets, enabling a quick
overview of their distributions.

2. Outlier Detection: Offers a straightforward method for identifying outliers, helping to

pinpoint data points that deviate significantly from the norm.

3. Summary of Statistics: Provides a concise summary of key statistical measures,

including median, quartiles, and potential outliers.

4. Visual Representation: Presents a visually intuitive representation of data, making it

accessible to a broad audience.

Example: Exam Scores

Suppose you have the exam scores of two different classes, Class A and Class B, to compare
their performance. The scores are as follows:
Class A Scores: 65,70,72,75,78,80,82,85,90,92,95,65,70,72,75,78,80,82,85,90,92,95
Class B Scores: 55,60,68,70,75,78,82,88,92,96,98,55,60,68,70,75,78,82,88,92,96,98
Now, let's create a box-and-whisker plot to visualize and compare the distribution of scores
between the two classes.
1. Calculate Quartiles:
 Class A: Q1=72, Q2(Median)=80, Q3=88
 Class B: Q1=68, Q2(Median)=78, Q3=92
2. Interquartile Range (IQR):
 Class A: IQR=Q3−Q1=88−72=16
 Class B:IQR=Q3−Q1=92−68=24
Q3
Q3

Q2
Q2
Q1
Q1
Conclusion: In conclusion, the box-and-whisker plot is a valuable tool in data analysis, providing a clear
and concise representation of dataset characteristics. Its simplicity, ability to highlight outliers, and
effectiveness in comparing datasets make it an indispensable asset in statistical exploration. Understanding
and utilizing box-and-whisker plots enhance the interpretability of data, supporting informed decision-
making processes across various disciplines.
Stem and Leaf Plots

Id: 011221007

Stem and Leaf Plots:

A stem and leaf plot uses the digits of data values to organize a data set. Stem and leaf plots have

data placed into order from lowest to highest. The stem and leaf plot show how data are distributed.

Each data is broken into a stem (digit or digits on the left of the vertical line) and leaf (digit or

digits on the right of the vertical line). The stems all represent tens place in stem and leaf plot. The

leaves all represent one’s place in stem and leaf plot.

Example-1: Draw a stem-and-leaf diagram for the following data

26 45 32 27 29 30 40 36 37

(i) Make an ordered list of the 7 values.

Sol:

Stem | leaves

2 | 6 7 9
3 | 0 2 6 7
4 | 0 5

[key: 2|6 means 26]

(ii) Find least value, greatest value, mean, median, mode and range.

Sol:

Least value=26

Greatest value=45

ΣXi 26+ 45+ 32+ 27+ 29+ 30+40 +36+ 37 302

Mean= ------------ = ---------------------------------------------------= -------- = 33.555

N 9 9

Median= (n+1)/2-th value= (9+1)/2=5th value =32

Mode: there is no mode.

Range=largest value - greatest value=45-26=19.

Back-to-Back Stem and Leaf Plot:

The back-to-back stem and leaf plots are used to compare two distributions side-by-side. This type

of back-to-back stem and leaf plot contains three columns, each separated by a vertical line.

The center column contains the stems.

Example-4: The following stem and leaf diagrams show the times taken by some girls and

boys to complete a level on a computer game.

Girls Boys

1
9 8 8

2 3 5 7 8 8
6 4 3 1

3 0 4 7
7 6 5 4 2

4 0 1 2
3 0 0

(a) Compare the times taken to complete the level between the children and the adults.
Sol:
For Girls,

Q1=(1*(15+1))/4- th data value=4th=21

Q2=(2*(15+1))/4=8th data value=32

Q3=(3*(15+1))4=12th data value= 37

I.Q.R=Q3-Q1=37-21=16

For Boys,

Q1=(1*(11+1))/4 th data value=3rd data value= 27

Q2=(2*(11+1)/4)th data value=6th data value= 30

Q3=(3*(11+1)/4)th data value=9th data value =40

I.Q.R=Q3-Q1=40-27=13

Here we can see that boys I.Q.R=13 & girls I.Q.R=16.So we can say Boy’s time to complete the level
on the computer game were more consistent
CORRELATION THEORY AND REGRESSION ANALYSIS

Types of Correlation
1. Positive or negative
2. Simple or multiple
3. Linear or non-linear

Comment on Correlation Coefficient

1 = Perfect positive correlation

0.7  c < 1 = Strong positive correlation

0.4  c < 0.7 = Fairly positive correlation

0 < c < 0.4 = Weak positive correlation

0 = No correlation

0 > c > -0.4 = Weak negative correlation

-0.4  c > -0.7 = Fairly negative correlation

-0.7  c < -1 = Strong negative correlation

-1 = Perfect negative correlation

Application Problem-1: A research physician recorded the pulse rates and the temperatures of

water submerging the faces of ten small children in cold water to control the abnormally rapid

heartbeats. The results are presented in the following table. Calculate the correlation coefficient

between temperature of water and reduction in pulse rate.

Temp. of water 68 65 70 62 60 55 58 65 69 63

Reduction in 2 5 1 10 9 13 10 3 4 6
pulse rate.

x y X^2 Y^2 xy
68 2 4624 4 136
65 5 4225 25 325
70 1 4900 1 70
62 10 3844 100 620
60 9 3600 81 540
55 13 3025 169 715
58 10 3364 100 580
65 3 4225 9 195
69 4 4761 16 276
63 6 3969 36 378
Σx=635 Σy=63 Σx^2=40537 Σy^2=541 Σxy=3835

n∑XY−∑X∑Y
We know, rxy =r = --------------------------------------------------------------------
Sqrt{( n ∑ X 2 − ( ∑ X ) 2 ) ⋅ ( n ∑ Y 2 − ( ∑ Y ) 2 )}
(10*3835)-(635*63)
R=--------------------------------------------------------------
Sqrt(10*40537 – (635)^2) * sqrt(10*541-(63^2))
= -0.9
The result -0.94, indicates that the correlation coefficient between temperature of water and

reduction in pulse rate is highly negatively correlated.

RANK CORRELATION

Rank correlation: In some situation it is difficult to measure the values of the variables from
bivariate distribution numerically, but they can be ranked. The correlation coefficient between
these two ranks is usually called rank correlation coefficient, given by Spearman (1904). It is
denoted by R. this is the only method for finding relationship between two qualitative variables
like beauty, honesty, intelligence, efficiency and so on.
Interpretation of Rank Correlation Coefficient (R)
The value of rank correlation coefficient, R ranges from -1 to +1
If R = +1, then there is complete agreement in the order of the ranks and the ranks are in the
same
direction
If R = -1, then there is complete agreement in the order of the ranks and the ranks are in the
opposite
direction
If R = 0, then there is no correlation

6 Σd^2
R=1 - --------
N^3-N

Application Problem-1: Obtain the rank correlation co-

efficient for the following data:
A 80 75 90 70 65 60
B 65 70 60 75 85 80
Sol:

A B R1 R2 D=R1- D^2
R2
80 65 2 5 -3 9
75 70 3 4 -1 1
90 60 1 6 -5 25
70 75 4 3 1 1
65 85 5 1 4 16
60 80 6 2 4 16
Σd=0 Σd^2=68

6 Σd^2
R=1 - --------
N^3-N

6 *68
R=1 - -------- = -0.94
6^3-6
Strongly negative relation between A and B.

x y R1 R2 D=R1- D^2
R2
20 30 3.5 4 -0.5 0.25
80 60 8 8 0 0
40 20 6 2 4 16
12 30 1 4 -3 9
28 50 5 7 -2 4
20 30 3.5 4 -0.5 0.25
15 40 2 6 -4 16
60 10 7 1 6 36
Σ d^2=81.5

6{81.5+1/12(2^3-2)+1/12(3^3-3)}
R=1 - ----------------------------------------------
8^3-8
R=0
NO correlation between x and y
REGRESSION ANALYSIS
What is regression?
Ans: The probable movement of one variable in terms of the other variables is
called
regression.
In other words the statistical technique by which we can estimate the unknown
value of
one variable (dependent) from the known value of another variable is called
regression.

Regression analysis.
Ans: Regression analysis is a mathematical measure of the average relationship
between
two or more variables in terms of the original units of data.

Example1: 1)find regression line x on y and y on x.

X y X^2 Y^2 xy
68 2 4624 4 136
65 5 4225 25 325
70 1 4900 1 70
62 10 3844 100 620
60 9 3600 81 540
55 13 3025 169 715
58 10 3364 100 580
65 3 4225 9 195
69 4 4761 16 276
63 6 3969 36 378
Σx=635 Σy=63 Σx^2=40537 Σy^2=541 Σxy=3835
Line x on y:
N Σ x y -Σ x Σ y
B(xy)= ----------------------------------
nΣ y^2-(Σy)^2

(10*3835)-(635*63)
B(xy)= ------------------------
(10*541)-(63)^2
B(xy)=-1.14
X=a+by
Σ x/n=a+b*(Σy/n)
a=70.73
x=70.73-1.14y[regression line]

Line Y on X:
N Σ x y -Σ x Σ y
B(yx)= ----------------------------------
nΣ x^2-(Σx)^2

(10*3835)-(635*63)
B(xy)= ------------------------
(10*40537)-( 635)^2
B(xy)=-0.77
Y=a+bx
Σ y/n=a+b*(Σx/n)
a=55.195
Y=55.195-0.77*X [regression line]

Effect of Infill Pattern and Ratio On The Flexural and Vibration Damping Characteristics of FDM Printed PLA Specimens
No ratings yet
Effect of Infill Pattern and Ratio On The Flexural and Vibration Damping Characteristics of FDM Printed PLA Specimens
8 pages
Astm E691
100% (1)
Astm E691
26 pages
Notes
No ratings yet
Notes
18 pages
1739892143
No ratings yet
1739892143
8 pages
ASSESSMENT IN LEARNING 1 - GROUP 8
No ratings yet
ASSESSMENT IN LEARNING 1 - GROUP 8
16 pages
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
100% (1)
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
14 pages
Frequency Distribution
100% (2)
Frequency Distribution
25 pages
MATH 322: Probability and Statistical Methods
No ratings yet
MATH 322: Probability and Statistical Methods
27 pages
Stat 153 Slides PDF Statistics Mode (Statis
No ratings yet
Stat 153 Slides PDF Statistics Mode (Statis
10 pages
3)Probability and statistics
No ratings yet
3)Probability and statistics
10 pages
Statistics Lec 2
No ratings yet
Statistics Lec 2
25 pages
Statistics Part 2
No ratings yet
Statistics Part 2
23 pages
Fresher four
No ratings yet
Fresher four
33 pages
Graphical Presentation of Data-I by Dr. Janak K.shrivastava, Asst Professor @TNB College, Bhagalpur For B.A. II
No ratings yet
Graphical Presentation of Data-I by Dr. Janak K.shrivastava, Asst Professor @TNB College, Bhagalpur For B.A. II
24 pages
Stat 2
No ratings yet
Stat 2
39 pages
Justin 0 Venkat
No ratings yet
Justin 0 Venkat
22 pages
QM Statistic Notes
No ratings yet
QM Statistic Notes
24 pages
Statistics To Arithmetic Mean: Probability and Bio-Statistics
No ratings yet
Statistics To Arithmetic Mean: Probability and Bio-Statistics
43 pages
Statistics and Probability_CSE (1)
No ratings yet
Statistics and Probability_CSE (1)
49 pages
Polestico - Assessment Report
No ratings yet
Polestico - Assessment Report
15 pages
03 Statistics
No ratings yet
03 Statistics
19 pages
Math
No ratings yet
Math
13 pages
CHAPTER 1 - PART 1 Latest PDF
No ratings yet
CHAPTER 1 - PART 1 Latest PDF
69 pages
Chapter 1 Eqt 271 (Part 1) : Basic Statistics
No ratings yet
Chapter 1 Eqt 271 (Part 1) : Basic Statistics
69 pages
Selvanathan 7e - 04 - PPT
No ratings yet
Selvanathan 7e - 04 - PPT
65 pages
09042020212640practical - Manual - Ag - Statistics - Ug and PG - Courses
No ratings yet
09042020212640practical - Manual - Ag - Statistics - Ug and PG - Courses
79 pages
Basic Mathematics Module 6- CB approved [Compatibility Mode]
No ratings yet
Basic Mathematics Module 6- CB approved [Compatibility Mode]
51 pages
Module 3 Data Presentation
No ratings yet
Module 3 Data Presentation
9 pages
Statistics Review Worksheet-1a
No ratings yet
Statistics Review Worksheet-1a
6 pages
dddddd3
No ratings yet
dddddd3
6 pages
2 Frequency Distribution
No ratings yet
2 Frequency Distribution
6 pages
Note05
No ratings yet
Note05
94 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
31 pages
FINAL-TERM-NOTES-ANDS-
No ratings yet
FINAL-TERM-NOTES-ANDS-
43 pages
Statistics - 4th Form 2023
No ratings yet
Statistics - 4th Form 2023
3 pages
Stats (Descriptive) Study Guide
No ratings yet
Stats (Descriptive) Study Guide
15 pages
STAT-231 Manual
No ratings yet
STAT-231 Manual
34 pages
Graphs and Central Tendency
No ratings yet
Graphs and Central Tendency
44 pages
Statistics Class IX
0% (1)
Statistics Class IX
14 pages
Module 1.2 - Descriptive Statistics
No ratings yet
Module 1.2 - Descriptive Statistics
30 pages
1. Descriptive Statistics I
No ratings yet
1. Descriptive Statistics I
33 pages
Ens185 Part 3
No ratings yet
Ens185 Part 3
49 pages
Section 2.1, Frequency Distributions and Their Graphs
No ratings yet
Section 2.1, Frequency Distributions and Their Graphs
2 pages
Presentation of Data
No ratings yet
Presentation of Data
10 pages
METHODS OF DATA PRESENTATION
No ratings yet
METHODS OF DATA PRESENTATION
10 pages
SM 38
No ratings yet
SM 38
33 pages
L5 Frequency Distribution 1
No ratings yet
L5 Frequency Distribution 1
13 pages
Lesson 2 Frequency Distribution and Data Presentation 18
No ratings yet
Lesson 2 Frequency Distribution and Data Presentation 18
11 pages
Statistics 12
No ratings yet
Statistics 12
29 pages
2.Data presentation
No ratings yet
2.Data presentation
26 pages
Stat Handbook
No ratings yet
Stat Handbook
18 pages
Lecture (1) - Statistics
No ratings yet
Lecture (1) - Statistics
31 pages
_ Unit 2 _ Descriptive Analytics
No ratings yet
_ Unit 2 _ Descriptive Analytics
85 pages
Lesson 3.1 Data Gathering and Organizing Data
No ratings yet
Lesson 3.1 Data Gathering and Organizing Data
38 pages
Statistics- slide 2
No ratings yet
Statistics- slide 2
15 pages
State PPT 3
No ratings yet
State PPT 3
14 pages
Data
No ratings yet
Data
10 pages
EDU 303 B research statistics-1
No ratings yet
EDU 303 B research statistics-1
41 pages
Chapter 9
No ratings yet
Chapter 9
12 pages
Statistics _ Mind Map ~ Mathematics
No ratings yet
Statistics _ Mind Map ~ Mathematics
2 pages
Lecture 1, 2 and 3_d21432a1071b0bf181cd2be654ea33bb
No ratings yet
Lecture 1, 2 and 3_d21432a1071b0bf181cd2be654ea33bb
45 pages
Comprehensive Linear Algebra
From Everand
Comprehensive Linear Algebra
Kartikeya Dutta
No ratings yet
WWW - Yuvadhan.ml: L) A'l'A S l'RU ('L'L RE ' - AL ) 1-tl'l'II
No ratings yet
WWW - Yuvadhan.ml: L) A'l'A S l'RU ('L'L RE ' - AL ) 1-tl'l'II
2 pages
IX Mathematics
No ratings yet
IX Mathematics
5 pages
Overview and Proofs of Derivatives
No ratings yet
Overview and Proofs of Derivatives
77 pages
A Conceptual Model For The Creation of Supply Chain Simulation Models
No ratings yet
A Conceptual Model For The Creation of Supply Chain Simulation Models
9 pages
Mathematics Grades 10 To 12 Common Schemes of Work
No ratings yet
Mathematics Grades 10 To 12 Common Schemes of Work
24 pages
Journal of Power Sources: Pedro O. Lopez-Montesinos, Amit V. Desai, Paul J.A. Kenis
No ratings yet
Journal of Power Sources: Pedro O. Lopez-Montesinos, Amit V. Desai, Paul J.A. Kenis
8 pages
Final XI-IIT-IC & IR - CTA-02 - 30-09-2024 - QP
No ratings yet
Final XI-IIT-IC & IR - CTA-02 - 30-09-2024 - QP
18 pages
Alto Differential Catalog 20200
No ratings yet
Alto Differential Catalog 20200
1 page
Ncert Solutions Class 9 Maths Chapter 8 Quadrilaterals - 0
No ratings yet
Ncert Solutions Class 9 Maths Chapter 8 Quadrilaterals - 0
47 pages
Phy Lab 2
No ratings yet
Phy Lab 2
11 pages
A Programmable Image Processor For Real-Time Image Processing Applications
No ratings yet
A Programmable Image Processor For Real-Time Image Processing Applications
7 pages
Mathematica Mathematica: A091474 A091475 A091476 Catalan's Constant
No ratings yet
Mathematica Mathematica: A091474 A091475 A091476 Catalan's Constant
10 pages
Precalculus Q2 M11
No ratings yet
Precalculus Q2 M11
15 pages
9280 Mathematics (Us) : MARK SCHEME For The May/June 2013 Series
No ratings yet
9280 Mathematics (Us) : MARK SCHEME For The May/June 2013 Series
6 pages
CV Mostafa Hajiaghaei Keshteli
No ratings yet
CV Mostafa Hajiaghaei Keshteli
5 pages
Acceleration Velocity and Position Graphs Worksheet
No ratings yet
Acceleration Velocity and Position Graphs Worksheet
4 pages
Chance Constrained Quadratic Bi-Level Programming Problem: Surapati Pramanik, Durga Banerjee
No ratings yet
Chance Constrained Quadratic Bi-Level Programming Problem: Surapati Pramanik, Durga Banerjee
8 pages
FDP Day1
No ratings yet
FDP Day1
35 pages
UNIT 5 IMP DAA
No ratings yet
UNIT 5 IMP DAA
13 pages
S11BLH21_Unit3Notes
No ratings yet
S11BLH21_Unit3Notes
41 pages
Converting Improper Fractions and Mixed Numbers
No ratings yet
Converting Improper Fractions and Mixed Numbers
1 page
FRM Part 1: Distributions
No ratings yet
FRM Part 1: Distributions
25 pages
Method Validation Calculation File of Assay
No ratings yet
Method Validation Calculation File of Assay
6 pages
11th - STD - Physics - Volume II - EM - WWW - Tntextbooks.in PDF
No ratings yet
11th - STD - Physics - Volume II - EM - WWW - Tntextbooks.in PDF
328 pages
Data Mining For Credit Card Risk Analysis: A Review: Smriti Srivastava & Anchal Garg
No ratings yet
Data Mining For Credit Card Risk Analysis: A Review: Smriti Srivastava & Anchal Garg
8 pages
MEDTECH-BIO211-EPIDEMIOLOGY
No ratings yet
MEDTECH-BIO211-EPIDEMIOLOGY
8 pages
2002 - 02 - BR Modelling Transport PDF
No ratings yet
2002 - 02 - BR Modelling Transport PDF
2 pages
Unit 5
No ratings yet
Unit 5
105 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.