0% found this document useful (0 votes)
16 views53 pages

MFCS

The document discusses different types of populations including finite, infinite, existent, and hypothetical populations. It also discusses probability and non-probability sampling methods. Key points include random sampling ensuring each population member has an equal chance of selection, and statistics estimated from samples having sampling distributions that can be used to infer about the population.

Uploaded by

Mahesh Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views53 pages

MFCS

The document discusses different types of populations including finite, infinite, existent, and hypothetical populations. It also discusses probability and non-probability sampling methods. Key points include random sampling ensuring each population member has an equal chance of selection, and statistics estimated from samples having sampling distributions that can be used to infer about the population.

Uploaded by

Mahesh Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Unit-II

Population and Sample


Population
It includes all the elements from the data set and measurable characteristics of the population such
as mean and standard deviation are known as a parameter. For example, All people living in India
indicates the population of India.
There are different types of population. They are:

 Finite Population

 Infinite Population

 Existent Population

 Hypothetical Population
Let us discuss all the types one by one.

Finite Population
The finite population is also known as a countable population in which the population can be counted.
In other words, it is defined as the population of all the individuals or objects that are finite. For
statistical analysis, the finite population is more advantageous than the infinite population. Examples
of finite populations are employees of a company, potential consumer in a market.

Infinite Population
The infinite population is also known as an uncountable population in which the counting of units in
the population is not possible. Example of an infinite population is the number of germs in the
patient’s body is uncountable.

Existent Population
The existing population is defined as the population of concrete individuals. In other words, the
population whose unit is available in solid form is known as existent population. Examples are books,
students etc.

Hypothetical Population
The population in which whose unit is not available in solid form is known as the hypothetical
population. A population consists of sets of observations, objects etc that are all something in
common. In some situations, the populations are only hypothetical. Examples are an outcome of
rolling the dice, the outcome of tossing a coin.

Sample
It includes one or more observations that are drawn from the population and the measurable
characteristic of a sample is a statistic. Sampling is the process of selecting the sample from the
population. For example, some people living in India is the sample of the population.
Basically, there are two types of sampling. They are:

 Probability sampling

 Non-probability sampling
Probability Sampling
In probability sampling, the population units cannot be selected at the discretion of the researcher.
This can be dealt with following certain procedures which will ensure that every unit of the population
consists of one fixed probability being included in the sample. Such a method is also called random
sampling. Some of the techniques used for probability sampling are:

 Simple random sampling

 Cluster sampling

 Stratified Sampling

 Disproportionate sampling

 Proportionate sampling

 Optimum allocation stratified sampling

 Multi-stage sampling

Non Probability Sampling


In non-probability sampling, the population units can be selected at the discretion of the researcher.
Those samples will use the human judgements for selecting units and has no theoretical basis for
estimating the characteristics of the population. Some of the techniques used for non-probability
sampling are

 Quota sampling

 Judgement sampling

 Purposive sampling

Population and Sample Examples

 All the people who have the ID proofs is the population and a group of people who only have
voter id with them is the sample.

 All the students in the class are population whereas the top 10 students in the class are the
sample.

 All the members of the parliament is population and the female candidates present there is
the sample.

Population and Sample Formulas


We will demonstrate here the formulas for mean absolute deviation (MAD), variance and standard
deviation based on population and given sample. Suppose n denotes the size of the population and n-
1 denotes the sample size, then the formulas for mean absolute deviation, variance and standard
deviation are given by;
Statistical Inference Definition
Statistical inference is the process of analysing the result and making conclusions from data subject
to random variation. It is also called inferential statistics. Hypothesis testing and confidence
intervals are the applications of the statistical inference. Statistical inference is a method of making
decisions about the parameters of a population, based on random sampling. It helps to assess the
relationship between the dependent and independent variables. The purpose of statistical inference to
estimate the uncertainty or sample to sample variation. It allows us to provide a probable range of
values for the true values of something in the population. The components used for making statistical
inference are:

 Sample Size

 Variability in the sample

 Size of the observed differences

Types of Statistical Inference


There are different types of statistical inferences that are extensively used for making conclusions.
They are:

 One sample hypothesis testing

 Confidence Interval

 Pearson Correlation

 Bi-variate regression

 Multi-variate regression

 Chi-square statistics and contingency table

 ANOVA or T-test

Statistical Inference Procedure


The procedure involved in inferential statistics are:

 Begin with a theory

 Create a research hypothesis


 Operationalize the variables

 Recognize the population to which the study results should apply

 Formulate a null hypothesis for this population

 Accumulate a sample from the population and continue the study

 Conduct statistical tests to see if the collected sample properties are adequately different
from what would be expected under the null hypothesis to be able to reject the null
hypothesis

Statistical Inference Solution


Statistical inference solutions produce efficient use of statistical data relating to groups of individuals
or trials. It deals with all characters, including the collection, investigation and analysis of data and
organizing the collected data. By statistical inference solution, people can acquire knowledge after
starting their work in diverse fields. Some statistical inference solution facts are:

 It is a common way to assume that the observed sample is of independent observations from
a population type like Poisson or normal

 Statistical inference solution is used to evaluate the parameter(s) of the expected model like
normal mean or binomial proportion

Sampling Distributions
A statistic, such as the sample mean or the sample standard deviation, is a number computed from a
sample. Since a sample is random, every statistic is a random variable: it varies from sample to
sample in a way that cannot be predicted with certainty. As a random variable it has a mean, a
standard deviation, and a probability distribution. The probability distribution of a statistic is called its
sampling distribution. Typically sample statistics are not ends in themselves, but are computed in
order to estimate the corresponding population parameters. This chapter introduces the concepts of
the mean, the standard deviation, and the sampling distribution of a sample statistic, with an
emphasis on the sample mean
 6.1: The Mean and Standard Deviation of the Sample Mean
The sample mean is a random variable and as a random variable, the sample mean has a
probability distribution, a mean, and a standard deviation. There are formulas that relate the
mean and standard deviation of the sample mean to the mean and standard deviation of the
population from which the sample is drawn.
 6.2: The Sampling Distribution of the Sample Mean
This phenomenon of the sampling distribution of the mean taking on a bell shape even
though the population distribution is not bell-shaped happens in general. The importance of
the Central Limit Theorem is that it allows us to make probability statements about the
sample mean, specifically in relation to its value in comparison to the population mean, as we
will see in the examples
 6.3: The Sample Proportion
Often sampling is done in order to estimate the proportion of a population that has a specific
characteristic.
 6.E: Sampling Distributions (Exercises)
These are homework exercises to accompany the Textmap created for "Introductory
Statistics" by Shafer and Zhang.

Populations, Samples, Parameters, and Statistics


The field of inferential statistics enables you to make educated guesses about the numerical
characteristics of large groups. The logic of sampling gives you a way to test conclusions about such
groups using only a small portion of its members.

A population is a group of phenomena that have something in common. The term often refers to a
group of people, as in the following examples:

 All registered voters in Crawford County

 All members of the International Machinists Union

 All Americans who played golf at least once in the past year

But populations can refer to things as well as people:

 All widgets produced last Tuesday by the Acme Widget Company

 All daily maximum temperatures in July for major U.S. cities

 All basal ganglia cells from a particular rhesus monkey

A sample is a smaller group of members of a population selected to represent the population. In


order to use statistics to learn things about the population, the sample must be random. A random
sample is one in which every member of a population has an equal chance of being selected. The
most commonly used sample is a simple random sample. It requires that every possible sample of
the selected size has an equal chance of being used.

A parameter is a characteristic of a population. A statistic is a characteristic of a sample. Inferential


statistics enables you to make an educated guess about a population parameter based on a statistic
computed from a sample randomly drawn from that population (see Figure 1).

Figure 1.Illustration of the relationship between samples and populations.

For example, say you want to know the mean income of the subscribers to a particular magazine—a
parameter of a population. You draw a random sample of 100 subscribers and determine that their
mean income is $27,500 (a statistic). You conclude that the population mean income μ is likely to be
close to $27,500 as well. This example is one of statistical inference.
Different symbols are used to denote statistics and parameters, as Table 1 shows.

Frequency distribution
The frequency of a value is the number of times it occurs in a dataset. A frequency distribution is the
pattern of frequencies of a variable. It’s the number of times each possible value of a variable occurs
in a dataset.

Types of frequency distributions


There are four types of frequency distributions:

 Ungrouped frequency distributions: The number of observations of each value of a variable.


o You can use this type of frequency distribution for categorical variables.
 Grouped frequency distributions: The number of observations of each class interval of a
variable. Class intervals are ordered groupings of a variable’s values.
o You can use this type of frequency distribution for quantitative variables.
 Relative frequency distributions: The proportion of observations of each value or class
interval of a variable.
o You can use this type of frequency distribution for any type of variable when you’re
more interested in comparing frequencies than the actual number of observations.
 Cumulative frequency distributions: The sum of the frequencies less than or equal to each
value or class interval of a variable.
o You can use this type of frequency distribution for ordinal or quantitative
variables when you want to understand how often observations fall below certain
values.

How to make a frequency table


Frequency distributions are often displayed using frequency tables. A frequency table is an effective
way to summarize or organize a dataset. It’s usually composed of two columns:

 The values or class intervals


 Their frequencies

The method for making a frequency table differs between the four types of frequency distributions.
You can follow the guides below or use software such as Excel, SPSS, or R to make a frequency table.

How to make an ungrouped frequency table

1. Create a table with two columns and as many rows as there are values of the
variable. Label the first column using the variable name and label the second column
“Frequency.” Enter the values in the first column.
o For ordinal variables, the values should be ordered from smallest to largest in the
table rows.
o For nominal variables, the values can be in any order in the table. You may wish to
order them alphabetically or in some other logical order.
2. Count the frequencies. The frequencies are the number of times each value occurs. Enter the
frequencies in the second column of the table beside their corresponding values.
o Especially if your dataset is large, it may help to count the frequencies by tallying. Add
a third column called “Tally.” As you read the observations, make a tick mark in the
appropriate row of the tally column for each observation. Count the tally marks to
determine the frequency.

Example: Making an ungrouped frequency tableA gardener set up a bird feeder in their backyard. To
help them decide how much and what type of birdseed to buy, they decide to record the bird species
that visit their feeder. Over the course of one morning, the following birds visit their feeder:

How to make a grouped frequency table

1. Divide the variable into class intervals. Below is one method to divide a variable into class
intervals. Different methods will give different answers, but there’s no agreement on the best
method to calculate class intervals.
o Calculate the range. Subtract the lowest value in the dataset from the highest.
o Decide the class interval width. There are no firm rules on how to choose the width,
but the following formula is a rule of thumb:

You can round this value to a whole number or a number that’s convenient to add
(such as a multiple of 10).

o Calculate the class intervals. Each interval is defined by a lower limit and upper limit.
Observations in a class interval are greater than or equal to the lower limit and less
than the upper limit:

The lower limit of the first interval is the lowest value in the dataset. Add the class
interval width to find the upper limit of the first interval and the lower limit of the
second variable. Keep adding the interval width to calculate more class intervals until
you exceed the highest value.

2. Create a table with two columns and as many rows as there are class intervals. Label the first
column using the variable name and label the second column “Frequency.” Enter the class
intervals in the first column.
3. Count the frequencies. The frequencies are the number of observations in each class interval.
You can count by tallying if you find it helpful. Enter the frequencies in the second column of
the table beside their corresponding class intervals.

Example: Grouped frequency distributionA sociologist conducted a survey of 20 adults. She wants to
report the frequency distribution of the ages of the survey respondents. The respondents were the
following ages in years:
52, 34, 32, 29, 63, 40, 46, 54, 36, 36, 24, 19, 45, 20, 28, 29, 38, 33, 49, 37

Round the class interval width to 10.

The class intervals are 19 ≤ a < 29, 29 ≤ a < 39, 39 ≤ a < 49, 49 ≤ a < 59, and 59 ≤ a < 69.

How to make a relative frequency table

1. Create an ungrouped or grouped frequency table.


2. Add a third column to the table for the relative frequencies. To calculate the
relative frequencies, divide each frequency by the sample size. The sample
size is the sum of the frequencies.

Example: Relative frequency distribution

From this table, the gardener can make observations, such as that 19% of the bird feeder visits were
from chickadees and 25% were from finches.

How to make a cumulative frequency table

1. Create an ungrouped or grouped frequency table for an ordinal or quantitative variable.


Cumulative frequencies don’t make sense for nominal variables because the values have no
order—one value isn’t more than or less than another value.
2. Add a third column to the table for the cumulative frequencies. The cumulative frequency is
the number of observations less than or equal to a certain value or class interval. To calculate
the relative frequencies, add each frequency to the frequencies in the previous rows.
3. Optional: If you want to calculate the cumulative relative frequency, add another column and
divide each cumulative frequency by the sample size.

Example:Cumulative frequency distribution

From this table, the sociologist can make observations such as 13 respondents (65%) were under 39
years old, and 16 respondents (80%) were under 49 years old.
How to graph a frequency distribution
Pie charts, bar charts, and histograms are all ways of graphing frequency distributions. The best
choice depends on the type of variable and what you’re trying to communicate.

Pie chart
A pie chart is a graph that shows the relative frequency distribution of a nominal variable.

A pie chart is a circle that’s divided into one slice for each value. The size of the slices shows their
relative frequency.

This type of graph can be a good choice when you want to emphasize that one variable is especially
frequent or infrequent, or you want to present the overall composition of a variable.

A disadvantage of pie charts is that it’s difficult to see small differences between frequencies. As a
result, it’s also not a good option if you want to compare the frequencies of different values.

Bar chart

A bar chart is a graph that shows the frequency or relative frequency distribution of a categorical
variable (nominal or ordinal).

The y-axis of the bars shows the frequencies or relative frequencies, and the x-axis shows the values.
Each value is represented by a bar, and the length or height of the bar shows the frequency of the
value.

A bar chart is a good choice when you want to compare the frequencies of different values. It’s much
easier to compare the heights of bars than the angles of pie chart slices.

Histogram

A histogram is a graph that shows the frequency or relative frequency distribution of a quantitative
variable. It looks similar to a bar chart.

The continuous variable is grouped into interval classes, just like a grouped frequency table. The y-
axis of the bars shows the frequencies or relative frequencies, and the x-axis shows the interval
classes. Each interval class is represented by a bar, and the height of the bar shows the frequency or
relative frequency of the interval class.

Although bar charts and histograms are similar, there are important differences:

Bar chart Histogram

Type of variable Categorical Quantitative

Value grouping Ungrouped (values) Grouped (interval classes)

Bar spacing Can be a space between bars Never a space between bars

Bar order Can be in any order Can only be ordered from lowest to highest
A histogram is an effective visual summary of several important characteristics of a variable. At a
glance, you can see a variable’s central tendency and variability, as well as what probability
distribution it appears to follow, such as a normal, Poisson, or uniform distribution.

Relative Frequencies and Their Distributions


A relative frequency indicates how often a specific kind of event occurs within the total number of
observations. It is a type of frequency that uses percentages, proportions, and fractions.

In this post, learn about relative frequencies, the relative frequency distribution, and its cumulative
counterpart.

Frequencies vs. Relative Frequencies

A frequency is a count of a particular event. For example, Jim read ten statistics books this year.
The football team won 12 games. For more information, read my post about frequency tables.

In contrast, relative frequencies do not use raw counts. Instead, they relate the count for a particular
type of event to the total number of events using percentages, proportions, or fractions. That’s
where the term “relative” comes in—a specific tally relative to the total number. For instance, 25% of
the books Jim read were about statistics. The football team won 85% of its games.

If you see a count, it’s a frequency. If you see a percentage, proportion, ratio, or fraction, it’s a relative
frequency.

Relative frequencies help you place a type of event into a larger context. For example, a survey
indicates that 20 students like their statistics course the most. From this raw count, you don’t know if
that’s a large or small proportion. However, if you knew that 30 out of 40 (75%) respondents indicated
that statistics was their favorite, you’d consider it a high number!

Additionally, they allow you to compare values between studies. Imagine that different sized schools
surveyed their students and obtained different numbers of respondents. If 30 students indicate that
statistics is their favorite, that could be a high percentage in one school but a low percentage in
another, depending on the total number of responses.

Relative frequencies facilitate apples-to-apples comparisons.

Learn more about percentages in my posts, How to Calculate a Percentage, Percent


Change and Percent Error.

How to Find a Relative Frequency

To calculate relative frequencies, you must know both of the following:

o The count of events for a category.


o The total number of events.

Relative frequency calculations convert counts into percentages by taking the count of a specific type
of event and dividing it by the total number of observations. Its formula is the following:
For example, imagine a school surveys 50 students and asks them to name their favorite course.
Thirty-six students state that statistics is their favorite.

o The frequency of “statistics” responses is 36.


o The total number of responses is 50.

To find the relative frequency for the statistics course, perform the following division: 36 / 50 = 72%.

Relative Frequencies as Empirical Probabilities

Relative frequencies also serve as empirical probabilities. Probabilities define the likelihood of events
occurring. Probability calculations often rely heavily on theory. However, when you observe the
relative frequency of an event, it’s an empirical probability. In other words, analysts calculate them
using real-world observations rather than theory.

An empirical probability is the number of events out of the total number of observations.

Related post: Probability Fundamentals

Relative Frequency Distributions: Tables and Graphs

A relative frequency distribution describes the relative frequencies for all possible outcomes in a
study. While a single value is for one type of event, the distribution displays percentages for all
possible results. Analysts typically present these distributions using tables and bar charts.

When you’re assessing two categorical variables together, you can use relative frequencies in a
contingency table. Learn more about Contingency Tables: Definition, Examples & Interpreting.

Let’s bring them to life by working through an example!

Table example

The relative frequency distribution table below displays the percentage of students in each grade at a
small school with 88 students.

School Grade Count of Students Relative Frequency

1 23 26.1%

2 20 22.7%
3 15 17.0%

4 12 13.6%

5 10 11.4%

6 8 9.1%

Total 88 100%

If the table had only the first two columns, grade level and count of students, it would be a frequency
distribution. A frequency distribution describes the counts for all possible outcomes. It’s the
percentage column that makes it a relative frequency distribution. You can see how the two types of
distributions are related.

To create a relative frequency distribution table, take the count of students in a row (one grade level)
and divide it by the total number of students. For example, in the first row, there are 23 students in the
first grade—23 out of 88 = 26.1%. For second graders, it’s 20 out of 88 = 22.7% Repeat this process
for all rows in the table.

Because these tables consider all possible outcomes, the total percentage must sum to 100%,
excepting rounding error.

They are handy because you instantly know the percentage of the total for each outcome, and you can
identify trends and patterns. For example, first graders account for just over a quarter (26.1%) of the
th
entire school by themselves. Conversely, 6 graders make up only 9.1% of the school. There’s a
downward trend in values as grade levels increase.

Cumulative Relative Frequency Distributions

A cumulative relative frequency distribution sums the progression of relative frequencies through all
the possible outcomes. Creating this type of distribution entails adding one more column to the table
and summing the values as you move down the rows to create a running, cumulative total.

For this example, we’ll return to school students. The cumulative relative frequency table below adds
the final column.

Cumulative Relative
School Grade Count of Students Relative Frequency
Frequency
1 23 26.1% 26.1%

2 20 22.7% 48.8%

3 15 17.0% 65.8%

4 12 13.6% 79.4%

5 10 11.4% 90.8%

6 8 9.1% 100%

Total 88 100%

To find the cumulative value for each row, sum the relative frequencies as you work your way down
the rows. The first value in the cumulative row equals that row’s relative frequency. For the 2nd row,
add that row’s value to the previous row. In the table, we add 26.1 + 22.7 = 48.8%. In the third row, add
17% to the previous cumulative value, 17 + 48.8 = 65.8%. And so on through all the rows.

The final cumulative value must equal 1 or 100%, excepting rounding error.

You can also display cumulative relative frequency distributions on graphs. In the chart below, I added
the orange cumulative line. Use these cumulative distributions to determine where most of the
events/observations occur. In the example data, the first and second graders comprise about half the
school.
Benford’s law is a fascinating relative frequency distribution that describes how often numbers in
datasets start with each digit from 1 to 9.

Mean, Median and Mode of Grouped Data


Suppose we want to compare the age of students in two schools and determine which school has
more aged students. If we compare on the basis of individual students, we cannot conclude anything.
However, if for the given data, we get a representative value that signifies the characteristics of the
data, the comparison becomes easy.
A certain value representative of the whole data and signifying its characteristics is called an average
of the data. Three types of averages are useful for analyzing data. They are:
 Mean
 Median
 Mode
In this article, we will study three types of averages for the analysis of the data.
Mean
The mean (or average) of observations is the sum of the values of all the observations divided by the
total number of observations. The mean of the data is given by x = f1x1 + f2x2 + …. + fnxn/f1 + f2+… +
fn. The mean Formula is given by,
Mean = ∑(fi.xi)/∑fi
Methods for Calculating Mean
Method 1: Direct Method for Calculating Mean
Step 1: For each class, find the class mark xi, as
x=1/2(lower limit + upper limit)
Step 2: Calculate fi.xi for each i.
Step 3: Use the formula Mean = ∑(fi.xi)/∑fi.
Example: Find the mean of the following data.

Class Interval 0-10 10-20 20-30 30-40 40-50

Frequency 12 16 6 7 9
Solution:
We may prepare the table given below:

Class Interval
Frequency fi Class Mark xi ( fi.xi )

0-10 12 5 60
Class Interval
Frequency fi Class Mark xi ( fi.xi )

10-20 16 15 240

20-30 6 25 150

30-40 7 35 245

40-50 9 45 405

∑fi=50 ∑fi.xi=1100
Mean = ∑(fi.xi)/∑fi = 1100/50 = 22
Method 2: Assumed – Mean Method For Calculating Mean
For calculating the mean in such cases we proceed as under.
Step 1: For each class interval, calculate the class mark x by using the formula: xi = 1/2 (lower limit +
upper limit).
Step 2: Choose a suitable value of mean and denote it by A. x in the middle as the assumed mean and
denote it by A.
Step 3: Calculate the deviations di = (x, -A) for each i.
Step 4: Calculate the product (fi x di) for each i.
Step 5: Find n = ∑fi
Step 6: Calculate the mean, x, by using the formula: X = A + ∑fidi/n.
Example: Using the assumed-mean method, find the mean of the following data:

0-10 10-20 20-30 30-40 40-50


Class Interval

7 8 12 13 10
Frequency

Solution:
Let A=25 be the assumed mean. Then we have,

Class Frequenc Mid Deviatio


y value n
Interval (f xd )
i i
fi xi di=(xi-25)

0-10 7 5 -20 -140

10-20 8 15 -10 -80

20-30 12 25=A 0 0

30-40 13 35 10 130
Class Frequenc Mid Deviatio
y value n
Interval (f xd )
i i
fi xi di=(xi-25)

40-50 10 45 20 200

∑(fixdi) =
∑fi = 50 100

Mean = X = A + ∑fidi/n = (25 + 110/50) = 27.2


Method 3: Step-Deviation method for Calculating Mean
When the values of x, and f are large, the calculation of the mean by the above methods becomes
tedious. In such cases, we use the step-deviation method, given below.
Step 1: For each class interval, calculate the class mark x, where X = 1/2 (lower limit + upper limit).
Step 2: Choose a suitable value of x, in the middle of the x, column as the assumed mean and denote
it by A.
Step 3: Calculate h = [(upper limit) – (lower limit)], which is the same for all the classes.
Step 4: Calculate ui = (xi -A) /h for each class.
Step 5: Calculate fu for each class and hence find ∑(fi × ui).
Step 6: Calculate the mean by using the formula: x = A + {h × ∑(fi × ui)/ ∑fi}
Example: Find the mean of the following frequency distribution:

50- 70- 90- 110- 130- 150-


Class
70 90 110 130 150 170

Frequenc
y
18 12 13 27 8 22
Solution:
We prepare the given table below,

Frequenc
ui=(xi-A)/h
y Mid Value
=(xi-
Class
fi xi 100)20 (fixui)

50-70 18 60 -2 -36

70-90 12 80 -1 -12

90-110 13 100=A 0 0

110-
130 27 120 1 27

130-
150 8 140 2 16
Frequenc
ui=(xi-A)/h
y Mid Value
=(xi-
Class
fi xi 100)20 (fixui)

150-
170 22 160 3 66

∑(fixui) =
∑fi = 100 61
A = 100, h = 20, ∑fi = 100 and ∑(fi x ui) = 61
x = A + {h × ∑(fi x ui) / ∑fi}
=100 + {20 × 61/100} = (100 + 12.2) =112.2
Median
We first arrange the given data values of the observations in ascending order. Then, if n is odd, the
median is the (n+1/2). And if n is even, then the median will be the average of the n/2th and the (n/2
+1)th observation. The formula for Calculating Median:
Median, Me = l + {h x (N/2 – cf )/f}
Where,
 l = lower limit of median class.
 h = width of median class.
 f = frequency of median class,
 cf = cumulative frequency of the class preceding the median class.
 N = ∑fi
Example: Calculate the median for the following frequency distribution.

0- 8- 16- 24- 32- 40-


Class
Interval
8 16 24 32 40 48

Frequency
8 10 16 24 15 7
Solution:
We may prepare cumulative frequency table as given below,

Cumulative
Class Frequency Frequency

0-8 8 8

8-16 10 18

16-
24 16 34

24-
32 24 58
Cumulative
Class Frequency Frequency

32-
40 15 73

40-
48 7 80

N = ∑fi = 80
Now, N = 80 = (N/2) = 40.
The cumulative frequency just greater than 40 is 58 and the corresponding class is 24-32.
Thus, the median class is 24-32.
l = 24, h = 8, f = 24, cf = c.f. of preceding class = 34, and (N/2) = 40.
Median, Me = l+ h{(N/2-cf)/f}
= 24 + 8 {(40 – 34)/ 24}
= 26
Hence, median = 26.
Mode
It is that value of a variety that occurs most often. More precisely, the mode is the value of the
variable at which the concentration of the data is maximum.
Modal Class: In a frequency distribution, the class having the maximum frequency is called the modal
class. The formula for Calculating Mode:
Mo = xk + h{(fk – fk-1)/(2fk – fk-1 – fk+1)}
Where,
 xk = lower limit of the modal class interval.
 fk = frequency of the modal class.
 fk-1= frequency of the class preceding the modal class.
 fk+1 = frequency of the class succeeding the modal class.
 h = width of the class interval.
Example 1: Calculate the mode for the following frequency distribution.

0- 10- 20- 30- 40- 50- 60- 70-


Class
10 20 30 40 50 60 70 80

Frequenc
y
5 8 7 12 28 20 10 10
Solution:
Class 40-50 has the maximum frequency, so it is called the modal class.
xk = 40, h = 10, fk = 28, fk-1 = 12, fk+1 = 20
Mode, Mo= xk + h{(fk – fk-1)/(2fk – fk-1 – fk+1)}
= 40 + 10{(28 – 12)/(2 × 28 – 12 – 20)}
= 46.67
Hence, mode = 46.67
Important Result
Relationship among mean, median and mode,
Mode = 3(Median) – 2(Mean)
Example 2: Find the mean, mode, and median for the following data,

0- 10- 20- 30- 40- Tota


Class
10 20 30 40 50 l

Frequenc
y
8 16 36 34 6 100
Solution:
We have,

Cumulative
Frequency
Class Frequency
Mid Value xi fi fi . xi

0-10 5 8 8 40

10-
20 15 16 24 240

20-
30 25 36 60 900

30-
40 35 34 94 1190

40-
50 45 6 100 270

∑fi.
∑fi=100 xi=2640
Mean = ∑(fi.xi)/∑f
= 2640/100
= 26.4
Here, N = 100 ⇒ N / 2 = 50.
Cumulative frequency just greater than 50 is 60 and corresponding class is 20-30.
Thus, the median class is 20-30.
Hence, l = 20, h = 10, f = 36, c = c. f. of preceding class = 24 and N/2=50
Median, Me = l + h{(N/2 – cf)/f}
= 20+10{(50-24)/36}
Median = 27.2.
Mode = 3(median) – 2(mean) = (3 × 27.2 – 2 × 26.4) = 28.8.
Unbiased and Biased Estimators
We now define unbiased and biased estimators. We want our estimator to match our parameter, in
the long run. In more precise language we want the expected value of our statistic to equal the
parameter. If this is the case, then we say that our statistic is an unbiased estimator of the parameter.

If an estimator is not an unbiased estimator, then it is a biased estimator. Although a biased


estimator does not have a good alignment of its expected value with its parameter, there are many
practical instances when a biased estimator can be useful. One such case is when a plus four
confidence interval is used to construct a confidence interval for a population proportion.

Example for Means

To see how this idea works, we will examine an example that pertains to the mean. The statistic

(X1 + X2 + . . . + Xn)/n

is known as the sample mean. We suppose that the random variables are a random sample from the
same distribution with mean μ. This means that the expected value of each random variable is μ.

When we calculate the expected value of our statistic, we see the following:

E[(X1 + X2 + . . . + Xn)/n] = (E[X1] + E[X2] + . . . + E[Xn])/n = (nE[X1])/n = E[X1] = μ.

Since the expected value of the statistic matches the parameter that it estimated, this means that the
sample mean is an unbiased estimator for the population mean.

What are Point Estimators?

Point estimators are functions that are used to find an approximate value of a population parameter
from random samples of the population. They use the sample data of a population to calculate a
point estimate or a statistic that serves as the best estimate of an unknown parameter of a
population.

Most often, the existing methods of finding the parameters of large populations are unrealistic. For
example, when finding the average age of kids attending kindergarten, it will be impossible to collect
the exact age of every kindergarten kid in the world. Instead, a statistician can use the point estimator
to make an estimate of the population parameter.
Properties of Point Estimators

The following are the main characteristics of point estimators:

1. Bias

The bias of a point estimator is defined as the difference between the expected value of the estimator
and the value of the parameter being estimated. When the estimated value of the parameter and the
value of the parameter being estimated are equal, the estimator is considered unbiased.

Also, the closer the expected value of a parameter is to the value of the parameter being measured,
the lesser the bias is.

2. Consistency

Consistency tells us how close the point estimator stays to the value of the parameter as it increases
in size. The point estimator requires a large sample size for it to be more consistent and accurate.

You can also check if a point estimator is consistent by looking at its corresponding expected value
and variance. For the point estimator to be consistent, the expected value should move toward the
true value of the parameter.

3. Most efficient or unbiased

The most efficient point estimator is the one with the smallest variance of all the unbiased and
consistent estimators. The variance measures the level of dispersion from the estimate, and the
smallest variance should vary the least from one sample to the other.

Generally, the efficiency of the estimator depends on the distribution of the population. For example,
in a normal distribution, the mean is considered more efficient than the median, but the same does
not apply in asymmetrical distributions.

Point Estimation vs. Interval Estimation

The two main types of estimators in statistics are point estimators and interval estimators. Point
estimation is the opposite of interval estimation. It produces a single value while the latter produces a
range of values.

A point estimator is a statistic used to estimate the value of an unknown parameter of a population. It
uses sample data when calculating a single statistic that will be the best estimate of the unknown
parameter of the population.

On the other hand, interval estimation uses sample data to calculate the interval of the possible
values of an unknown parameter of a population. The interval of the parameter is selected in a way
that it falls within a 95% or higher probability, also known as the confidence interval.

The confidence interval is used to indicate how reliable an estimate is, and it is calculated from the
observed data. The endpoints of the intervals are referred to as the upper and lower confidence limits.

Common Methods of Finding Point Estimates

The process of point estimation involves utilizing the value of a statistic that is obtained from sample
data to get the best estimate of the corresponding unknown parameter of the population. Several
methods can be used to calculate the point estimators, and each method comes with different
properties.
1. Method of moments

The method of moments of estimating parameters was introduced in 1887 by Russian


mathematician Pafnuty Chebyshev. It starts by taking known facts about a population and then
applying the facts to a sample of the population. The first step is to derive equations that relate the
population moments to the unknown parameters.

The next step is to draw a sample of the population to be used to estimate the population moments.
The equations derived in step one are then solved using the sample mean of the population moments.
This produces the best estimate of the unknown population parameters.

2. Maximum likelihood estimator

The maximum likelihood estimator method of point estimation attempts to find the unknown
parameters that maximize the likelihood function. It takes a known model and uses the values to
compare data sets and find the most suitable match for the data.

For example, a researcher may be interested in knowing the average weight of babies born
prematurely. Since it would be impossible to measure all babies born prematurely in the population,
the researcher can take a sample from one location.

Because the weight of pre-term babies follows a normal distribution, the researcher can use the
maximum likelihood estimator to find the average weight of the entire population of pre-term babies
based on the sample data.

What are Confidence Intervals?

Often in statistics we’re interested in measuring population parameters – numbers that describe
some characteristic of an entire population.
Two of the most common population parameters are:
1. Population mean: the mean value of some variable in a population (e.g. the mean height of males
in the U.S.)
2. Population proportion: the proportion of some variable in a population (e.g. the proportion of
residents in a county who support a certain law)
Although we’re interested in measuring these parameters, it’s usually too costly and time-consuming
to actually go around and collect data on every individual in a population in order to calculate the
population parameter.
Instead, we typically take a random sample from the overall population and use data from the sample
to estimate the population parameter.
For example, suppose we want to estimate the mean weight of a certain species of turtle in Florida.
Since there are thousands of turtles in Florida, it would be extremely time-consuming and costly to go
around and weigh each individual turtle.
Instead, we might take a simple random sample of 50 turtles and use the mean weight of the turtles
in this sample to estimate the true population mean:

The problem is that the mean weight of turtles in the sample is not guaranteed to exactly match the
mean weight of turtles in the whole population. For example, we might just happen to pick a sample
full of low-weight turtles or perhaps a sample full of heavy turtles.
In order to capture this uncertainty we can create a confidence interval. A confidence interval is a
range of values that is likely to contain a population parameter with a certain level of confidence. It is
calculated using the following general formula:
Confidence Interval = (point estimate) +/- (critical value)*(standard error)
This formula creates an interval with a lower bound and an upper bound, which likely contains a
population parameter with a certain level of confidence.
Confidence Interval = [lower bound, upper bound]
For example, the formula to calculate a confidence interval for a population mean is as follows:
Confidence Interval = x +/- z*(s/√n)
where:
 x: sample mean
 z: the chosen z-value
 s: sample standard deviation
 n: sample size
The z-value that you will use is dependent on the confidence level that you choose. The following
table shows the z-value that corresponds to popular confidence level choices:
Confidence Level z-value

0.90 1.645

0.95 1.96

0.99 2.58

For example, suppose we collect a random sample of turtles with the following information:
Sample size n = 25
Sample mean weight x = 300
Sample standard deviation s = 18.5
Here is how to find calculate the 90% confidence interval for the true population mean weight:
90% Confidence Interval: 300 +/- 1.645*(18.5/√25) = [293.91, 306.09]
We interpret this confidence interval as follows:
There is a 90% chance that the confidence interval of [293.91, 306.09] contains the true population
mean weight of turtles.Another way of saying the same thing is that there is only a 10% chance that
the true population mean lies outside of the 90% confidence interval. That is, there’s only a 10%
chance that the true population mean weight of turtles is greater than 306.09 pounds or less than
293.91 pounds.
It’s worth nothing that there are two numbers that can affect the size of a confidence interval:
1. The sample size: The larger the sample size, the more narrow the confidence interval.
2. The confidence level: The larger the confidence level, the wider the confidence interval.
Types of Confidence Intervals
There are many types of confidence intervals. Here are the most commonly used ones:
Confidence Interval for a Mean
A confidence interval for a mean is a range of values that is likely to contain a population mean with a
certain level of confidence. The formula to calculate this interval is:
Confidence Interval = x +/- z*(s/√n)
where:
 x: sample mean
 z: the chosen z-value
 s: sample standard deviation
 n: sample size
Confidence Interval for a Mean CalculatorConfidence Interval for the Difference Between Means
A confidence interval (C.I.) for a difference between means is a range of values that is likely to
contain the true difference between two population means with a certain level of confidence. The
formula to calculate this interval is:
2 2
Confidence interval = (x1–x2) +/- t*√((sp /n1) + (sp /n2))
where:
 x1, x2: sample 1 mean, sample 2 mean
 t: the t-critical value based on the confidence level and (n1+n2-2) degrees of freedom
2
 sp : pooled variance
 n1, n2: sample 1 size, sample 2 size
where:
 The pooled variance is calculated as: sp2 = ((n1-1)s12 + (n2-1)s22) / (n1+n2-2)
 The t-critical value t can be found using the Inverse t Distribution calculator
Confidence Interval for a Proportion
A confidence interval for a proportion is a range of values that is likely tocontain a population
proportion with a certain level of confidence. The formula to calculate this interval is:
Confidence Interval = p +/- z*(√p(1-p) / n)
where:
 p: sample proportion
 z: the chosen z-value
 n: sample size
Confidence Interval for the Difference in ProportionsA confidence interval for the difference in
proportions is a range of values that is likely to contain the true difference between two population
proportions with a certain level of confidence.. The formula to calculate this interval is:

Confidence interval = (p1–p2) +/- z*√(p1(1-p1)/n1 + p2(1-p2)/n2)

 p1, p2: sample 1 proportion, sample 2 proportion


 z: the z-critical value based on the confidence level
 n1, n2: sample 1 size, sample 2 size

Maximum likelihood estimation

Maximum likelihood estimation (MLE) is an estimation method that allows us to use a sample to
estimate the parameters of the probability distribution that generated the sample.
This lecture provides an introduction to the theory of maximum likelihood, focusing on its
mathematical aspects, in particular on:
 its asymptotic properties;
 the assumptions that are needed to prove the properties.
At the end of the lecture, we provide links to pages that contain examples and that treat practically
relevant aspects of the theory, such as numerical optimization and hypothesis testing.

The sample and its likelihood


The main elements of a maximum likelihood estimation problem are the following:
 a sample , that we use to make statements about the probability distribution that generated
the sample;
 the sample is regarded as the realization of a random vector , whose distribution is
unknown and needs to be estimated;
 there is a set of real vectors (called the parameter space) whose elements
(called parameters) are put into correspondence with the possible distributions of ; in particular:
 if is a discrete random vector, we assume that its joint probability mass function
belongs to a set of joint probability mass functions indexed by the
parameter ; when the joint probability mass function is considered as a function of for
fixed , it is called likelihood (or likelihood function) and it is denoted by

 if is a continuous random vector, we assume that its joint probability density


function belongs to a set of joint probability density functions indexed by
the parameter ; when the joint probability density function is considered as a function of

for fixed , it is called likelihood and it is denoted by


 we need to estimate the true parameter , which is associated with the unknown
distribution that actually generated the sample (we rule out the possibility that several different
parameters are put into correspondence with true distribution).

UNIT-V
Graph Theory Basics
A graph is a data structure that is defined by two components :
1. A node or a vertex.
2. An edge E or ordered pair is a connection between two nodes u,v that is identified by unique
pair(u,v). The pair (u,v) is ordered because (u,v) is not same as (v,u) in case of directed graph.The
edge may have a weight or is set to one in case of unweighted graph.
“Graph representation”
Applications: Graph is a data structure which is used extensively in our real-life.
1. Social Network: Each user is represented as a node and all their activities,suggestion and friend
list are represented as an edge between the nodes.
2. Google Maps: Various locations are represented as vertices or nodes and the roads are
represented as edges and graph theory is used to find shortest path between two nodes.
3. Recommendations on e-commerce websites: The “Recommendations for you” section on various
e-commerce websites uses graph theory to recommend items of similar type to user’s choice.
4. Graph theory is also used to study molecules in chemistry and physics.
More on graphs: Characteristics of graphs:
1. Adjacent node: A node ‘v’ is said to be adjacent node of node ‘u’ if and only if there exists an edge
between ‘u’ and ‘v’.
2. Degree of a node: In an undirected graph the number of nodes incident on a node is the degree of
the node. In case of directed graph ,Indegree of the node is the number of arriving edges to a
node. Outdegree of the node is the number of departing edges to a node.
Note: 1 a self-loop is counted twice
2 the sum of degree of all the vertices in a graph G is even.

1.
2. Path: A path of length ‘n’ from node ‘u’ to node ‘v’ is defined as sequence of n+1 nodes.
P(u,v)=(v0,v1,v2,v3…….vn)

1. A path is simple if all the nodes are distinct,exception is source and destination are same.
2. Isolated node: A node with degree 0 is known as isolated node.Isolated node can be found by
Breadth first search(BFS). It finds its application in LAN network in finding whether a system is
connected or not.

Types of graphs:

1. Directed graph: A graph in which the direction of the edge is defined to a particular node is a
directed graph.
 Directed Acyclic graph: It is a directed graph with no cycle.For a vertex ‘v’ in DAG there is no
directed edge starting and ending with vertex ‘v’. a) Application :Critical game
analysis,expression tree evaluation,game evaluation.
 Tree: A tree is just a restricted form of graph.That is, it is a DAG with a restriction that a child
can have only one parent.
2. Undirected graph: A graph in which the direction of the edge is not defined.So if an edge exists
between node ‘u’ and ‘v’,then there is a path from node ‘u’ to ‘v’ and vice versa.
 Connected graph: A graph is connected when there is a path between every pair of vertices.In
a connected graph there is no unreachable node.
 Complete graph: A graph in which each pair of graph vertices is connected by an edge.In other
words,every node ‘u’ is adjacent to every other node ‘v’ in graph ‘G’.A complete graph would
have n(n-1)/2 edges.See below for proof.
 Biconnected graph: A connected graph which cannot be broken down into any further pieces by
deletion of any vertex.It is a graph with no articulation point.

Proof for complete graph:


1. Consider a complete graph with n nodes. Each node is connected to other n-1 nodes. Thus it
becomes n * (n-1) edges. But this counts each edge twice because this is a undirected graph so
divide it by 2.
2. Thus it becomes n(n-1)/2.

Consider the given graph, //Omit the repetitive edges Edges on node A = (A,B),(A,C),(A,E),(A,C). Edges
on node B = (B,C),(B,D),(B,E). Edges on node C = (C,D),(C,E). Edges on node D = (D,E). Edges on node E
= EMPTY.https://en.wikipedia.org/wiki/Graph_theory Total edges = 4+3+2+1+0=10 edges. Number of
node = 5. Thus n(n-1)/2=10 edges. Thus proven. Read next set – Graph Theory Basics
Some more graphs :
1. Regular graph :A graph in which every vertex x has same/equal degree.k-regular graph means every
vertex has k degree.
Every complete graph Kn will have (n-1)-regular graph which means degree is n-1.

Regular graphs

2. Bipartite graph : It is graph G in which vertex set can be partitioned into two subsets U and V such
that each edge of G has one end in U and another end point in V.
Bipartite graph

3. Complete Bipartite graph : it is a simple graph with vertex set partitioned into two subsets :
U={v1,v2………..vm} and W={w1,w2,………..wn}
i. There is an edge from each vi to each wj.
ii. there is not an selp loop.

Complete Bipartite graph

4. Cycle graph : A graph of n vertices (n≥3) . v1,v2,………………..vn with edges (v1,v2),(v2,v3),………..,(vn-


1,vn),(vn,v1).

Subgraph:
A graph G1 = (V1, E1) is called a subgraph of a graph G(V, E) if V1(G) is a subset of V(G) and E1(G)
is a subset of E(G) such that each edge of G1 has same end vertices as in G.
Types of Subgraphs:
 Vertex disjoint subgraph: Any two graph G1 = (V1, E1) and G2 = (V2, E2) are said to be vertex
disjoint of a graph G = (V, E) if V1(G1) intersection V2(G2) = null. In the figure, there is no
common vertex between G1 and G2.
 Edge disjoint subgraph: A subgraph is said to be edge-disjoint if E1(G1) intersection E2(G2) =
null. In the figure, there is no common edge between G1 and G2.
Note: Edge disjoint subgraph may have vertices in common but a vertex disjoint graph cannot have
a common edge, so the vertex disjoint subgraph will always be an edge-disjoint subgraph.
17. Spanning Subgraph
Consider the graph G(V,E) as shown below. A spanning subgraph is a subgraph that contains all the
vertices of the original graph G that is G'(V’,E’) is spanning if V’=V and E’ is a subset of E.

So one of the spanning subgraph can be as shown below G'(V’,E’). It has all the vertices of the
original graph G and some of the edges of G.
This is just one of the many spanning subgraphs of graph G. We can various other spanning
subgraphs by different combinations of edges. Note that if we consider a graph G'(V’,E’) where V’=V
and E’=E, then graph G’ is a spanning subgraph of graph G(V,E).

Matrix Representation of Graphs


A graph can be represented using Adjacency Matrix way.

Adjacency Matrix

An Adjacency Matrix A[V][V] is a 2D array of size V × V where V� is the number of vertices in a


undirected graph. If there is an edge between Vx to Vy then the value of A[Vx][Vy]=1 and A[Vy][Vx]=1,
otherwise the value will be zero.
For a directed graph, if there is an edge between Vx to Vy, then the value of A[Vx][Vy]=1, otherwise the
value will be zero.
Adjacency Matrix of an Undirected Graph
Let us consider the following undirected graph and construct the adjacency matrix −

Adjacency matrix of the above undirected graph will be −

a b c d

a 0 1 1 0

b 1 0 1 0

c 1 1 0 1

d 0 0 1 0

Adjacency Matrix of a Directed Graph


Let us consider the following directed graph and construct its adjacency matrix −

Adjacency matrix of the above directed graph will be −

a b c d

a 0 1 1 0

b 0 0 1 0

c 0 0 0 1

d 0 0 0 0
Incidence matrix
The incidence matrix can be described as a matrix that shows the graph. That means the incidence
matrix is used to draw a graph. We will use the symbol [Ac] to represent the incidence matrix. Just like
all other matrices, this matrix also contains rows and columns.

In the graph, the number of nodes is indicated with the help of rows of incidence matrix [Ac], and the
number of branches is indicated with the help of columns of that matrix. If the given incidence matrix
contains the n number of rows, then it will show that the graph of this matrix has n number of nodes.
Similarly, if the given incidence matrix contains the m number of columns, then it will show that the
graph of this matrix has the m number of branches.

The above graph is a directed graph that has 6 branches and 4 nodes. So we can say that this graph
contains the 6 columns and 4 rows for the incidence matrix. The incidence matrix will always take the
entries as -1, 0, +1. The incidence matrix is always analogous to the KCL, which stands for the
Kirchhoff Current Law. Thus, the following things can be derived from the KCL:

Types of Branch Value

Incoming branch to kth node -1

th
Outcome branch to k node +1

Others 0

Steps to Construct Incidence matrix

The incidence matrix can be drawn with the help of some steps, which are described as follows:

1. We will write the +1 if there is an outgoing branch in the kth node.


2. We will write the -1 if there is an incoming branch in the kth node.
3. We will write 0 in all the other branches.

Examples of Incidence matrix

In this example, we have a directed graph, and we have to draw the incidence matrix of this graph.
The incidence matrix of the above graph is described as follows:

[AC] =

Branches a b c d e f

Nodes

1 1 -1 0 0 1 0

2 -1 0 -1 0 0 -1

3 0 0 0 1 -1 1

4 0 1 1 -1 0 0

Reduced Incidence matrix

If we delete any arbitrary row from the given incidence matrix, in this case, the newly created matrix
will be known as the reduced incidence matrix. The newly created matrix or reduced matrix is
indicated by the symbol [A]. The order of this reduced incidence matrix will be (n-1)*b, where b is used
to indicate the number of branched and n is used to indicate the number of nodes. The reduced
incidence matrix for the above incidence matrix is described as follows:

[A] =

Branches a b c d e f

Nodes

1 1 -1 0 0 1 0

2 -1 0 -1 0 0 -1

3 0 0 0 1 -1 1

In this matrix, we have deleted node no 4 of the incidence matrix [AC].


Example of Reduced incidence matrix

To show the example of a reduced incidence matrix, we will consider an incidence graph. Now we
have to write the reduced incidence matrix for this incidence graph.

Solution:

We have to first draw an incidence matrix of the given graph to draw the reduced incidence matrix.
The incidence matrix of the above graph is described as follows:

[AC] =

Branches a B C d e f

Nodes

1 -1 1 0 0 1 0

2 1 0 -1 0 0 1

3 0 0 0 1 -1 -1

4 0 -1 1 -1 0 0

Now we will draw the reduced incidence matrix of this matrix, which is described as follows:

[A] =

Branches a B c d e f

Nodes

1 -1 1 0 0 1 0

3 0 0 0 1 -1 -1

4 0 -1 1 -1 0 0
Graph isomorphism in Discrete Mathematics The isomorphism graph
can be described as a graph in which a single graph can have more than one form. That means two
different graphs can have the same number of edges, vertices, and same edges connectivity. These
types of graphs are known as isomorphism graphs. The example of an isomorphism graph is
described as follows:

The above graph contains the following things:

o The same graph is represented in more than one form.


o Hence, we can say that these graphs are isomorphism graphs.

Conditions for graph isomorphism

Any two graphs will be known as isomorphism if they satisfy the following four conditions:

1. There will be an equal number of vertices in the given graphs.


2. There will be an equal number of edges in the given graphs.
3. There will be an equal amount of degree sequence in the given graphs.
4. If the first graph is forming a cycle of length k with the help of vertices {v1, v2, v3, …. vk}, then
another graph must also form the same cycle of the same length k with the help of vertices
{v1, v2, v3, …. vk}.

Important Points
o For any two graphs to be an isomorphism, the necessary conditions are the above-defined
four conditions.
o It is not necessary that the above-defined conditions will be sufficient to show that the given
graphs are isomorphic.
o If two graphs satisfy the above-defined four conditions, even then, it is not necessary that the
graphs will surely isomorphism.
o If the graph fails to satisfy any conditions, then we can say that the graphs are surely not an
isomorphism.

Sufficient Conditions for Graph isomorphism

If we want to prove that any two graphs are isomorphism, there are some sufficient conditions which
we will provide us guarantee that the two graphs are surely isomorphism. When the two graphs are
successfully cleared all the above four conditions, only then we will check those graphs to sufficient
conditions, which are described as follows

If the complement graphs of both the graphs are isomorphism, then these graphs will surely be an
isomorphism.
If the adjacent matrices of both the graphs are the same, then these graphs will be an isomorphism.

If the corresponding graphs of two graphs are obtained with the help of deleting some vertices of one
graph, and their corresponding images in other images are isomorphism, only then these graphs will
not be an isomorphism.

When two graphs satisfy any of the above conditions, then we can say that those graphs are surely
isomorphism.

Example 1:

Solution: For this, we will check all the four conditions of graph isomorphism, which are described as
follows:

Condition 1:

o In graph 1, there is a total 4 number of vertices, i.e., G1 = 4.


o In graph 2, there is a total 4 number of vertices, i.e., G2 = 4.

Here,

There are an equal number of vertices in both graphs G1 and G2. So these graphs satisfy condition 1.
Now we will check the second condition.

Condition 2:

o In graph 1, there is a total 5 number of edges, i.e., G1 = 5.


o In graph 2, there is a total 6 number of edges, i.e., G2 = 6.

Here,

There does not have an equal number of edges in both graphs G1 and G2. So these graphs do not
satisfy condition 2. Now we cannot check all the remaining conditions.

Since, these graphs violate condition 2. So these graphs are not an isomorphism.

∴ Graph G1 and graph G2 are not isomorphism graphs.


Example 2:

Solution: For this, we will check all the four conditions of graph isomorphism, which are described as
follows:

Condition 1:

o In graph 1, there is a total 4 number of vertices, i.e., G1 = 4.


o In graph 2, there is a total 4 number of vertices, i.e., G2 = 4.
o In graph 3, there is a total 4 number of vertices, i.e., G3 = 4.

Here,

There are an equal number of vertices in all graphs G1, G2 and G3. So these graphs satisfy condition 1.
Now we will check the second condition.

Condition 2:

o In graph 1, there is a total 5 number of edges, i.e., G1 = 5.


o In graph 2, there is a total 5 number of edges, i.e., G2 = 5.
o In graph 3, there is a total 4 number of edges, i.e., G2 = 4.

Here,

o There are an equal number of edges in both graphs G1 and G2. So these graphs satisfy
condition 2.
o But there does not have an equal number of edges in the graphs (G1, G2) and G3. So the
graphs (G1, G2) and G3 do not satisfy condition 2.

Since, the graphs (G1, G2) and G3 violate condition 2. So we can say that these graphs are not an
isomorphism.

∴ Graph G3 is neither isomorphism with graph G1 nor with graph G2.

Since the graphs, G1 and G2 satisfy condition 2. So we can say that these graphs may be an
isomorphism.

∴ Graphs G1 and G2 may be an isomorphism.

Now we will check the third condition for graphs G1 and G2.

Condition 3:

o In the graph 1, the degree of sequence s is {2, 2, 3, 3}, i.e., G1 = {2, 2, 3, 3}.
o In the graph 2, the degree of sequence s is {2, 2, 3, 3}, i.e., G2 = {2, 2, 3, 3}.

Here

There are an equal number of degree sequences in both graphs G1 and G2. So these graphs satisfy
condition 3. Now we will check the fourth condition.

Condition 4:

Graph G1 forms a cycle of length 3 with the help of vertices {2, 3, 3}.

Graph G2 also forms a cycle of length 3 with the help of vertices {2, 3, 3}.

Here,

It shows that both the graphs contain the same cycle because both graphs G1 and G2 are forming a
cycle of length 3 with the help of vertices {2, 3, 3}. So these graphs satisfy condition 4.

Thus,

o The graphs G1 and G2 satisfy all the above four necessary conditions.
o So G1 and G2 may be an isomorphism.

Now we will check sufficient conditions to show that the graphs G1 and G2 are an isomorphism.

Checking Sufficient Conditions

As we have learned that if the complement graphs of both the graphs are isomorphism, the two
graphs will surely be an isomorphism.

So we will draw the complement graphs of G1 and G2, which are described as follows:

In the above complement graphs of G1 and G2, we can see that both the graphs are isomorphism.

∴ Graphs G1 and G2 are isomorphism.

Path:
A path is a type of open walk where neither edges nor vertices are allowed to repeat. There is a
possibility that only the starting vertex and ending vertex are the same in a path. In an open walk, the
length of the walk must be more than 0.

So for a path, the following two points are important, which are described as follows:

o Edges cannot be repeated


o Vertex cannot be repeated

In the above graph, there is a path, which is described as follows:

1. Path: F, H, C, A, B, D
o In discrete mathematics, every path can be a trail, but it is not possible that every trail is a
path.
o In discrete mathematics, every cycle can be a circuit, but it is not important that every circuit
is a cycle.
o If there is a directed graph, we have to add the term "directed" in front of all the definitions
defined above.
o In both the walks and paths, all sorts of graphical theoretical concepts are considered. For
example, suppose we have a graph and want to determine the distance between two vertices.
In this case, it will be considered the shortest path, which begins at one and ends at the other.
Here the length of the path will be equal to the number of edges in the graph.

Circuit
A circuit can be described as a closed walk where no edge is allowed to repeat. In the circuit, the
vertex can be repeated. A closed trail in the graph theory is also known as a circuit.

So for a circuit, the following two points are important, which are described as follows:

o Edges cannot be repeated


o Vertex can be repeated

In the above graph, there is a circuit, which is described as follows:

1. Circuit: A, B, D, C, F, H, C, A
Euler Graph
If all the vertices of any connected graph have an even degree, then this type of graph will be known
as the Euler graph. In other words, we can say that an Euler graph is a type of connected graph which
have the Euler circuit. The simple example of Euler graph is described as follows:

The above graph is a connected graph, and the vertices of this graph contain the even degree. Hence
we can say that this graph is an Euler graph.

In other words, we can say that this graph is an Euler graph because it has the Euler circuit as
BACEDCB.

If there is a connected graph with a trail that has all the edges of the graph, then that type of trail will
be known as the Euler trail.

If there is a connected graph, which has a walk that passes through each and every edge of the graph
only once, then that type of walk will be known as the Euler walk.

Note: If more than two vertices of the graph contain the odd degree, then that type of graph will be known as the
Euler Path.

Examples of Euler path:

There are a lot of examples of the Euler path, and some of them are described as follows:

Example 1: In the following image, we have a graph with 4 nodes. Now we have to determine whether
this graph contains an Euler path.

Solution:

The above graph will contain the Euler path if each edge of this graph must be visited exactly once,
and the vertex of this can be repeated. So if we begin our path from vertex B and then go to vertices C,
D, B, A, and D, then in this process, each and every edge is visited exactly once, and it also contains
repeated vertex. So the above graph contains an Euler path, which is described as follows:

Euler path = BCDBAD


Euler Circuit:

We can also call the Euler circuit as Euler Tour and Euler Cycle. There are various definitions of the
Euler circuit, which are described as follows:

o If there is a connected graph with a circuit that has all the edges of the graph, then that type
of circuit will be known as the Euler circuit.
o If there is a connected graph, which has a walk that passes through each and every edge of
the graph only once, then that type of walk will be known as the Euler circuit. In this walk, the
starting vertex and ending vertex must be the same, and this walk can contain the repeated
vertex, but it is not compulsory.
o If an Euler trail contains the same vertex at the start and end of the trail, then that type of trail
will be known as the Euler Circuit.
o A closed Euler trail will be known as the Euler Circuit.

Note: If all the vertices of the graph contain the even degree, then that type of graph will be known as the Euler
circuit.

Examples of Euler Circuit

There are a lot of examples of the Euler circuit, and some of them are described as follows:

Example 1: In the following image, we have a graph with 4 nodes. Now we have to determine whether
this graph contains an Euler circuit.

Solution:

The above graph will contain the Euler circuit if the starting vertex and end vertex are the same, and
this graph visits each and every edge only once. The Euler circuit can contain the repeated vertex. If
we begin our path from vertex A and then go to vertices B, C, D, and A, then in this process, the
condition of same starting and end vertex is satisfied, but another condition of covering all edges is
not satisfied because there is one edge from vertex D to B, which is not covered. If we try to cover this
edge, then the edges will be repeated. So the above graph does not contain an Euler circuit.

Semi Euler Graph

If there is a connected graph that does not have an Euler circuit, but it has an Euler trail, then that type
of graph will be known as the semi-Euler graph. Any graph will be known as semi Euler graph if it
satisfies two conditions, which are described as follows:

o For this, the graph must be connected


o This graph must contain an Euler trail
Example of Semi-Euler graph

In this example, we have a graph with 4 nodes. Now we have to determine whether this graph is a
semi-Euler graph.

Solution:

Here,

o There is an Euler trail in this graph, i.e., BCDBAD.


o But there is no Euler circuit.
o Hence, this graph is a semi-Euler graph.

Important Notes:

When we learn the concept of Euler graph, then there are some points that we should keep in our
minds, which are described as follows:

Note 1:

We have two ways through which we can check whether any graph is Euler or not. We can use any of
the ways to do this, which are described as follows:

o If there is a connected graph with an Euler circuit, then that type of graph will be an Euler
graph.
o If there is an even degree for all the vertices of the graph, then that type of graph will be an
Euler graph.

Note 2:

We can use the following way through which we can check whether any graph has an Euler circuit:

o We will check whether all the vertices of a graph have an even degree.
o If it contains an even degree, then that type of graph will be an Euler circuit. Otherwise, it will
not be an Euler circuit.

Note 3:

We can use the following way through which we can check whether any graph is a Semi Euler graph:
o We will check whether there is a connected graph that has an Euler circuit.
o If it is a connected graph and it has an Euler circuit, then that type of graph will be a Semi
Euler Graph. Otherwise, it will not be a Semi Euler graph.

Note 4:

We can use the following way through which we can check whether any graph has an Euler trail:

o We will check whether more than two vertices of the graph contain the odd degree.
o If more than two vertices contain an odd degree, then that type of graph will be an Euler Trail.
Otherwise, it will not be an Euler trail.

Note 5:

o If there is a connected graph, which contains an Euler circuit, then that graph will also contain
an Euler trail.
o If there is a connected graph, which contains an Euler trail, then that graph may or may not
have an Euler circuit.

Note 6:

o If there is an Euler graph, then that graph will surely be a Semi Euler graph.
o But it is compulsory that a semi-Euler graph is also an Euler graph.

Example of Euler Graph:

There are a lot of examples of the Euler graphs, and some of them are described as follows:

Example 1: In the following graph, we have 6 nodes. Now we have to determine whether this graph is
an Euler graph.

Solution:

If the above graph contains the Euler circuit, then it will be an Euler Graph. A graph will contain an
Euler circuit if the starting vertex and end vertex are the same, and this graph visits each and every
edge only once. So when we begin our path from vertex A and then go to vertices E, D, F, B, C, D and
then A in this process, the starting and end vertex are the same. This path covers all the edges only
once and contains the repeated vertex. So this graph contains the Euler circuit. Hence, it is an Euler
Graph.
Hamiltonian Graph in Discrete mathematics
The graph will be known as a Hamiltonian graph if there is a closed walk in a connected graph, which
passes each and every vertex of the graph exactly once except the root vertex or starting vertex. The
Hamiltonian walk must not repeat any edge. One more definition of a Hamiltonian graph says a graph
will be known as a Hamiltonian graph if there is a connected graph, which contains a Hamiltonian
circuit. The vertex of a graph is a set of points, which are interconnected with the set of lines, and
these lines are known as edges. The example of a Hamiltonian graph is described as follows:

o In the above graph, there is a closed walk ABCDEFA.


o Except for the starting vertex, it passed through every vertex of the graph exactly once.
o At the time of walk, the edges are not repeating.
o Due to all the reasons, we can say that this graph is a Hamiltonian graph.

In other words, we can say that the above graph contains a Hamiltonian circuit. That's why this graph
is a Hamiltonian graph.

Hamiltonian Path

In a connected graph, if there is a walk that passes each and every vertex of a graph only once, this
walk will be known as the Hamiltonian path. In this walk, the edges should not be repeated. There is
one more definition to describe the Hamiltonian path: if a connected graph contains a Path with all
the vertices of the graph, this type of path will be known as the Hamiltonian path.

Examples of Hamiltonian Path

There are a lot of examples of the Hamiltonian path, which are described as follows:

Example 1: In the following graph, we have 5 nodes. Now we have to determine whether this graph
contains a Hamiltonian path.

Solution:

In the above graph, we can see that when we start from A, then we can go to B, C, D, and then E. So
this is the path that contains all the vertices (A, B, C, D, and E) only once, and there is no repeating
edge. That's why we can say that this graph has a Hamiltonian path, which is described as follows:
Hamiltonian path = ABCDE

Hamiltonian Circuit

In a connected graph, if there is a walk that passes each and every vertex of the graph only once and
after completing the walk, return to the starting vertex, then this type of walk will be known as a
Hamiltonian circuit. For the Hamiltonian circuit, there must be no repeated edges. We can also be
called Hamiltonian circuit as the Hamiltonian cycle.

There are some more definitions of the Hamiltonian circuit, which are described as follows:

o If there is a Hamiltonian path that begins and ends at the same vertex, then this type of cycle
will be known as a Hamiltonian circuit.
o In the connected graph, if there is a cycle with all the vertices of the graph, this type of cycle
will be known as a Hamiltonian circuit.
o A closed Hamiltonian path will also be known as a Hamiltonian circuit.

Examples of Hamiltonian Circuit

There are a lot of examples of the Hamiltonian circuit, which are described as follows:

Example 1: In the following graph, we have 5 nodes. Now we have to determine whether this graph
contains a Hamiltonian circuit.

Solution: =The above graph contains the Hamiltonian circuit if there is a path that starts and ends at
the same vertex. So when we start from the A, then we can go to B, C, E, D, and then A. So this is the
path that contains all the vertices (A, B, C, D, and E) only once, except the starting vertex, and there is
no repeating edge. That's why we can say that this graph has a Hamiltonian circuit, which is described
as follows:

Hamiltonian circuit = ABCDEA

Important Points
o If we remove one of the edges of Hamiltonian path, then it will be converted into any
Hamiltonian circuit.
o If any graph has a Hamiltonian circuit, then it will also have the Hamiltonian path. But the vice
versa in this case will not be possible.
o A graph can contain more than one Hamiltonian path and Hamiltonian circuit.

Examples of Hamiltonian Graph:

There are a lot of examples of the Hamiltonian graphs, which are described as follows:
Example 1: In the following graph, we have 6 nodes. Now we have to determine whether this graph is
a Hamiltonian graph.

Solution: This graph does not contain a Hamiltonian path because when we start from A, then we can
go to vertices B, C, D, and E, but in this path, vertex F is not covered. This graph does not contain the
Hamiltonian circuit also because when we start from A, we can go to vertices B, C, D, E, and A. This
can make a circuit, but vertex F is also not covered in this path. So we can say that this graph does
not contain a Hamiltonian path and Hamiltonian circuit. Hence, this graph is not a Hamiltonian Graph.

Multi-Graph: A graph will be known as a multi-graph if the same sets of vertices contain multiple
edges. In this type of graph, we can form a minimum of one loop or more than one edge. The diagram
of multi-graph is described as follows:

In the above graph, vertices a, b, and c contains more than one edge and does not contain a loop. So
this graph is a multi-graph.

Planer Graph: A graph will be known as the planer graph if it is drawn in a single plane and the
two edges of this graph do not cross each other. In this graph, all the nodes and edges can be drawn
in a plane. The diagram of a planer graph is described as follows:

In the above graph, there is no edge which is crossed to each other, and this graph forms in a single
plane. So this graph is a planer graph.
Graph Coloring Graph coloring problem is to assign colors to certain elements of a
graph subject to certain constraints.
Vertex coloring is the most common graph coloring problem. The problem is, given m colors, find a
way of coloring the vertices of a graph such that no two adjacent vertices are colored using same
color. The other graph coloring problems like Edge Coloring (No vertex is incident to two edges of
same color) and Face Coloring (Geographical Map Coloring) can be transformed into vertex
coloring.
Chromatic Number: The smallest number of colors needed to color a graph G is called its
chromatic number. For example, the following can be colored minimum 2 colors.

Chromatic number of this graph is 2 because in this above diagram we can use to color red and
green .
so chromatic number of this graph is 2 and is denoted x(G) ,means x(G)=2 .
Chromatic number define as the least no of colors needed for coloring the graph .
and types of chromatic number are:
1) Cycle graph
2) planar graphs
3) Complete graphs
4) Bipartite Graphs:
5) Trees
The problem to find chromatic number of a given graph is NP Complete.
The chromatic number is denoted by X(G). Finding the chromatic number for the graph is NP-
complete problem. Graph coloring problem is both, a decision problem as well as an optimization
problem. A decision problem is stated as, “With given M colors and graph G, whether a such color
scheme is possible or not?”.
The optimization problem is stated as, “Given M colors and graph G, find the minimum number of
colors required for graph coloring.” Graph coloring problem is a very interesting problem of graph
theory and it has many diverse applications.
Applications of Graph Coloring:

1) Making Schedule or Time Table


2) Mobile Radio Frequency Assignment
3) Sudoku
4) Register Allocation
5) Bipartite Graphs: We can check if a graph is Bipartite or not by coloring the graph using two
colors. If a given graph is 2-colorable, then it is Bipartite, otherwise not.
6) Map Coloring: Geographical maps of countries or states where no two adjacent cities cannot be
assigned same color. Four colors are sufficient to color any map

Coverings
A graph covering of a graph G is a sub-graph of G which contains either all the vertices or all the
edges corresponding to some other graph.

A sub-graph which contains all the vertices is called a line/edge covering. A sub-graph which contains
all the edges is called a vertex covering.

1. Edge Covering

A set of edges which covers all the vertices of a graph G, is called a line cover or edge cover of G.

Edge covering does not exist if and only if G has an isolated vertex.

Edge covering of graph G with n vertices has at least n/2 edges.

Example

In the above graph, the red edges represent the edges in the edge cover of the graph.

Minimal Line covering

A line covering M of a graph G is said to be minimal line cover if no edge can be deleted from M.

Or minimal edge cover is an edge cover of graph G that is not a proper subset of any other edge
cover.

No minimal line covering contains a cycle.

Example

From the above graph, the sub-graph having edge covering are:

M1 = {{a, b}, {c, d}}


M2 = {{a, d}, {b, c}}
M3 = {{a, b}, {b, c}, {b, d}}
M4 = {{a, b}, {b, c}, {c, d}}

Here, M1, M2, M3 are minimal line coverings, but M4 is not because we can delete {b, c}.

Minimum Line Covering

A minimal line covering with minimum number of edges is called a minimum line covering of graph G.
It is also called smallest minimal line covering.

Every minimum edge cover is a minimal edge cove, but the converse does not necessarily exist.

The number of edges in a minimum line covering in G is called the line covering number of G and it is
denoted by α1.

Example

From the above graph, the sub-graph having edge covering are:

M1 = {{a, b}, {c, d}}


M2 = {{a, d}, {b, c}}
M3 = {{a, b}, {b, c}, {b, d}}
M4 = {{a, b}, {b, c}, {c, d}}

In the above example, M1 and M2 are the minimum edge covering of G and α1 = 2.

2. Vertex Covering

A set of vertices which covers all the nodes/vertices of a graph G, is called a vertex cover for G.

Example

In the above example, each red marked vertex is the vertex cover of graph. Here, the set of all red
vertices in each graph touches every edge in the graph.

Minimal Vertex Covering

A vertex M of graph G is said to be minimal vertex covering if no vertex can be deleted from M.

Example
The sub- graphs that can be derived from the above graph are:

M1 = {b, c}
M2 = {a, b, c}
M3 = {b, c, d}

Here, M1 and M2 are minimal vertex coverings, but in M3 vertex 'd' can be deleted.

Minimum Vertex Covering

A minimal vertex covering is called when minimum number of vertices are covered in a graph G. It is
also called smallest minimal vertex covering.

The number of vertices in a minimum vertex covering in a graph G is called the vertex covering
number of G and it is denoted by α2.

Example 1

In the above graphs, the vertices in the minimum vertex covered are red.

α2 = 3 for first graph.

And α2 = 4 for the second graph.

Example 2

The sub- graphs that can be derived from the above graph are:

M1 = {b, c}
M2 = {a, b, c}
M3 = {b, c, d}

Here, M1 is a minimum vertex cover of G, as it has only two vertices. Therefore, α2 = 2

Algorithms for Spanning Trees:-


A spanning tree of a connected undirected graph G is a tree that minimally includes all of the vertices
of G. A graph may have many spanning trees.
Example

Minimum Spanning Tree

A spanning tree with assigned weight less than or equal to the weight of every possible spanning tree
of a weighted, connected and undirected graph G, it is called minimum spanning tree (MST). The
weight of a spanning tree is the sum of all the weights assigned to each edge of the spanning tree.
Example

Kruskal's Algorithm
Kruskal's algorithm is a greedy algorithm that finds a minimum spanning tree for a connected
weighted graph. It finds a tree of that graph which includes every vertex and the total weight of all the
edges in the tree is less than or equal to every possible spanning tree.
Algorithm
Step 1 − Arrange all the edges of the given graph G(V,E) in ascending order as per their edge weight.
Step 2 − Choose the smallest weighted edge from the graph and check if it forms a cycle with the
spanning tree formed so far.
Step 3 − If there is no cycle, include this edge to the spanning tree else discard it.
Step 4 − Repeat Step 2 and Step 3 until (V−1) number of edges are left in the spanning tree.
Prim's Algorithm
Prim's algorithm, discovered in 1930 by mathematicians, Vojtech Jarnik and Robert C. Prim, is a
greedy algorithm that finds a minimum spanning tree for a connected weighted graph. It finds a tree
of that graph which includes every vertex and the total weight of all the edges in the tree is less than
or equal to every possible spanning tree. Prim’s algorithm is faster on dense graphs.
Algorithm
 Initialize the minimal spanning tree with a single vertex, randomly chosen from the graph.
 Repeat steps 3 and 4 until all the vertices are included in the tree.
 Select an edge that connects the tree with a vertex not yet in the tree, so that the weight of the
edge is minimal and inclusion of the edge does not form a cycle.
 Add the selected edge and the vertex that it connects to the tree.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy