Advanced Bank Managment
Advanced Bank Managment
Advanced Bank Managment
This book is produced in DAISY text for people with
bona fide print disabilities by the National
Association for the Blind, Delhi state branch (NAB).
This book is created for special distribution for the
print disabled in accordance with Section 52 (1) (zb)
of the "Copyright Act of India 1957 as amended in
• To develop professionally qualified and competent bankers and finance professionals primarily through
a process of education, training, examination, consultancy/counselling and continuing professional
development programs.
• To be premier Institute for developing and nurturing competent professionals in banking and finance
• To facilitate study of theory and practice of banking and finance.
• To test and certify attainment of competence in the profession of banking and finance.
• To collect, analyse and provide information needed by professionals in banking and finance.
• To promote continuous professional development.
• To promote and undertake research relating to Operations, Products, Instruments, Processes, etc., in
banking and finance and to encourage innovation and creativity among finance professionals so that
they could face competition and succeed.
(This book has been published by Indian Institute of Banking & Finance. Permission of
the Institute is essential for reproduction of any portion of this book. The views expressed
herein are not necessarily the views of the Institute.)
All rights reserved. No part of this publication may be reproduced ortransmitted, in any
form or by any means, without permission. Any person who does any unauthorised act
in relation to this publication may be liable to criminal prosecution and civil claims for
ISBN: 978-93-5666-027-4
“This book is meant for educational and learning purposes. The author(s) of the book has/have taken all reasonable
care to ensure that the contents of the book do not violate any copyright or other intellectual property rights of any person
in any manner whatsoever. In the event the author(s) has/have been unable to track any source and if any copyright has
been inadvertently infringed, please notify the publisher in writing for any corrective action.”
Formal education will make you a living; self-education will make you a fortune.
-Jim Rohn
The banking sector, currently, is experiencing a transformation catalysed by digitalization and information
explosion with the customer as the focal point. Besides, competition from NBFCs, FinTechs, changing
business models, growing importance of risk and compliance, along with disruptive technologies, have
contributed to this radical shift. Such an ever-evolving ecosystem requires strategic agility and constant
upgradation of skill levels on the part of the Banking & Finance professionals to chart a clear pathway
for their professional development.
The mission of the Indian Institute of Banking & Finance is to develop professionally qualified
and competent bankers and finance executives primarily through a process of education, training,
examination, counseling and continuing professional development programs. In line with the Mission,
the Institute has been offering a bouquet of courses and certifications for capacity building of the
banking personnel.
The flagship courses/examinations offered by the Institute are the JAIIB, CAIIB and the Diploma in
Banking & Finance (DB&F) which have gained wide recognition among banks and financial institutions.
With banking witnessing tectonic shifts, there was an imperative need to revisit the existing syllabi for
the flagship courses.
The pivotal point for revising the syllabi was to ensure that, in addition to acquiring basic knowledge,
the candidates develop concept-based skills in line with the developments happening in the financial
ecosystem and to ensure greater value addition to the flagship courses and to make them more practical
and contemporary. This will culminate in creating a rich pool of knowledgeable and competent
banking & finance professionals who are capable of contributing to the sustainable growth of their
Keeping in view the above objectives, the Institute had constituted a high-level Syllabi Revision
Committee comprising of members from public sector banks, private sector banks, co-operative banks
and academicians. On the basis of the feedback received from various banks and changes suggested by
the Committee, the syllabi of JAIIB & CAIIB have since been finalized.
The revised CAIIB syllabi will now have four compulsory subjects and one elective subject to be chosen
from the five elective subjects. The subjects under the revised CAIIB Syllabi are:
1. Advanced Bank Management
2. Bank Financial Management
3. Advanced Business & Financial Management
4. Banking Regulations and Business Laws
1. Risk Management
2. Information Technology & Digital Banking
3. Central Banking
4. Human Resources Management
5. Rural Banking
A new module on Compliance has been introduced in Advanced Bank Management with Compliance,
Corporate Governance and Audit becoming the focal point for a resilient banking system. New units
covering Risks in Foreign Trade, GIFT-city etc have been added in the Bank Financial Management.
The new subject on Advanced Business & Financial Management will cover the management principles,
the advanced concepts of Financial Management and emerging business solutions including Green
Finance and Sustainable Financing.
The subject Banking Regulations & Business Laws (BRBL) is designed to familiarise the professionals
with various laws concerning banking and finance with increased focus on case laws, court judgements
covering different areas of banking and finance.
The elective subjects on Risk Management, Information Technology & Digital Banking and Rural Banking
have also been thoroughly revised and will include new units to make the courses more contemporary.
Insofar as the electives on Central Banking and Human Resources Management are concerned, new
modules on NBFCs and Emerging Scenarios in HRM have been introduced respectively.
As is the practice followed by the Institute, a dedicated courseware for every paper/subject is published.
The present courseware on Advanced Bank Management has now been authored in line with the revised
syllabus for the subject. The book follows the same modular approach adopted by the Institute in the
earlier editions/publications.
While the Institute is committed to revise and update the courseware from time to time, the book should,
however, not be considered as the only source of information I reading material while preparing for
the examinations due to rapid changes being witnessed in all the areas concerning banking & finance.
The students have to keep themselves abreast with the current developments by referring to economic
newspapers/journals, articles, books and Government / Regulators’ publications I websites etc. Questions
will be based on the recent developments related to the syllabus.
Considering that the courseware cannot be published frequently, the Institute will continue the practice
of keeping candidates informed about the latest developments by placing important updates/Master
Circulars/ Master Directions on its website and through publications like IIBF Vision, Bank Quest, etc.
The courseware has been updated with the help of Subject Matter Experts (SMEs) drawn from respective
fields and vetted by practitioners to ensure accuracy and correctness. The Institute acknowledges with
gratitude the valuable contributions rendered by the SMEs in updating/vetting the courseware.
We welcome suggestions for improvement of the courseware.
Mumbai Biswa Ketan Das
2023 Chief Executive Officer
The Institute has prepared comprehensive courseware in the form of study kits to facilitate
preparation for the examination without intervention of the teacher. An attempt has been made
to cover fully the syllabus prescribed for each module/subject and the presentation of topics may
not always be in the same sequence as given in the syllabus.
Candidates are also expected to take note of all the latest developments relating to the subject
covered in the syllabus by referring to Financial Papers, Economic Journals, Latest Books and
Publications in the subjects concerned.
Definition of Statistics, Importance & Limitations & Data Collection, Classification & Tabulation
Importance of Statistics; Functions of Statistics; Limitation or Demerits of Statistics; Definitions;
Collection of Data; Classification and Tabulation; Frequency Distribution
Sampling Techniques
Random Sampling; Sampling Distributions; Sampling from Normal Populations; Sampling from Non
Normal Populations; Central Limit Theorem; Finite Population Multiplier
Measures of Central Tendency & Dispersion, Skewness, Kurtosis
Arithmetic Mean; Combined Arithmetic Mean; Geometric Mean; Harmonic Mean; Median and Quartiles;
Mode; Introduction to Measures of Dispersion; Range and Coefficient of Range; Quartile Deviation
and Coefficient of Quartile Deviation; Standard Deviation and Coefficient of Variation; Skewness and
Correlation and Regression
Scatter Diagrams; Correlation; Regression; Standard Error of Estimate
Time Series
Variations in Time Series; Trend Analysis; Cyclical Variation; Seasonal Variation; Irregular Variation;
Forecasting Techniques
Theory of Probability
Mathematical Definition of Probability; Conditional Probability; Random Variable; Probability Distribution of
Random Variable; Expectation and Standard Deviation of Random Variable; Binomial Distribution; Poisson
Distribution; Normal Distribution; Credit Risk; Value at Risk (VaR); Option Valuation
Estimates; Estimator and Estimates; Point Estimates; Interval Estimates; Interval Estimates and
Confidence Intervals; Interval Estimates of the Mean from Large Samples; Interval Estimates of the
Proportion from Large Samples
Linear Programming
Graphic Approach; Simplex Method
Simulation Exercise; Simulation Methodology
Term Loans
Important Points about Term Loans; Deferred Payment Guarantees (DPGs); Difference between Term
Loan Appraisal and Project Appraisal; Project Appraisal; Appraisal and Financing of Infrastructure
Credit Delivery and Straight Through Processing
Documentation; Third-Party Guarantees; Charge over Securities; Possession of Security; Disbursal of
Loans; Lending under Consortium/Multiple Banking Arrangements; Syndication of Loans; Straight-
Through Loan Processing or Credit Underwriting Engines
Credit Control and Monitoring
Importance and Purpose; Available Tools for Credit Monitoring/Loan Review Mechanism (LRM)
Risk Management and Credit Rating
Meaning of Credit Risk; Factors Affecting Credit Risk; Steps taken to Mitigate Credit Risks; Credit
Ratings; Internal and External Ratings; Methodology of Credit Rating; Use of Credit Derivatives for
Risk Management; RBI guidelines on Credit Risk Management; Credit Information System
Restructuring/Rehabilitation and Recovery
Credit Default/Stressed Assets/NPAs; Wilful Defaulters; Non-cooperative borrowers; Options Available
to Banks for Stressed Assets; RBI Guidelines on Restructuring of Advances by Banks; Available
Frameworks for Restructuring of Assets; Sale of Financial Assets
Resolution of Stressed Assets under Insolvency and Bankruptcy Code 2016
Definition of Insolvency and Bankruptcy; To Whom the Code is Applicable; Legal Elements of the Code;
Paradigm Shift; Corporate Insolvency Resolution Process; Liquidation process; Pre-packed Insolvency
Resolution Process for stressed MSMEs
Foreword v
1. Definition of Statistics, Importance & Limitations &
Data Collection, Classification & Tabulation 3
2. Sampling Techniques 17
3. Measures of Central Tendency & Dispersion, Skewness, Kurtosis 46
4. Correlation and Regression 71
5. Time Series 86
6. Theory of Probability 105
7. Estimation 134
8. Linear Programming 151
9. Simulation 161
After studying this unit, you will be able to:
• Develop an understanding to use the proper methods to collect the data, employ the correct analyses,
and effectively present the results.
• Familiarise with Data Management techniques by using Classification, Tabulation and
• As Statistics is a crucial process behind how we make discoveries in science, make decisions based
on data, and make predictions, collecting correct data and present it properly and making it ready
for analysis and interpretation is very important. From this chapter, you will understand correctly
getting the data ready to be analysed and interpreted.
The word6Statistics’ has been derived from the Latin word ‘statisticum’, Italian word ‘statistia’ and German
word ‘ statistik’, each of which means a group of numbers or figures that represent some information of
human interest. It was first used by professor Achenwell in 1749 to refer to the subject-matter as a whole.
Achenwell defined statistics as the political science of many countries.
In the early years statistics is to be used only by the kings to collect facts about the state, revenue of the
state or the people in the state of administrative or political purpose.
Gradually the use of statistics which means data or information has increased and widened. It is now used
in almost in all the fields of human knowledge and skills like Business, Commerce, Economics, Social
Sciences, Politics, Planning, Medicine and other sciences, physical as well as natural.
In many practical situations in life, we come across different types of data which are needed to be
understood, analysed, compared and interpreted correctly. For example, in a college we need to analyse
the data of marks obtained, in a hospital we need to analyse the data of number of patients having different
diseases, rate of mortality, Different types of data need to be analysed in Economics, Government and
Private organisations, Sports and in many other fields. Data mean information, which can be of two
types - Qualitative and Quantitative. Statistics means quantitative or numerical data, which can be used
for further calculations.
Statistical analysis of data can be comprised of four distinct phases:
1. Collection of data: In this first stage of investigation, numerical data is collected from different
published or unpublished sources, primary or secondary.
2. Classification and Tabulation of data: The raw data collected is to be represented properly for
further calculations. The raw data is divided into different groups or classes and represented in a
form of a table.
3. Analysis of data: Classified and Tabulated data is analysed using different formulas and methods
according to purpose of the study or investigation.
4. Interpretation of data: At the final stage, relevant conclusions are drawn after the data is thoroughly
1.2.2 Medical
Statistics have extensive application in clinical research and medical field. Clinical research
involves investigating proposed medical treatments, assessing the relative benefits of competing therapies,
and establishing optimal treatment combinations.
1.2.5 Bank
In banking industry, credit policies are decided based on statistical analysis of profitability, demand
deposits, time deposits, credit ratio, number of customers and many other ratios. The credit policies are
based on the application of probability theory.
1.2,6 Sports
Players use statistics to identify or rectify their mistakes.
A proper understanding of the statistics determines the success of a team or a single athlete.
It is the entire collection of observations (person, animal, plant or things which is actually studied by a
researcher) from which we may collect data. It is the entire group we are interested in and from which
we need to draw conclusions.
1. If we are studying the weight of adult men in India, the population is the set of weights of all men in
2. If we are studying the grade point average of students of Mumbai University, the population is the
set of GPA’s of all students of Mumbai University.
Sample: Sometimes the population from which we need to draw conclusion is too large to study. At
times collecting data from too large a population becomes time-consuming and expensive. To save time
and money, generally a part of population is selected for study. A sample is a part (a group of units) of
population which is representative of the actual population. By studying the sample, it is expected that
valid conclusions are drawn about the whole group.
Example: The population for a study of infant health might be all children bom in India in one particular
year. The sample might be all babies bom on one particular day in that year.
Data can be classified into two types, based on their characteristics. They are:
1. Variates
2. Attributes
A characteristic that varies from one individual to another and can be expressed in numerical terms is
called variate.
Example: Prices of a given commodity, wages of workers, heights and weights of students in a class,
marks of students, etc.
A characteristic that varies from one individual to another but can’t be expressed in numerical terms is
called an attribute.
Example: Colour of the ball (black, blue, green, etc.), religion of human, etc.
Quantitative or Numerical variables can be further classified as discrete and continuous. A variate which
takes discrete or distinct value or in other words can take only a countable and usually finite number of
values is called Discrete Variable.
Example: Number of members in a family, Number of accidents, Age in years.
A variate that can take any value within a range (integral/fractional) is called Continuous Variable.
Example: Percentage of marks, Height, Weight.
A parameter is a numerical value or function of the observations of the entire population being studied.
A parameter is usually an unknown value that is fixed.
Example: Population mean, population median, population standard deviation, etc.
Since parameter is unknown, it has to be calculated or estimated from a sample. Statistic is used to
estimate parameter. A statistic is a quantity or function of the observations of the sample of data. It
is used to give information about unknown values in the corresponding population. For example, the
sample mean is used to estimate the parameter population mean. Statistic is also called Estimator.
Types of Classification
1. If we classify observed data for a single characteristic, it is known as One-way Classification. Ex:
Population can be classified by Religion - Hindu, Muslim, Christians, etc.
2. If we consider two characteristics at a time to classify the observed data, it is known as a Two-way
classification. Ex: Population can be classified according to Religion and sex.
3. If we consider more than two characteristics at a time in order to classify the observed data, it is
known as Multi-way Classification. Ex: Population can be classified by Religion, sex and literacy.
2. Continuous Frequency Distribution: Variable takes values which are expressed in class intervals
within certain limits.
Problem 2: Marks obtained by 20 students in an exam for 50 marks are given below-convert the
data into continuous frequency distribution form.
18, 23, 28, 29, 44, 28, 48, 33, 32, 43, 24, 29, 32, 39, 49, 42, 27, 33, 28, 29.
Problem 3: Following data reveals information about the number of children per family for 25
families. Prepare frequency distribution of number of children (say variable x, taking distinct values
0, 1,2, 3,4).
3 2 112
4 0 12 3
1 2 0 4 2
2 12 3 2
1 3 4 0 1
Solution: Frequency distribution of number of children in 25 families
Problem 4: For the following frequency distribution, prepare cumulative frequency distribution of less
than and greater than type
1 2 3 4 5
6 4 44 + 4 = 48 4
Total 48
Problem 5: Following is the marks of 50 students. Prepare cumulative frequency distribution of both
the types. Also find Relative Frequencies.
Problem 6: For the following frequency distribution, obtain cumulative frequencies, relative frequencies
and relative cumulative frequencies.
Skill Men Women Total
Year Skilled Unskilled Total Skilled Unskilled Total Skilled Unskilled Total
2020 2250 450 2750 50 250 300 2300 700 3000
2021 2500 460 2960 250 300 550 2750 760 3510
Problem 9: Out of total number of 2000 candidates interviewed for employment in a company, 628
were from Pune and the rest from Nashik. Amongst the graduate from Pune, 350 were experienced and
80 were unexperienced. While the corresponding figures for undergraduates from Nashik were 615 and
52 respectively. The total number of in experienced candidates from Pune and Nashik were 175 and 192
Present the above information in a suitable tabular form.
Solution: Distribution of candidate from Pune and Nashik according to Education and Experience
1. The data of some worker’s salary are given as 2300, 2400, 2500, 2100, 2000, 2000, 2300, 2800,
3000, 2300, 2700, 2400, 2500. If desired number of class intervals is 10, class width is
(a) 100
(b) 200
(c) 300
(d) 400
2. The largest and smallest values of a data are 60 and 40 respectively. If desired number of class
intervals is 5, class width is
(a) 25
(b) 20
(c) 4
(d) 5
3. The class intervals where upper and lower limits are also in the class interval are called
(a) Exclusive type
(b) Inclusive type
(c) Discrete type
(d) Continuous type
4. The type of cumulative frequencies where the frequencies are added starting from the highest class
to the lowest class are called______________
(a) Relative Frequency
(b) Percentage Frequency
(c) Less than Cumulative Frequency
(d) Greater than Cumulative Frequency
5. The data classification, which is based on variables, like, demand, supply, height and weight is
considered as
(a) Qualitative data
(b) Quantitative data
(c) Time series data
(d) Discrete data
6. The data which is classified or arranged by their time of occurrence, such as years, months, weeks,
days, etc., is called
(a) Time series data
(b) Geographical data
(c) Historical data
(d) Both (a) and (c)
l. 2. (c); 3. (b); 4. (d); 5. (b); 6. (d)
Sampling Techniques
2.0. Objectives
2.1. Introduction
2.2. Random Sampling
2.3. Sampling Distributions
2.4. Sampling from Normal Populations
2.5. Sampling from Non-Normal Populations
2.6. Central Limit Theorem
2.7. Finite Population Multiplier
Keywords/Gloss ary
Annexure - Probability Table
The objectives of this unit are as follows:
• Learn to take a sample from an entire population and use it to describe the population
• Make sure the samples represent the population.
• Introduce the concepts of sampling distributions
• Understand the tradeoff between costs of larger samples and accuracy
• Introduce experimental design - sampling procedures.
• Estimation - data analysis and interpretation of sample data
• Testing of hypotheses - one-sample data
• Testing of hypotheses - two-sample data
As you know, statistics is a tool used in business and finance. Statistics is an appreciated and maligned
tool, depending on how it is used. We need statistical methods to reduce risk and uncertainty and improve
our decision-making skills. Decision making in a situation involves collecting data and then using it for
the future strategy. Usually. We cannot use the entire data because of the sheer size or numbers. Therefore,
we take a sample and test it. For example, if a milk plant processes 1 lakh litres of milk every day, one
cannot break open each packet and test the milk for quality. Here we take samples from each batch. If
we want to do a market survey, we cannot interview each and every household. We take a representative
sample. Thus, sampling becomes an integral tool of the quantitative methods we use. We take a sample,
collect data from the sample, and attempt to generalise the whole data results.
Tea tasters at tea auctions are very highly paid employees of tea companies. They sample a small portion
of the tea produced from the plantation before the auction. Food products are often tasted before being
sold. Before buying Diwali sweets, you may take a small bite before buying it. Obviously, everything
cannot be opened and tasted or tested as there would be nothing left to sell. We have to select a sample
and test that only. If you want to write a report on why many people are migrating from India to Canada
or Australia, contacting each Indian who migrated would be time consuming and expensive. So you
would choose a sample and make a report accordingly. Thus, time and size are decisive factors which
make it necessary to take business decisions on a sample. If your sample is properly chosen, your report
would truly reflect the reasons of the entire population for migration. Of course, the best results would
be available in any situation if we collected data from the entire population. Such complete enumeration
or census is used in the population census carried out in our country every ten years.
Statisticians use the word population to refer not only to people but to all items that are to be studied.
A sample is a part or subset of the population selected to represent the entire group. The word sample is
used to describe a portion chosen from the population, in other words.
We can describe samples and populations using mean, median, mode, and standard deviation measures.
When these terms describe a sample, they are called startsricand are not from the population but estimated
from the sample. When these terms describe a population, they are called parameters.
A statistic is a characteristic of a sample; a parameter is a population characteristic.
Conventionally, statisticians use lower case Roman letters to denote sample statistics and Greek or Capital
letters to denote population parameters.
Table 2.1 lists these symbols.
Population Sample
Definition Collection of all items Part of the population
Characteristics Parameters Statistics
Symbols Size - N Size - n
Mean - p Mean - X
Standard Standard
Deviation - a Deviation - q
Types of sampling
The process of selecting respondents is known as "sampling.9 The units under study are called
sampling units, and the number of units in a sample is called a sample size. There are two
methods of selecting samples from populations: non-random or judgement sampling and random or
probability sampling. In probability sampling, all the items in the population have a chance of being
chosen in the sample. In judgement sampling, personal knowledge or opinions are used to identify
the items from the population that are to be included in the sample. A sample selected by judgement
sampling is based on someone’s experience with the population. An oil drilling company would ask
an experienced geologist to test different terrains or land beneath the sea before deciding where to
explore for oil. Sometimes a judgement sample is used as a pilot or trial sample to decide how to
take a random sample later. If we want to launch a new city newspaper, a pilot test of the paper can
be launched at a judgement sample to see the response. But, the rigorous statistical analysis, which
can be done with random probability samples, cannot be done with judgement samples. On the other
hand, they are more convenient and can be used successfully even if we cannot test their validity.
But, if a study uses judgement sampling and loses a significant degree of representativeness, it will
have purchased convenience at a high price.
Biased samples
Suppose the Parliament is debating on the women’s bill. You are asked to conduct an opinion survey.
Because women are the most affected by the women’s bill, you interviewed many women in different
cities, towns and rural areas of India. Then you report that an overwhelming 95 per cent are in favour of
reservation for women in Parliament.
Sometime later, the government has to take up the issue of Foreign Direct Investment (FDI) in print media.
Since newspaper publishers are the most affected, you contact all of them, both national and regional,
in India and report that the majority is not in favour of FDI in print media.
In both these cases, you picked a biased sample by choosing people who would have strong feelings on
this issue. You have to have sound samples.
A report based on the data collected from such a biased sample would not truly reflect public opinion.
If we follow random sampling, it is possible to statistically determine the reliability of the estimates
obtained from the sample to avoid such errors.
The example illustrates a finite population of four teenagers. If we write A, B, C, and D on four identical
slips of paper, fold the papers, and randomly pick any two, we get a sample. While picking up two paper
slips, we may pick up one, keep it away, and pick another from the remaining three. This type is called
sampling without replacement
There is another way of doing it. Suppose after picking the first slip, we note the name on it and put the
slip back in the lot, i.e replace the paper slip. Then we draw the second slip. There is a chance that we
may draw the same student again. This is called sampling with replacement.
Theoretically, it is possible to have an infinite population. For example, the population of all prime
numbers is infinite. Although many populations seem exceedingly large, no truly infinite population of
physical objects actually exists. After all, given unlimited resources and time, one can enumerate any
finite population. As a practical matter, we will use the term infinite population when we are talking about
a population that could not be enumerated in a reasonable period of time. Thus, we use a theoretical
concept of infinite population as an approximation of a large finite population.
Let us see how to use this table. We assign each employee a number from 00 to 99, look up the table
above and pick a systematic method of selecting two-digit numbers, like the first two digits. So, we have
15, 09, and so on till we get our 10 numbers.
Systematic Sampling
In systematic sampling, elements are selected from the population at a consistent level that is measured in
time, order, or space. If we wanted to interview every twentieth student on a college campus, we would
choose a random starting point in the first twenty names in the student directory and then pick every
twentieth name after that.
Systematic sampling differs from simple random sampling in that each element has an equal chance
of being selected, but each sample does not have an equal chance of being selected. This would have
been the case if, in our earlier example, we had assigned numbers between 00 and 99 to our employees
and then began to choose our sample of 10 by picking every tenth number beginning 1, 11,21,31,
and so forth. Employees numbered 2, 3, 4 and 5 would have no chance of being selected together.
In systematic sampling, there is a probability of introducing an error into the sampling process. The system
chosen may cause a problem. If we want to check the chances of people eating out on different days of
the week, choose Friday. There is a higher likelihood of Friday as it is the beginning of the weekend,
and we get a higher result.
Systematic sampling has some advantages. Even though systematic sampling may be inappropriate when
the elements lie in a sequential pattern, this method may require less time and sometimes results in lower
costs than the simple random sample method.
Stratified Sampling
To use stratified sampling, we divide the population into relatively homogenous groups, called strata. Then
we use one of the following two approaches. Either we select at random from each stratum a specified
number of elements corresponding to the proportion of that stratum in the population as a whole or, we
draw an equal number of elements from each stratum and give weight to the results according to the
stratum’s proportion of the total population. With either approach, stratified sampling guarantees that
every element in the population has a chance of being selected.
Cluster Sampling
A well-designed cluster sampling procedure can produce a more precise sample at considerably less cost
than simple random sampling. In cluster sampling, we divide the population into groups or clusters and
then select a random sample of these clusters. We assume that these individual clusters are representative
of the population as a whole. Suppose a market Research team is attempting to determine by sampling
the average number of television sets per household in a large city. They could use a city map and divide
the territory into blocks and then choose a certain number of blocks (clusters) for interviewing. Every
household in each of these blocks would be interviewed.
random sampling. Once you understand the basics of random sampling, the same can be extended to
other samples with some amendments which are best left to professional statisticians. It is important that
you get a grasp of the concepts concerned.
| Boy A B • • c D
height (cm) 160 162 164 170 156
Now, if we take samples of size 3 (that is, select 3 boys in each sample), we will get 10 different samples.
We list these samples, the corresponding data and their mean in Table 2.3.
No 1 2 3 4 5 6 7 8 9 10
data 160 160 160 162 162 160 160 160 162 164
162 162 162 164 164 164 164 170 170 170
164 170 156 170 156 170 156 156 156 156
mean 162 164 159.33 165.33 160.66 164.66 160 162 162.66 163.33
From Table 2.3, you can see that sample mean for each sample is different. This collection of different
values of the sample mean for samples of size 3, forms a distribution of sample means. This distribution
has a mean. If we add all sample means in Table 2.3, and divide the sum by the number of samples, i.e.,
10, we get 162.397 (say, 162.40)
Normally, we will be dealing with large populations. Hence the number of samples of a particular size
is also very large.
Suppose we have to determine the proportion of sugar plants in a plantation affected by pest disease in
samples of 100 plants taken from a very large plantation. We have taken a large number of these 100 item
samples. If we plot a probability distribution of the proportions of infested plants in all these samples,
we would see a distribution of the sample proportion. (The term proportion here refers to the proportion
that is infected.) We could also have a sampling distribution of a proportion.
Sampling distribution is the distribution of all possible values of a statistic from all possible samples
of a particular size drawn from the population.
Each of the above sampling distributions can be partially described by its mean and standard
The areas also show the probabilities under the probability curve shown in Fig. 2.2.
Owner Chetan (C) Dinesh (D) Eswar (E) Feroz(F) George (G)
EFG 6 + 9 + 15 10
DFG 3 + 9 + 15 9
DEG 3 + 6+15 8
DEF 3+6+9 6
CFG 6+6+9 7
CEG 3 + 6 + 15 8
CEF 3+6+9 6
CDF 3+3+9 5
CDE 3+3+9 5
CDG 3 + 3 + 15 8
Total = 72 months
Mean = 7.2 months
Now, look at Fig. 2.3(a), which shows the population distribution of tyre lives for the five motorcycles
owners, a distribution that is anything but normal in shape. In Fig. 2.3(b), we show the sampling distribution
of the mean for a sample size of three, taking the information from Table 2.6. Notice the difference
between the probability distributions in Figs. 2.3(a) and 2.3(b). In Figure 2.3(b), the distribution looks a
little more like the bell shape of the normal distribution.
If we repeat this exercise and enlarge the population size to 40, we could take samples of different sizes.
Then plot the sampling distributions of the mean that would occur for the different sizes. This will show
quite dramatically how quickly the sampling distribution of the mean approaches normality, regardless
of the shape of the population distribution.
0.45 “
0.4 - ♦
0.35 -
0.3 -
0.25 -
0.2 -
0.15 -
0.1 -
0.05 -
0-I---------- 1------ ------ r
1------ ------ 1------ ♦------ 1------ —i
0 2 4 8 10 12 14 16
Standard error of the mean
= Rs. 365.16
Because we are dealing with a sampling distribution, we must now use the equation for z value and the
Standard Normal Probability Distribution (App. Table z = (x - p)/ ct-
For x = Rs. 19750;
~ 365.16
= 2.05
Annexure Table gives us the probability of 0.4798 for az value of 2.05. We show the corresponding area
in Fig. 2.4 as the area between the mean and Rs. 19,750. Since half or 0.5000 of the area under the curve
lies between the mean and the right-hand tail, the shaded area must be
0.5000 (Area between the mean and the right-hand tail)
- 0.4798 (Area between the mean and 19,750)
0.0202 (Area between the right-hand tail and 19,750)
Thus, we have determined that there is slightly more than a 2 per cent chance of average earnings being
more than Rs. 19,750 annually in a group of 30 tellers.
The central limit theorem is one of the most powerful concepts in statistics, which states that the distribution
of sample means tends to be a normal distribution. This is true regardless of the shape of the population
distribution from which the samples were taken.
Result 1: If X ~ Bin(n, p), then Z = ( ? tends to standard Normal Deviation as n —>
Result 2: If X ~ PX), then Z = __ tends to standard Normal Deviation as sample size —> «>
1. A sample of 25 observations from a normal distribution has a mean of 98.6 and a standard deviation
of 17.2.
(a) What is P(92 < x < 102)?
(b) Find the corresponding probability given a sample of 36.
(a) N= 25, p = 98.6,o= 17.2,
ax = o I yl~n~ = 17.2 IV25 = 3.44
P (92 < x < 102) = p[(92 - 98.6)/3.44 < (x - m)/s/ (102 - 98.6)/3.44]
= P (-1.72 < z < 0.99) = 0.4573 + 0.3389 = 0.7962
(b) n = 36, 0^=0/7^ = 17.2/736 = 2.87
P (92 < x < 102) = z?[(92 - 98.6)/2.87 < (x - p,)/o. < (102 - 98.6)/2.87]
2. Kamala, an auditor for a large credit card company, knows that, on average, the monthly balance
of any given customer is Rs. 112, and the standard deviation is Rs. 56. If Kamala audits 50
randomly selected accounts, what is the probability that the sample average monthly balance is
(a) Below Rs. 100?
(b) Between Rs. 100 and Rs. 130?
Solution: The sample size of 50 is large enough to use the central limit theorem
p = 112, o = 56, n = 50, cr? = 56/ V50 = 7.920
(a) P(x < 100) = P[(x -p) /crv< (100-112)/ 7.920]
= P(z < -1.52) = 0.5 - 0.4357 = 0.0643
(b) P(100 < x < 130) = P[(l 00 -112) / 7.920 < (x -p) /o- < (130 -112) / 7.920]
= P(-1.52 < z < 2.27) = 0.4357 + 0.4884 = 0.9241
3. It has been found that 2% of the tools produced by a certain machine are defective. What is the
probability that in a shipment of 400 tools, 3% or more defective?
The sample size of 400 is large enough to use the central limit theorem.
Solution: X ~ P (2 = np = 0.02*400 = 8), then Z = tends to standard Normal Deviation
3% of 400= 12
P(X > 12) = P[(x - 8)/V8 > (12 - 8))/V8 = P([(x - 8)/2.82 > 1.43) = 0.5 - 0.4236 = 0.0764
4. A coin is tossed 700 times. Using Normal approximation find the probability of getting number of
Heads between 280 and 375.
The sample size of 700 is large enough to use the central limit theorem.
z x X-np X-350
Solution: X~ Bin (n = 700, p = 0.5), then Z = ,---- =---------- tends to standard Normal Deviation
yjnpq 13.22
P(280 < X < 375) = P[(280 - 350)/13.22 < (X - 350)/l3.22 < (375 - 350)/l3.22)]
= P[-5.29 < Z < 1.89] = 0.5 + 0.4706 = 0.9706
and the cost of the additional precision they will obtain, from a larger sample, before they commit re
sources to take it.
This equation is designed for situations in which the population is infinite, or in which we sample from
a finite population with replacement (that is, after each item is sampled, it is put back into the population
before the next item is chosen, so that the same item can possibly be chosen more than once).
In fact, many of the populations studied are finite; that is, of stated or limited size. Examples of these
include the employees in a given company, the clients of a city social-services agency, the students in a
specific class, and a day’s production in a given manufacturing plant. So, we need to modify the equation
to deal with finite populations. The formula designed to find the standard error of the mean, when the
population is finite, and we sample without replacement, is
o N-n
n N-\
75 [20-5
’ x/5[V20-l
= (33.54)(0.888)
= 29.8 Standard error of the mean of a finite population
In this example, a finite population multiplier of 0.888 reduced the standard error from 33.54 to 29.8.
In cases in which the population is very large in relation to the size of the sample, this finite population
multiplier is close to 1 and has little effect on the calculation of the standard error. Say, that we have a
population of 1,000 items and that we have taken a sample of 20 items. If we use the Equation to calcu
late the finite population multiplier, the result would be
Finite population multiplier = j
” V 1000-1
’ ^999
= 0.99
Using this multiplier of 0.99 would have little effect on the calculation of the standard error of the mean.
This last example shows that when we sample a small fraction of the entire population (that is, when the
population size N is very large relative to the sample size n), the finite population multiplier takes on a
value close to 1.0. Statisticians refer to the fraction n/Nas the sampling fraction, because it is the fraction
of the population TV that is contained in the sample.
When the sampling fraction is small, the standard error of the mean for finite populations is so close to
the standard error of the mean for infinite populations that we might as well use the same formula for
both, namely,
Equation ct- = ct / Vw
The generally accepted rule is: When the sampling fraction is less than 0.05, the finite population
multiplier need not be used.
When we use the above equation, s is constant, and so the measure of sampling precision sx depends only
on the sample size n and not on the proportion of the population sampled. That is, to make sx smaller, it is
necessary only to make n larger. Thus, it turns out that it is the absolute size of the sample that determines
sampling precision, not the fraction of the population sampled.
Although the law of diminishing return comes from economics, it has a definite place in statistics too.
It says that there is diminishing return in sampling. Although sampling more items will decrease the
standard error, (the standard deviation of the distribution of sample means) the increased precision
may not be worth the cost. Because n is in the denominator, when we increase it (take larger samples)
the standard error sx decreases. In our example, when we increased the sample size from 10 to 100 (a
tenfold increase) the standard error fell only from 31.63 to 10 (about a two-thirds decrease). Maybe it
wasn’t smart to spend so much money increasing the sample size to get this result. That’s exactly why
statisticians (and smart managers) focus on the concept of the “right” sample size. Some finite populations
are so large that they are treated as if they were infinite. An example of this would be the number of TV
households in our country.
Census The measurement or examination of every element in the population.
Sample: A portion of the elements in a population chosen for direct examination or measurement.
Strata: Groups within a population formed in such a way that each group is relatively homogeneous, but
wider variability exists among the separate groups.
Clusters: Groups, in population, that are similar to each other, although the groups themselves have wide
internal variation.
Random or probability sampling: A method of selecting a sample from a population in which all the
items in the population have an equal chance of being chosen in the sample.
Stratified sampling: It is a method of random sampling. The population is divided into homogeneous
groups or strata. Elements within each stratum are selected randomly according to one of two rules.
1. A specified number of elements is drawn from each stratum corresponding to the proportion of that
stratum in the population.
2. Equal numbers of elements are drawn from each stratum, and the results are weighted according to
the stratum’s proportion of the total population. .
Systematic sampling: A method of sampling in which elements to be sampled are selected from the
population at a uniform interval measured in time, order or space.
Cluster sampling: A method of random sampling. The population is divided into groups or clusters of
elements, and then a random sample of these clusters is selected.
Judgment sampling. It is a method of selecting a sample from a population in which personal knowledge
or expertise is used to identify the items from the population that are to be included in the sample.
Statistic: Measures describing the characteristics of a sample.
Parameters: Values that describe the characteristics of a population.
Sampling distribution of the mean A probability distribution of the means of all the possible samples
of a given size, n. from a population.
Sampling distribution of a statistic: For a given population, a probability distribution of all the possible
values a statistic may take on for a given sample size.
Sampling error: Error or variation among sample statistic. Differences between each sample and the
population and among several samples, which are due to the elements we happen to choose for the sample.
Standard error: The standard deviation of the sampling distribution of a statistic.
Standard error of the mean: The standard deviation of the sampling distribution of the mean, a measure
ment the extent to which we expect the means from different samples to vary from the population mean,
owing to the chance error in the sampling process.
Statistical inference: The process of making inferences about populations from information contained in
Central Limit Theorem: The theorem states that the sampling distribution of the mean approaches
normality as the sample size increases, regardless of the shape of the population distribution from which
the sample is selected.
Finite population: A population having a stated or limited size.
Finite population multiplier: A factor used to correct the standard error of the mean for studying a
population of a finite size that is small concerning the size of the sample.
Infinite population: A population in which it is theoretically impossible to observe all the elements.
Sampling with replacement: A sampling procedure in which sampled items are returned to the population
after being picked so that some members of the population can appear in the sample more than once.
Sampling without replacement: A sampling procedure in which sampled items are not returned to the
population after being picked so that no member of the population can appear in the sample more than once.
Use this formula to derive the standard error of the mean when the population is infinite, that is, when
the elements of the population cannot be enumerated in a reasonable period, or when we sample with
replacement. This equation states that the sampling distribution has a standard deviation, which we also call
a standard error, equal to the population standard deviation divided by the square root of the sample size.
Equation 2 z = (x — p) /o -
A modified version of the equation allows us to determine the distance of the sample mean x from the
population mean p, when we divide the difference by the standard error of the mean sf Once we have
derived a z value, we can use the Standard Normal Probability Distribution Table and compute the
probability that the sample mean will be that distance from the population mean. Because of the central
limit theorem, we can use this formula for non-normal distributions if the sample size is at least 30.
o N—n
Equation 3 ar=“/=
dn A-l
This is the formula for finding the standard error of the mean when the population is finite, that is, of
stated or limited size, and the sampling is done without replacement.
In Equation 3, the term = A——which we multiply by the standard error from Equation (1), is called
the finite population multiplier. When the population is small in relation to the size of the sample, the finite
population multiplier reduces the size of the standard error. Any decrease in the standard error increases
the precision with which the sample mean can be used to estimate the population mean.
1. Explain Random Numbers.
2. What is a Sampling distribution?
3. What is sample mean?
4. Explain the Distribution of all sample means.
5. Define Standard Error.
6. We have a population of 10,000, and we wish to sample 20 randomly. Use a random number table
to choose the sample.
7. A population comprises groups that have wide variations within the group and less variation from
group to group. Which is the appropriate type of sampling method?
8. Explain: Sampling allows us to be cost-effective. We have to be careful in choosing representative samples.
9. Suppose you are sampling from a population with a mean of 5.3. What sample size will guarantee that
(a) The sample mean is 5.3?
(b) The standard error of the mean is zero?
10. In a sample of 16 observations from a normal distribution with a mean of 150 and a variance of 256,
what is
(a) P(x< 160)?
(b) P(x> 142)?
If, instead of 16 observations, 9 observations are taken, find
(c) P(x<160)?
(d) P(x> 142)?
11. In a sample of 19 observations from a normal distribution with mean 18 and standard deviation 4.8
(a) What is P(16 <x < 20)?
(b) What is P(16 £x £ 20)?
(c) Suppose the sample size is 48. What is the new probability in part(a)?
12. In a normal distribution with a mean of 56 and standard deviation of 21, how large a sample must
be taken so that there will be at least a 90 per cent chance that its mean is greater than 52?
13. In a normal distribution with a mean of 375 and a standard deviation of 48, how large a sample must be
taken so that there will be at least a 0.95 probability that the sample mean falls between 370 and 380?
14. The average cost of a flat at Powai lake is Rs. 62 lakh, and the standard deviation is Rs. 4.2 lakh.
What is the probability that a flat at this location will cost at least Rs. 65 lakh?
15. State whether the following statements are true or false.
(a) When the items included in a sample are based on the judgement of the individual conducting
the sample, the sample is said to be non-random. True/False
(b) A statistic is a characteristic of a population. True/False
(c) A sampling plan that selects members from a population at uniform intervals in time order or
space is called stratified sampling. True/False
(d) As a general rule, it is not necessary to include a finite population multiplier in computation for
the standard error of the mean when the sample size is greater than 50. True/False
(e) The probability distribution of means of all the possible samples is known as the sample
distribution of the mean. True/False
(f) The principles of simple random sampling are the theoretical foundation for statistical inference.
(g) The standard error of the mean is the standard deviation of the distribution of sample means.
(h) A sampling plan that divides the population into well-defined groups from which random samples
are drawn is known as cluster sampling. True/False
(i) With increasing sample size, the sampling distribution of the mean approaches normality,
regardless of the distribution of the population. True/False
(j) The standard error of the mean decreases with an increase in sample size. True/False
(k) To perform a complete enumeration, one would need to examine every item in a population.
(l) In everyday life, we see many examples of infinite populations of physical objects. True/False
(m) To obtain a theoretical sampling distribution, we consider all the samples of a given size.
(n) Large samples are always a good idea because they decrease the standard error. True/False
(o) If the mean for a certain population were 15, most of the samples we could take from that
population would likely have a mean of 15. True/False
(p) The standard error of a sample statistic is the standard deviation of its sampling distribution.
(q) Judgement sampling has the disadvantage that it may lose some representativeness of a sample.
(r) The sampling fraction compares the size of a sample to the size of the population. True/False
(s) Any sampling distribution can be totally described by its mean and standard deviation.
(t) The precision with which the sample mean can estimate the population mean decreases as the
standard error increases. True/False
UNIT Measures of Central Tendency
3.0 Objectives
3.1 Introduction to Measures of Central Tendency
3.2 Arithmetic Mean
3.3 Combined Arithmetic Mean
3.4 Geometric Mean
3.5 Harmonic Mean
3.6 Median and Quartiles
3.7 Mode
3.8 Introduction to Measures of Dispersion
3.9 Range and Coefficient of Range
3.10 Quartile Deviation and Coefficient of Quartile Deviation
3.11 Standard Deviation and Coefficient of Variation
3.12 Skewness and Kurtosis