Unit+8 (Block 2)

Processing and Presentation of Data
UNIT 8 STATISTICAL DERIVATIVES AND MEASURES OF CENTRAL TENDENCY

STRUCTURE 8.0 Objectives 8.1 Introduction 8.2 Statistical Derivatives
8.2.1 Percentage 8.2.2 Ratio 8.2.3 Rate
8.3 Measures of Central Tendency

8.3.1 8.3.2 8.3.3 8.3.4 8.3.5 8.3.6 Properties of an Ideal Measure of Central Tendency Mean and Weighted Mean Median Mode Choice of a Suitable Average Some Other Measures of Central Tendency
8.4 8.5 8.6 8.7 8.8
Let Us Sum Up Key Words Answers to Self Assessment Exercises Terminal Questions/Exercises Further Reading
8.0 OBJECTIVES
After studying this unit, you should be able to:
l
explain the meaning and use of percentages, ratios and rates for data analysis, discuss the computational aspects involved in working out the statistical derivatives, describe the concept and significance of various measures of central tendency, and compute various measures of central tendency, such as arithmetic mean, weighted mean, median, mode, geometric mean, and hormonic mean.
8.1 INTRODUCTION
In Unit 6 we discussed the method of classifying and tabulating of data. Diagrammatic and graphic presentations are covered in the previous unit (Unit-7). They give some idea about the existing pattern of data. So far no big numerical computation was involved. Quantitative data has to be condensed in a meaningful manner, so that it can be easily understood and interpreted. One of the common methods for condensing the quantitative data is to compute statistical derivatives, such as Percentages, Ratios, Rates, etc. These are simple derivatives. Further, it is necessary to summarise and analyse the data. The first step in that direction is the computation of Central Tendency or Average, which gives a bird's-eye view of the entire data. In this Unit, we will discuss computation of statistical derivatives based on simple calculations. Further, numerical methods for summarizing and describing data measures of Central Tendency are discussed. The purpose is to identify one value, which can be obtained from the data, to represent the entire data set.
5 6
8.2 STATISTICAL DERIVATIVES

Statistical derivatives are the quantities obtained by simple computation from the given data. Though very easy to compute, they often give meaningful insight to the data. Here we discuss three often-used measures: percentage, ratio and rate. These measures point out an existing relationship among factors and thereby help in better interpretation.
Statistical Derivatives and Measures of Central Tendency
8.2.1 Percentage
As we have noted earlier, the frequency distribution may be regarded as simple counting and checking as to how many cases are in each group or class. The relative frequency distribution gives the proportion of cases in individual classes. On multiplication by 100, the percentage frequencies are obtained. Converting to percentages has some advantages - it is now more easily understood and comparison becomes simpler because it standardizes data. Percentages are quite useful in other tables also, and are particularly important in case of bivariate tables. We show one application of percentages below. Let us try to understand the following illustration.
Illustration 1
The following table gives the total number of workers and their categories for all India and major states. Compute meaningful percentages.
Table: Total Workers and Their Categories-India and Major States : 2001 (In thousands)
Sl. No. (1) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. State/ India (2) Jammu & Kashmir Himachal Pradesh Punjab Haryana Rajasthan Uttar Pradesh Bihar Assam West Bengal Orissa Madhya Pradesh Gujarat Maharashtra Andhra Pradesh Karnataka Kerala Tamil Nadu INDIA 6936 740 5114 127628 6209 1654 8665 107448 936 365 1459 16396 9441 7532 12574 151040 23522 10291 27812 402512 5613 12010 7904 4988 11291 13819 382 1046 1570 9386 17706 11573 20369 42053 34865 2099 3046 13167 22173 8192 3742 5613 4238 11059 1499 1276 2529 13605 13528 1290 7351 5001 7381 307 207 651 2886 1087 329 2153 689 1010 5236 3854 7434 15517 5273 4197 14386 4344 6307 9142 8383 23781 54180 28080 9557 29503 14273 25756 1961 93 50 887 2991 Cultivators Agricultural Labourers (4) 249 Household Industry Workers (5) 230 Other Workers (6) 1611 Total Workers (7) 3689
(3) 1600
5 7
Solution: In the table above, the row total gives the total workers of a state/
all India and column total gives the aggregate values of different categories of workers and all workers. Thus, it is possible to compute meaningful percentages from both rows and columns. The row percentages are computed by dividing the figures in columns (3), (4), (5) and (6) by the figure in column (7) and multiplied by 100. The figures are presented in tabular form below. Percentage of cultivators in Jammu & Kashmir is obtained as (1600 3688) 100 which equals 43.37. Similarly other figures are obtained. Table: Percentage of Total Workers and Their Categories-India and Major States : 2001 Sl. State/ No. India (1) 1. 2. 3. 4. 5. 6. 7. 8. 9. (2) Jammu & Kashmir Himachal Pradesh Punjab Haryana Rajasthan Uttar Pradesh Bihar Assam West Bengal 10. Orissa 11. Madhya Pradesh 12. Gujarat 13. Maharashtra 14. Andhra Pradesh 15. Karnataka 16. Kerala 17. Tamil Nadu INDIA 29.70 42.93 27.56 28.56 22.67 29.48 7.19 18.39 31.72 35.04 28.66 24.49 26.85 39.63 26.40 16.07 31.16 26.69 4.82 3.92 1.87 2.49 4.51 3.98 3.55 5.24 4.07 30.44 24.49 46.08 42.10 33.19 40.14 73.19 45.21 37.52 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 Cultivators Agricultural Household Labourers Industry workers (3) 43.37 65.55 22.96 36.34 55.36 40.92 29.17 39.15 19.02 (4) 6.74 3.10 16.40 15.22 10.64 25.11 48.18 13.50 24.92 (5) 6.22 1.68 3.36 2.47 2.74 5.33 3.87 3.44 7.30 Other workers (6) 43.67 29.67 57.28 45.97 31.26 28.64 18.78 43.91 48.76 Total Workers (7) 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
The figures above help in comparing the proportion of workers in different categories across the state and all India. One may read from the table that Kerala has the lowest percentage of cultivators and Bihar the highest percentage of agricultural labourers.
5 8
Self Assessment Exercise A

1) What is a Percentage? .................................................................................................................. .................................................................................................................. .................................................................................................................. .................................................................................................................. 2) From the data given in illustration 1, compute column percentages and interpret it. Why are the totals of these percentages not adding to 100? The table below may be used for computation. Table: State-wise Percentage Share of Total Workers and Categories of Workers in All India: 2001
Sl. State/ No. India (1) 1. 2. 3. 4. 5. 6. 7. 8. 9. (2) Jammu & Kashmir Himachal Pradesh Punjab Haryana Rajasthan Uttar Pradesh Bihar Assam West Bengal Cultivators (3) Agricultural Labourers (4) Household Industry workers (5) Other workers (6) Total Workers (7)
10. Orissa 11. Madhya Pradesh 12. Gujarat 13. Maharashtra 14. Andhra Pradesh 15. Karnataka 16. Kerala
17. Tamil Nadu INDIA 100.00 100.00 100.00 100.00 100.00
8.2.2
Ratio
Another descriptive measure that is commonly used with frequency distribution (it may be used elsewhere also) is the ratio. It expresses the relative value of frequencies in the same way as proportion or percentages but it does so by comparing any one group to either total number of cases or any other group. For instance, in table 6.3, Unit 6, the ratio of all labourers to their daily wages
5 9
between Rs 3035 is 70:14 or 5:1. Where ever possible, it is convenient to reduce the ratios in the form of n1: n2, the most preferred value of n2 being 1. Thus, representation in the form of ratio also reduces the size of the number which facilitates easy comparison and quick grasp. As the number of categories increases, the ratio is a better derivative for presentation as it will be easy and less confusing. There are several types of ratios used in statistical work. Let us discuss them. The Distribution Ratio: It is defined as the ratio of a part to a total which includes that part also. For example, in an University there are 600 girls out of 2,000 students. Than the distribution ratio of girls to the total number of students is 3:10. We can say 30% of the total students are girls in that University. Interpret ratio: It is a ratio of a part in a total to another part in the same total. For example, sex ratio is usually expressed as number of females per 1,000 males (not against population). Time ratio: This ratio is a measure which expresses the changes in a series of values arranged in a time sequence and is typically shown as percentage. Mainly, there are two types of time ratios : i) Those employing a fixed base period: Under this method, for instance, if you are interested in studying the sales of a product in the current year, you would select a particular past year, say 1990 as the base year and compare the current years production with the production of 1990.
ii) Those employing a moving base: For example, for computation of the current year's sales, last year's sales would be assumed as the base (for 1991, 1990 is the base. For 1992, 1991 is the base and so on . . Ratios are more often used in financial economics to indicate the financial status of an organization. Look at the following illustration:
Illustration 2
The following table gives the balance sheet of XYZ Company for the year 200203. Compute useful financial ratios. Table: Balance Sheet of XYZ Company as on March 31, 2003 I 1 1(a) 1(b) 2 2 (a) 2 (a) (i) 2 (a) (ii) 2 (b) 2 (b) (i) 2 (b) (ii)
6 0
Sources of Funds Shareholders' funds Share capital Reserve and surplus Loan funds Secured loans Due after one year Due within one year Unsecured loans Due after one year Due within one year Total (520 + 280)
Amount (Rs. 000) 520 130 390 280 170 120 50 110 50 60 800
II 1 2 2 (a) 2 (b) 3 3 (a) 3 (b) 3 (c) 3 (d)
Application of Funds Net fixed asset Investments Long term investments Current investments Current assets, loans and advances Inventories Sundry debtors Cash and bank balances Loans and advances Less: Current liabilities and provisions Net current assets Total (535 + 85 + 180)
Amount (in Rs. 000) 535 85 75 10 330 160 80 40 50 150 180 800
Solution: Three common ratios may be computed from the above balance
sheet: current ratio, cash ratio, and debt-equity ratio. However, these ratios are discussed in detail in MCO-05 : Accounting for Managerial Decisions, under Unit-5 : Techniques of Financial Analysis.
Current ratio = Current assets , loans , advances + current investment s 330 + 10 = Current liabilitie s and provisions + short term debt 150 + 50 + 60
= 1.31
Cash ratio = Cash and bank balances + Current investment s 40 + 10 = = 0.19 Current liabilitie s and provisions + Short term debt 150 + 50 + 60
Debt Loan fund 280 = = = 0.54 Equity Shareholde rs funds 520
Debt equity ratio =
8.2.3 Rate
The concept of ratio may be extended to the rate. The rate is also a comparison of two figures, but not of the same variable, and it is usually expressed in percentage. It is a measure of the number of times a value occurs in relation to the number of times the value could occur, i.e. number of actual occurrences divided by number of possible occurrences. Unemployment rate in a country is given by total number of unemployed person divided by total number of employable persons. It is clear now that a rate is different from a ratio. For example, we may say that in a town the ratio of the number of unemployed persons to that of all persons is 0.05: 1. The same message would be conveyed if we say that unemployment rate in the town is 0.05, or more commonly, 5 per cent. Sometimes rate is defined as number of units of a variable corresponding to a single unit of another variable; the two variables
6 1
could be in different units. For example, seed rate refers to amount of seed required per unit area of land. The following table gives some examples of rates. S.No. (1) 1 2 3 Description (2) Computation (3) Rate (4) 12.5 km per litre Rs. 1.50 per banana Rs. 1200 per day consultancy
100 kms with 8 litres of petrol 100/8 Rs. 18 for 12 banana Rs. 6000 for 5 days of consultancy 18/12 6000/5
Self Assessment Exercise B

1) Name the different types of ratios used in statistical work. .................................................................................................................. .................................................................................................................. .................................................................................................................. 2) What is a rate? .................................................................................................................. .................................................................................................................. ..................................................................................................................
8.3 MEASURES OF CENTRAL TENDENCY

In Unit 6, we have studied in detail how to classify raw data into a small number of classes or groups and presented them in the form of tables. The next step would be to identify a single value that may be considered as the most representative value of the given data. This is the measure of central tendency, which represents an average character. A measure of central tendency helps us to represent a set of huge data by a single value. To understand the economic condition of people of a particular country, we talk of average or per capita income. It also enables us to compare the situation in two different places or situations. For example, one may compare per capita power availability in two states to understand which one is better in terms of industrial climate. To start with, we list the properties that could be defined by an ideal measure of central tendency. Some of the measures are discussed in detail later.
8.3.1 Properties of an Ideal Measure of Central Tendency

An ideal measure of central tendency should have the following properties:
l l l l
simple to compute and easy to interpret. based on all observations. should not be influenced much by a few observations. should be capable of further algebraic treatment. should be capable of being defined unambiguously.
6 2
Some of the important measures of central tendency which are most commonly used in business and industry are: Arithmetic Mean, Weighted Arithmetic Mean, Median, Mode, Geometric mean and Harmonic mean. Among them Median and Mode are the positional averages and the rest are termed as Methamatical Averages.
8.3.2 Mean and Weighted Mean

Most of the time, when we refer to the average of something, we are talking about the arithmetic mean. This is the most important measure of central tendencies which is commonly called mean. Mean of ungrouped data: The mean or the arithmetic mean of a set of data is given by:
X1 + X 2 + + X n N This formula can be simplified as follows: X=
Arithmetic mean ( x ) =
x N
Sum of values of all observations. Number of observations.
The Greek letter sigma, , indicates the sum of
Illustration 3
Suppose that wages (in Rs) earned by a labourer for 7 days are 22, 25, 29, 58, 30, 24 and 23.The mean wage of the labourer is given by: (22 + 25 + 29 + 58 + 30 + 24 + 23)/7 = Rs. 30.14 Mean of grouped data: We have seen how to obtain the result of mean from ungrouped data. In Unit-6, we have learnt the preparation of frequency distribution (grouped data). Let us consider what modifications are required for grouped data for calculation of mean. When we have grouped data, either in the form of discrete or continuous, the expression for the mean would be :
fx (x ) = N
(f x) Sum of the frequency (f)
Let us consider an illustration to understand the application of the formula.
Illustration 4
The following discrete frequency distribution of wage data of a labourer for 35 days: Wage (in Rupees) 23 24 25 27 28 29 30 31 32 33 34 1 1 3 3 4 6 4 5 5 2 1
Frequency (No of Days)
Now, to compute the mean wage, multiply each variable with its corresponding frequency (f x) and obtain the total (fx).
6 3
Divide this total by number of observations (f or N). Practically, we compute the mean as follows:
Mean = (231 + 24 1 + 25 3 + 27 3 + 28 4 + 29 6 + 30 4 + 31 5 + 32 5 + 33 2 + 34 1) (1 + 1 + 3 + 3 + 4 + 6 + 4 + 5 + 5 + 2 + 1)
102 35
fx = 29.26 f or N
When a frequency distribution consists of data that are grouped by classes, it is known as continuous frequency distribution. In such a distribution each value of an observation falls somewhere in one of the classes. Unlike the raw data (ungrouped) or discrete data we do not know the seperate values of every observation. It is, therefore, to be noted that we can easily compute an estimate of the value of mean of continuous distribution but not the actual value of mean. On the other hand, we can say for ease of calculation, we cannot be very accurate. To find the mean of continuous frequency distribution, we first calculate the midpoints of each class. Then we multiply each mid-point by the frequency of observations in that class, obtain sum of these products, and divide the sum by the total number of observations. The formula looks like this: fx x= N where, fx = Sum value, which is obtained by multiplying the mid-points with its respective frequencies N = Number of observations (f) Let us consider the frequency distribution obtained in Unit-6 (table 6.3), as an illustration for study.
Illustration-5
The following table gives the daily wages for 70 labourers on a particular day. Daily Wages (Rs) : No of labourer : 15-20 20-25 2 23 25-30 30-35 35-40 40-45 19 14 5 4 45-50 3
Solution: For obtaining the estimated value of mean we have to follow the
procedure as explained above. This is elaborated below. Daily wages (Rs) 1520 2025 2530 3035 3540 4045 4550 Mid-point (x) 17.5 22.5 27.5 32.5 37.5 42.5 47.5
6 4
No. of workers (f) 2 23 19 14 5 4 3 N or f = 70
f.x 35.0 517.5 522.5 455.0 187.5 170.0 142.5 fx = 2030.0
29 N 70 Hence, the mean daily wage is Rs. 29.
fx = 2030 = Rs. X=
To simplify calculations, the following formula for mean may be more convenient to use. It is to be noted that it can be applied when the width of the classes are equal.
X=A+
fd
xi
xA and i is the size of the equal class i
where, A is an assumed mean, d = interval.
This formula makes the computations very simple and takes less time. This method eliminates the problem of large and inconvenient mid-points. To apply this formula, let us consider the data of the previous illustration-5. Try to understand the procedure, for obtaining the value of mean, shown below. Assume A as 32.5 Class Interval 15-20 20-25 25-30 30-35 35-40 40-45 45-50 Mid-point (X) 17.5 22.5 27.5 32.5 37.5 42.5 47.5
X=A+
(X-32.5)/5 =d 3 2 1 0 1 2 3
Frequency (f) 2 23 19 14 5 4 3 N = 70
fd 6 46 19 0 5 8 9 fd = 49
fd i
N
49 5 = 29 70 Hence mean daily wage is Rs. 29, as obtained earlier. = 32.5 +
The important property of arithmetic mean is that the means of several sets of data may be combined into a single mean for the combined sets of data. The combined mean may be defined as:
X = 12...n N X + N X ..... + N n X n 2 2 1 1 N + N ..... + N n 1 2
If we have to combine means of four sets of data, then the above formula can be generalized as:
X1234 = N 1 X1 + N 2 X 2 + N 3 X 3 + N 4 X 4 N1 + N 2 + N 3 + N 4
6 5
Advantages and disadvantages of mean

The concept of mean is familiar to most people and easily understood. It is due to the fact that it possesses almost all the properties of a good measure of central tendency. However, the mean has disadvantages of which we must be aware. First, the value of mean may be distorted by the presence of extreme values in a given data set and in case of U-shaped distribution this measure is not likely to serve a useful purpose. Second problem with the mean is that we are unable to compute mean for open-ended classes, since it is difficult to assign a mid-point to the open-ended classes. Third, it cannot be used for qualitative variables.
Weighted Mean
The arithmetic mean, as discussed above, gives equal importance (weight) to all the observations. But in some cases, all observations do not have the same weightage. In such a case, we must compute weighted mean. The term weight, in statistical sense, stands for the relative importance of the different variables. It can be defined as:
xW =
Wx W
where, x w is the weighted mean, w are the weights assigned to the variables (x). Weighted mean is extensively used in Index numbers, it will be discussed in detail in Unit 12 : Index Numbers, of this course. For example, to compute the cost of living index, we need the price index of different items and their weightages (percentage of consumption). The important issue that arises is the selection of weightages. If actual weightages are not available then estimated or arbitrary weightages may be used. This is better than no weightages at all. However, keeping the phenomena in mind, the weightages are to be assigned logically. To understand this concept, let us take an illustration.
Illustration 6
Given below are Price index numbers and weightages for different group of items of consumption for an average industrial worker. Compute the cost of living index.
Group Item Food Clothing House rent Fuel and Light Others Group Price Index 150 186 125 137 184 Weight 55 15 17 8 5
Solution: The cost of living index is obtained by taking the weighted average
as explained in the table below:
Group Item Food Clothing House rent Fuel and Light Others 6 6 Group Price Index( Pi) 150 186 125 137 184 Weight (Wi ) 55 15 17 8 5 W = 100 Wi. Pi 8250 2790 2125 1096 920 Wx = 15181
Therefore, the cost of living index is
Wx = 15181 = 151.81 Xw = W 100
Self Assessment Exercise C

1) A student's marks in a Computer course are 69, 75 and 80 respectively in the papers on Theory, Practical and Project Work. What are the mean marks if the weights are 1, 2 and 4 respectively? What would be the mean marks if all the papers have equal importance? Use the following table Table: Computation of Weighted Mean Marks Paper Theory Practical Project Work So, weighted mean is = 2) The following table gives frequency distribution of monthly sales (in Rupees thousands) of 125 firms. Table: Monthly Sales of 125 Firms Monthly Sales (in thousands) 0150 150300 300450 450600 600750 750900 All Number of Firms 15 22 64 11 9 4 125 Marks Percentage (X) Weight (W) W. X
Compute mean monthly sales of the firms and interpret the data. Since the class width is 150 for all the classes, the method of assumed mean is useful. The following table may be helpful. Table: Computation of Average Monthly Sales of 125 Firms Monthly Sales (in thousands) Midpoint (X) (XA)/150 (d) Number of Firms (f) f.d
6 7
So, the average monthly sales = 3) The mean wage of 200 male workers in a factory was Rs 150 per day, and the mean wage of 100 female and 50 children were Rs. 90 and Rs. 35 respectively, in the same factory. What would be the combined mean of the workers. Comment on the result. .................................................................................................................. .................................................................................................................. .................................................................................................................. ..................................................................................................................
8.3.3 Median
The median is another measure of central tendency. The median represents the middle value of the data that measures the central item in the data. Half of the items lie above the median, and the other half lie below it. Median of Ungrouped Data: To find the median from ungrouped data, first array the data either in ascending order or in descending order. If the number of observations (N) contains an odd number then the median is the middle value. If it is even, then the median is the mean of the middle two values. In formal language, the median is
N + 1 2
th
item in a data array, where N is
the number of items. Let us consider the earlier illustration 3 to locate the median value in two different sets of data.
Illustration-7
On arranging the daily wage data of the labourers (as given in illustration 3) in ascending order, we get Rs. 22, 23, 24, 25, 29, 30, 58. Number of observations is an odd number (seven). According to equation
N + 1 2
th
Item, the middle (i.e. the fourth) number is the median. Here the
median wage of labourer is Rs. 25. You may notice that unlike the mean we calculated earlier, the median we calculated above was not distorted by the presence of the last value i.e., Rs. 58. This value could have been even Rs. 99, the median would have been the same. Had there been one more observation, say, Rs. 6, the order would have been as below: Rs. 6, 22, 23, 24, 25, 29, 30, 58 There are eight observations, and the median is given by the mean of the fourth and the fifth observations (i.e., wage = (24 + 25)/2 = Rs. 24.5.
8 +1 th item = 4.5th item. So, median 2
6 8
Median of Grouped Data: Now, let us calculate the median from grouped data. When the data is in the form of discrete series, the median can be computed by examining the cumulative frequency distribution, as is shown below.
Illustration-8
To compute median wage from the data given in Illustration 4, we add one more row of cumulative frequency (the formation of cumulative frequency, we have discussed in Unit 6 of this course : Processing of Data. Wage (In Rupees) 23 24 25 27 28 29 30 31 32 33 3 5
th
34 1 35
Frequency (No. of Days) 1 1 Cumulative Frequency According to the formula, Therefore, item 1 2
N + 1 2
3 8
12 18 22 27 32 34
item, the number of observations is 35.
35 + 1 th is the 18th item. Hence the 18th observation will be the 2
median. By inspection it is clear that Median wage is Rs. 29. This procedure is to be slightly modified for computation of median from class interval data. The median is taken as the value of the variable corresponding to the (N/ 2)th observation. The class or group containing median should be identified first and the median is computed under the assumption that all the observations in that class are equally spaced. Symbolically, the expression for median is given by: Median = L +
N / 2 c.f i f
where, L is the lower limit of the median class, N is the number of observations, f is frequency of the median class, cf is the cumulative frequency of the class next lower to the median class and i is the width of the median class. Let us consider the data given in earlier illustration 5 to study the median.
Illustration 9: Compute median wage for the data given in Illustration 5.
Solution: The approach is quite similar to the previous example. As indicated,

it will be implicitly assumed that the wages of the labourers in the groupcontaining median are equally spaced. Class Interval (wages in Rs.) 15-20 20-25 25-30 30-35 35-40 40-45 45-50 Frequency (f) 2 23 19 14 5 4 3 N = 70 Cumulative frequency (cf) 2 25 44 58 63 67 70
Here, the number of observations is 70. So the median corresponds to the 35th
6 9
value of the variable
70 2
th
item. This item lies in 44 (35th observation)
cumulative frequency. Hence, It is clear Column (3) that median is in the third class interval, i.e. Rs. 25 to Rs. 30. So we have to locate the position of the 35th observation in the class 25-30. Here,
N = 35 , 2
L = 25, cf = 25, f = 19 and i = 30 25 = 5.

35 25 N/ 2 cf i = 25+ 5 = Rs . 27.63. 19 f
Thus median is : L +
It is to be noted that the median value may also be located with the help of graph by drawing ogives or a less than cumulative frequency curve. This method was discussed in detail in Unit 7 : Diagrammatic and Graphic Presentation, of this block.
Advantages and disadvantages of median

The biggest advantage of median is that extreme observations do not affect it. For computation of the median, it is not necessary to know all the observations and this property comes in handy when there are open-ended classifications of data. This is also suitable for qualitative variables, which can be ordered or arranged in ascending or descending order (ordinal variables). However, it requires data to be arranged before computation. It is not amenable to arithmetic and algebraic manipulations. For example, if M1 and M2 are medians of two different sets of data, we cannot get the median of the combined data set from M1 and M2.
Some Additional Points: Median divides the distribution in two equal

parts. When a distribution is divided into four equal parts they are called quartiles. Similarly, there are deciles (divided into ten equal parts), percentiles (divided into hundred equal parts) etc. The general term for all of them is fractile. In Unit 9 of this block, we will learn more about quartiles.
Self Assessment Exercise D

1) Refer to the data in Self Assessment Exercise C, No. 1. Obtain median monthly sales of the firms. The following table may be helpful. Table: Computation of Median Sales of 125 Firms Monthly Sales (in thousands) Number of Firms (f) Cumulative frequency
..................................................................................................................
7 0
..................................................................................................................
8.3.4
Mode
Mode is also a measure of central tendency. This measure is different from the arithmetic mean, to some extent like the median because it is not really calculated by the normal process of arithmetic. The mode, of the data, is the value that appears the maximum number of times. In an ungrouped data, for example, the foot size (in inches) of ten persons are as follows: 5, 8, 6, 9, 11, 10, 9, 8, 10, 9. Here the number 9 appeares thrice. Therefore, mode size of foot is 9 inches. In grouped data the method of calculating mode is different between discrete distribution and continous distribution. In discrete data, for example consider the earlier illustration 6, the modal wage is Rs. 29 as is the wage for maximum number of days, i.e. six days. For continuous data, usually we refer to modal class or group as the class with the maximum frequency (as per observation approach). Therefore, the mode from continuous distribution may be computed using the expression: Mode = L +
1 i 1 + 2
where, L = lower limit of the modal class, i = width of the modal class, 1 = excess of frequency of the model class (pi) over the frequency of the preceding class (f0), 2 = excess of frequency of the model class (f1) over the frequency of the succeeding class (f2). The letter is read as delta. Noting that, 1 = f1 f0 and 2 = f1 f2 It is to be noted that while using the formula for mode, you must arrange the class intervals uniformly throughout, otherwise you will get misleading results. To illustrate the computation of mode, let us consider the grouped data of earlier illustration 7.
Illustration 10
Compute mode from the following data. Daily wages (Rs.) 15-20 20-25 25-30 30-35 No. of workers (f) 12 23 19 14 Daily wages (Rs) 35-40 40-45 45-50 No. of workers (f) 5 4 3
Solution: Since the maximum frequency 23 is in the class 20-25. Therefore, based on observation method, the class 20-25 is the modal class. Applying the
1 i 1 + 2 formula, we get 1 = f1 f 0 ; 2 = f1 f 2 Mode = L +
7 1
The related values are as follows: L = 20; f0 = 12; f1= 23; f2 = 19; and i = 5 1 = 11 & 2 = 4
Mode =
20 +
11 11 5 = 20 + 5 11 + 4 15
= 20 + 3.67 = Rs. 23.67
Hence the modal daily wage is Rs. 23.67 In a continuous frequency distribution, the value of mode can also be located graphically. We have already discussed the procedure for locating the mode graphically in Unit-7 of this block.
Advantages and Disadvantages of Mode

Extreme values of observations do not affect the mode and its value can be determined in open-ended classes. This measure is also suitable for any qualitative variables (both nominal and ordinal variables). It may not be unique all the time. There may be more than one mode or no mode (no value that occurs more than once) at all. In such a case it is difficult to interpret and compare the distributions. It is not amenable to arithmetic and algebraic manipulations. For example, we cannot get the mode of the combined data set from the modes of the constituent data sets.
Self Assessment Exercise E

Refer to the data in Self Assessment Exercise C No. 1. Obtain mode of monthly sales of the firms. L = , f0 = , f1 = , f2 = and i= .
Hence, the mode is given by : ....................................................................................................................... ....................................................................................................................... ....................................................................................................................... Comparing the Mean, Median, and Mode For a moderately skewed distribution, it has been empirically observed that the difference between Mean and Mode is approximately three times the difference between Mean and Median. This was illustrated in the Fig. 8.1 (b) and (c). The expression is: Mean Mode = 3(Mean Median) Alternately, Mode = 3(Median) 2(Mean) Sometimes this expression is used to calculate value of one measure when the value of the other two measures are known.
7 2
8.3.5 Choice of a Suitable Average

We have already discussed advantages and disadvantages of three different types of averages: Mean, Median, and Mode. Here, we discuss their appropriateness in terms of the following three factors: (1) the level of measurement of data (2) the shape of the distribution and (3) the stability of the measure of the average. Levels of measurement: There are four levels of measurement of data: nominal, ordinal, ratio and interval. At nominal level, the observations can be just distinguished or differentiated but cannot be arranged in any order. Examples may be colour of cars, types of blood groups, brands of a consumer goods etc. At ordinal level, the observations can be arranged in ascending or descending order, but no arithmetic operations are possible. While describing the existing business climate, the respondents may tell - very good, good, medium, bad and very bad. This could be an example of ordinal data. At interval level, it is assumed that a given interval on the scale measures the same amount of difference, irrespective of where the interval appears. There is a zero but it is arbitrary and is not of much significance. For example, the temperature difference between 500C and 600C is the same, as the temperature difference between 100C and 200C but a temperature of 00C does not mean absence of heat. Variables like height, weight are examples of ratio levels of measurement. Here, a value which is twice as large as another value corresponds to twice the value of the variable and it has an absolute zero. We say that a 10-metre tower is twice as tall as a 5-metre tower, but we never mean that a temperature of 400C is twice as hot as a temperature of 200C. From the above discussion, it is clear that for nominal data only mode can be used, for ordinal data both mode and median can be used whereas for ratio and interval levels of data all three measures can be calculated. Shape of the distribution: If the distribution of data is symmetric with only one peak, mean, median, and mode are the same. Even in case of two modes, mean and median will be the same. For asymmetric distribution, all these are different. For positively skewed distribution, mode is the smallest and median lies between mode and mean whereas for negatively skewed distribution the pattern is just the opposite. (It is discussed elaborately in Unit 9, Section 9.5.) Thus, in either of the cases, median appears to be a better measure of central tendency. Figure 8.1 shows three different shapes of a distribution.
Mo= Me = X (a) Mode Median Mean Median Mean
Mode
(b)
Figure 8.1
(c)
7 3
Stability: Quite often a researcher studies a sample to infer about the entire population. Mean is generally more stable than median or mode. If we calculate means, medians and modes of different samples from a population, the means will generally be more in agreement then the medians or the modes. Thus, mean is a more reliable measure of central tendency. Normally, the choice of a suitable measure of central tendency depends on the common practice in a particular industry. According to its requirement, each case must be judged independently. For example, the Mean sales of different products may be useful for many business decisions. The median price of a product is more useful to middle class families buying a new product. The mode may be a more useful measure for the garment industry to know the modal height of the population to decide the quantity of garments to be produced for different sizes. Hence, the choice of the measure of central tendency depends on (1) type of data (2) shape of the distribution (3) purpose of the study. Whenever possible, all the three measures can be computed. This will indicate the type of distribution.
8.3.6 Some Other Measures of Central Tendency

Sometimes two other measures of central tendency, geometric mean and harmonic mean are also used. They are briefly discussed here. Geometric Mean (GM): Geometric mean is defined as the Nth root of the product of all the N observations. It may be expressed as:
G.M. = n Pr oduct of all n values . Thus, the geometric mean of four numbers 2, 5,
( 2 5 8 10 = (800) = 5.3183 . If one observation is
8 and 10 is given by
zero, the geometric mean becomes zero and hence inappropriate. If some values are negative, sometimes the geometric mean may be computed but may be meaningless. Geometric mean is appropriate for the variables that reproduce themselves. Suppose, population of a country in years 1990 and 2000 are respectively 100 and 121 million. The average population in the decade is = 2 (100121 ) million or 110 million. Probably the most frequent use of geometric mean is to know the average rate of change. These could be average percent change of population, compound interest, growth rate etc. Harmonic Mean (HM): Harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocal of the observations. In other words, it may be defined as the ratio of Number of observations and sum of reciprocal of the
1 1 1 values. It may be expressed as: HM = N / x + x + ..... x 2 n 1 , in short
1 N / For example, the harmonic mean of 4 and 6 is 2 / (1/4 + 1/6) = 2 / x (5/12) = 2/ (5/12) = 21 (0.4166) = 4.8. Suppose a car moves half the distance at the speed of 60 km/hr and the other half at the speed of 80 km/hr. Then the average speed of the car is 68.57 km/hr, which is the harmonic mean of 60 and 80. Harmonic mean is useful in averaging rates.
For any set of data wherever computation is possible, the following inequality holds
7 4
x > GM > HM
Illustration-11
To compute arithmetic, geometric and harmonic means of 4,5,10 and 11 and verify the above relationship. Arithmetic Mean, A = (4 + 5 +10 +11)/4 = 30 / 4 = 7.50 Geometric Mean, G =
4 ( 4 5 10 11 = 4 ( 2200) = 6.85
Harmonic Mean, H = 4/(1/4 +1/5+1/10+1/11) = 4 /0.64 = 6.25 So, the relationship discussed above is verified. It is also possible to compute weighted geometric and harmonic means.
8.4 LET US SUM UP

In order to draw meaningful and useful conclusions from the data the collected data must be analysed with the help of statistical derivatives like percentage, ratio and rates. They also give meaningful insight with very little computation. A ratio expresses the relationship between the magnitude of more than one quantity. It is generally stated as A : B : C. Proportion is the ratio of any one category to the total of all the categories. It is a better derivative to use when the number of categories increases. A rate is usually expressed as per 100 per 1,000 etc. A measure of central tendency gives one representative value, around which the data set is clustered. Three widely used measures are discussed in detail. Mode is the simplest of all but at times it is not defined. Median divides the observations into two equal parts and is particularly suitable in open-ended data. Arithmetic mean is calculated based on all the observations but gets affected by extreme values. For qualitative data, however mean cannot be computed. Mean, median and mode show the type of distribution of data.Measures of central tendency are also called measures of location.
8.5 KEY WORDS

Arithmetic Mean : This equals the sum of all the values divided by the number of observations. Bimodal Distribution : In a distribution, when two values occur more frequently in equal number. Geometric Mean : It refers to Nth root of the product of all the N observations. Harmonic Mean : This is the reciprocal of the arithmetic mean of the reciprocals of the given values. Mean : Usually refers to Arithmetic Mean or Average. Median : The middlemost observation in a data set when arranged in order. Mode : The most frequent value occurring in a data set. It is represented by the highest point in the distribution curve of a data set. Percentage : It gives the magnitude of the numerator when denominator of a ratio becomes hundred. Rate : Amount of one variable per unit amount of some other variable.
7 5
Ratio : Relative value of one value with respect to another value. Weighted Mean : An average in which each observation value is weighted by some index of its importance.
8.6
ANSWERS TO SELF ASSESSMENT EXERCISES
A: 2. With reference to the original table, the all India figures are the totals of all the state figures. Thus a column percentage gives the share of a state from among all the states of India in respect of the category of workers. The column percentages are given below. Table: State-wise Percentage Share of Total Workers and Categories of Workers in All India: 2001 Sl. State/ No. India 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Jammu & Kashmir Himachal Pradesh Punjab Haryana Rajasthan Bihar Assam Cultivators Agricultural Household Other Labourers Industry Worworkers kers 1.25 1.54 1.64 2.39 10.32 6.42 2.93 0.23 0.09 1.40 1.19 2.35 12.66 12.59 1.20 6.84 4.65 6.87 4.64 10.51 12.86 5.78 1.54 8.06 100.00 1.40 0.31 1.87 1.26 3.97 17.60 6.63 2.00 13.13 4.20 6.16 2.33 6.38 9.57 5.71 2.22 8.90 100.00 1.07 0.59 3.47 2.55 4.92 10.27 3.49 2.78 9.52 2.88 4.18 6.21 11.72 7.66 6.25 4.99 8.32 100.00 Total
0.92 0.74 2.27 2.08 5.91 13.46 6.98 2.37 7.33 3.55 6.40 5.06 10.45 8.66 5.84 2.56 6.91 100.00
Uttar Pradesh 17.37
West Bengal 4.40 Orissa Madhya Pradesh Gujarat Maharashtra Andhra Pradesh Karnataka Kerala Tamil Nadu INDIA 3.32 8.66 4.40 9.41 6.19 5.43 0.58 4.01 100.00
7 6
The interpretation is obvious - Out of all the workers in all India, 13.46 percent are in Uttar Pradesh and 10.45 percent in Maharashtra. Andhra Pradesh has the highest number of Agricultural Labourers (12.86%) followed by Uttar Pradeh (12.66%) and Bihar (12.59%). The lowest number of Household Industry workers is in Himachal Pradesh, etc.
C: 1)
The weighted mean is = 539/7 = 77 If all the papers have equal importance, i.e. equal weightage, then the simple mean = 224/3 =74.67. Since the class width is 150 for all the classes, the method of assumed mean is useful. On observation, assumed mean is taken as 375.
x =A+ fd xi N ; Mean sales of 125 firms is Rs. 361.8 thousands.
N1x1 + N 2 x 2 + N 3 x 3 N1 + N 2 + N 3
2)
3) x123 =
x 123 = 116 .43
D: Median = L +
N / 2 c.f c f Me = 359.76
1 E: Mode = L + + i ; Modal sales value is Rs. 363.64 thousands. 1 2
8.7 TERMINAL QUESTIONS/EXERCISES

1) 2) Explain the concept of central tendency with the help of an example. What purpose does it serve? A representative value of a data set is a number indicating the central value of that data. To what extent is it true for Mean, Median, and Mode? Explain with illustrations. Discuss the merits and limitations of various measures of central tendency. The following table gives workers of India (in thousands) as per 2001 census. Compute suitable percentages and interpret them. Table: Total Workers and Their Categories-India : 2001 (In thousands)
S. Total Persons No. Rural/ Males/ Urban Females (1) 1 2 3 4 5 6 7 8 9 Total (2) Rural (3) Persons Males Females Urban Persons Males Females Persons Males Females Cultivators Agricultural Labourers (5) 103122 54749 48373 4326 2605 1721 107448 57354 50093 Household Industry workers (6) 11710 5642 6067 4686 2670 2016 16396 8312 8084 Other Total Workers Workers (7) 71142 54762 16380 79899 68707 11191 151040 123469 27571 (8) 310655 199200 111456 91857 76264 15593 402512 275464 127048
3) 4)
(4) 124682 84047 40635 2946 2282 664 127628 86328 41300
Source: Census of India, 2001.

7 7
5) The monthly salaries (in Rupees) of 11 staff members of an office are: 2000, 2500, 2100, 2400, 10000, 2100, 2300, 2450, 2600, 2550 and 2700. Find mean, median and mode of the monthly salaries. Which one among the above do you consider the best measure of central tendency for the above data set and why? 6) Consider the data set given in problem 2 above. Find mean deviation of the data set from (i) median (ii) 2400 and (iii) 2500. Find mean squared deviation of the data set from (i) mean (ii) 3000 and (iii) 3100. 7) Mean examination marks in Mathematics in three sections are 68, 75 and 72, the number of students being 32, 43 and 45 respectively in these sections. Find the mean examination marks in Mathematics for all the three sections taken together. 8) The followings are the volume of sales (in Rupees) achieved in a month by 25 marketing trainees of a firm: 1220 1450 1800 1280 300 475 1700 1800 1200 1400 200 600 400 1150 1200 350 1225 1200 1300 1550 1300 1100 450 1400 1200
The firm has decided to give the trainees some performance bonus as per the following rule - Rs. 100 if the volume of sales is below Rs. 500; Rs. 250 if the volume of sales is between Rs. 500 and Rs.1000; Rs.400 if the volume of sales is between Rs. 1000 and Rs, 1500 and Rs.600 if the volume of sales is above Rs. 1500. Find the average value of performance bonus of the trainees. 9) In an urban cooperative bank, the minimum deposit in a savings bank is Rs. 500. The deposit balance at the end of a working day is given in the table below : Table: Average Deposit Balance in ABC Urban Cooperative Bank S No 1 2 3 4 5 6 7 8 9 10 Deposit Balance Less than Rs. 10000 Less than Rs. 9000 Less than Rs. 8000 Less than Rs. 7000 Less than Rs. 6000 Less than Rs. 5000 Less than Rs. 4000 Less than Rs. 3000 Less than Rs. 2000 Less than Rs. 1000 Number of Deposits 982 959 874 773 621 395 295 145 25 10
7 8
Calculate mean, median and mode from the above data.
10) 11)
Refer to the table given in the previous problem. Compute (a) median and (b) mode by graphical approach. Refer to the problem 8. Compute the approximate value of mode using the relationship: Mean - Mode = 3 (Mean - Median), and compare with the computed value obtained earlier. Note: These questions/exercises will help you to understand the unit better. Try to write answers for them. But do not submit your answers to the university for assessment. These are for your practice only.
8.8 FURTHER READING

The following text books may be used for more indepth study on the topics dealt with in this unit. Gupta, S P and M P Gupta, 1988. Business Statistics, S Chand, New Delhi. Hooda, R.P., 2001. Statistics for Business and Economics, Macmillan India Limited, New Delhi. Levin, R I and D S Rubin, 1998. Statistics for Management, Prentice Hall India, New Delhi. Spiegel, M R, 1992. Statistics, Schaum's Outline Series, McGraw Hill, Singapore.
7 9

Unit+8 (Block 2)

Uploaded by

Copyright:

Available Formats

Unit+8 (Block 2)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit+8 (Block 2)

Uploaded by

Copyright:

Available Formats

Processing and Presentation of Data

UNIT 8 STATISTICAL DERIVATIVES AND MEASURES OF CENTRAL TENDENCY

8.3 Measures of Central Tendency

8.4 8.5 8.6 8.7 8.8

8.2 STATISTICAL DERIVATIVES

Statistical Derivatives and Measures of Central Tendency

Processing and Presentation of Data

Self Assessment Exercise A

Statistical Derivatives and Measures of Central Tendency

17. Tamil Nadu INDIA 100.00 100.00 100.00 100.00 100.00

Processing and Presentation of Data

II 1 2 2 (a) 2 (b) 3 3 (a) 3 (b) 3 (c) 3 (d)

Statistical Derivatives and Measures of Central Tendency

Debt equity ratio =

Processing and Presentation of Data

Self Assessment Exercise B

8.3 MEASURES OF CENTRAL TENDENCY

8.3.1 Properties of an Ideal Measure of Central Tendency

Statistical Derivatives and Measures of Central Tendency

8.3.2 Mean and Weighted Mean

Sum of values of all observations. Number of observations.

The Greek letter sigma, , indicates the sum of

(f x) Sum of the frequency (f)

Let us consider an illustration to understand the application of the formula.

Frequency (No of Days)

Processing and Presentation of Data

No. of workers (f) 2 23 19 14 5 4 3 N or f = 70

f.x 35.0 517.5 522.5 455.0 187.5 170.0 142.5 fx = 2030.0

29 N 70 Hence, the mean daily wage is Rs. 29.

Statistical Derivatives and Measures of Central Tendency

where, A is an assumed mean, d = interval.

49 5 = 29 70 Hence mean daily wage is Rs. 29, as obtained earlier. = 32.5 +

Processing and Presentation of Data

Advantages and disadvantages of mean

Therefore, the cost of living index is

Wx = 15181 = 151.81 Xw = W 100

Statistical Derivatives and Measures of Central Tendency

Self Assessment Exercise C

Processing and Presentation of Data

item in a data array, where N is

Statistical Derivatives and Measures of Central Tendency

item, the number of observations is 35.

35 + 1 th is the 18th item. Hence the 18th observation will be the 2

Illustration 9: Compute median wage for the data given in Illustration 5.

Solution: The approach is quite similar to the previous example. As indicated,

Processing and Presentation of Data

value of the variable

item. This item lies in 44 (35th observation)

L = 25, cf = 25, f = 19 and i = 30 25 = 5.

Advantages and disadvantages of median

Some Additional Points: Median divides the distribution in two equal

Self Assessment Exercise D

Statistical Derivatives and Measures of Central Tendency

Processing and Presentation of Data

= 20 + 3.67 = Rs. 23.67

Advantages and Disadvantages of Mode

Self Assessment Exercise E

8.3.5 Choice of a Suitable Average