Statistics and Truth
Statistics and Truth
Statistics and Truth
AND TRUTH
Putting Chance to Work
second edition
Pen~y~vania
State University, USA
STATISTICS
AND TRUTH
Putting Chance to Work
World Scientific
Singapore *NewJersey *London*HongKong
Published by
World Scientific Publishing Co. Pte. Ltd.
P 0 Box 128, Farrer Road, Singapore 912805
USA @ce: Suite IB, 1060 Main Street, River Edge, NJ 07661
For photocopying of material in this volume, please pay D copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.
Printed in Singapore
For instilling in me
the quest for knowledge
I owe to
my mother
A. Laxmikanthamma
who, in my younger days, woke me up
everyday at four in the morning and
lit the oil lamp for me to study in
the quiet hours of the morning
when the mind is fresh
***
All knowledge is, in final analysis, history.
All sciences are, in the abstmct, mathematics.
All judgements are, in their rationale, statistics.
Foreword
Beginning this year, CSIR has instituted a distinguished
lectureship series. The objective here is to invite eminent scientists
from India and abroad for delivering a series of three lectures on
topics of their choice. The lectures, known as the CSIR Distinguished
Lectures, were to be delivered in different locations of the country.
The first of this series has been dedicated to the memory of the
mathematics genius Srinivasa Ramanujan.
It augurs well that this first set of lectures (CSIR Ramanujan
lectures) has begun with those of Prof. C. Radhakrishna Rao,
National Professor (and currently Eberly Professor of Statistics, Penn
State University), a distinguished scientist in the international statistics
scene.
The lectures were delivered at the National Physical
Laboratory in Delhi, at the Central Leather Research Institute in
Madras and in the Indian Statistical Institute at Calcutta and were
widely appreciated by professional statisticians, by physicists,
chemists and biological scientists, by students of different age groups
and by professionals and administrators. The scope of these lectures
was wide and pervaded many areas of human activities, both scientific
and administrative.
By arranging to have the lectures published now, CSIR hopes
that a wider community of scientists the world over will be able to
derive the benefit of the expertise of renowned men like Prof. Rao.
I express my appreciation of thehefforts of Dr. Y .R. Sarma for
having edited and brought out the volume quickly.
A.P. MITRA
Director-General
Council of Scientific &
Industrial Research
New Delhi,
December 31, 1987
vii
Uncertain
knowledge
+
Knowledge of the
amount of
uncertainty in it
Usable
knowledge
Preface
I consider it a great honour to be called upon to deliver the
Ramanujan Memorial Lectures under the auspices of the CSIR
(Council of Scientific & Industrial Research). I would like to thank
Dr. A.P. Mitra, Director General of the CSIR, for giving me this
honour and opportunity to participate in the Ramanujan centenary
celebrations.
I gave three lectures, the first one in Delhi, the second in
Calcutta and the third in Madras as scheduled, which I have written
up in four chapters for publication. In the beginning of each lecture
I have said a few words about the life and work of Ramanujan, the
rare mathematical genius who was a legendary figure in my younger
days. This is to draw the attention of the younger generation to the
achievements of Ramanujan, and to emphasize the need to reform our
educational system and reorganize our research institutes to encourage
creativity and original thinking among the students.
When I was a student, statistics was in its infancy and I have
closely watched its evolution over the last 50 years as an independent
discipline of great importance and a powerful tool in acquiring
knowledge in any field of enquiry. The reason for such phenomenal
developments is not difficult to seek.
Statistics as a method of learning from experience and decision
making under uncertainty must have been practiced from the
beginning of mankind. But the inductive reasoning involved in these
processes has never been codified due to the uncertain nature of the
conclusions drawn from given data or information. The breakthrough
occurred only in the beginning of the present century with the
realization that inductive reasoning can be made precise by specifying
the amount of uncertainty involved in the conclusions drawn. This
paved the way for working out an optimum course of action involving
minimum risk, in any given uncertain situation, by a purely deductive
process. Once this mechanism was made available, the flood gates
ix
xi
years ago. Since the latter book appeared, there have been new
developments in our thinking and attitude towards chance. We have
reconciled with the "Dice-Playing God" and learnt to plan our lives
to keep in resonance with uncertainties around us. We have begun to
understand and accept the beneficial role of chance in situations
beyond our control or extremely complex to deal with. To emphasize
this I have chosen the subtitle, Putting Chance to Work.
Dr. Joshi, the Director of the National Physical Laboratory,
reminded me what Thomas Huxley is reported to have said, that a
man of science past sixty does more harm than good. Statistically it
may be so. As we grow old we tend to stick to our past ideas and try
to propagate them. This may not be good for science. Science
advances by change, by the introduction of new ideas. These can
arise only from uninhibited young minds capable of conceiving what
may appear to be impossible but which may contain a nucleus for
revolutionary change. But I am trying to follow Lord Rayleigh who
was an active scientist throughout his long life. At the age of sixtyseven (which is exactly my present age), when asked by his son, who
is also a famous physicist, to comment on Huxley's remark, Rayleigh
responded:
That may be if he undertakes to criticize the work of younger men, but I
do not see why it need be so if he sticks to things he is conversant with.
However J.B.S. Haldane used to say that Indian scientists are polite
and they do not criticize the work of each other, which is not good
for the progress of science.
It gives me great pleasure to thank Dr. Y.R.K. Sarma of the
Indian Statistical Institute for the generous help he has given in
editing the Ramanujan Memorial Lectures I gave at various places in
the form of a book and looking after its publication.
Calcutta
December 12, 1987
C.R. Rao
Xlll
xiv
Chapter 5 deals with the ubiquity of statistics as an inevitable tool in
search of truth in any investigation whether it is for unravelling the
mysteries of nature or for taking optimum decisions in daily life or
for settling disputes in courts of law.
We are all living in an information age, and much of the
information is transmitted in a quantitative form such as the
following. The crime rate this year has gone down by 10% compared
to last year. There is 30% chance of rain tomorrow. The Dow Jones
index of stock market prices has gained 50 points. Every fourth child
born is Chinese. The percentage of people approving Presidents
foreign policy is 57 with a margin of error of 4 percentage points.
You lose 8 years of your life if you remain unmarried. What do these
numbers mean to the general public? What information is there in
these numbers to help individuals in making right decisions to
improve the quality of their lives. An attempt is made in Chapter 6
of the new edition to emphasize the need for public understanding of
statistics, and what we can learn from numbers to be efficient
citizens, as emphasized by H.G. Wells:
Statistical thinking will one day be as necessary for efficient citizenship as
the ability to read and write.
University Park,
March 31, 1997
C.R. Rao
Contents
CHAPTER 1 Uncertainty, Randomness and Creation
of New Knowledge
1
1
3
17
21
25
26
26
28
34
37
39
41
41
49
60
63
63
70
a7
89
92
xv
xvi
CHAPTER 4 Weighted Distributions - Data
with Built-in Bias
1. Specification
2. Truncation
3. Weighted distributions
4. P.P.s. sampling
5. Weighted binomial distribution: empirical
theorems
6. Alcoholism, family size and birth order
7. Waiting time paradox
8. Damage models
References
95
95
96
99
101
102
110
116
117
119
121
121
129
129
131
133
133
134
134
135
136
137
138
139
143
145
145
147
xvii
2.16 Machine building factories to increase food
production
2.17 The missing decimal numbers
2.18 The Rhesus factor: a study in scientific
research
2.19 Family size, birth order and I.Q.
References
CHAPTER 6 Public Understanding of Statistics:
Learning from Numbers
148
150
152
154
155
157
157
158
162
166
168
169
171
173
177
178
179
180
APPENDIX
Srinivasa Ramanujan - a rare phenomenon
181
Index
187
Chapter 1
...
toss, you get a random sequence of what are called binary digits (0s
and 1s). Such a sequence can also be obtained by drawing beads
from a bag containing black and white beads in equal numbers,
writing, say 0 for black bead drawn and 1 for white. When I was
teaching the first year class at the Indian Statistical Institute, I used
to send my students to the Bon-Hooghly Hospital near the Institute in
Calcutta to get a record of successive male and female births
delivered. Writing M for a male birth and F for a female birth we get
a binary sequence as the one obtained above by repeatedly tossing a
coin or drawing beads. One is a natural sequence of biological
phenomena and another is an artificially generated one.
Table 1.1 gives a sequence of the colors of 1000 beads drawn
with replacement from a bag containing equal numbers of white (W)
and black (B) beads. Table 1.2 gives a sequence of lo00 Children
delivered in a hospital according to sex of the child, male (M) or
female (F). We can summarize the data of Tables 1.1 and 1.2 in the
form of what are called frequency distributions. The frequencies of
0, 1, 2, 3, 4, 5 males in sets of 5 consecutive births and of white
beads in sets of 5 consecutive draws of beads are given in Table 1.3.
The expected frequencies are theoretical values which are
realizable on the average if the experiment with 200 trials is repeated
a large number of times. The frequencies can be represented
graphically in the form of what are called histograms.
It is seen that the two histograms are similar indicating that
the chance mechanism of sex determination of a child is the same as
that of drawing a black or a white bead from a bag containing equal
numbers of beads of the two colors or similar to that of coin tossing.
A simple exercise such as the above can provide the basis for
formulating a theory of sex determination. God is tossing a coin! In
fact statistical tests showed that the male-female births provide a more
faithful random binary sequence than the artificially generated one.
Perhaps God is throwing a more perfect coin. In India one child is
born every second, which provides a cheap and expeditious source for
generating binary random sequences.
B WWB B
B B WWR
WB B WB
B B W B B
U WBB W
WB WB B
R WWB W
WWWB W
D WWWB
WWB WB
B B B W B
WB WWW
WWWB B
W W B ii
B B B B W
WB WB W
B W B B W
WB WWW
B B WWW
B B W B W
B B WWB
B B WWW
B B B W W
B WWBB
B B r J B B
B W B B B
U WB B B
WWWB B
WWB B W
B WWWW
WWWB B
WWWWB
B WB WW
WB B WB
B B WBW
W W B B B
WB WB W
B B WWB
B B B W W
WB WB W
B W B B B
WR W B w
WWWWB
B WB WB
B B W B B
WWWB W
WWB WB
B B B W B
WWB WB
B WB WB
WWWB W
WWB WB
B 13 w w w
U B U W W
B B WWW
B WWWW
WWWB B
B WWWB
WB WWW
B B W B B
B W B B B
B WWWW
B WB WW
WWWWB
B B B W W
BWWBB
W B B B W
B B B B B
WB B WW
B WWWB
B B B B W
B WB B W
B W B W W
B B W B W
B WB WW
B B W B W
B B W B B
B B WB W
WB WB B
DBI3WB
WWB B W
W B B B B
B B B WB
WWWB B
WWWB W
B B B B B
WB WWB
WWWWB
B W B W B
B WB WB
WWWB W
WB B WW
B B B B W
WWBWB
B W B W n
B B B B W
WB WB B
WB WB U
WWBBB
WWB B B
W _B B B _B
B B B W B
WB W B B
WBWBW
WWWWW
WB WWB
WB WWW
B B B W W
BBWWB
B B WWB
WW
~ WWB
~
WWB B B
WWB WW
WB WWB
I3 W W W B
WWB B W
WWB WB
WBBWW
B W R W B
B WB WB
B WWWB
WWB WW
B B B W W
WB WB W
B B WB W
B WWWW
WWWB B
WB WB W
W W B B B
WWB B B
B B B B B
W W B B I3
B WB WW
B B B W W
B WB WW
B B B W B
B B B W W
13 W W B B
W B WWW
WB W B B
W B B WW
WWB B W
WB WB W
B WWBB
R W B WB
B B W B W
B B WWW
WWWB W
D B B B W
W B 11 W W
U B B l ~ B
WWWn B
WB WB W
W B B B B
B W B B W
WB WB B
B WWB W
B B W B B
W B B B W
B WWWW
WB B WB
B WWWn
U B W B W
W B WWW
B WWWW
B WWWW
B W B B B
B B R W W
B WWWW
WB B WB
I3 B W W B
WB B WW
R WB B W
G W B B W
B WR W w
WB B WB
WB WB W
B fVB W w
wwn ww
wwwww
w s TOGRAM
n 1200
wwwww
U WWn B
WWWB W
W R B U B
WWWWB
WB WB W
WWWB B
WWWB B
B W B B W
wwwww
5
n = 200
M M M M F
M M M M F
M M F M F
M XFMM21
M~ L I P M F F
M F M M M M
F F F F M M
M M M F F F
M F M F M
M M M M M
M M F F F
F F M M F
F MMF F
M F M F F
F M M M F
F M F F F
M F F M
F F F M
M F n i
F F M M
F M F F
F M F M
M M M F
M M M M M
F F F F M
H F M F M
F M M M F
F M M M M
F F F M F
F F M F M
F F F F M
F F M M F
F M F F M
FFMFF
~ ~ F M F MMM F N M
M M F F F
M M F F M
F M M F M F M M F M
F M M M M
M F M F F
F M M F M
hfMMFh1
M M M F M
F F F M M
F F F F F
F M F F M
F M F B I M
MMMMBI
F M F M
F F n f
F F M F
F F F F
F M F M
Februant
FF"MMM
M M M F M
F M M F F
M F M F M
F F
March
M F F
M F M F F
M F F F F
M M F F M
A3n*"I
F F M F M
F M F M M
M M F M M
F M F F M
F F F F M
F F M F F
N F nIMF
M F F F M
M M M F M
M M F F M
M F M F F
F M M M F
F M M M F
F MMF F
Julu
_ -
B M M M MF
F F F M M
M F M F F
M F M M F
F M F F M
F F M M M
M M M M
F M F F F
M M M M M
F F M M M
M F M F F
M M M M F
M
F
M
M
M
M
M
F M M F F
M M M M M
F M F F F
F M F M F
F M M F M
F F M F F
F M F M M
F M F M M
M M M F M
F F M M F
F M F M M
M F M M F
M M F F M
M M F M F
F F F F F
MMMhfF
M F M M F
F F M F F
F M M F F
M M M M F
F F M F F
M F M F M
M F M M F
M F F F M
M F M M F
M M F M H
F M F F F
M F F M F
F F~ M F~ F M F
M M F F F
h f F M F F
F M M M F
M F M F F
M M F F M
F F M M M
M F F M F
M F F F M
F F M M M B M F M F
M M M F F
F M M F M
J ~ F M F MF F M F M
h f M F P P M M M F F
J f I p M F ni
~ I M M
F F
M M M N M F M M F F
FMM:FM
M M F M M
M M M F M
F M M F M
F M F F M
F M M F F
MMMhIM
F M M M M
F F F M F
M M F M M
F M M M F
n i ~ b i n l ~
F M F F F
F F M F M
M M M F
M F M F M
M F M M F
F M M F F
F M M F F
B F M M F
F F F M F
F F F M F
F M F M M
R F M F E '
F M F M M
M M M M M
F M F M F
M M M F F
n f ~ n r n r F~ F F M M
F M M F F F F F M M
O&ET
__....
F F F F M
M M F F F
F F M F F
h i F M M M
F
~ F F n i ~
F F M F M
M F F F M
F F M M F
M F M M F
F M F M M
F M M M M
M M F F M
F M F F h l
B F M F M
M M M F F
L ~ n i+fF
i ~
F F F M h f
M M F F h I
F F F F F
n r ~ ~ a i n f
M F F F B ~
M F M M F
F F F M F
h l F F h f F
F F M F F
F F F F F
~ ~ M M
M F
MFJlBfM
hfFMMM
F F F M F
The survey was conducted by Srilekha Basu, a first year student. The
data refer to births in some months in 1956.
Frequencies
Number
Male
children
5
27
4
34
65
30
9
65
70
22
5
200
2.22
200
5.04
64
Total
Chisquare
White
beads
Expected
6.25
31.25
62.50
62.50
31.25
6.25
200.00
-
events such as the sequence of male and female births. There are a
number of ways of exploiting randomness to make inroads on baffling
questions, to solve problems that are too complex for an exact
solution, to generate new information and also perhaps to help in
evolving new ideas. I shall briefly describe some of them.
than the length of the side of the square, and plot the point with
coordinates (x, y) in the square. Repeat the process a number of times
and suppose that at some stage, a,,,is the number of points that have
fallen within the picture area and rn is the total number of points that
have fallen within the square. There is a theorem, called the law of
large numbers, established by the famous Russian probabilist A.N.
Kolmogorov, which says that the ratio a,,,/m tends to the true
proportion of the area of the picture to that of the square, as m
becomes large provided the pairs (x, y) chosen to locate the points are
truly random. The success (or precision) of this method then depends
on how faithful the random number generator is and how many we
can produce subject to given resources.
Under the leadership of Karl Pearson, the method was used by
some of his students to find the distribution of some very complicated
sample statistics, but it did not catch up immediately except perhaps
in India at the Indian Statistical Institute (ISI), where Professor P.C.
Mahalanobis exploited Monte Carlo techniques, which h e called
random sampling experiments, to solve a variety of problems like the
choice of the optimum sampling plans in survey work and optimum
size and shape of plots in experimental work. The reason for delay in
recognizing the potentialities of this method may be attributed to
non-availability of devices to produce truly random numbers and in
the requisite quantity both of which affect the precision of results.
Also, in the absence of standard devices to generate random numbers,
the editors of journals were reluctant to publish papers reporting
simulation results. Now the situation is completely changed with the
advent of reliable random number generators and easy access to them.
We are able to undertake investigations of complex problems and give
at least approximate solutions for practical use. The editors of
journals insist that every article submitted should report simulation
results even when exact solutions are available! As a matter of fact,
the whole character of research in statistics, perhaps in other fields
too, is gradually changing with greater emphasis on what are called
"number crunching methods," of which a well known example is the
true proposition as n
-+
00
0
1
1
0
1
1
0
1
1
0
0
1
1
0
1
0
1
1
0
1
0
0
0
0
1
0
1
1
0 0
1
1
0
1
1
Random digits
Senders message
Transmitted message
Same random digits
by Receiver
and ask the respondent to toss a coin and answer S correctly if head
turns up and T correctly if tail turns up. The investigator does not
know which question the respondent is answering and the secrecy of
information is maintained. From such responses, the true proportion
of individuals smoking marijuana can be estimated as shown below:
'IF = unknown proportion smoking marijuana, which is the
parameter to be estimated.
even digit.
p = observed proportion of yes responses.
Then:
?r
+=2p-X
'
Galileo Galilei, known by his first name was an Italian astronomer, mathematician and
physicist who has been called the founder of modem experimental science. His name is associated
with discoveries of the laws of pendulum, craters on the moon, sunspots, four bright satellites of
Jupiter, telescope, and so on. These discoveries convinced Galileo that Nicholaus Copemicus's
"CopercianThcory" that earth rotates on its axis and revolves around the sun was true. But this was
contrary to Church's teaching and Galileo was forced by the Inquisition to retract his views. It is
interesting to note that a few years ago, the present pope exonerated Galileo from the earlier charges
made by the Church on the basis of a report submitted by a committee appointed by him.
...
I
8ome conjeoturos (formulae) in tho Lost Note Book of Ramanujon.
Appendix: Discussion
A . l Chance and Chaos
I
10
JO
20
TIME ( s e e )
cos X,t
cos X,r
...
cos Ant
6
for sufficiently large n, choosing a sequence of numbers XI,
and a scale factor a. Kac asks: So what is chance?
..., A,,
A.2 Creativity
Creativity 29
any one else can think of in the twentieth century. Reflecting on the
nature of this mysterious element in the act of creation, that is, in the
birth of new ideas or new discoveries, Professor Rao speculates
whether randomness is not an important part of creativity. In fact he
puts forward a new tentative paradigm for understanding creativity.
Let me quote from him. "It is then clear that a necessary condition
for creativity is to let the mind wander unfettered by the rigidities of
accepted knowledge or conventional rules. Perhaps the thinking that
precedes a discovery is a fuzzy type, a successful interplay of random
search for new frameworks to fit past experience and subconscious
reasoning to narrow down the range of possibilities." Perhaps even
the random search is at a subconscious level. That much creative
work gets done at a subconscious level has been authenticated many
times - a brilliant account was complied by Hadamard [Hadamard, J.
(1954): Essay on the psychology of invention, in the book
Mathematical Field, Princeton, Dover.] But the association with
randomness and uncertainty, the concepts that we quantify through
probability statements, is a brilliant additional hypothesis. In the form
of a vague reference to chance it occurs in Hadamard, but does not
receive much attention. It is probably the central thesis to which
Professor Rao leads us through a dizzying glimpse of Ramanujan's
almost magical powers and a masterly overview of randomness and
uncertainty. The following remarks are confined to this thesis.
It seems to me that there is always an element of creativity,
including its magical quality, when one makes an inductive leap or
even when one is involved in a non-trivial learning process. Two
consequences would seem to follow from this. First, at least part of
the mystery relating to creativity is related to a lack of proper
philosophical foundations for induction, in spite of many attempts,
specially by the Viennese school. Such attempts have been frivolously
described as attempts to pull a very big cat out of a very small bag.
Secondly the mystery of creativity is also related to a lack of a
satisfactory model for learning in artificial intelligence. A third fact,
which is relevant in this context is worth pointing out. As far as I
Creativity 31
which itself may be at two levels of sophistication: that made within
the framework of an existing paradigm and that, at a higher level,
involving a paradigm shift. The mechanism of creative processes of
both kinds may not be completely known, but few aspects of it are
generally recognized: subconscious thinking when the mind is not
constrained by logical deductive processes, serendipity, transferring
experience gained in one area to a seemingly different area, and even
aesthetic feeling for beauty and patterns. The following is a sample
of quotations about creativity.
pour inventor il faut penser ci c6tt.
(to invent you must think nside)
One sometimesjinds what one is not looking for.
I do not seek, I j W .
- Souriau
- A. Fleming
- Picasso
M y work always tried to unite the true with the beautiful; but when I had
to choose one or the other, I usually chose the beautiful.
- H. Weyl
I had my results for a long time, but I do not yet know how to arrive at
them.
- Johann Gauss
- Isaac Newton
Creativity 33
spark", "flash of genius" and "sudden insight". As such it is believed
that creativity results from information processing and hence
programmable.
In a recent book Scientific Discovery (Computational
Exploration of Creative Processes, MIT Press, Cambridge), the
authors, Pat Langley, Herbert A. Simon, Gary L. Bradshaw and Jan
M. Zytkow, discuss the taxonomy of discovery and the possibility of
writing computer programs, for information processing aimed at
"problem finding, " "identification of relevant data" and ''selective
search guided by heuristics," the major ingredients of creativity.
They have given examples to show that several major discoveries
made in the past could have been accomplished, perhaps more
effectively through computer programs using only the information
and knowledge available at the times of these discoveries. The authors
hope that the theory they have on problem solving will provide
programs to search for solutions even involving paradigm shifts
initiating new lines of research. The authors conclude by saying:
We would like to imagine that the great discoverers, the scientists whose
behaviour we are trying to understand, would be pleased with this
interpretation of their activity as normal (albeit high-quality) human
thinking. Science is concerned with the way the world is, not with how
we would like it to be. So we must continue to try new experiments, to be
guided by new evidence, in a heuristic search that is never finished but it
always fascinating.
...
y =f,(x,, ...,xJ
+e
Ambiguity
37
0
93
1
2
3
4 5
6 7
8 9
116 103 102 93 97 94 95 101 106
100 100 100 100100 100 100 100 100 100
It is repolted that a Chinese boy Zhang Zuo aged 12 could recall from memory the first 4000
digits of T in 25 minutes and 30 seconds.
~~
1415926535
5820974944
8214808651
4811174502
4428810975
4564856692
7245870066
7892590360
3305727036
0744623799
9833673362
6094370277
0005681271
1468440901
4201995611
5187072113
5024459455
7101000313
5982534904
1857780532
8979323846
5923078164
3282306647
8410270193
6659334461
3460348610
0631558817
0113305305
5759591953
6274956735
4406566430
0539217176
4526356082
2249534301
2129021960
4999999837
3469083026
7838752886
2875546873
1712268066
2643383279
0628620899
0938446095
8521105559
2847564823
4543266482
4881520920
4882046652
0921861173
1885752724
8602139494
2931767523
7785771342
4654958537
8640344181
2978049951
4252230825
5875332083
1159562863
1300192787
5028841971
8628034825
5058223172
6446229489
3786783165
1339360726
9628292540
1384146951
8193261179
8912279381
6395224737
8467481846
7577896091
1050792279
5981362977
0597317328
3344685035
8142061717
8823537875
6611195909
6939937510
3421170679
5359408128
5493038196
2712019091
0249141273
9171536436
9415116094
3105118548
8301194912
1907021798
7669405132
7363717872
6892589235
4771309960
1609631859
2619311881
7669147303
9375195778
2164201989
0
7
6.25
1
31
31.25
54
62.50
3
61
62.50
41
31.25
6
6.25
Chapter 2
1.
What is statistics?
42 Taming of Uncertainty
Early Records
44 Taming of Uncertainty
It is not clear why and how such masses of data were compiled, what
The term statistics has its roots in the Latin word status which
means the state, and it was coined by the German scholar Gottfried
Achenwall about the middle of the eighteenth century to mean
collection, processing and use of data by the state.
46 Taming of Uncertainty
48 Taming of Uncertainty
Taming of uncertainty 49
and statistical education were formed within the IS1 administrative set
up of ISI.
2.
Taming of uncertainty
As I have already said, statistics in the original etymological
50 Taming of Uncertainty
2.1
2.1.1 Deduction
Deductive reasoning was introduced by the Greek philosophers
more than two thousand years ago and perfected over the last several
centuries through the study of mathematics. We have given premises
or axioms, say A,, A,, A,, ..., each of which is accepted to be true
by itself. We can choose any subset of the axioms, say A,, A,, to
prove a proposition P,. The truth of P, solely depends on the truth of
the axioms A,, A,; the fact that the other axioms are not explicitly
used in the argument has no relevance. Similarly using A,, A,, A, we
may derive a proposition P, and so on.
By deductive reasoning no new knowledge is created beyond
the premises, since all the derived propositions are implicit in the
.axioms. There is no claim that either the axioms or the derived
propositions have any relation to reality as characterized by the
following quotations.
Mathematics is a subject in which we do not know what we are talking
about, nor care whether what we say is true.
- Bertrand Russel
- Tobias Dantzig
It is interesting to note that deductive logic which is the basis of
mathematics considered to be the "highest truth" is not without logical
flaws. As observed earlier, in deductive logic it is permissible to
prove a proposition choosing any subset of the axioms and the fact
that other axioms are not used has no relevance.
Then the following question arises. Is it possible that
one subset of axioms say A,, A, imply the proposition P and
another subset A,, A,, A, imply the proposition not P, leading to a
Taming of uncertainty 51
Deductive Reasoning
AXIOMS:
PROPOSITIONS:
P*
...
(Derived)
contradiction? Can it happen that postulate A,, A, imply that the sum
of three angles of a triangle is 180 while postulates A,, A4, A, imply
some other number? Attempts to prove that no such contradiction
arises with the axioms of mathematics has resulted in some surprises.
G d e l , the famous mathematical logician put forward an ingenious
proof, by an elaborate argument, to the effect that you could not
basing your reasoning on a given set of axioms disprove the
possibility that the system could lead to a contradiction.
It was also established that if a system of axioms allows the
deduction of a particular proposition P as well as not P, then the
system of axioms enables us to derive any contradiction we like. I
would like to recall an anecdote mentioned by Sir Ronald Fisher in
his lecture on "Nature of Probability" published in The Centennial
Review, Vol. 11, 1958. G.H. Hardy, the famous British mathematician
remarked on this remarkable fact at the dinner table one day in
Trinity College, Cambridge. A Fellow sitting across the table took
him up.
52 Taming of Uncertainty
Fellow: Hardy, if I said that 2 + 2 = 5 , could you prove any other
proposition you like?
Hardy: Yes, I think so.
Fellow: Then prove that McTaggart is the Pope.
Hardy: If 2+2=5, then 5=4. Subtracting 3 from each side 5-3=4-3,
i.e., 2= 1.
McTaggart and the Pope are two, but two is one. Therefore
McTaggart is the Pope.
2.1.2 Induction
The story is different with inductive reasoning. Here we are
confronted with the reverse problem of deciding on the premises
given some of its consequences. It is the reasoning by which decisions
are taken in the real world based on incomplete or shoddy
information. Some examples where induction is necessary are as
follows:
Making decisions under uncertainty in a unique situation
Prediction
Testing of h-vpothesis
*
*
Taming of uncertainty 53
These are some of the situations in the real world where decisions
have to be taken under uncertainty. We have observed data which
could have resulted from any one of a set of possible hypotheses or
causes, i.e., the correspondence between data and hypothesis is not
one to one. Inductive reasoning is the logical process by which we
match a hypothesis to given data and thus generalize from the
particular. This way, we are creating new knowledge, but it is
uncertain knowledge because of lack of one to one correspondence
between data and hypothesis. This lack of precision in our inference
from given data, unlike in deductive inference from given axioms,
stood in the way of codifying inductive reasoning. To the human
mind accustomed to deductive logic, the concept of developing a
theory or introducing rules of reasoning which need not always give
correct results must have appeared unacceptable. So, inductive
reasoning remained more as an art with a degree of success depending
on an individuals skill, experience and intuition.
Inductive Reasoning
Observed
data
Possible
hypotheses
54 Taming of Uncertainty
+
know ledge
Knowledge of
the extent of
uncertainty in it
Useable
knowledge
Taming of uncertainty 55
56 Taming of Uncertainty
Table 2.1 Weather Forecast (quantification of uncertainty)
Possibilities
Today's
It will rain tomorrow
atmospheric
conditions
It will not rain tomorrow
Chances
30 %
70%
2.1.4 Abduction
Taming of uncertainty 57
2.2
58 Taming of Uncertainty
consider it as given. This together with a knowledge of the probability
distribution of data (d) given a hypothesis (h), denoted by p(d Ih),
enables us to obtain the total (marginal) probability distribution of
observed data denoted by p(d). We are now in a position to compute
the conditional probability distribution of hypotheses given data,
called Bayes theorem,
P(hId) = P(hlP(d I h)
P(4
Taming of uncertainty 59
Chance may be the antithesis of all law. But the way out is to
discover the laws of chance. We look for the alternatives and provide
the probabilities of their happening as measures of their uncertainties.
60 Taming of Uncertainty
Knowing the consequences of each event and the probability of its
happening, decision making under uncertainty can be reduced to an
exercise in deductive logic. It is no longer a hit and miss affair.
3.
Future of Statistics
Future of statistics 61
skill and experience of a statistician, which makes statistics an art, as
in the example of the Red Fort Story (Section 2.14, Chapter 5 ) .
What is the future of statistics? Statistics is now evolving as
a metascience. Its object is the logic and the methodology of other
sciences - the logic of decision making and the logic of experimenting
in them. The future of statistics lies in the proper communication of
statistical ideas to research workers in other branches of learning; it
will depend on the way the principal problems are formulated in other
fields of knowledge.
On the logical side, the methodology of statistics is likely to
be broadened for using expert evidence in addition to information
supplied by data in assessment of uncertainty.
Having said that statistics is science, technology as well as an
art - the newly discovered logic for dealing with uncertainty and
making wise decisions - I must point out a possible danger to its
future development. I have said earlier that statistical predictions
could be wrong, but there is much to be gained by relying on
statistically predicted values rather than depending on hunches or
superstitious beliefs. Can the customer for whom you are making the
prediction sue you if you are wrong? There have been some recent
court cases. I quote from an editorial of The Pittsburgh Press, dated
Saturday, May 24, 1986 under the title, Forecasters Breathe Easier:
A federal appeals court has wisely corrected a gross miscalculation
of government liability in a case involving weather forecasting.
Last August, a U.S.District judge awarded $1.25 million to the
families of three lobster-men who were drowned during a storm that had
not been predicted. The judge said the government was liable because il
had failed to repair promptly a wind sensor on a buoy used to help forecast
weather conditions off Cape Cod.
The award was overturned the other day by the appeals court or
grounds that weather forecasting is a "discretionaryfunctionof governmen1
and not a reliable one at that".
"Weather predictions fail on frequent occasions" the appeals courl
said. "If in only a small proportion of cases, parties suffering ir
consequence succeeded in producing an expert who could persuade a judge
62 Taming of Uncertainty
that the government should have done better," the burden on the
government "would be both unlimited and intolerable."
Chapter 3
'
The top 20 discoveries considered are, in no particular order: Plastics, the IQ test, Einstein's
theory of relativity, blood types, pesticides, television, plant breeding, networks, antibiotics, the Taung
skull, atomic fission, the big-bang theory, birth control pills, drugs for mental illness, the vacuum tube,
the computer, the transistor, statistics (what is true and what is due to chance), DNA, and the laser.
Historical developments 65
Historical developments 67
eliciting information from randomly chosen individuals on a set of
questions. In such a situation, problems such as ensuring accuracy
(free from bias, recording and response errors) and comparability
(between investigators and methods of enquiry) of data assumed
paramount importance. Mahalanobis (1931, 1944) was perhaps the
first to recognize that such errors in survey work were inevitable and
could be more serious than sampling errors, and steps should be taken
to control and detect these errors in designing a survey and to develop
suitable scrutiny programs for detecting gross errors (outliers) and
inconsistent values in collected data.
We have briefly discussed what are commonly believed to be
two branches of statistics, viz., descriptive and inferential statistics,
and the need felt by practicing statisticians to clean up the data of
possible defects which may vitiate inferences drawn from statistical
analysis. What was perhaps needed is an integrated approach,
providing methods for a proper understanding of given data, its
defects and special features, and for selection of a suitable stochastic
model or a class of models for analysis of data to answer specific
questions and to raise new questions for further investigation. A great
step in this direction was made by Tukey (1962, 1977) and Mosteller
and Tukey (1968) in developing what is known as exploratory data
analysis (EDA). The basic philosophy of EDA is to understand the
special features of data and to use robust procedures to accommodate
for a wide class of possible stochastic models for the data. Instead of
asking the Fisherian question as to what summary statistics are
appropriate for a specified stochastic model, Tukey proposed asking
for what class of stochastic models, a given summary statistic is
appropriate. Reference may also be made to what Chatfield (1985)
describes as initial data analysis, which appears to be an extended
descriptive data analysis and inference based on common sense and
experience with minimal use of traditional statistical methodology.
The various steps in statistical data analysis are exhibited in
Chart 1, which is based on my own experience in analyzing large
data sets and which seems to combine K.P.s descriptive, Fishers
68
DATA
COLLECTION
TECHNIQUES
Design
of
Experiments
Historical
(published
material)
Random
Sample
Surveys
RECORDED MEASUREMENTS
HOW ASCERTAINED ?
DATA
CROSS
EXAMINATION
OF DATA (CED)
MODELLING
CONCOMITANT
EXPERT OPINIONS
VARIABLES
PRIOR INFORMATION
SPECIFICATION OR CHOICE
OF STOCHASTIC MODEL
(cross validation, how to use expert opinions
and previous findings, Bayesian analysis ?)
INFERENTIAL
DATA
ANALYSIS (FDA)
TESTING
m
DISPLAY
Historical developments 69
inferential and Tukeys exploratory data analyses, and Mahalanobis
concern for non-sampling errors.
In Chart 1, data is used to represent the entire set of recorded
measurements (or observations) and how they are obtained, by an
experiment, sample survey or from historical records, and the
operational procedures involved in recording the observations, and
any prior information (including expert opinions) on the nature of data
or the stochastic model underlying the data.
Cross-examination of data (CED) represents whatever
exploratory or initial study is done to understand the nature of the
data, to detect measurement errors, recording errors and outliers, to
test validity of prior information and to examine whether data are
genuine or faked. The initial study is also intended to test the validity
of a specified model or select a more appropriate stochastic model or
a class of stochastic models for further analysis of data.
Inferential data analysis (IDA) stands for the entire body of
statistical methods for estimation, prediction, testing of hypotheses
and decision making based on chosen stochastic model for observed
data. The aim of data analysis should be to extract all available
information from data and not merely confined to answering specific
questions. Data often contain valuable information to indicate new
lines of research and to make improvements in designing future
experiments or sample surveys for data collection. I would like to
enunciate the main principle of data analysis in the form of a
fundamental equation:
+
on New Lines of Research
2.
Cross-examination of data
2. 1
Editing of data
Number
Attacked
40-59
60-79
80
198
1440
1525
1470
842
1519
752
118
154
1117
1183
1140
653
1178
583
92
77.8
77.7
77.6
77.6
77.6
77.6
77.5
78.0
44
3
2
4
10
46
46
15
28.6
0.3
0.2
0.3
1.5
3.9
7.9
16.3
Total
7864
6100
77.6
170
2.8
Age
(Years)
<1
1-9
10-19
20-29
30-39
Attack
Rate
(percent)
Number
of
Deaths
Case
Fatality
(percent)
Source: Peter L. Panum. Observations Made During the Epidemic of Measles on B e Faroe
Islands in the Year 1846. New York: Delta Omega Society, 1940, p.82.
in all age groups, the fatality varied significantly, being higher under
one year and then rising steadily for those over age thirty." Is this
conclusion valid?
What is striking in the table is the rather uniform attach rates
of measles for all age groups (indicated by blocking) with very little
or no variation from the overall attack rate of 77.6. Could this occur
by chance even if the true attack rate is common to all the age
groups? There is a strong suspicion that the number attacked in each
age group was not observed but reconstructed from the known
population size in each age group by multiplying it by the common
overall attack rate of 6100/7864 = .776 and rounding off to the nearest
integer. Thus the figures 154 for age less than 1, and 92 for over 80,
could have been obtained as follows:
(2.1.1)
.- .778;-=.7796
92
118
- .780
(2.1.2)
as reported by the authors and also explains why the reported attack
rates differ slightly in the third decimal place. A reference to the
original report in German by the well known German epidemiologist
who was sent to Faroe Islands to combat the epidemic of measles,
Panum revealed that the number attacked was not originally classified
by age groups but the number attacked in each group was
reconstructed in the manner explained in the equation (2.1.1) by the
editor of the English translation assuming a uniform attack rate. The
attack rates reported in the blocked column of the above table are not
found in the table on page 87 of the English translation, which are
probably computed by the authors Fox, Hall and Elveback of the
book, Epidemiology, Man and Disease, in the manner explained in
(2.1.2). In view of this, the age specific fatality rates computed from
the reconstructed values of the number attacked in each group and the
consequent interpretation may not be valid. A statistician is often
required to do detective type of work! (The second entry in the
blocked column should be 77.6!)
2.2
tar
H.B.
KOLAM
Yl
Ya
.15
1.62,
.71*
4.64*
.29
-.06
.48
1.12
.17
.19
.44
.ll
.83* 2.03.
-.14
-.03
L.A.L. -2.17*
.08
FA
.37
Bg.B.
-.05
YI
.39
-.14
-.63
0.98.
M A m
Ya
-42
H.L.
V.A.L.
KOYA
Y1
-1.DG'
-.30
6.88'
-.07
.59
-.06
1.72'
-.40
.66*
-.01
-.08
RAT QOND
MARIA
Yl
-.
27
Ya
Yi
Ya
.48
- .30
.23
.05 - . 0 9
--.32 .28
.32 -.05
-.27
.13
-.lo
.T6
-.04 me24
.14 --.40
.74
.19
-,67
-.02
.28 -.06 - - . 6 7
-.02
The values in the aocond line for caoh charactor am calculated after omitting ertrorne
obmrvationa.
Faking of data
Real data
hospital simulated
Expectation
(binomial
distirbution)
Imaginary data
(A)
(B)
(1)
(2)
(3)
(4)
(5)
(6)
0
1
2
3
4
5
2
26
65
64
31
9
27
64
68
32
4
6.25
31.25
62.50
62.50
3 1.25
6.26
2
20
78
80
17
3
32
63
61
33
6
Tots--
200
200
200.00
200
200
2.10
2.18
23.87
0.54
P(X2> x2o)
test hypothesis
degrees of
freedom
Xo2
(observed)
3:l ratios
2:l ratios
bifactorial
gametic ratios
trifactorial
7
8
8
15
26
2.1389
5.1733
2.81 10
3.6730
15.3224
0.95
0.74
0.94
0.9987
0.95
total
64
29.1 186
0.99987
illustrations of
plant variation
20
12.4870
0.90
total
84
41.6056
0.99993
remarkable study, R.A. Fisher (Annals of Science, 1, 1936, pp. 115137), examined the data by computing the chi-square values
measuring the departure from Mendel's theory in groups of
experiments. The results are reported in Table 3.4.
We see from the last column of Table 3.4 that the probabilities
are extremely high in each case indicating that "data are probably
faked to show a remarkably close agreement with theory." The
overall probability of such good agreement is
1
- .99993 = 7/100000
H
T
H
H
H
T
T
H
T
H
H
H
H
T
T
T
T
T
H
T
H
H
H
T
H
H
T
T
T
H
T
H
H
H
H
H
H
T
H
T
H
H
T
H
H
seem to be more uniform than what is expected by chance. The chisquare for these values is
7
15 fl. 1 inch 1 - lines
9
and the latter as
15 Jt. 1 inch 1 -1 lines
2
respectively, where one line = 1/12 inch, giving a precision of 1 part
in 3000 for comparison. The velocity of sound was estimated to be
1142 ft. per second which has a precision of 1 part in 1OOO. Newton
computed the precision of the equinoxes to be 50' 01"' 12iv,which has
a precision of 1 part in 3000. Such a high degree of precision was
?r
-+
p almost surely, as N
-+ 00
WN
as
37r
7ru
which differs from the true value only in the 7th decimal place!
Notice the strange numbers that appear in the above
computation and how the numbers factorize nicely yielding the value
of T as the ratio 355/113 which is known to be the best rational
approximation to a involving small numbers (due to the 5th century
Chinese Mathematician Tsu Chung-Chin). The next best rational
approximation is 52 163/16604 involving rather large numbers. The
game played by Lazzarini is now clear as revealed by independent
investigations due to N.T. Gridgman (Scripta Mathernatica, 1961) and
T.H. O'Beirne (The New Scientist, 1961, p.598). In order to get the
ratio 355/113 when Z/a=5/6,one has to get the ratio 113/213 for R/N,
i.e., 113 successes in 213 trials (at the minimum) or 1 13 k successes
in 213 k trials for any integer k. In Lazzarini's case k was 16. There
are two possibilities. Either he did not do any experiments which he
described in great detail in his article and just reported the numbers
he wanted. Or, he did experiments in batches of 213 trials and
"watched his step" till he struck the right number of successes. With
16 repetitions, as done by Lazzarini, the chance of getting the right
number of successes, 113x16, is about 1/3.
Laplace, in his Thehie Analytique des ProbabiZitb wrote:
It is remarkable that a science which began with consideration of games of
chance should have become the most important object of human knowledge.
2.5
(ii)
(iii)
*
*
in the sample.
The population under study has a heavy tailed
distribution so that the occurrence of large values is
not rare.
*
*
Meta analysis 87
shall leave one thought to the reader.
To omit or not to omit an outlier or a spurious observation is
a serious dilemma as the following example shows. Suppose that we
have N observations from a population with mean p and standard
-
a mean value
. Let us ignore the fact that
contaminating observations and estimate p by
arose from
a2
N + M [I
M 262
+
< V(2)
2
N
Meta analysis
Teacher:
Student:
badly needed!
In making decisions, one has to take into account all the
available evidence which may be in the form of several pieces of
*
*
*
*
is
=(x,+x,+x)/3. However, if after drawing the sample we find
that two of the trees chosen are next to each other with the
corresponding yields, say x, and x2, then we may be better off in
giving the alternative estimator R =(y+x,)/2 where y =(x, +x,)/2.
It may be seen that if the yields of consecutive trees are highly
correlated, then the variance of R is less than that of 2 in
samples where at least two consecutive trees are chosen. Such
strategies as using different methods for different configurations of
the sample under the same stochastic model should be explored.
Then, there is the problem of "Oh! Calcutta." Suppose that
someone who is not aware of the large differences in the populations
of Calcutta and the rest of the towns and cities (which we refer to as
units) in the state of West Bengal tries to estimate the total population
of the state by taking a simple random sample of the units without
replacement. The usual formula in such a case, which is proved to be
N - 1
(xz + ... + x,) .
n - 1
References
Chatfield, C. (1985). The Initial examination of data. J. Roy. Stat. Sco. A, 148,
2 14-253.
Cleveland, W.S. (1993). Visualizing Data, AT&T Bell Laboratories, Murray Hill,
New Jersey.
Efron, B. (1979). Bootstrap methods: Another look at jack-knife. Ann. Statist. 7 ,
1-26.
Fisher, R.A. (1922). On the mathematical foundations of theoretical statistics.
Philos. Trans. Roy. SOC. 222, 309-368.
Fisher, R. A. (1925). Statistical Methou's for Research Workers, Olivia and Boyd.
Fisher, R.A. (1934). The effect of method of ascertainment upon estimation of
frequencies. Ann. Eugen. 6 , 13-25.
Fisher, R.A. (1936). Has Mendel's work been rediscovered? Annals of Science
1, 115-137.
Fox, J.P., Hall, C.E. and Elveback, L.R. (1970). Epidemiology, Man and
Disease, MacMillan Co, London.
References
93
Pitman, E.J.G. (1937). Significance tests which may be applied to samples from
any population. J. Roy. Statist. SOC. Ser. B, 4, 119-130.
Rao, C. Radhakrishna (1948). The utilization of multiple measurements in problems
of biological classification. 1. Roy. Statist. SOC. B, 10, 159-203.
Rao, C. Radhakrishna (1971). Taxonomy in anthropology. In Mathematics in
Archeological and Historical Sciences, Edin. Univ. Press, 329-358.
Rao, C. Radhakrishna (1987). Prediction of future observations in growth curve
models. Statistical Sciences, 2, 434-47 1.
Shewart, W.A. (1931). Economic Control of Quality of Manufactured Product,
D. Van Nostrand, New York.
Tukey, J. (1962). The future of data analysis. Ann. Math. Statist., 30, 1-67.
Tukey, J. (1977). Exploratory Data Analysis (EDA), Addison Wesley.
U d l a Pingle (1982). Morphological and Genetic Composition of Gonds of Central
India: A statistical study, Ph.D. Thesis, Submitted to Indian Statistical
Institute.
Wald, A. (1950). Statistical Decision Functions, Wiley, New York.
Chapter 4
Specification
96
Weighted Distributions
Truncation
Truncation 91
and having no albino children get confounded with normal families.
The actual frequency of the event zero albino children is thus not
ascertainable.
In general, if p(x,f3) is the p.d.f. (probability density function
for a continuous variable or probability for a discrete variable), where
8 denotes an unknown parameter, and the random variable X is
truncated to a specified region T C Q of the sample space, then the
p.d.f. of the truncated random variable XT is
section.
Suppose the event zero is not observable in sampling from a
binomial distribution with index n and probability of success a. Let
RT denote the TB (truncated binomial) random variable. Then
+(l-~y-~
P(R T=r)= n!
, r=l,...,n.
r!(n-r)! 1 -(1-..)"
(2.2)
E(R 3=
nu
1-(1-a)"'
E(R Tin) =
1 -(1 -a)"
(2.3)
which are somewhat larger than those for a complete binomial, for
which the above values are n a and a respectively.
The following data relate to the numbers of brothers and
sisters in families of the girls whose names were found in a private
telephone notebook of a European professor. (The first number within
98 Weighted Distributions
the brackets gives the numbers of sisters including the respondent and
the second number, that of her brothers.)
Since at least one girl is present in the family, we may try and see
whether the data conform to a TB distribution with the observation on
zero sisters missing (i.e., Binomial truncated at zero). The expected
number of girls under this hypothesis, assuming a=0.5, is
where f(n) is the observed number of families with size n(i.e., the
total number of brothers and sisters). Using the formulas (2.3) and
(2.5) and the data (2.4), we have:
Number of
observed
expected
Sisters
Brothers
The observed figures seem to be in good agreement with those
expected under the hypothesis of truncated binomial. However, a
different story may emerge in a similar situation as in the following
data giving the numbers of sisters and brothers in the families of girl
acquaintances of a male student in Calcutta.
Weighted distributions 99
binomial is 14.6 (using the formulas (2.3) and (2.5)) whereas the
observed number is 17. The truncated binomial is not appropriate for
the data (2.6)and it appears that the mechanisms of encountering
girls seem to be different in the cases of the European professor and
the Calcutta student.
Note that if we sample a number of households in a city and
ascertain the numbers of brothers and sisters (i.e., sons and
daughters) in each household, then we expect the number of sisters
to follow a complete binomial distribution. If from such data we omit
the households which do not have girls, then the data would follow
a truncated binomial distribution. The professor seems to be sampling
from the general population of households with at least one girl. We
shall see in the next section that a different distribution holds when
data are ascertained about sisters and brothers from boys or girls one
encounters. The case of the student seems to fall in such a category.
3.
Weighted distributions
weighted distribution
(3.2)
where f(x) is some monotonic function of x, is called a size biased
distribution. When X is univariate and nonnegative, the weighted
distribution
P(X=r)=
r=1,2, ,..
(3.4)
P(X"=r)=(1-@)@-',
r=l,2,...
(3.5)
P.P.s.sampling 101
original distribution. An exception is the logarithmic series
distributions.
An extensive literature on weighted distributions has appeared
since the concept was formalized in Rao (1965); it is reviewed with
a large number of references in a paper by Patil (1984) with special
reference to the earlier contributions by Patil and Rao (1977, 1978)
and Patil and Ord (1976). Rao (1985) contains an updated review of
the previous work and some new results.
4.
P.P.s. sampling
a*.,
(X",Yd)
(4.4)
from the distribution (4.1), then an estimate of E(X), the mean with
respect to the original p.d.f. p(x,y,8), which is the parameter of
interest, is
z
=
xi
i
Empirical theorems
(i)
(ii)
(iii)
(iv)
103
k
2(B+s)
P(r)=
n!2-" , r=0,1,2,...
r !(n-r)!
n! -, 1
*m=r!(n-r)!
2"-1
r=1,2,...
(5.2)
pw(r)=ir[
2 n
[ [
"=
n-l
r-1
1 [$1
n-l
, r=1,2,... . (5.3)
In Rao (1977), it was argued that (5.3) is more appropriate for the
observed data than (5.2). Table 4.1 gives the observed frequency
distributions of the number of brothers in families of different sizes
based on the data collected separately from the male and female
*E(r- 1 ) =-.n-1
2
(5.5)
T+k
...+r,,
(5.7)
Empirical theorem
105
n1 2
11x3
No. of
expeaed
expected
expected
brothers
observed TB WB observed TB WE observed TB WB
1
24
28.7 21.5
12
20.1
11.7
19
14.3 21.5
24
20.2
23.6
11
6.7
11.7
47
47.0
Total
43
43.0 43.0
47.0
n=6
No. of
expected
expected
expected
brothers
observed TB WE observed TB WB observed TB WB
1
11.2
5.3
6.5
2.5
1.9
0.6
10
16.8
15.7
12.9
10.0
4.8
3.1
17
11.2
15.7
15
12.9
15.0
6.3
6.3
2.8
5.3
10
6.5
10.0
4.8
6.3
1.2
2.5
1.9
3.1
0.3 0.6
20
20.0 20.0
5
6
~~
Total
42
42.0 42.0
40
40.0
40.0
B-k
x2
B+S
B+S-k
Bangalore(India,75)
Delhi (India,75)
Calcutta (India,63)
Waltair (India,69)
Ahmedabad (India, 75)
Tirupati(India,75)
Poona (India.75)
Hyderabad (India, 74)
Tehran (Iran,75)
Isphahan(Iran,75)
Tokyo (Japan,75)
Lima (Peru,82)
Shanghai(China,82)
Columbus (USA,75)
College St. (USA, 76)
55
29
104
39
29
592
47
25
21
11
50
38
74
29
63
180
92
414
123
84
1902
125
72
65
43
90
132
193
65
152
127
66
312
88
49
1274
65
53
40
32
34
87
132
52
90
.586
.582
.570
.583
.632
.599
.658
.576
.619
.584
.725
.603
.594
.556
.628
.496
.490
.498
.491
.523
.484
.545
.470
.500
.515
.540
.519
.474
.409
.497
.02
.07
.04
.09
.35
.50
1.18
.36
.19
.06
.49
.27
.67
2.91
.01
Total
1206
3734
2501
.600
.503
0.14
[Actually, the Chi-squares are too small which needs further study of
the mechanism underlying the observed data.]
Empirical theorems
107
Lima (Peru,82)
Los Banos
(Philippines,83)
Manila
(Philippines,83)
Bilbao (Spain,83)
Shanghai(China,82)
B
B-k
B+S
B+S-k
x2
16
37
48
.565
.464
.36
44
101
139
.579
.485
.18
84
14
27
197
19
28
281
35
.588
.SO0
.OO
.576
.662
.525
.5OO
.OO
55
.10
B
B+S
State College
(USA, 75)
Warsaw (Poland, 75)
Poznan (Poland, 75)
Pittsburgh (USA, 81)
Tirupati (India,76)
Maracaibo
(Venezuela, 82)
Richmond (USA,81)
Total
B-k
B+S-k
x2
.584
.525
.567
.565
.480
2.53
2.52
1.88
2.99
0.39
28
18
24
69
50
80
41
50
169
172
37
21
17
77
132
.690
.660
.746
.687
24
26
95
57
56
29
.629
.663
.559
.517
1.77
0.03
239
664
369
.642
.535
3.95
,566
E[B/(B+S)] :
.75
.67
.625
.6
6
.58
(ii)
(iii)
Empirical theorems
109
~~~
10
~~
E(b,)=p,+-p,+-p,+
4
E (b,) =
16
1 p3
-81 p4 ,..
+
(5.9)
(5.10)
N=n and the number of brothers B=b, and suppose that the
probability of selecting such a family is proportional to b. Then
(5.11)
(5 * 12)
p "(n)=-,
np(n) Ew(lIN)=lIE(N)
(5.13)
E(N)
so that the harmonic mean of observations n,,
from the distribution (5.11) or (5.12)
(5.14)
where 4 = ( 1 - ~ ) From
.
(6.2), it follows that the distribution of family
size in the general population, given that a family has at least one
alcoholic, is
(6.3)
If we had chosen households at random and recorded the family sizes
in households containing at least one alcoholic, then the null
hypothesis on the excess of alcoholics in larger families could be
tested by comparing the observed frequencies with the expected
frequencies under the model (6.3). However, under the sampling
scheme adopted of ascertaining the values of n and r from an
alcoholic admitted to a clinic, the weighted distribution of (n,r),
(6.4)
is more appropriate. If we had information on the family size n as
well as on the number of alcoholics (r) in the family, we could have
compared the observed joint frequencies of (n,r) with those expected
under the model (6.4).
From (6.4), the marginal distribution of n alone is
np(n)/E(N), n = 1,2, ....
(6.5)
113
..
T ~ - ~ @ - ~1,.
, s =,n;r= 1,.
(6.6)
(6.7)
115
1
2
3
4
n=l
0
E
21
21
2
E
22
10
16
16
3
0
17
14
9
13.3
13.3
13.3
11
10
11.75
11.75
11.75
11.75
13
13
O=observed, E =expected.
2
3
4
n=l
14
6
9
4
2
6
2
0
0
many are first born, second born, etc.. There will be a preponderance
of the earlier born.)
7.
Damage models
117
8.
Damage models
P(R=r1N=n)=s(r,n).
Then the marginal distribution of R truncated at zero is
PI = ( I -PV
where
...,
(8.1)
References
119
which is the same as (8.4). It is shown in Rao and Rubin (1964) that
the equality p: = p,' characterizes the Poisson distribution.
The damage models of the type described above were
introduced in Rao (1965). For theoretical developments on damage
models and characterization of probability distributions arising out of
their study, the reader is referred to Alzaid, Rao and Shanbhag
(1984).
References
Alzaid, A.H., Rao, C.R. and Shanbhag, D.N. (1984): Solutions of certain functi
onal equations and related results on probability distributions. Technical
Report, University of Sheffield, U.K.
Cox, D.R. (1962): Renewal Theory. Chapman and Hall, London.
Feller, W. (1966): An Introduction to Probability Theory and its Applications,
Vol. 2, John Wiley & Sons, New York.
Feller, W. (1968): An Introduction to Probability 7heory and its Applications,
Vol. 1 (3rd edn.), John Wiley & Sons, New York.
Fisher, R.A. (1934): The effect of methods of ascertainment upon the estimation
of frequencies. Ann. Eugen., 6 , 13-25.
Patil, G.P. (1984): Studies in statistical ecology involving weighted distributions.
In Statistics: Applications and New Directions, 478-503. Indian Statistical
Institute, Calcutta.
Patil, G.P. and Ord, J.K. (1976): On size-biased sampling and related forminvariant weighted distributions. Sankhyd Ser. B 33, 49-61.
Patil, G.P. and Rao, C.R. (1977): The weighted distributions: A survey of their
applications. In Applications of Statistics (P.R. Krishnaiah, Ed.), 383-405,
North Holland Publishing Company, Amsterdam.
Patil, G.P. and Rao, C.R. (1978): Weighted distributions and size biased
sampling with applications to wildlife populations and human families.
Biometrics, 34, 170-180.
Rao, C.R. (1965): On discrete distributions arising out of methods of
ascertainment. In Classical and Contagious Discrete Distributions, (G.P.
Chapter 5
122
In Search of Truth
1.1
Scientijic Laws
Scientijic laws are not advanced by the principal of authority
or justified by faith or medieval philosophy; statistics is the
only court of appeal to new knowledge.
P.C. Mahalanobis
A beautiful theory, killed by a nasty, ugly little fact.
Thomas H. Huxley
Deductive
Reasoning
Inductive
Reasoning
(4
(9
124
In Search of Truth
1.2
Decision Making
To guess is cheap, to guess wrongly is expensive.
An old Chinese proverb
Is the last born child more or less intelligent than the first born? What
will be the price of gold two months from now? Does the use of a
seat belt protect the driver of an automobile from serious injuries in
an accident? Do the planets control our movements, actions and
achievements? Are astrological predictions correct?
These are all situations which cannot be resolved by
philosophical discussions or by using existing (or established)
theories. No definite answers can be found from available information
or data, and any prescribed rule for selecting one out of possible
answers will be subject to error. The alternative to avoiding mistakes
is not refiainingfiom taking decisions. There can be no progress that
way. The best we can do is to take decisions in an optimal way by
minimizing the risk involved. We discuss a number of examples
where statistics enabled to resolve the issues involved.
1.3
126
In Search of Truth
LAYMAN
GOVERNMENT
Policy decisions
Long range planning
Services (weather,
pollution control,
Lifetime decisions
Wise investments
Daily chores
Participation in
country's
democratic
processes
etc.)
Dissemination of
information
CL
RESEARCH
Hard sciences
Soft sciences
Art, Literature
Archaeology
Economic history
V
Statistical evidence
Disputed paternity
Disputed authorship
Diagnosis
Prognosis
Clinical trials
128
In Search of Truth
Some examples
2.1
No. of times
a word is used
No. of distinct
words
14,376
4,343
2,292
1,463
1,043
837
638
> 100
TOTAL
846
3 1,534
2.2
132
In Search of Truth
Table 5.2 Frequency distributions of distinct words in poems
according to Shakespearean canon in poems of similar
length by different authors
Number of
times used in
Shakepares
works
(An
Expected
according to
Christopher John Donne New Shakesperian
Marlov
(The Ecstacy) poem
cannon
(four Poems)
10
17
3-4
5-9
6.97
4.21
3.33
16
5.36
22
12
11
10.24
10-19
20
17
10
13.96
20-29
12
13
14
21
10.77
30-39
12
16
8.87
40-59
13
14
12
18
13.77
60-79
10
9.99
80-99
13
13
10
7.48
243
272
252
258
258
41 1
495
487
429
No. of distinct
words ..
Total No. of
words ..
2.3
Dating of publications
134
In Search of Truth
2.6
Filiation of manuscripts
m e Language tree
+,+
2.8
138
In Search of Truth
Table 5.3 Lyell's geological classification
Name given to
geological strata
Percentage of
surviving species
Examples
PLEISTOCENE
(most recent)
PLIOCENE
(majority recent)
96 %
Sicilian Group
40 %
Sub-appenine
Italian Rocks,
English Crag
18%
MIOCENE
(minority recent)
3% or 4%
EOCENE
(dawn of the recent)
I1
II
II
II
II
2.11
140
In Search of Truth
Progeny
Pollen
parent
Seed
parent
left
right
Right
Right
Left
Left
Right
Left
Right
Left
44
47
45
47
:
:
:
:
56
53
55
53
142
In Search of Truth
Dr.Robert S p e ~ y the
, Nobel prize winner, established that in
each individual either the left or the right brain dominates, the left
Circadian rhythm
If you are asked, what is your height, you will, no doubt, have
144
In Search of Truth
differ, the next question may be, which part of the body elongates
more when we are asleep? To examine this, separate determinations
were made of the lengths between certain pints marked on the body,
both in the morning and in the evening. It was found that the entire
difference of about 1 cm occurred in that part of the body along
which the vertebral column is located. A plausible physiological
explanation is that during the day the vertebrae come closer by
shrinkage of the cartilages between them; they revert to the original
position when the body is relaxed.
Why do teachers prefer to lecture in the morning hours? It is
said that both teachers and students are fresh in the morning and there
is greater rapport between them. Is there any physiological
explanation of this phenomenon?
The change in the plasma levels seems to explain our alertness
in the morning hours. In normal subjects, the cortisol level is about
16 mg1100 ml at 8 a.m. and it gradually drops to 6 mg/100 ml by 11
p.m. (a decrease of 60 percent). The rise of cortisol in the morning
wakes you up and the trough in the evening puts you to sleep.
Consequently we are alert in the morning and gradually tend to be
sluggish as the night falls.
Several physiological characteristics of the human body, in
fact, vary during the day as was observed in the case of the height;
each has a particular circadian rhythm, that is, it follows a 24-hour
cycle. The importance of studying such variations, known as
Chronobiology, for optimum timing of administering medicines to
patients has been stressed by Halberg (1974). For instance, a dose of
a drug which is right at one time of the day can be found to be not
effective at another time; the action may depend on the levels of
different biochemical substances in the blood at the time of
administering the drug. Chronobiology is becoming an active field of
research with extensive possibilities of application. Much progress in
these studies is due to statistical techniques developed to detect and
establish periodicities in measurements taken over time.
Disputed paternity
Salt in statistics
146
In Search of Truth
148
In Search of Truth
150
In Search of Truth
machinery for manufacturing fertilizers, and the cost for this may be
only 50 or 60 million dollars in foreign exchange once for all. In this
way only 50 or 60 million dollars can serve the same purpose as 300
or 400 or 1400 million dollars. Would it not be still wiser to set up
machine building factories?
The argument sounds like the saying: For want of a nail, the
horse shoe was lost; for want of a shoe, the horse was lost; for want
of a horse, the rider was lost; for want of a rider, the kingdom was
lost.
Some of our economists have argued that the Mahalanobian
thinking is not in tune with principles of economics; in retrospect we
see that Mahalanobis' plan had helped in industrializing India.
2.17
152
In Search of Truth
2.18
known antibodies
predicted
antibodies
complex
suggested
gene complex
H 6 f l
Rl
- + + -
R2
+ -
+ +
cDE
cde
RO
+ -
R"
+ - -
R'
- + - -
Cde
R,
- + + +
CDE
- + -
CdE
*RY
CDe
cDe
cdE
+ I + - I
154
In Search of Truth
References 155
Table 5.6 Average I.Q. of children in England
classified according to number of sibs in the family
Number in
family
I.Q.
1
2
3
Number of
families in sample
115
212
185
152
127
103
106.2
105.4
102.3
101.5
99.6
96.5
93.8
95.8
5
6
7
7+
88
102
Birth order
1
103.76
106.21
106.14
105.59
104.39
104.44
102.89
103.05
101.71
102.71
101.30
99.37
100.18
97.69
96.87
parents and the earlier born children. A case is made that the effect
can be reversed by increasing the age spacing between siblings, so
that the intellectual level, depending on age, will be higher for the
earlier born at the times of later births.
References
Boneva, L.L. (1971): A new approach to a problem of chronological seriation
156
In Search of Truth
Chapter 6
It is only half a century later that the importance of what Bernal said
is recognized and serious efforts are being made to spread scientific
knowledge to the public. National Science Academies of advanced
countries have appointed task forces to examine the problem and
suggest ways of achieving this. Five years ago, the Royal Society of
United Kingdom started a new journal called Science and Public
Aflairs with the broad aim of fostering the understanding of scientific
issues by the public and of explaining the implications of discoveries
in science and technology in everyday life. The new slogan raised by
the Royal Society is
157
158
2.
159
a slight on the subject for it is now recognized that many hosts die but for
the parasites they entertain. Some animals could not digest their food. So
it is with many fields of human endeavors, they may not die but they
would certainly be a lot weaker without statistics.
160
by the army of the victorious king after a war with another kingdom.
How were these nicely rounded figures arrived at? Were they actual
counts made by the royal tally keepers or fictitious figures conceived
by the active imagination of the victorious king? Was the drastic
rounding of figures intended to highlight the large dimensions of the
booty? Samuel Johnson believed:
Round numbers are always false.
161
7,405,926
ghosts inhabited the earth! Most people believed that the figure must
have been the actual count as Weirus was a learned man.
I am reminded of what is recommended in a Tax Guide while
filing my tax return in the U.S.A.
Careful scrutiny of GAO reports confirms one important way to reduce the
odds of audit. Avoid rounding out dollars when reporting earnings or
expenses. Figures of $100, $250, $400, $600 arouse an examiners
suspicion, whereas $171, $313, $496 are less likely to. If you must
estimate some expenses, estimate in odd amounts.
162
3.
Information revolution
163
t 64
Information revolution
165
that the middle class families have on the average 2.2 children,
commented:
The figure of 2.2 children per adult female is in some respects absurd. It
is suggested that the middle classes be paid money to increase the average
to a rounded and more convenient number.
166
4.
Mournful numbers
Tell me not, in mount@ numbers
Life is but an empty dream.
H. W. Longfellow
Cause
Days Cause
3500
3285
1600
1300
900
2250
800
330
220
300
74
Days
Alcohol
Firearms accidents
Natural radiation
Medical x-rays
Coffee
Oral contraceptives
Diet drinks
Pap test
Smoke alarm in house
Airbags in cars
Mobile coronary care units
130
11
8
6
6
5
2
-4'
-10
-50
-125
Mournful numbers
167
168
Weather forecasting
A reliableforecaster is one whose microphone is close
enough to the window so that he can decide whether to use
oficial forecast or make up one of his own.
169
Expected Loss
m
.6(r)+.4(0)=6r/lO
6.
170
role in it. They gather information from the public on various social,
political and economic issues, and publish summary reports. Such
opinion polls serve a good purpose in a democratic political system.
They would tell the political leaders and the bureaucracy what the
public needs and likes are. They also constitute news informing
people on what the general thinking is. This may be of help in
crystallizing public opinion on certain key issues.
The results of public opinion polls are usually announced in a
particular style which needs an explanation. For instance, the news
broadcaster may say:
The percentage of people who approve the president's foreign policy is 42
with a margin of error of plus or minus 4 points.
Superstition
171
lOO(r/p)-e, lOO(r/p) +e
with a high "chance" usually chosen as 95% (or 99%). What it means
is that the event that the interval does not cover the true value is as
rare as observing a white ball in a random draw from a bag
containing 5 (or 1) white balls and 95 (99) black balls.
The validity of the results of opinion polls depends on "how
representative" the choice of individuals is. It is quite clear that the
result will depend on the composition of the political affiliations
(Republican or Democrat) of the individuals chosen. Even supposing
that no bias is introduced in the choice of individuals with respect to
their political affiliations, the results can be vitiated if some
individuals do not respond and they happen to belong to a particular
political party. In any survey, there is bound to be some degree of
non-response, and the error due to this is difficult to assess unless
some further information is available.
7.
172
Sample 1
Sample2
Sample3
5 4 3 2
months after
24 31 20 23 34 16
66 69 67 73 67 70
0 2 1 9 2 2
birth
month
26
93
3
36 37 41 26 34
82 84 73 87 72
2 0
I 3 2
Total
348
903
18
375
2144
.611
8.
174
Sex
Pass
Fail
Percentage
passing
Female
Male
6
34
3
3
.666
.919
Total
40
.870
175
Sex
Pass
Fail
Female
Male
16
3
6 4 3
.842
Total
80
.930
Percentage
passing
.955
176
(I)
2
3
50 +
50+
50
4
5
41
50+
6
7
8
9
10
50+
50+
50.1
50
51
Clear and
convincing
(%)
60-70
67
60
65
Clear, unequivocal
and convincing
(%)
65-75
70
70
67
Beyond a
reasonable doubt
(4%)
80
76
85
90
70
70
75
60
70
80
75
90
+
+
85
95
85
85
ESP
177
9.
178
Key technology
179
180
*
*
182
Srinivasa Ramanujan
suddenness at the age of 32. In the process, he put India on the map
of modem mathematics. Ramanujan's mathematical contributions in
many fields are profound and abiding, and he is ranked as one of the
world's greatest mathematicians. Ramanujan did not do mathematics
as mathematicians do. He discovered and created mathematics. This
makes him a phenomenon and an enigma, and his creative process a
myth and a mystery.
At the time of his death, he left a strange and rare legacy:
about 4000 formulae written on the pages of three notebooks and
some scraps of papers. Assuming that the bulk of his work was
produced during a period of 12 years, Ramanujan was discovering
one new formula or one new theorem a day, which beats the record
of anyone involved even in a less creative activity. These are not
ordinary theorems; each one of them has the nucleus of generating a
whole new theory. These are not a number of isolated magicalseeming formulae pulled out of thin air, but something which have
profound influence on current mathematical research itself and also
in developing new concepts in theoretical physics from the superstring
theory of cosmology to statistical mechanics of complicated molecular
systems. The work of his last one year of life, while his health was
decaying, recorded by hand on 130 unlabelled pages, was discovered
in 1976 in the library of Trinity College, Cambridge. The results
given in his "Lost Notebook" alone are considered to be "equivalent
of a lifetime work for a great mathematician." Commenting on the
originality, depth and permanence of Ramanujan's contributions,
Professor Askey of the University of Wisconsin said:
Little of his work seems predictable at first-glance, and after we understand
it, there is still a large body of work about which it is safer to predict that
it would not be rediscovered by any one who has lived in this century.
Then there are some of the formulae Ramanujan found that no one can
understand or prove. We will probably never understand how Ramanujan
found them.
Srinivasa Ramanujan
183
184
Srinivasa Ramanujan
Srinivasa Ramanujan
185
The first term was the solution which I obtained. Each successive
term represented successive solutions for the same type of relation between
two numbers, as the number of houses in the street would increase
indefinitely. I was amazed. I asked: Did you get the solution in a flash?
Ramanujan: Immediately I heard the problem, it was clear that the
solution was obviously a continued fraction; I then thought, "which
continued fraction?" and the answer came to my mind. It was just as
simple as this.
186
Srinivasa Ramanujan
Period
Number
ofpapers
-1914
1914
1915
1916
1917
1918
1919
1920
1921
Ramanujan died in 1920 at the age of 33. The last two to three years
of his life was the period of his declining health, during which he
continued to work and left behind numerous results recorded in a
notebook, which was discovered a few years ago. This "Lost
Notebook" has a number of new theorems which have opened up new
areas of research in number theory.
Of course, Ramanujan was a rare phenomenon and he
blossomed in a more or less hostile environment in which he lived a routine educational system geared to produce clerical staff for
administrative work, poverty which forced brilliant students to give
up academic pursuits and take up employment for living and lack of
institutional support or other opportunities for research. Referring to
Ramanujan's achievements in mathematics, Jawaharlal Nehru wrote
in his Discovery of India:
Ramanujan's brief life and death are symbolic of conditions of India. Of
millions, how many get education at all? How many live on the verge of
starvation? If life opened its gates to them and offered them food and
healthy conditions of living and education and opportunitiesof growth, how
many among these millions would be eminent scientists, educationists,
technicians, industrialists, writers and artisans helping to build a new India
and a new world?
Index
Broad, W.,75, 81
Buffon needle problem, 82
Burt, C., 76, 82
Butler, S., 157
Byron, Lord, 178
Abduction, 56,57
Achenwall Gottfried, 45
Ain-in-Akbari, 44
Alcoholism, 110
Alzaid, A.H., 119
Ambiguity, 37
Amino acids @&L), 141, 142
Andrews, D.F., 93
Andrews, G., 25
Anscombe, F.J., 94
Aristotle, 2, 59
Arthasasthra, 44, 133, 160
Artificial intelligence, 14
Chance, 3, 27
Chance and necessity, 34
Chandrasekar, S . , 30
Chaos, 3, 27
Chatfield, C., 67, 92
Chesterton, G.K.,177
Chisquare test, 64
Chronobiology, 143
Cleveland, U.S.,75, 92
Comedy of Errors, 133
Conan Doyle, 63
Contaminated samples, 85
Cooking of data, 85
Cox, D.R.,59, 117, 119
Creativity, 21, 22
Cross examination of data, 69,70
Cross validation, 92
Cryptology, 13
DNA, 116
Dalton, J., 81
Damage model, 117
Dantzig, T., 50
Dating of publications, 106
Davis, T.A., 139, 141
Descartes, 59
187
Feigenbaum, M.J.,27
Feller, W., 100, 116, 119
File drawer problem, 88
Filiation of manuscripts, 134
Fisher, R.A., 4, 12, 51, 58, 64,
65, 66, 70, 78, 79, 92, 96,
119, 124, 125, 129, 131, 136,
137, 154, 156
Fleming, A., 31
Forging of data, 85
Fox, Captain, 83
Fox, J.P., 71, 73, 93
Fourth R., 163, 164
France, A., 1
Fractal geometry, 27
Frost, R., 1
Future of statistics, 60
Fuzzy sets, 38
Galileo, G., 19, 81
Gallop polls, 169
Gamblers fallacy, 15
Gauss, J., 31
Gauss, K., 20, 31
Geological time scale, 136
Ghosh, J.K.,28, 30, 32, 81
Gleick, J., 18, 25
Glotto chronology, 135
Gnanadesikan, R., 57, 65
Gothe, 1
Godel, K., 30, 51
Graunt, J., 46
Gridgeman, N.T., 84
Grosvenor, G.C.H., 70
Hacking, I., 41, 64, 93
Index 189
Hadamard, J., 29
Halberg, J., 144, 156
Haldane, J.B.S., 11, 76, 78, 93
H alifax, 173
H all, C.E., 71, 73, 92
H amilton, A., 132
H ardy, G.H., 51, 52, 185
H ickerson, D.R., 25
H ilbert, 30
Hofstadter, D.R., 23
H otelling, H., 65
H oyle, F., 16
H ull, T.E., 8, 25
Huxley, A., 122
IQ fraud, 76
Indian Statistical Institute, 6
Induction, 52, 53, 57
Inferential data analysis, 64,69,
89
Initial data analysis, 69
International Statistical Institute,
48
Jack-knife, 90
Jay, J., 132
Jefferson, T., 173
Johannsen, W., 138, 139
Johnson, B., 132
Johnson, S., 160
Kac, M.,27, 28, 139, 156
Kappler, 28
Karma, 2
Kautilya, 44, 133, 160
Koeffler, R.,
Koestler, A., 23
Kolmogorov, A.N., 10
Kruskal, W., 38
Kruskal, J.B., 136, 158
Kammarer, P., 16, 25
Language tree, 135
Langlay, P.L., 33
Laplace, P.S., 17, 18, 25, 46,85
Laplace, Demon, 17, 18
Law of large numbers, 10, 21
Law of series, 15
Lazzarini (Lazzerini), 82, 83, 84
Lee, 166, 180
Left handed, 139
Levant, O., 169
Levi, E., 38
Logarithmic series, 100,
Longfellow, H.W.,166
Loren, E., 19, 27
Lost Note Book (Ramanujan), 25
Loves Labor Lost, 133
Lyell, C., 136
Macmurrary, J., 122, 156
Madison, J., 132
Mahalanobis, P.C., 10, 26, 67,
69, 93, 122, 149, 150, 184
Malchus, C.A.V., 45
Mallows, C.L., 94
Mandelbrott, B.B., 14, 26, 27
Marbe, K., 15, 26
Marlow, C., 132
Mathematical demon, 17, 18
Mazumdar, D.N., 74,93
Measles, 72
Meta analysis, 87
Millikan, R., 81
Model building, 14
Monte Carlo, 9, 10
Mosteller, F., 67, 93, 132, 156,
178, 179, 180
Mourant, A.E., 154, 156
Mukherji, R.K., 74, 93
Picasso, P., 31
Pi, 83
Pingle, U., 74, 93
Pitman, E.J.G., 65, 93
Plato, 130, 145
Plautus, 2
Poisson distribution, 118
Polya, G., 15
Popper, K., 30, 32, 123
Posterior distribution, 58, 117
Post stratification, 9 1
Pps sampling, 101
Prior distribution, 58, 176
Ptolemy, C., 81
Publicistics, 45
Quetelet, A., 18, 26, 46, 47
Race, R.R., 153, 156
Ramanujan, S., 22, 23, 24, 29,
39, 181-187
Random numbers, 3, 4
Randomness, 3
Rao, C.R., 29, 30, 74, 93, 96,
100, 101, 103, 116, 118, 119,
120, 143, 156
Rastrigin, L., 37
Rhesus factor, 152
Roy, Rustum, 36, 162
Roy, S.N., 65
Rubin, H., 119
Ryle, M., 16
93
Penrose, R., 33
Pheadrus, 145
Phillips, D., 172, 173, 180
Index 191
Savage, L.J.,158
Schmidt, J., 137
Scientific laws, 122
Sensitive questions, 16
Sengupta, J.M.,147
Sequential sampling, 66
Shakespeare, 15, 129, 130, 131,
133
Shanbhag, D.N., 119
Shaw, G.B.,32
Shannon, C., 161
Shewart, W., 66,93
Simon, H.A., 33
Sinclair, Sir John, 45
Size bias, 100
Smart, R.G.,110-120
Smullyan, R., 118
Souriau, 31
Southwell, R., 2
Specification error, 95
Sperry, R., 142
Sprott, D.A., 110-120
Stamp, J., 75
Statistical quality control, 128
Statistics
art, 61
evolution of, 41
fundamental equation of, 69
future of, 60-62
logical equation of, 54
science, 60
societies, 47-49
technology, 60
Statistics in
archaeology, 128
business, 128