Introducing Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 349
introducing STATISTICS 2nd EDITION GRAHAM UPTON Re = a ; as Tas OXFORD OXFORD ‘UNIVERSITY PRESS: Great Clarendon Strect, Oxford OX2 6DP- Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing workiwide in Oxford New Yark Auckland Bangkok Buenos Aires Cape Town Chennai Dar es Salaam Detht Hong Kong Istanbul Karachi Kolkata Kuala Lumpur Madrid Melbourne Mexico Cy Mumbai Nairobi Sdo Paole Shanghai Taipei Tokyo Toronto Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © Graham Upten and lan Cook 2000 First published 1998 Reprinted (with cortections) 1999 Second edition 2001 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, of tranamitied, in any form or by any means, without the prior permission in writing from Oxford University Press, or as expressly permitted by kaw, or under terms agreed with the appropriate reprographics rights organisation. Enquiries concerning reproduction outside the scope of the above should be seat to the Rights Department, Oxford University Press, at the address above. ‘You must not circulate this book in any other binding or cover and you must impose this same condition on any acquirer, British Library Cataloguing in Publication Data Data available ISBN 6 19 914 801 5 wes Ths Typeset by Tech Set Lid, Gateshead, Tyne and Wear Printed and bound in Great Britain by Bell & Bain Lid. Contents Pref he Second Editi ii Preface to the First Edition sii Glossary of Notation x 1 Summary diagrams and tables 1 1.1__ The purpose of Statistics 1 L2 Variables and observations _2 L3__Types of data 2 L4_ Tally charts and frequency 2.11 Quartiles, deciles and percentiles. 48 Grouped data 48 Ungrouped data 49 2.12 Range, inter-quartile range and midrange 52 2.13 _Box-whisker diagrams 52 Refined boxplats 53 2.14 Deviations from the mean 53 2.45 The variance __ sg Using the divisorm | S6. Using the divisor (n — 1 56 2.16 Calculating the variance ____S7 2.17 The sample standard deviation 58 LS__Stem-and-leaf diagrams 3 Approximate properties of the 1,6 Barcharts 00 T Slandard deviation 1.7 Multiple bar charts 8 2.18 Variance and standard deviation 1.8 Compound bars for proportions 10 for frequency distributions & 19 Pie charts 2.19 Wariance catculations using coded 1,10 Grouped frequency tables R valves 6a LI Difficulties with grouped frequencies 13 2.20 Symmetric and skewed data 66 L12_ Histograms iF 2.21 The weighted mean and index 1.13 Frequency polygons 21 numbers 68 1.14 Cumulative frequency diagrams 22 Chapter summary i Step diagrams 23 LIS Cumulative proportion diagrams 24 3. Data collection 80 6 Time seri 35 - — 117 Scatter diagrams % 3.1_Data collection by observation 80 Ls Che vhich displ 28 3.2 The purpose of sampling x0 STE ete eth pep te) ie _s 3.3_Methads for sampling a population __80 L19_Dirty data * ‘The simple random sample $0 Chapter summary 30 Cluster sampling 8D i pling 82 2 Summary statistics Stratified seam tin Systematic sampling 82 2.1 The purpose of summary statistics 33 Quota sampling 83 2.3 The median, 44 Random numbers 24 The mean ___36 Psendo-random numbers __84 2.5 Advantages and disadvantages of the mode, mean, and median 38 Advantages: 38 Disadvantages 38 2.6__ Sigma (5} notation 39 Applications of sigma notation x” 2.7_ The mean of a frequency distribution _4 2.8 The mean of grouped data 4 2.9 Using coded values to simplify calculations 43 2.10 The median of grouped data aT ‘Tables of random numbers __85. 3.5__Methods of data collection by questionnaire (or survey) a5 ‘The ‘postal’ questionnaire 86 The telephone interview ___—&6. 6 tionnaire desi 86 ‘Some poor questions 87 Some uestions 88 ‘The order of questions 8S. Question order and bias __BR so iW Introducing Statistics Filtered questions 8s 5.5 __ Expectations 153 ‘Open and closed questions 89 Expected value or expected number 155 ‘The order of answers for closed Expectation of V* 156 questions 56 varia ‘The pilot study, oS The standard deviation AS 41 Primary and secondary data £9. _38 Greek notation 4. aL summary 162 4 Probability 92 6 Expectation algebra 164 4.1__ Relative frequency 92 6.1 ECW +a) and Var(¥ +a) 165 4.2 Preliminary definitions 92 6.2 _Efa@N) and Varfay’) 166, 43 The probability scale 8 63 _Efa¥ +5) and Varla¥ + 6) 168 4.4 Probability with equally likely outcomes 93 64 Expectations involving more 45 The com ary event, EY 9S than one variable 169 4.6 Venn diagrams. % Var(¥+ ¥) 169 4.7 Unions and intersections of events _96 E(¥, + 2) and Var( Ay + ¥3 172 Mi chusive: ‘The difference between 2 and ‘The addition rule _________99 M+ Xz 172 4.9__Exhaustive events. 100 6.5 The expectation and variance of the 4.10 Probability trees 101 sample mean 14 441 § ns and y 103, 6.6 ‘The unbiased estimate of the 4.12 Unequally likely possibilities 10s population variance 176 4.13 Physical independence: 105 Chapter summary. inv ‘The multiplication rule 105 4.14 Orderings 108 7_‘The binomial distribution 179 Orderings of similar objects io LL Deri 0 :, 4 12_Notation __186 4.16 Sampling with replacement 17 ud Geib 186 07 Sempling whos placement Tr 14 The shape of the distribution 186 .18 Conditional probability rst (binomial d 10 soe Sultipication cole 1247.6 The expectation and variance ofa 4.20 The total probability thearem 129 binomial random variahle __100 4.21 Bayes' theorem 13 Chapter summary SD. ‘Chapter summary 138 & The Poisson distribution 5 Probability distributions and expectations 143 §.1_‘The Poisson process 14 8.2 The farm af the dissribal 195 SL_Notation - 3.2 Probability distributions 144 8.3 The shape of a Poisson 2 y isteibut 197 The probability function 1435, 8 Tables for Poi listributi 199 Mustrati distributions 145 Estimating probability distributions 147 ‘The cumulative distribution funetion 148 4.3 Some special discrete probability distributions 149 8.5__ The Poisson approximation to the binomial 201 8.6 Sums of independent Poisson random variabh 204 Chapter summary 205 $4 ‘The geometric distribution 159 2 Continvous random variables __209 Noution _____ 1S} 8.1_Histograms and sample size__208 ‘Cumulative probabilities 151 9.2 The probability density funci f 2 ‘A paradox! 1s? ‘Properties of the pdf 22 Urheberrechilich geschitztes Material ity W 9.3 The cumulative distribution function, F 216 The median, m 219 9.4 Expectation and variance 2s 9.3 Obtaining f from F 232 9.6 Distribution of a function of a random variable 233 9.7 The umform (rectangular) distribution 236 Chapter summary 239 ‘The normal distribution 242 0.) The standard normal distribution 242 10.2_ Tables of 4/2) 243 10.3 Probabilities for other normal istributions Dat 104 Finer detail in the tables of (=) 249 10.5 Tables of percentage points 252 10.6 Using calculators 256 10.7 Applications of the normal istributi 25 10.8 General properties 258 10.9 Linear combinations of independent normal random variables _249 ‘Extension to more than two varia 2 oe pone] random variah 263 10.10 The C Limit Tt 268 The distribution of the sample mean, 270 10.11 The normal approximation to a binomial distribution. 276 Inequalities 272 Choosing between the normal and Poisson approximations to a binomial distri 10.12 The normal approximation to a Poisson distribution 283 Chapter summary 287 Point and interval estimation 294 11.1 Point estimates 204 1.2 Confidence intervals 294 3_Confidence interval far population mean 204 Normal ribution with known variance 295 Unknown population distribution, known population variance, large sample 27 Comtenis v Unknown population distribution, unknown population variance, large sample 297 Poisson distribution, large mean 299 1.4 Confidence interval for a population proportion 301 ILS The distribution 305 Tables of the distribution 306 11.6 Confidence interval for a population mean using the tedistribution ___308 Chapter summary 312 12_Hypothesis tests 315 12.1 The null and alternative hypotheses MS 12.2 Critical regions and significance levels 316 12.3 The general test procedure M7 12.4 Test fe i normal distribution or large sample_317 12.5 Identifying the two hypotheses 321 The null hypothesis 3 ‘The altemative hypothesis 321 12.6 Test for mean, large sample, variance unknown m2 12.7 Test for Jarge Poisson mean. 324 12.8 Test for proportion, large sample __326 12.9 Test for mean, small sample, ‘Matiance unknown ___328 12.10 _The p-value approach ay 12.11 Hypothesis tests and confidence intervals 8D 12.12 Type | and Type [errors 34 ‘The general procedure 335 12.13 Hypothesis tests for a proportion based on a small sample 341 12.14 Hypothesis tests fora Poisson mean based on a small sample 344 12.15 Comparison of two means 347 12.16 Comparison of two means ~ known population variances 343, (Confidence interval for the common mean RR 12.17 Comparison of two means — common unknown population variance 352 Large sample sizes 353 Small sample sizes 356 Chapter summary 358 vi Introducing Statistics 13_ Goodness of fit 361 149 Deducing x from a Y-value 409 13.1 The chi-squared distribution 362 14.10_‘Two regression lines 409 M4 Properties of the chi-squared distmbution 363 Tables of the chi-squared distribution 363 13.2 Goodness of fit to prescribed probabilities 363 13.3. Small expected frequencies 369 13.4 Goodness of fit to prescribed distribution type 372 13.5 Contingency tables 378 The Yates correction 381 Chapter summary 385 Regression and correlation 389 14.1 ‘The equation of a straight line 390 Determining the equation 390 |4.2__ The estimated regression line, 39] 14.3 The method of least squares 396 14.4 Dependent random variable ¥ 398 Estimating a future value 398 14.5 Transformations, extrapolation and outliers a0 14.6 Confidence intervals and significance tests for the population regression coefficient 402 Mean and variance of the estimator of ff 403 Significance test for the regression coefficient 404 14.7 Confidence intervals and significance tests for the intercept # and for the expected value of Y, with known ¢? 407 14.8 Distinguishing x and Y 408 LL) Correlation Nonsense correlation _ 414 14.12, The product-moment correlation coefficient —_________415 The population product-moment correlation coefficient, pp 420 Testing the significance of r 420 14.13. Spearman's rank correlation coefficient, r, 423 ‘Testing the significance of r, 425 Alternative table formats 426 1414 Using +, for non-linear relationships 428 Chapter summary 430 Appendices, ‘Cumulative probabilities for the binomial — 3 ‘Cumulative probabilities for the Poisson distribution 438 ‘The normal distribution funs Upper-tail percentage points for the standard normal distribution __440, Percentage points for the f-distribution al Percentage points for the ¢ distribution 442 ‘Critical values for the product-moment correlation coefficient, 443 Critical values for Spearman's rank correlation coefficient. F, 444 Preface to the Second Edition Early Statistics syllabuses presented the subject as a branch of mathematics The emphasis was on formulae, with, for example, questions on artificial continuous distributions being an excuse to practise integration. In reality, Statistics is about the interpretation of data and the modelling of the processes that have given rise to those data. Modern syllabuses (or ‘specifications’ as they are now sometimes called) increasingly reflect the need to do more than simply summarise data. The first edition of Introducing Statistics emphasised the need to draw inferences and in this second edition we have added further inferential material. Amongst the additions to Chapter | are several examples of the graphical comparison of similar data sets. This chapter includes five new sections and ends with a discussion of the (largely unwanted) characteristics to be expected in real data. Chapter 2 has been augmented by sections on the use of coded values, Bayes’ theorem is included in Chapter 4, and the method for determining the distribution of a simple function of a random variable is now included in Chapter 9. In Chapter I4 there is a new section dealing with Properties of regression line estimators and, later, a subsection on nonsense correlation. We have also taken the opportunity to introduce some questions on sampling for Chapter 3. These questions are somewhat open-ended, as were a number of existing questions for which we did not give answers in the first edition. In this edition we now provide possible answers to these questions, though no answers to such questions should be regarded as prescriptive. This book covers all the material in the statistics modules of the EDEXCEL, AQA, OCR, WJEC and NICCEA syllabuses for A-level Mathematics and in the corresponding syllabuses set by Cambridge International Examin: Although it is still possible for a student to study am A-level Mathematics syllabus that contains no Statistics, for some boards Statistics ean amount to half the syllabus. Since there are substantial differences between the syllabuses, this book will also cover much of the material in A-level Further Mathematics, and AS-level Statistics. Students studying A-level Statistics, or requiring a book for a first course at university level should use the companion volume Understanding Statistics. GIGU Tc University of Essex Colchester June 2000 Preface to the First Edition Long long ago, when the authors were at school, the subject Statistics was almost unheard of. Few universities had specialist Statistics teachers and there was little if any Statistics taught in schools — those were certainly not the ‘good old days’ so far as Statistics was concerned. Since that time there has been a continuing expansion in the teaching of the subject as its relevance to everyday life, the conduct of research, and government has become increasingly appreciated This book concentrates on the fundamentals of the subject. To determine its contents we took careful note of the statistics sections of the current single subject A-level syllabuses. This book covers the union of those syllabuses. Of course, syllabuses seem ever changing. but this is not a problem for us since all of them must begin in very much the same way in dealing with these fundamentals. This book will therefore be suitable for a wide audience. The detailed list of contents is given in the next few pages. A summary is as follows: Chapters 1 to 3 describe the basic mechanics of collecting, displaying and summarising data; Chapters 4 to 12 develop the common probability based models (binomial, Poisson, normal) and use them to draw conclusions about the properties of large populations on the basis of information from small samples; Chapter 13 introduces a simple method for checking model validity; Chapter 14 provides an introduction to the study of relationships between variables, A comparison of this book (/S, for short) with our other volume Understanding Statistics (US) will immediately reveal that JS is shorter (about two-thirds the size) and that there is nothing in /5 that does not appear in US. The economy in size has not been accomplished by curtailing explanations, nor by reducing the large numbers of worked examples and set questions on the topics treated, Instead, the reduction has been effected by omitting the more advanced or esoteric sections to leave only the essentials. Our hope is that a student using /S will be enthused by the material and, like Oliver Twist, will ask for more! In this case ‘more’ will be readily to hand in the shape of US, with its corresponding format. In Statistics, as in the rest of Mathematics, the best way to learn is by doing questions. For that reason we have included hundreds of questions, with answers, in the book. We are very grateful to the examining boards listed below for permission to reproduce their questions, The source of each question is indicated by the corresponding initials at the end of the question The numerical answers given in the book are, of course, our responsibility and any errors (we hope none!) are due to us and not to the examining boards. Where only part of a question has been used this is indicated by (P) after the attribution. Associated Examining Board [AEB] Northern Examination and Assessment Board [NEAB], formerly the Joint Matriculation Board [JMB] Oxford and Cambridge Schools Examination Board [O&C}, which also gave permission to reproduce questions from the examinations for the Mathematics in Education and Industry Project [MEI] and the School Mathematics Project [SMP] University of Cambridge Local Examinations Syndicate [UCLES], which gave permission to reproduce questions from the University of Oxford Delegacy for Local Examinations (UODLE). London Examinations, a division of Edexcel Foundation, formerly the University of London Examinations and Assessment Council [ULEAC] and the University of London School Examinations Board [ULSEB] Welsh Joint Education Committee [WJEC] All that remains is (o wish you, the reader, pleasure in the use of this book. We hope that you too may be bitten by the Statistics bug GIGU ITc University of Essex. Colchester October 1997 Preface to the First Edition ix

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy