Analyzing and Modeling Rank Data
Analyzing and Modeling Rank Data
Analyzing and Modeling Rank Data
(Full details concerning this series are available from the Publishers.)
Analyzing and
Modeling
Rank Data
JOHN I. MARDEN
Department of Statistics,
University of Illinois at Urbana-Champaign, USA
This book contains information obtained from authentic and highly regarded sources. Reason-
able efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-
tion that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface xi
1 Introduction 1
1.1 Rank data 1
2 Looking at Data 5
2.1 Introduction: Permutation polytopes 5
2.2 Projections of polytopes 12
2.3 Marginals 17
2.4 Pairs 19
2.5 Center, spread, and distance 21
2.5.1 Some useful distances 23
2.5.2 Estimating the center 28
2.5.3 Estimating spread, location known 30
2.5.4 Estimating spread, location unknown 31
2.5.5 Clustering: £-centers 33
2.6 Linear su bspaces 39
2.6.1 Spectral decomposition 45
2.6.2 Inversions 52
2.7 Exercises 54
=
5.2 Probability models - General
5.3 Thurstonian Order statistic models
5.4 Distance-based models
112
114
115
5.5 Paired comparison models - Babington Smith 115
5.5.1 Bradley-Terry /Mallows 117
5.5.2 Mallows' models 118
5.6 Multistage models 118
5.6.1 Plackett-Luce 118
5.6.2 Free and ¢-component models 120
5. 7 Sufficient statistic models 122
5.8 Loglinear models 123
5.9 ANOVA-like models 124
5.10 Nested orthogonal contrast models 126
5.10.1 The free model 129
5.10.2 The ¢ model 129
5.10.3 Contingency table models 130
5.11 Unfolding models 132
5.12 Generalizing the models 133
5.13 Some axiomatics 134
5.13.1 Luce's choice axiom 134
5.13.2 Unidimensionality, unimodality, and consensus 136
5.14 Likelihood methods and exponential families 140
5.14.1 The likelihood function and
Fisher information 141
5.14.2 Maximum likelihood estimation 143
CONTENTS vii
12 Appendix 294
12.1 Some linear algebra 294
12.1.1 Eigenvalues and eigenvectors 295
12.2 Means and covariances for vectors and matrices 296
12.2.1 Definitions 296
12.2.2 Kronecker products 297
12.3 Normality and chi-squares 299
12.4 Some asymptotics 300
12.4.1 Central limit theorem 300
12.4.2 Convergence in probability 301
Bibliography 302
Introduction
Coke 4 7-Up 1
Pepsi 3 Sprite 2
7-Up 1 Pepsi 3
Sprite 2 Coke 4
What does one do with such rank data? Chapter 2 takes a data-
analytic approach. Rank data are multivariate data, the objects
representing the variables; hence any multivariate method can be
applied to the rank data. Means, standard deviations, histograms,
box plots, cluster analyses, multidimensional scaling, factor anal-
ysis, etc., all have potential to provide insight into the data. For
example, simple statistics such as the number of judges who rank
a particular object as first, or the number who prefer one object
to another, arise naturally.
Since rank data are so highly structured, methods that respect
the peculiarities of rank data can be especially useful. Our funda-
mental structure is the "permutation polytope" created by plotting
the rank vectors in Euclidean space, then connecting the points.
Thompson (1993a,b) represents a set of ranking data on the poly-
tope by placing a ball at each ranking, where the radius of the ball
is indicative of the frequency of that ranking in the sample. Much
of Chapter 2 is involved with trying to visualize (or numericize)
these multidimensional plots. Distances defined on Sm or Tm allow
defining the center and spread of a data set, and finding clusters
of judges. In the final section of Chapter 2, we consider functions
on Sm as vectors in m!-dimensional space, and try to decompose
them by projecting onto interesting subspaces. The popular de-
scriptive statistics fit into this framework. Diaconis (1988, 1989)
develops a spectral analysis along these lines, decomposing frequen-
cies into components based on the number of rankings with each
object attaining each rank, or each pair of objects attaining each
pair ofranks, etc. McCullagh (1993a) introduces inversions, which
similarly decompose the data, but into factors based on paired
comparisons of objects, triples of objects, etc.
Along with descriptions, a good statistician wants to be careful
not to see structure in the data that is basically just random varia-
tion. Chapter 3 provides some tools, mainly for testing whether the
uniform distribution is tenable. (Under the uniform distribution,
all rankings have the same probability.) The statistics in Chapter
2 are turned into formal test statistics. In addition, concordances
based on distance measures prove to be very useful. Chapter 4
continues by comparing groups of judges.
Modeling begins in Chapter 5. Some models arise from theoret-
ical constructs, some from experimental methods, and others from
attempts to find a simple description of the population of rankers.
The basic models are summarized, including those named after
4 INTRODUCTION
Chung, L. (1989) The Use of NonnuH Models lor Rank Data in Nonpara-
metric Statistics. Ph. D. Thesis, Statistics, University of lllinois at
Urbana-Champaign.
Chung, L. and Marden, J. I. (1990) Finding the marginal distribution of
ranks in some nonnull ranking models, with applications to goodness-
of-fit statistics. Department of Statistics, University of lllinois at Ur-
bana-Champaign.
Chung, L. and Marden, J. I. (1991) Use of nonnull models for rank statis-
tics in bivariate, two-sample, and analysis-of-variance problems. Jouz-
nal of the American Statistical Association, 88, 188 - 200.
Chung, L. and Marden, J. I. (1993) Extensions of Mallows'¢> model. Prob-
ability Models and Statistical Analyses lor Ranking Data, 108 - 139.
Fligner, M. A. and Verducci, J. S., eds. Springer-Verlag: New York.
Cleveland, W. S. (1985) The Elements of Graphing Data. Wadsworth:
Monterey.
Cochran, W. G. and Cox, G. M. (1957) Experimental Designs. Wiley:
New York.
Cohen, A. (1982) Analysis of large sets of ranking data. Communications
in Statistics- Theory and Methods, 11, 235 - 256.
Cohen, A. and Mallows, C. L. (1980) Analysis of ranking data. Technical
Report, Bell Laboratories, Murray Hill, New Jersey.
Cohen, A. and Mallows, C. L. {1983) Assessing goodness of fit of ranking
models to data. The Statistician, n, 361 - 373.
Coombs, C. H. (1964) A Theory of Data. Wiley: New York.
Critchlow, D. E. (1985) Metric Methods for Analyzing Partially Ranked
Data. Springer-Verlag: New York.
Critchlow, D. E. (1986) A unified approach to constructing nonparametric
rank tests. Technical Report #86-15, Department of Statistics, Purdue
University.
Critchlow, D. E. and Verducci, J. S. (1992) Detecting a trend in paired
rankings. Applied Statistics, 41, 17- 29.
Critchlow, D. E., Fligner, M. A. and Verducci, J. S. (1991) Probability
models on rankings. Jouznal of Mathematical Psychology, 32, 294-
373.
Critchlow, D. E. and Fligner, M. A. (1993) Ranking models with item
covariates. Probability Models and Statistical Analyses for Ranking
Data, 1 - 19. Fligner, M. A. and Verducci, J. S., eds. Springer-Verlag:
New York.
Croon, M. A. (1989a) The analysis of partial rankings by log-linear and
latent-class models. Multiway Data Analysis, 497- 506. North-Hol-
land: Amsterdam.
Croon, M. A. (1989b) Latent class models for the analysis of rankings.
New Developments in Psychological Choice Modeling, 99 - 121. Feger,
Klauer and de Soete, eds. North-Holland: Amsterdam.
BIBLIOGRAPHY 305