Abstract
The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Morey and Agresti (1984) and adopted by others (e.g., Miligan and Cooper 1985) is based on an incorrect assumption. Then, the general problem of comparing partitions is approached indirectly by assessing the congruence of two proximity matrices using a simple cross-product measure. They are generated from corresponding partitions using various scoring rules. Special cases derivable include traditionally familiar statistics and/or ones tailored to weight certain object pairs differentially. Finally, we propose a measure based on the comparison of object triples having the advantage of a probabilistic interpretation in addition to being corrected for chance (i.e., assuming a constant value under a reasonable null hypothesis) and bounded between ±1.
Similar content being viewed by others
References
ARABIE, P., and BOORMAN, S.A., (1973), “Multidimensional Scaling of Measures of Distance Between Partitions,”Journal of Mathematical Psychology, 10, 148–203.
BERRY, K.J., and MIELKE, P.W., (1985), “Goodman and Kruskal's TAU-B Statistic,”Sociological Methods & Research, 13, 543–550.
BRENNAN, R.L., and LIGHT, R.J., (1974), “Measuring Agreement When Two Observers Classify People into Categories not Defined in Advance,”British Journal of Mathematical and Statistical Psychology, 27, 154–163.
BROOK, R.J., and STIRLING, W.D., (1984), “Agreement Between Observers When the Categories are not Specified in Advance,”British Journal of Mathematical and Statistical Psychology, 37, 271–282.
COSTANZO, C.M., HUBERT, L.J., and GOLLEDGE, R.G., (1983), “A Higher Moment for Spatial Statistics,”Geographical Analysis, 15, 347–351.
DUBIEN, J.L., and WARDE, W.D., (1981),Some Distributional Results Concerning a Comparative Statistic Used in Cluster Analysis, Unpublished manuscript, Department of Mathematics, Western Michigan University, Kalamazoo, Michigan.
FOWLKES, E.B., and MALLOWS, C.L., (1983), “A Method for Comparing Two Hierarchical Clusterings,”Journal of the American Statistical Association, 78, 553–569.
FRANK, O., (1976), “Comparing Classifications by the Use of the Symmetric Class Difference,” inProceedings in Computational Statistics, eds. J. Gordesch and P. Maeze, Würzburg: Physica Verlag, 84–96.
GAREY, M.R., and JOHNSON, D.S., (1979),Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco: W.H. Freeman.
GOODMAN, L.A., and KRUSKAL, W.H., (1954), “Measures of Association for Cross-Classifications,”Journal of the American Statistical Association, 49, 732–764.
GREEN, P.E., and RAO, V.R., (1969), “A Note on Proximity Measures and Cluster Analysis,”Journal of Marketing Research, 6, 359–364.
HARTIGAN, J.A., (1975),Clustering Algorithms, New York: Wiley.
HUBERT, L.J., (1977), “Nominal Scale Response Agreement as a Generalized Correlation,”British Journal of Mathematical and Statistical Psychology, 30, 98–103.
HUBERT, L.J., (1979), “Matching Models in the Analysis of Cross-Classifications,”Psychometrika, 44, 21–41.
HUBERT, L.J., (1983), “Inference Procedures for the Evaluation and Comparison of Proximity Matrices,” inNumerical Taxonomy, ed. J. Felsenstein, New York: Springer-Verlag, 209–228.
HUBERT, L.J., GOLLEDGE, R.G., COSTANZO, C.M., and GALE, N., (1985), “Order-Dependent Measures of Correspondence for Comparing Proximity Matrices and Related Structures,” inMeasuring the Unmeasurable, eds. P. Nijkamp and H. Leitner, The Hague: Martinus Nijhoff.
JOHNSON, S.C., (1968), “Metric Clustering,” Unpublished manuscript, AT&T Bell Laboratories, Murray Hill, New Jersey.
KENDALL, M.G., (1970),Rank Correlation Methods, 4th Edition, London: Griffin.
KLASTORIN, T.D., (1985), “Thep-Median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach,”Management Science, 31, 84–95.
LERMAN, I.C., (1973), “Etude Distributionelle de Statistiques de Proximité entre Structures Finies de Même Type; Application à la Classification Automatique,”Cahier no. 19 du Bureau Universitaire de Recherche Opérationnelle, Institut de Statistique des Universités de Paris.
MIELKE, P.W., (1979), “On Asymptotic Nonnormality of Null Distributions of MRPP Statistics,”Communications in Statistics — Theory and Methods, A8, 1541–1550 (errata:A10, 1981, p. 1795 andA11, 1982, p. 847).
MIELKE, P.W., and BERRY, K.J., (1985), “Non-Asymptotic Inference Based on the Chrisquare Statistic forr byc Contingency Tables,”Journal of Statistical Planning and Inference, 12, 41–45.
MIELKE, P.W., BERRY, K.J., and BRIER, G.W., (1981), “Application of Multiresponse Permutation Procedures for Examining Seasonal Changes in Monthly Sea-Level Pressure Patterns,”Monthly Weather Review, 109, 120–126.
MILLIGAN, G.W., and COOPER, M.C., (1985), “A Study of the Comparability of External Criteria across Hierarchy Levels,” unpublished manuscript, Ohio State University, Columbus, Ohio.
MILLIGAN, G.W., SOON, S.C., and SOKOL, L.M., (1983), “The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure,”IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5, 40–47.
MIRKIN, B.G., and CHERNYI, L.B., (1970), “Measurement of the Distance Between Distinct Partitions of a Finite Set of Objects,”Automation and Remote Control, 31, 786–792.
MOREY, L.C., and AGRESTI, A., (1984), “The Measurement of Classification Agreement: An Adjustment of the Rand Statistic for Chance Agreement,”Educational and Psychological Measurement, 44, 33–37.
PATEFIELD, W.M., (1981), “Algorithm AS159. An Efficient Method of Generating RandomR × C Tables with Given Row and Column Totals,”Applied Statistics, 30, 91–97.
RAND, W.M., (1971), “Objective Criteria for the Evaluation of Clustering Methods,”Journal of the American Statistical Association, 66, 846–850.
REYNOLDS, H.T., (1977),The Analysis of Cross-Classifications, New York: Free Press.
ROHLF, F.J., (1974), “Methods of Comparing Classifications,”Annual Review of Ecology and Systematics, 5, 101–113.
ROHLF, F.J., (1982), “Consensus Indices for Comparing Classifications,”Mathematical Biosciences, 59, 131–144.
STAM, A.J., (1983), “Generation of a Random Partition of a Finite Set by an Urn Model,”Journal of Combinatorial Theory A, 35, 231–240.
WALLACE, D.L., (1983), “Comment”Journal of the American Statistical Association, 78, 569–579.
Author information
Authors and Affiliations
Additional information
William H.E. Day was Acting Editor for the reviewing of this paper. We are grateful to him, Ove Frank, Charles Lewis, Glenn W. Milligan, Ivo Molenaar, Stanley S. Wasserman, and anonymous referees for helpful suggestions. Lynn Bilger and Tom Sharpe provided competent technical assistance. Partial support of Phipps Arabie's participation in this research was provided by NSF Grant SES 8310866 and ONR Contract N00014-83-K-0733.
Rights and permissions
About this article
Cite this article
Hubert, L., Arabie, P. Comparing partitions. Journal of Classification 2, 193–218 (1985). https://doi.org/10.1007/BF01908075
Issue Date:
DOI: https://doi.org/10.1007/BF01908075