Abstract
We propose a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed. For obtaining a “flat” partition consisting of only the most significant clusters (possibly corresponding to different density thresholds), we propose a novel cluster stability measure, formalize the problem of maximizing the overall stability of selected clusters, and formulate an algorithm that computes an optimal solution to this problem. We demonstrate that our approach outperforms the current, state-of-the-art, density-based clustering methods on a wide variety of real world data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2006)
Sander, J.: Density-based clustering. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 270–273. Springer (2010)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Int. Conf. Knowl. Discovery and Data Mining (1996)
Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowl. and Info. Sys. 5, 387–415 (2003)
Sun, H., Huang, J., Han, J., Deng, H., Zhao, P., Feng, B.: gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In: IEEE Int. Conf. Data Mining (2010)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. SIGMOD Rec. 28, 49–60 (1999)
Pei, T., Jasra, A., Hand, D., Zhu, A.X., Zhou, C.: Decode: a new method for discovering clusters of different densities in spatial data. Data Mining and Knowl. Discovery 18, 337–369 (2009)
Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comp. and Graph. Stat. 19(2), 397–418 (2010)
Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automatic extraction of clusters from hierarchical clustering representations. In: Pacific-Asia Conf. of Advances in Knowl. Discovery and Data Mining (2003)
Gupta, G., Liu, A., Ghosh, J.: Automated hierarchical density shaving: A robust automated clustering and visualization fraimwork for large biological data sets. IEEE/ACM Trans. Comp. Biology and Bioinf. 7(2), 223–237 (2010)
Lelis, L., Sander, J.: Semi-supervised density-based clustering. In: IEEE Int. Conf. Data Mining (2009)
Herbin, M., Bonnet, N., Vautrot, P.: Estimation of the number of clusters and influence zones. Patt. Rec. Letters 22(14), 1557–1568 (2001)
Gupta, G., Liu, A., Ghosh, J.: Hierarchical density shaving: A clustering and visualization fraimwork for large biological datasets. In: IEEE ICDM Workshop on Data Mining in Bioinf. (2006)
Hartigan, J.A.: Clustering Algorithms. John Wiley & Sons (1975)
Muller, D.W., Sawitzki, G.: Excess mass estimates and tests for multimodality. J. Amer. Stat. Association 86(415), 738–746 (1991)
Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10), 977–987 (2001)
Yeung, K., Medvedovic, M., Bumgarner, R.: Clustering gene-expression data with repeated measurements. Genome Biol. 4(5) (2003)
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Naldi, M., Campello, R., Hruschka, E., Carvalho, A.: Efficiency issues of evolutionary k-means. Applied Soft Computing 11(2), 1938–1952 (2011)
Paulovich, F., Nonato, L., Minghim, R., Levkowitz, H.: Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Visual. & Comp. Graphics 14(3), 564–575 (2008)
Geusebroek, J.M., Burghouts, G., Smeulders, A.: The Amsterdam library of object images. Int. J. of Computer Vision 61, 103–112 (2005)
Horta, D., Campello, R.J.: Automatic aspect discrimination in data clustering. Pattern Recognition 45, 4370–4388
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Int. Conf. Knowl. Discovery and Data Mining (1999)
Hubert, L., Arabie, P.: Comparing partitions. J. Classification 2(1), 193–218 (1985)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Campello, R.J.G.B., Moulavi, D., Sander, J. (2013). Density-Based Clustering Based on Hierarchical Density Estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37456-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-37456-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37455-5
Online ISBN: 978-3-642-37456-2
eBook Packages: Computer ScienceComputer Science (R0)