Hcac ML PDF
Hcac ML PDF
Abstract—Semi-supervised clustering has been widely explored rithms modify the objective function of clustering algorithms
in the last years. In this paper, we present HCAC-ML (Hierarchi- in order to respect the provided constraints. On the other hand,
cal Confidence-based Active Clustering with Metric Learning), distance-based algorithms employ the constraints in order to
an innovative approach for this task which employs distance
metric learning through cluster-level constraints. HCAC-ML is learn a new distance metric to group instances. Constraint-
based on the HCAC algorithm, an state-of-the-art algorithm based methods, in general, have a lower computational cost.
for hierarchical semi-supervised clustering that uses an active However, distance-based methods are more indicated to deal
learning approach for inserting cluster-level constraints. These with contexts containing clusters with different shapes and
constraints are presented to a variation of ITML (Information- densities [10].
theoretic Metric Learning) algorithm to learn a Mahalanobis-like
distance function. We compared HCAC-ML with other semi- Existing distance-based clustering algorithms employ
supervised clustering algorithms in 26 different datasets. Results instance-level constraints [11], [12], [10], [13]. These algo-
indicate that HCAC-ML outperforms other algorithms in most rithms present good results in clustering quality. However,
of the scenarios, but specially when the number of constraints is they require relatively large amounts of user intervention
small. This makes HCAC-ML useful in practical applications. to obtain good results. Given this context, one interesting
alternative not explored in the literature is the usage of
I. I NTRODUCTION
cluster-level constraints and distance-metric learning in a same
Semi-supervised clustering has emerged as an interesting algorithm. To the best of our knowledge, there is no such
alternative in the last years. These algorithms improve the algorithm in the literature. Since cluster-level constraints can
clustering quality through external knowledge conveyed in the carry more information than instance-level constraints, less
form of constraints. These constraints are used to guide the user interventions may be needed to achieve good clustering
clustering process and can be directly derived from original performance. Moreover, employing metric learning approaches
data (using partially labeled data) or provided by an user, help to overcome problems as clusters of different shapes and
trying to adapt clustering results to his/her expectations [1]. to fit different kinds of data.
Constraints employed in semi-supervised clustering algo- In this sense, in this work we present HCAC-ML (Hi-
rithms may refer to instances, clusters or both. Instance-level erarchical Confidence-based Active Clustering with Metric
constraints may indicate whether two instances must or must Learning), a semi-supervised hierarchical clustering algorithm
not belong to the same cluster [2], as in pairwise must-link and that uses cluster-level constraints and distance-learning. The
cannot link constraints, or when a given instance is nearer to core of the algorithm is based on the HCAC algorithm [5],
a second than to a third one [3]. Cluster-level constraints, on which outperforms other semi-supervised algorithms in vari-
the other side, introduce information about groups of objects, ous scenarios. HCAC uses cluster-level constraints posed by
indicating, for example, when two clusters must or must not be the user along an agglomerative clustering process. Reported
merged [4], [5], split [6] or even removed [7]. When referring results indicate that HCAC achieve good results even when
to both instances and clusters, constraints may indicate, for the number of constraints is small, which is specially inter-
example, whether an instance must or must not belong to a esting in practical applications. Unlike other semi-supervised
given cluster [8]. While instance-level constraints are easier algorithms, HCAC explores hierarchical clustering, which has
for a user to provide, cluster-level constraints may be more large practical importance. The data structure provided by
effective, since they refer to a group of instances and, then, hierarchical clustering algorithms provides a visualization of
can convey more information [5]. the data in different levels of abstraction [14], which facilitates
Semi-supervised clustering algorithms may also be classi- the comprehension and the navigation over the data collection.
fied according to the way they deal with the constraints [9]: Moreover, hierarchical clustering algorithms do not require
constraint-based and distance-based. Constraint-based algo- prior knowledge about the number of clusters, which is an
important drawback of partitional clustering algorithms. that this would result in clusters with higher cohesion. For
In HCAC-ML, we adapt the ITML (Information-theoretic each broken constraint, a penalty is added in the objective
Metric Learning) algorithm [15] in order to learn a Maha- function. In MPC-KMeans, a distance metric is learned for
lanobis distance [16] using the cluster-level constraints ob- each cluster, obtaining more cohesive clusters and being more
tained by HCAC. Each user intervention generates one cluster- adaptive to different shapes. This algorithm follows the EM
level constraint which is further derived in several instance- approach. On the E-step, each object is clustered such that the
level constraints to serve as input of the ITML algorithm. Thus, sum of the distance of this object to the centroid of his cluster
HCAC-ML requires less user interventions to achieve good and the penalty for the possible violation of constraints for
results in clustering performance. Our experimental evaluation that attribution is minimized. Next, on the M-step, a distance
shows that HCAC-ML can efficiently use the information learning process for each cluster is performed, in order to learn
provided by the user and outperforms other algorithms with the distance that better fits for that dataset.
few user interventions in different scenarios. An algorithm-independent approach is provided in the In-
This paper is organized as follows. In the next section, we formation Theoretic Metric Learning (ITML) algorithm [15].
present some related work on semi-supervised clustering and ITML learns the covariance matrix A used in the Mahalanobis
distance metric learning. Then, in Section III we present the distance parameterization guided by a set of pairwise must-
HCAC-ML algorithm and describe its three basic steps. In link and cannot-link constraints and two thresholds u and l.
sequence, in Section IV we present or experimental method- According to the ITML approach, pairs of objects involved
ology and report the results. Finally, in Section V we present in must-link constraints must have pairwise distance below l,
our conclusions and point some future work. while objects involved in cannot-link constraints must have
pairwise distance above u. Due to this procedure, ITML does
II. R ELATED W ORK not rely on clustering results to adjust the covariance matrix,
Feature learning and distance metric learning are extensively unlike other distance-based algorithms.
explored in the last years. Feature learning aim at obtaining All of these related works achieve good results in clustering
a new set of features with smaller dimension that preserve quality. Most of them, however, are algorithm-driven and are
some characteristics in the data [17], [18]. Distance metric harder to adapt to employ cluster-level constraints. Due to
learning, on the other hand, focus on learning specific distance its adaptability and algorithm independence, we adapted the
metrics for an application, given that the concept of similarity ITML algorithm in order to work with cluster-level constraints
is subjective and may not be captured by conventional distance provided by the HCAC algorithm. The junction of HCAC and
metrics [19]. ITML generated the HCAC-ML algorithm, which is explained
In this work, we focus on distance metric learning ap- in details in the next section.
proaches, which are more appropriate for the context of semi-
supervised clustering. Existing algorithms for clustering ap- III. HCAC-ML: H IERARCHICAL C ONFIDENCE - BASED
plications learn distance metrics based on string-edit distance ACTIVE C LUSTERING WITH M ETRIC L EARNING
[20], Kullback–Leibler divergence [11], Euclidean distance In this work, we propose the Hierarchical Confidence-
[4] and Mahalanobis distance [21], [22]. This last one is based Active Clustering with Metric Learning (HCAC-ML)
the simplest and most common approach and consists in algorithm, an algorithm based on the HCAC algorithm [5] that
considering the problem of obtaining a similarity function as a performs metric learning through an adaptation of the ITML
Mahalanobis distance on the form dpx, yq “ ||x ´ y||2A , where algorithm [15]. The HCAC-ML algorithm can be described
A is a parameter (covariance) matrix. The values of the A in three basic steps, as presented in Figure 1: (i) Hierarchi-
matrix are modified along the iterations in order to optimize cal confidence-based clustering; (ii) Metric learning through
the adequacy of the distance function to the constraints. Due ITML; and (iii) Unsupervised clustering. Each of these steps
to its simplicity and versatility, in this work we focus on are detailed in the next sections.
learning Mahalanobis-like distance metrics in detriment of
other approaches. A. Hierarchical Confidence-based Clustering
In [22], the Mahalanobis distance is learned through must- In this first step, we cluster the dataset following the
link and cannot-link constraints which are used to indicate HCAC algorithm [5]. The choice for HCAC is due to its
whether two elements are near or not. The authors address this good performance in different scenarios and outperforms other
problem as a convex optimization problem, in order to obtain hierarchical clustering algorithms when the number of user
results without problems of local optimum. In [21], pairwise interventions is small. This shows that HCAC active learning
constraints are also used and a gradient descent approach is approach and the posed cluster-level constraints are efficient.
used to optimize the Mahalanobis weight matrix according to This step aims at obtaining cluster-level constraints to be used
the given constraints. in the metric learning procedure in the next step (ITML).
The MPC-KMeans algorithm [10] also learns a Mahalanobis The pseudocode of the HCAC procedure can be observed
distance function. This algorithm employs pairwise constraints in Algorithm 1. HCAC performs an agglomerative clustering
to learn a distance metric in a KMeans [23]. The constraints procedure and asks for user intervention to pose cluster-level
are relaxed, allowing the violation of some constraints in cases constraints when a low confidence cluster merge is detected.
Fig. 1. Steps of the HCAC-ML algorithm.
Algorithm 1: Hierarchical Confidence-based Active Clus- The confidence of a merge is related to the distance between
tering procedure. the elements from the proposed merge and other elements near
Input : X “ tx1 , x2 , ..., xn u: dataset of size n them. In essence, if a pair of elements are close but far from
δ: confidence threshold, other elements of the dataset, the merging confidence is high.
µ: number of pair of clusters to be shown to the user : On the other hand, if this pair of elements is also very near
maximum number of user interventions to other elements in the dataset, it might be advisable to ask
distp¨, ¨q: distance function for the user intervention to validate this merge. Experimental
Output: C: set of clusters generated by the algorithm evaluations show that this concept is useful in detecting cluster
Ψ: set of constraints posed by the user borders [5].
1 begin Let us consider a distance function distp¨, ¨q between data
2 for @xi P X do instances in a dataset which is used to produce the distance
3 ci Ð xi matrix DM . Along the clustering iterations, as instances
4 end and clusters are merged, the distances between the formed
5 for @ci P C do clusters and the remaining elements are updated in the DM
6 for @cj P C, cj ‰ ci do matrix according to the UPGMA approach [24]. The natural
7 DMi,j “ distpci , cj q merge (unsupervised merge) in each step of an agglomerative
8 end hierarchical clustering process involves the nearest pair of
9 end elements ci and cj with a distance DMi,j . The confidence θ
10 int Ð 0 new Ð n ` 1 of this merge is calculated by the difference between DMi,j
11 while |DM | ‰ 1 do and DMr,s , where DMr,s “ min DMk,l , k ‰ l, pk, lq ‰
12 minDist “ DMi,j : DMi,j “ min DMx,y , cx ‰ pi, jq, k P ti, ju ‘ l P ti, ju. In practical terms, low confidence
cy merges are those where confidence α is below a threshold δ
13 secM inDist “ DMr,s : DMr,s “ which is defined a priori. To determine the value of δ, we
min DMk,l , cl P tci , cj u, ck R tci , cj u perform a calibration procedure by running an unsupervised
14 θ Ð secM inDist ´ minDist clustering process and storing the confidence value of each
15 if θ ď δ and int ď then cluster merge in an ordered list Conf . The value of δ can be
16 sortedN eighbors “ found by picking Conf , where is the desired number of
nearestN eighborspci , cj , µq interventions.
17 pcnew , ψint q “ Once a low confidence is detected, the user intervention is
userChoisepsortedN eighbors, ci , cj q required to pose a cluster-level constraint. In this intervention,
18 int += 1 the user is asked to point the best cluster merge in that given
19 end step. A pool of µ pairs of clusters, where µ is defined a
20 else priori, is presented to the user. This pool is assembled in the
21 cnew “ mergepci , cj q. nearestN eighbors function and contains the nearest pair of
22 end clusters pci , cj q and the set containing µ ´ 1 pairs pcx , cy q cor-
23 updateM atrixpDM, cnew q respondent to the best unsupervised cluster merges involving
24 for @p P DM, p ‰ new do ci or cj (i.e., pcx , cy q ‰ pca , cb q and x P ti, ju ‘ y P ti, ju).
25 DMnew,p “ averagepDMp,i , DMp,j q Each of these pair of elements is presented to the user using
26 end some cluster summarization technique, as wordclouds in text
27 end clustering or parallel coordinates in other applications, in order
28 end to make it easier for the user to comprehend these clusters.
The number of user interventions must not exceed a
predefined value . Once this threshold is achieved, the
clustering process resumes in an unsupervised way. Each of a dissimilarity constraint between A and B, we pose a dis-
the t “ 1, ..., user interventions posed in the userChoice similarity constraint between every pair pax , by q, ax P A and
function generates a constraint ψt “ pCSt , CDt q, where by P B.
CSt “ pcl , cm , DMi,j q contains the pair pcl , cm q selected to
be merged in detriment of the pair of clusters in the set CDt
and the distance between the nearest pair of clusters pci , cj q. a1
The set CDt is composed by the pairs of clusters in the set
sortedN eighbors whose distance is smaller than DMl,m .
a2
An example of this procedure is illustrated in Figure 2. In 1.5
this figure, we assume that we have two underlying clusters b2
A
represented by the two sets of points. Each of these points
corresponds to a subcluster. Assuming that this is a low
b1 b3
confidence merge and we show the set of pairs of clusters
tpA, Bq, pB, Dq, pA, Cq, pA, Equ for the user to choose the B
next cluster merge, this user should select the pair pA, Cq,
since it appears to be the best option (the nearest pair of CD(A,B,1.5) → { ID(a1,b1,1.5), ID(a1,b2,1.5), ID(a1,b3,1.5),
ID(a2,b1,1.5), ID(a2,b2,1.5), ID(a2,b3,1.5)}
subclusters belonging to the same underlying cluster).
Then, it is intuitive to assume that in the new distance
Fig. 3. Generating instance-level constraints from the cluster-level constraints
function to be learned, the distance between the pair pA, Cq provided by HCAC.
should be smaller than the distance of the non-selected pairs.
Thus, the distance between the nearest pair of clusters is Finally, we assemble the sets IS and ID, such that IS “
taken as the u (upper bound) parameter for this similarity tIS1 Y IS2 Y ... Y ISµ u and ID “ tID1 Y ID2 Y ... Y IDµ u.
constraint. Similarly, we assume that the distance between the The IS and ID sets are used as input in the metric learning
non-selected pairs pA, Bq and pB, Dq should be greater than step, presented in the next section.
the distance between pA, Cq, which is taken as the l (lower
bound) parameter for this dissimilarity constraint. B. Metric Learning
Once the HCAC procedure finishes, the instance-level con-
straints generated by the cluster-level constraints posed by
H E M the user are used to learn a Mahalanobis distance through an
D
ITML approach [15].
K N
I
F A The pseudocode of the modified ITML procedure adapted
B
for HCAC-ML is presented in Algorithm 2. It is possible to
C observe that this procedure relies on a set IS of instance sim-
L
G
J ilarity constraints and ID instance dissimilarity constraints; a
set U and L of upper and lower distance bounds; an slack
Cluster Distances Cluster Constraints
parameter γ; and an input covariance matrix A0 . The sets IS,
d(A,B) = 1.0
User
→ Cluster Similarity
ID, U and L are generated in the previous step, derived from
Intervention
d(B,D) = 1.3 CS = {(A,C,1.0)} the HCAC cluster-level constraints.
d(A,C) = 1.5 → Cluster Dissimilarity
d(A,E) = 2.3 CD = {(A,B,1.5), (B,D,1.5)}
Instead of using an upper bound and an lower bound values,
as in the original ITML algorithm, in HCAC-ML we have
Fig. 2. Generating similarity and dissimilarity constraints with the constraints sets of values. These values are obtained for each constraint
provided by HCAC. and reflect the hierarchical nature of the clustering algorithm.
Since for cluster merges in the lower levels of the cluster
Once we have sets of cluster similarity (CS) and cluster hierarchy the distances are smaller, the upper and lower bounds
dissimilarity (CD) constraints among pairs of clusters, we of constraints posed in these levels are also smaller. The
have to break each of this constraints in sets of instance- usage of different values for these boundaries leads to a
level constraints. Each CSt set generates a set ISt containing faster convergence of ITML, since it may have more than one
instance-level similarity constraints. Analogously, each CLt constraint for a same pair of instances.
set generates an instance-level dissimilarity set IDt . Both ISt The matrix A0 is the identity matrix, which leads the
and IDt are used as input for the ITML algorithm in the next Mahalanobis distance to be equal the Euclidean distance. The
step of HCAC-ML. parameter γ controls the intensity of the adjustment of the
The procedure of transforming cluster-level constraints in covariace matrix A. This parameter regularizes the importance
instance-level constraints is illustrated in Figure 3. In this relation between satisfying the constraints and the distance
figure, we assume that A “ ta1 , a2 u and B “ tb1 , b2 , b3 u minimization between A and A0 and varies in the interval
are the clusters composed by sets of elements. As we have r0, 1s.
Algorithm 2: Modified Information-Theoretic Metric IV. E XPERIMENTAL E VALUATION
Learning procedure. In this paper, we compared HCAC-ML with other state-
Input : X: distance matrix of size d x n, where d is of-the-art semi-supervised hierarchical clustering algorithms:
the number of attributes and n is the number of HCAC [5]; HCAC-LC [25] (Hierarchical Confidence-based
examples Active Clustering with Limited Constraints, a variation of
IS: Set of instance similarity constraints HCAC that only poses constraints when clusters size is greater
ID: Set of instance dissimilarity constraints than 2); Constrained Complete Link (CCL) [4]; and a hierar-
U, L: set of upper and lower bounds of distance for each chical clustering algorithm that employs pairwise constraints
constraint must-link and cannot-link [2], [26]. Also, we compared these
A0 : input covariance matrix algorithms with an unsupervised clustering algorithm (UP-
γ : slack parameter GMA [24]), which is used as a baseline.
c: constraint indexing function
A. Evaluation Methodology
Output: A: learned covariance matrix for the
Mahalanobis distance For the comparison of these algorithms, we used 26 datasets.
1 A Ð A0 From these datasets, 13 were real-world numerical datasets
2 λij Ð 0, @i, j from the UCI repository1 and from the MULAN repository2 .
3 if pi, jq P IS then A description of these datasets can be observed in Table I.
4 ξcpi,jq Ð Ui,j The remaining 13 datasets were artificially generated bi-
5 else dimensional datasets3 , varying the number of clusters in each
6 ξcpi,jq Ð Li,j dataset from 2 to 30. All datasets are perfectly balanced, with
7 end 30 examples in each cluster. Each cluster is formed by the
8 for @ISpi, jq P IS and IDpi, jq P ID do combination of two normal distributions (one for the x-axis
9 Get ISpi, jq P IS or IDpi, jq P ID and other for the y-axis), separated by a constant distance
10 p Ð pxi ´ xj qT Apxi ´ xj q and, therefore, are well shaped.
11 if pi, jq P ID then
12 δ Ð Li,j TABLE I
D ESCRIPTION OF THE REAL - WORLD NUMERICAL DATASETS USED IN THE
13 else EXPERIMENTS .
14 δ Ð ´Li, j
15 end Dataset # Examples # Classes
γ Breast Cancer Wisconsin 683 2
16 α Ð minpλij , 2δ p p1 ´ ξcpi,jq qq Ecoli 336 8
δα Emotions 593 27
17 β Ð 1´δαp
γξcpi,jq
Glass 214 6
18 ξcpi,jq Ð γ`δαξ cpi,jq
Haberman 306 2
Image Segmentation 210 7
19 γij Ð γij ´ α Ionosphere 351 2
20 A Ð A ` βApxi ´ xj qpxi ´ xj qT A Iris 150 3
21 end Pima 768 2
Vertebral Column 310 3
Vowel 990 10
Wine 178 3
Zoo 101 7
At each iteration of the ITML procedure, one must-link
or cannot-link constraint is used to adjust the A matrix. We have used objective measures to simulate the human
Unlikely the original ITML algorithm which performs multiple interaction in the semi-supervised algorithms. These measures
iterations over the constraints, HCAC-ML performs a single model user’s behaviour through the labels provided with the
iteration. This phenomenon is possible due to the relatively datasets to automatically answer the queries posed by the
large number of pairwise instance-level constraints derived algorithms. In HCAC, HCAC-LC and HCAC-ML we used
from the cluster-level constraints. Performing multiple itera- the entropy measure [27] in order to select the best cluster
tions should improve the convergence but also implies in an merge. Entropy is an easily calculated and naive measure
additional computational cost. that reflects the “purity” of a cluster. We assumed that the
user should choose the cluster merge that generates the a
C. Unsupervised Clustering cluster with the smallest possible entropy. For the algorithm
Once obtained the covariance matrix is adjusted, HCAC-ML using pairwise constraints, we inserted pairwise constraint to
uses this new covariance matrix to calculate a new distance randomly chosen pairs of instances. As suggested by [26], if
matrix DM using the Mahalanobis distance. The matrix DM the elements belong to the same class, a must-link constraint
is used as input for the unsupervised clustering algorithm 1 http://archive.ics.uci.edu/ml/datasets.html
Average-Link (UPGMA) [24]. The result of the Average-Link 2 http://mulan.sourceforge.net/datasets.html
TABLE II
R ESULTS OF THE STATISTICAL COMPARISON OF HCAC-ML AGAINST OTHER METHODS .
In this paper, we present Hierarchical Confidence-based HCAC-ML has the drawback of being applicable only
Active Clustering with Metric Learning (HCAC-ML), a hierar- to scenarios where it is possible to employ a Mahalanobis
chical clustering algorithm that learns a Mahalanobis distance distance. In this sense, it cannot be directly applied in con-
using cluster-level constraints. HCAC-ML is based on the texts where Mahalanobis distance is not adequate, such as
HCAC algorithm, which uses cluster-level constraints posed in text clustering applications and other sparse contexts. In
using an active learning approach based on the confidence of these scenarios, an extra preprocessing step is necessary.
a cluster merge. Each cluster-level constraint acquired with For example, in textual datasets, one must normalize data
HCAC is derived in a set of instance-level pairwise similarity using the data versors [31]. Also, HCAC-ML presents the
and dissimilarity constraints. These pairwise constraints are additional computational cost of requiring an extra (unsuper-
presented to a variation of the ITML algorithm, which learns vised) clustering step, performed after learning the distance
the covariance matrix for a new Mahalanobis distance. metric, when compared to other semi-supervised clustering
Experimental results using 26 datasets show that HCAC-ML algorithms. However, this is an essential characteristic of
outperforms other state-of-the-art semi-supervised hierarchical distance metric learning algorithms. Given the good results
clustering algorithms. In particular, HCAC-ML have a strong achieved by HCAC-ML and considering that this additional
performance when the number of constraints provided by the step does not modify the complexity magnitude of the method,
user is small. This makes HCAC-ML useful in practical ap- this additional cost may be considered not relevant.
In future work, we intend to report the performance varia- [12] S. Basu, M. Bilenko, and R. J. Mooney, “A probabilistic framework
tion of the method when performing multiple iterations of the for semi-supervised clustering,” in KDD ’04: Proceedings of the 10th
ACM SIGKDD International Conference on Knowledge Discovery and
ITML step instead of using a single-pass approach. Moreover, Data Mining. New York, NY, USA: ACM, 2004, pp. 59–68. [Online].
we intend to apply HCAC-ML in larger datasets, including Available: http://doi.acm.org/10.1145/1014052.1014062
textual datasets. Finally, we intend to compare the HCAC-ML [13] V.-V. Vu, N. Labroche, and B. Bouchon-Meunier, “Improving
constrained clustering with active query selection,” Pattern Recognition,
performance with other distance metric learning algorithms. vol. 45, no. 4, pp. 1749 – 1758, 2012. [Online]. Available:
Given that existing algorithms perform partitional clustering, http://www.sciencedirect.com/science/article/pii/S0031320311004407
we must prune the dendrogram to be able to compare the [14] R. Gil-Garca and A. Pons-Porrata, “Hierarchical star clustering algo-
rithm for dynamic document collections,” in CIARP ’08: Proceedings
performance of HCAC-ML with these methods. of the 13th Iberoamerican congress on Pattern Recognition. Berlin,
Heidelberg: Springer-Verlag, 2008, pp. 187–194.
ACKNOWLEDGMENT [15] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon,
“Information-theoretic metric learning,” in Proceedings of the 24th
The authors would like to acknowledge the financial support International Conference on Machine Learning, ser. ICML ’07. New
of the Coordination for the Improvement of Higher Education York, NY, USA: ACM, 2007, pp. 209–216. [Online]. Available:
http://doi.acm.org/10.1145/1273496.1273523
Personnel (CAPES), the Brazilian Council for Scientific and [16] P. C. Mahalanobis, “On the generalized distance in statistics,” Proc. of
Technological Development (CNPq) and the Foundation for the National Institute of Sciences (India), vol. 2, pp. 49–55, 1936.
the Support and Development of Education, Science and [17] Z. Zhang, M. Zhao, and T. W. S. Chow, “Binary- and multi-class
group sparse canonical correlation analysis for feature extraction and
Technology of Mato Grosso do Sul (FUNDECT - Siafem classification,” IEEE Transactions on Knowledge and Data Engineering,
25907, Process 147/2016). vol. 25, no. 10, pp. 2192–2205, Oct 2013.
[18] Z. Zhang, S. Yan, and M. Zhao, “Pairwise sparsity preserving embedding
R EFERENCES for unsupervised subspace learning and classification,” IEEE Transac-
tions on Image Processing, vol. 22, no. 12, pp. 4640–4651, Dec 2013.
[1] S. Dasgupta and V. Ng, “Which clustering do you want? inducing [19] B. Kulis, “Metric learning: A survey,” Foundations and Trends in
your ideal clustering with minimal feedback,” Journal of Artificial Machine Learning, vol. 5, no. 4, pp. 287–364, 2012.
Intelligence Research, vol. 39, pp. 581–632, 2010. [Online]. Available: [20] M. Bilenko and R. J. Mooney, “Adaptive duplicate detection using
http://dl.acm.org/citation.cfm?id=1946417.1946430 learnable string similarity measures,” in KDD ’03: Proceedings of the
[2] K. Wagstaff and C. Cardie, “Clustering with instance-level constraints,” 9th ACM SIGKDD International Conference on Knowledge Discovery
in ICML ’00: Proceedings of the 17th International Conference on and Data Mining. New York, NY, USA: ACM, 2003, pp. 39–48.
Machine Learning. San Francisco, CA, USA: Morgan Kaufmann [Online]. Available: http://doi.acm.org/10.1145/956750.956759
Publishers Inc., 2000, pp. 1103–1110. [21] H.-j. Kim and S.-g. Lee, “An effective document clustering method
[3] N. Kumar, K. Kummamuru, and D. Paranjpe, “Semi-supervised clus- using user-adaptable distance metrics,” in SAC ’02: Proceedings
tering with metric learning using relative comparisons,” in ICDM ’05: of the 9th ACM Symposium on Applied Computing. New
Proceedings of the 5th IEEE International Conference on Data Mining. York, NY, USA: ACM, 2002, pp. 16–20. [Online]. Available:
Washington, DC, USA: IEEE, 2005, pp. 693–696. http://doi.acm.org/10.1145/508791.508796
[4] D. Klein, S. D. Kamvar, and C. D. Manning, “From instance-level con- [22] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell, “Distance
straints to space-level constraints: Making the most of prior knowledge metric learning, with application to clustering with side-information,”
in data clustering,” in ICML ’02: Proceedings of the 19th International in Advances in Neural Information Processing Systems 15, vol. 15.
Conference on Machine Learning. San Francisco, CA, USA: Morgan Cambridge, MA: MIT Press, 2003, pp. 505–512.
Kaufmann Publishers, 2002, pp. 307–314. [23] J. B. MacQueen, “Some methods for classification and analysis of mul-
[5] B. M. Nogueira, A. M. Jorge, and S. O. Rezende, “HCAC: Semi- tivariate observations,” in Proceedings of the V Berkeley Symposium on
supervised hierarchical clustering using confidence-based active learn- Mathematical Statistics and Probability, L. M. L. Cam and J. Neyman,
ing,” in DS ’12: Proceedings of the 15th International Conference Eds., vol. 1. University of California Press, 1967, pp. 281–297.
on Discovery Science, ser. Lecture Notes in Computer Science, J.-G. [24] R. R. Sokal and C. D. Michener, “A statistical method for evaluating sys-
Ganascia, P. Lenca, and J.-M. Petit, Eds. Springer Berlin Heidelberg, tematic relationships,” University of Kansas Scientific Bulletin, vol. 28,
2012, vol. 7569, pp. 139–153. pp. 1409–1438, 1958.
[6] M.-F. Balcan and A. Blum, “Clustering with interactive feedback,” in [25] B. M. Nogueira, “Hierarchical semi-supervised confidence-based active
ALT ’08: Proceedings of the 19th international conference on Algorith- clustering and its application to the extraction of topic hierarchies
mic Learning Theory. Berlin, Heidelberg: Springer-Verlag, 2008, pp. from document collections,” Ph.D. dissertation, Instituto de Ciłncias
316–328. Matemáticas e de Computacão, 2013.
[7] Y. Huang and T. M. Mitchell, “Text clustering with extended user [26] I. Davidson and S. S. Ravi, “Using instance-level constraints in agglom-
feedback,” in SIGIR ’06: Proceedings of the 29th ACM Conference on erative hierarchical clustering: theoretical and empirical results,” Data
Research and Development in Information Retrieval. New York, NY, Mining and Knowledge Discovery, vol. 18, no. 2, pp. 257–282, 2009.
USA: ACM, 2006, pp. 413–420. [27] C. Shannon, “A mathematical theory of communication,” Bell System
[8] A. Dubey, I. Bhattacharya, and S. Godbole, “A cluster-level semi- Technical Journal, vol. 4, pp. 379–423, 1948.
supervision model for interactive clustering,” in ECML PKDD’10: [28] B. Larsen and C. Aone, “Fast and effective text mining using
Proceedings of the 2010 European Conference on Machine linear-time document clustering,” in KDD ’99: Proceedings of the 5th
Learning and Knowledge Discovery in Databases: Part I. Berlin, ACM SIGKDD International Conference on Knowledge Discovery and
Heidelberg: Springer-Verlag, 2010, pp. 409–424. [Online]. Available: Data Mining. New York, NY, USA: ACM, 1999, pp. 16–22. [Online].
http://dl.acm.org/citation.cfm?id=1888258.1888292 Available: http://doi.acm.org/10.1145/312129.312186
[9] R. Huang and W. Lam, “An active learning framework for semi- [29] R. M. Aliguliyev, “Performance evaluation of density-
supervised document clustering with language modeling,” Data and based clustering methods,” Information Sciences, vol. 179,
Knowledge Engineering, vol. 68, no. 1, pp. 49–67, 2009. no. 20, pp. 3583 – 3602, 2009. [Online]. Available:
[10] M. Bilenko, S. Basu, and R. J. Mooney, “Integrating constraints and http://www.sciencedirect.com/science/article/pii/S0020025509002564
metric learning in semi-supervised clustering,” in ICML ’04: Proceed- [30] F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics
ings of the 21st International Conference on Machine learning. New Bulletin, vol. 1, no. 6, pp. 80–83, 1945. [Online]. Available:
York, NY, USA: ACM, 2004, pp. 81–88. http://dx.doi.org/10.2307/3001968
[11] D. Cohn, R. Caruana, and A. Mccallum, “Semi-supervised clustering [31] C. D. Manning, P. Raghavan, and H. Schütze, “Language models
with user feedback - technical report tr2003-1892,” Cornell University, for information retrieval,” in An Introduction to Information Retrieval.
Tech. Rep., 2003. Cambridge University Press, 2008, ch. 12.