0% found this document useful (0 votes)
7 views

Unsupervised Classification

Uploaded by

Poubelle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unsupervised Classification

Uploaded by

Poubelle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

SCAN: Learning to Classify Images

without Labels

Wouter Van Gansbeke, Simon Vandenhende, Stamatios


Georgoulis, Marc Proesmans and Luc Van Gool
Unsupervised Image Classification
Task: Group a set unlabeled images into semantically
meaningful clusters.
Bird Cat
Unlabeled Data

Cluster

Car Deer
Prior work – Two dominant paradigms
I. Representation Learning II. End-To-End Learning
Idea: Use a self-supervised learning pretext task Idea: - Leverage architecture of CNNs as a prior.
+ off-line clustering (K-means) (e.g. DAC, DeepCluster, DEC, etc.)

or - Maximize mutual information between an


image and its augmentations
(e.g. IMSAT, IIC)

Ex 1: Predict Transformations
Problems:
- Cluster learning depends on initialization,
and is likely to latch onto low-level features.

Ex 2: Instance Discrimination - Special mechanisms required


(Sobel, PCA, cluster re-assignments, etc.).
Problem: K-means leads to cluster degeneracy.
[1] Unsupervised representation learning by predicting image rotations, Gidaris et al. (2018)
[2] Colorful Image Colorization, Richard et al. (2016)
[3] Unsupervised feature learning via non-parametric instance discrimination, Wu et al. (2018)
SCAN: Semantic Clustering by Adopting Nearest Neighbors
Approach: A two-step approach where feature learning and
clustering are decoupled.
Step 1: Solve a pretext task + Mine k-NN Step 2: Train clustering model by imposing
consistent predictions among neighbors
Step 1: Solve a pretext task + Mine k-NN
Question: How to select a pretext task appropriate for the
down-stream task of semantic clustering?
Problem: Pretext tasks which try to predict image
transformations result in a feature representation that is
covariant to the applied transformation.

→ Undesired for the down-stream task of semantic clustering.

→ Solution: Pretext model should minimize the distance


between an image and its augmentations.

[1] Unsupervised representation learning by predicting image rotations, Gidaris et al. (2018)
[2] Colorful Image Colorization, Richard et al. (2016)
[3] AET vs AED, Zhang et al. (2019)
Step 1: Solve a pretext task + Mine k-NN
Question: How to select a pretext task appropriate for the
down-stream task of semantic clustering?

Instance discrimination satisfies the


invariance criterion w.r.t. augmentations
applied during training.

[1] Unsupervised feature learning via non-parametric instance discrimination, Wu et al. (2018)
Step 1: Solve a pretext task + Mine k-NN
The nearest neighbors tend to belong to the same semantic
class.
Step 2: Train clustering model
- SCAN-Loss:
(1) Enforce consistent predictions
among neighbors. Maximize:

→ Dot product forces predictions


to be one-hot (confident)

(2) Maximize entropy to avoid


all samples being assigned to
the same cluster.
Step 2b: Refinement through self-labeling
- Refine the model through self-labeling

- Apply a cross-entropy loss on


strongly augmented [1] versions of
confident samples.

- Applying strong augmentations


avoids overfitting.

[1] RandAugment, Cubuk et al. (2020)


[2] FixMatch, Sohn et al. (2020)
[3] Probability of error, Scudder H. (1965)
Experimental setup
- ResNet backbone + Identical hyperparameters.

- SimCLR and MoCo implementation for the pretext task.

- Experiments on four datasets


Ablation studies - SCAN

- Pretext task - Number of NNs (K)

Pretext Task ACC


(Avg +- Std)
Rotation Prediction 74.3 +- 3.9

Instance 87.6 +- 0.4


Discrimination
Ablation studies - Self-label

Self-labeling (CIFAR-10) Threshold self-labeling

Step ACC
(Avg +- Std)
SCAN 81.8 +- 0.3

Self-labeling 87.6 +- 0.4


Comparison with SOTA
Comparison with SOTA

CIFAR100-20 STL10 CIFAR10


100%
88%
 Large performance gains w.r.t. to prior works:
Classification Accuracy [%]

81%
80% +26:6% on CIFAR10, +25:0% on CIFAR100-20
60%62% and +21:3% on STL10
60% 52% 51%
47%
 SCAN outperforms SimCLR + K-means
40% 36% 37%
33%
30%
24% 26%  Close to supervised performance on CIFAR-10
19% 19%
20% and STL-10

0%
DEC DeepCluster DAC IIC SCAN (Ours)
(ICML16) (ECCV18) (ICCV17) (ICCV19)
ImageNet Results
 Scalable: First method  Semantic clusters: We observe  Confusion matrix shows
which scales to ImageNet that the clusters capture a large ImageNet hierarchy containing
(1000 classes) variety of different backgrounds, dogs, insects, primates,
viewpoints, etc. snakes, clothing, buildings,
birds etc.
Comparison with supervised methods

 Trained with 1% of the labels


 SCAN: Top-1: 39.9%, Top-5: 60.0%, NMI: 72.0%, ARI: 27.5%
Prototypical behavior
Prototype: The closest sample to the mean embedding of
ImageNet
the high confident samples of a certain class.

Prototypes:
- show what each cluster
represents
- are often more pure

STL10

CIFAR10
Conclusion

 Two step approach: decouple feature learning and clustering


 Nearest neighbors capture variance in viewpoints and backgrounds
 Promising results on large scale datasets

Future directions
 Extension to other modalities, e.g. video, audio
 Other domains, e.g. segmentation, semi-supervised, etc.

Code is available on Github

github.com/wvangansbeke/Unsupervised-Classification

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy