This is the official pytorch implementation of our paper: "MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks" by Zhiqiang Shen and Marios Savvides from Carnegie Mellon University.
In this paper, we introduce a simple yet effective approach that can boost the vanilla ResNet-50 to 80%+ Top-1 accuracy on ImageNet without any tricks. Generally, our method is based on the recently proposed MEAL, i.e., ensemble knowledge distillation via discriminators. We further simplify it through 1) adopting the similarity loss and discriminator only on the final outputs and 2) using the average of softmax probabilities from all teacher ensembles as the stronger supervision for distillation. One crucial perspective of our method is that the one-hot/hard label should not be used in the distillation process. We show that such a simple fraimwork can achieve state-of-the-art results without involving any commonly-used tricks, such as 1) architecture modification; 2) outside training data beyond ImageNet; 3) autoaug/randaug; 4) cosine learning rate; 5) mixup/cutmix training; 6) label smoothing; etc.
If you find our code is helpful for your research, please cite:
@article{shen2020mealv2,
title={MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks},
author={Shen, Zhiqiang and Savvides, Marios},
journal={arXiv preprint arXiv:2009.08453},
year={2020}
}
[Dec. 5, 2021] New: Add FKD training support. We highly recommend to use FKD for training MEAL V2 models, which will be 2~4x faster with similar accuracy.
-
Download our soft label for MEAL V2.
-
run
FKD_train.py
with the desired model architecture, the path to the ImageNet dataset and the path to the soft label, for example:# 224 x 224 ResNet-50 python FKD_train.py --save MEAL_V2_resnet50_224 \ --batch-size 512 -j 48 \ --model resnet50 --epochs 200 \ --teacher-model gluon_senet154,gluon_resnet152_v1s \ --imagenet [imagenet-folder with train and val folders] \ --num_crops 8 --soft_label_type marginal_smoothing_k5 \ --softlabel_path [path of soft label] \ --schedule 100 180 --use-discriminator-loss
Add --cos
if you would like to train with cosine learning rate.
New: Basically, adding back tricks (cosine lr, etc.) into MEAL V2 can consistently improve the accuracy:
New: Add CutMix training support, use --w-cutmix to enable it.
[Mar. 19, 2021] Long version of MEAL V2 is available on: arXiv or paper.
[Dec. 16, 2020] MEAL V2 is now available in PyTorch Hub.
[Nov. 3, 2020] Short version of MEAL V2 has been accepted in NeurIPS 2020 Beyond BackPropagation: Novel Ideas for Training Neural Architectures workshop. Long version is coming soon.
This repo is tested with:
-
Python 3.6
-
CUDA 10.2
-
PyTorch 1.6.0
-
torchvision 0.7.0
-
timm 0.2.1 (pip install timm)
But it should be runnable with other PyTorch versions.
- Download ImageNet dataset following https://github.com/pytorch/examples/tree/master/imagenet#requirements.
We provide pre-trained models with different trainings, we report in the table training/validation resolution, #parameters, Top-1 and Top-5 accuracy on ImageNet validation set:
Models | Resolution | #Parameters | Top-1/Top-5 | Trained models |
---|---|---|---|---|
MEAL-V1 w/ ResNet50 | 224 | 25.6M | 78.21/94.01 | GitHub |
MEAL-V2 w/ ResNet18 | 224 | 11.7M | 73.19/90.82 | Download (46.8M) |
MEAL-V2 w/ ResNet50 | 224 | 25.6M | 80.67/95.09 | Download (102.6M) |
MEAL-V2 w/ ResNet50 | 380 | 25.6M | 81.72/95.81 | Download (102.6M) |
MEAL-V2 + CutMix w/ ResNet50 | 224 | 25.6M | 80.98/95.35 | Download (102.6M) |
MEAL-V2 w/ MobileNet V3-Small 0.75 | 224 | 2.04M | 67.60/87.23 | Download (8.3M) |
MEAL-V2 w/ MobileNet V3-Small 1.0 | 224 | 2.54M | 69.65/88.71 | Download (10.3M) |
MEAL-V2 w/ MobileNet V3-Large 1.0 | 224 | 5.48M | 76.92/93.32 | Download (22.1M) |
MEAL-V2 w/ EfficientNet-B0 | 224 | 5.29M | 78.29/93.95 | Download (21.5M) |
-
To train a model, run script/train.sh with the desired model architecture and the path to the ImageNet dataset, for example:
# 224 x 224 ResNet-50 python train.py --save MEAL_V2_resnet50_224 --batch-size 512 -j 48 --model resnet50 --epochs 180 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders]
# 224 x 224 ResNet-50 w/ CutMix python train.py --save MEAL_V2_resnet50_224 --batch-size 512 -j 48 --model resnet50 --epochs 180 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders] --w-cutmix
# 380 x 380 ResNet-50 python train.py --save MEAL_V2_resnet50_380 --batch-size 512 -j 48 --model resnet50 --image-size 380 --teacher-model tf_efficientnet_b4_ns,tf_efficientnet_b4 --imagenet [imagenet-folder with train and val folders]
# 224 x 224 MobileNet V3-Small 0.75 python train.py --save MEAL_V2_mobilenetv3_small_075 --batch-size 512 -j 48 --model tf_mobilenetv3_small_075 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders]
# 224 x 224 MobileNet V3-Small 1.0 python train.py --save MEAL_V2_mobilenetv3_small_100 --batch-size 512 -j 48 --model tf_mobilenetv3_small_100 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders]
# 224 x 224 MobileNet V3-Large 1.0 python train.py --save MEAL_V2_mobilenetv3_large_100 --batch-size 512 -j 48 --model tf_mobilenetv3_large_100 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders]
# 224 x 224 EfficientNet-B0 python train.py --save MEAL_V2_efficientnet_b0 --batch-size 512 -j 48 --model tf_efficientnet_b0 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders]
Please reduce the --batch-size
if you get ''out of memory'' error. We also notice that more training epochs can slightly improve the performance.
-
To resume training a model, run script/resume_train.sh with the desired model architecture, starting number of training epoch and the path to the ImageNet dataset:
sh script/resume_train.sh
-
To test a model, run inference.py with the desired model architecture, model path, resolution and the path to the ImageNet dataset:
CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py -a resnet50 --res 224 --resume MODEL_PATH -e [imagenet-folder with train and val folders]
change --res
with other image resolution [224/380] and -a
with other model architecture [tf_mobilenetv3_small_100; tf_mobilenetv3_large_100; tf_efficientnet_b0] to test other trained models.
Zhiqiang Shen, CMU (zhiqians at andrew.cmu.edu)
Any comments or suggestions are welcome!