Image Segmentation on AWS EC2 using Semantic Segmentation on MIT ADE20K dataset in PyTorch
This is simple Flask API to consume the segmentation model below. You can use the joined Dockerfile to generate an image and use it as you want.
- Create image
docker build -t image_semantic_segmentation .
- Create container
docker run –name segmentation -p 80:80 image_semantic_segmentation
This project use a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset (http://sceneparsing.csail.mit.edu/).
ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7: https://github.com/CSAILVision/sceneparsing
Color encoding of semantic categories can be found here: https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit?usp=sharing
Encoder:
- MobileNetV2dilated
- ResNet18/ResNet18dilated
- ResNet50/ResNet50dilated
- ResNet101/ResNet101dilated
- HRNetV2 (W48)
Decoder:
- C1 (one convolution module)
- C1_deepsup (C1 + deep supervision trick)
- PPM (Pyramid Pooling Module, see PSPNet paper for details.)
- PPM_deepsup (PPM + deep supervision trick)
- UPerNet (Pyramid Pooling + FPN head, see UperNet for details.)
IMPORTANT: The base ResNet in our repository is a customized (different from the one in torchvision). The base models will be automatically downloaded when needed.
Architecture | MultiScale Testing | Mean IoU | Pixel Accuracy(%) | Overall Score | Inference Speed(fps) |
---|---|---|---|---|---|
MobileNetV2dilated + C1_deepsup | No | 34.84 | 75.75 | 54.07 | 17.2 |
Yes | 33.84 | 76.80 | 55.32 | 10.3 | |
MobileNetV2dilated + PPM_deepsup | No | 35.76 | 77.77 | 56.27 | 14.9 |
Yes | 36.28 | 78.26 | 57.27 | 6.7 | |
ResNet18dilated + C1_deepsup | No | 33.82 | 76.05 | 54.94 | 13.9 |
Yes | 35.34 | 77.41 | 56.38 | 5.8 | |
ResNet18dilated + PPM_deepsup | No | 38.00 | 78.64 | 58.32 | 11.7 |
Yes | 38.81 | 79.29 | 59.05 | 4.2 | |
ResNet50dilated + PPM_deepsup | No | 41.26 | 79.73 | 60.50 | 8.3 |
Yes | 42.14 | 80.13 | 61.14 | 2.6 | |
ResNet101dilated + PPM_deepsup | No | 42.19 | 80.59 | 61.39 | 6.8 |
Yes | 42.53 | 80.91 | 61.72 | 2.0 | |
UperNet50 | No | 40.44 | 79.80 | 60.12 | 8.4 |
Yes | 41.55 | 80.23 | 60.89 | 2.9 | |
UperNet101 | No | 42.00 | 80.79 | 61.40 | 7.8 |
Yes | 42.66 | 81.01 | 61.84 | 2.3 | |
HRNetV2 | No | 42.03 | 80.77 | 61.40 | 5.8 |
Yes | 43.20 | 81.47 | 62.34 | 1.9 |
The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), the inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.
- Install python lib
pip install -r requirements.txt
- Launch init.py
# download model's parameter and test image
python init.py
- To test on an image or a folder of images (
$PATH_IMG
), you can simply do the following:
python test.py --imgs $PATH_IMG
If you find the code or pre-trained models useful, please cite the following papers:
Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba. International Journal on Computer Vision (IJCV), 2018. (https://arxiv.org/pdf/1608.05442.pdf)
@article{zhou2018semantic,
title={Semantic understanding of scenes through the ade20k dataset},
author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
journal={International Journal on Computer Vision},
year={2018}
}
Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)
@inproceedings{zhou2017scene,
title={Scene Parsing through ADE20K Dataset},
author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2017}
}