Skip to content

Kiol-Ice/semantic-segmentation-pytorch

 
 

Repository files navigation

Image Segmentation on AWS EC2 using Semantic Segmentation on MIT ADE20K dataset in PyTorch

Online segmentation application

This is simple Flask API to consume the segmentation model below. You can use the joined Dockerfile to generate an image and use it as you want.

  1. Create image
docker build -t image_semantic_segmentation .
  1. Create container
docker run –name segmentation -p 80:80 image_semantic_segmentation

Semantic Segmentation on MIT ADE20K dataset in PyTorch

This project use a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset (http://sceneparsing.csail.mit.edu/).

ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7: https://github.com/CSAILVision/sceneparsing

Color encoding of semantic categories can be found here: https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit?usp=sharing

Model description

Encoder:

  • MobileNetV2dilated
  • ResNet18/ResNet18dilated
  • ResNet50/ResNet50dilated
  • ResNet101/ResNet101dilated
  • HRNetV2 (W48)

Decoder:

  • C1 (one convolution module)
  • C1_deepsup (C1 + deep supervision trick)
  • PPM (Pyramid Pooling Module, see PSPNet paper for details.)
  • PPM_deepsup (PPM + deep supervision trick)
  • UPerNet (Pyramid Pooling + FPN head, see UperNet for details.)

Model performance:

IMPORTANT: The base ResNet in our repository is a customized (different from the one in torchvision). The base models will be automatically downloaded when needed.

Architecture MultiScale Testing Mean IoU Pixel Accuracy(%) Overall Score Inference Speed(fps)
MobileNetV2dilated + C1_deepsup No34.8475.7554.07 17.2
Yes33.8476.8055.32 10.3
MobileNetV2dilated + PPM_deepsup No35.7677.7756.27 14.9
Yes36.2878.2657.27 6.7
ResNet18dilated + C1_deepsup No33.8276.0554.94 13.9
Yes35.3477.4156.38 5.8
ResNet18dilated + PPM_deepsup No38.0078.6458.32 11.7
Yes38.8179.2959.05 4.2
ResNet50dilated + PPM_deepsup No41.2679.7360.50 8.3
Yes42.1480.1361.14 2.6
ResNet101dilated + PPM_deepsup No42.1980.5961.39 6.8
Yes42.5380.9161.72 2.0
UperNet50 No40.4479.8060.12 8.4
Yes41.5580.2360.89 2.9
UperNet101 No42.0080.7961.40 7.8
Yes42.6681.0161.84 2.3
HRNetV2 No42.0380.7761.40 5.8
Yes43.2081.4762.34 1.9

The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), the inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.

Quick start: Test on an image using our trained model

  1. Install python lib
pip install -r requirements.txt
  1. Launch init.py
# download model's parameter and test image
python init.py
  1. To test on an image or a folder of images ($PATH_IMG), you can simply do the following:
python test.py --imgs $PATH_IMG 

Reference

If you find the code or pre-trained models useful, please cite the following papers:

Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba. International Journal on Computer Vision (IJCV), 2018. (https://arxiv.org/pdf/1608.05442.pdf)

@article{zhou2018semantic,
  title={Semantic understanding of scenes through the ade20k dataset},
  author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
  journal={International Journal on Computer Vision},
  year={2018}
}

Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)

@inproceedings{zhou2017scene,
    title={Scene Parsing through ADE20K Dataset},
    author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    year={2017}
}

About

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.9%
  • Other 1.1%
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy