Skip to content

Latest commit

 

History

History
88 lines (63 loc) · 10.2 KB

README.md

File metadata and controls

88 lines (63 loc) · 10.2 KB

SAGAN (ICML'2019)

Self-attention generative adversarial networks

Task: Conditional GANs

Abstract

In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN performs better than prior work, boosting the best published Inception score from 36.8 to 52.52 and reducing Fréchet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.

Results and models

Results from our SAGAN trained in CIFAR10
Model Dataset Inplace ReLU dist_step Total Batchsize (BZ_PER_GPU * NGPU) Total Iters* Iter IS FID Download
SAGAN-32x32-woInplaceReLU Best IS CIFAR10 w/o 5 64x1 500000 400000 9.3217 10.5030 model | Log
SAGAN-32x32-woInplaceReLU Best FID CIFAR10 w/o 5 64x1 500000 480000 9.3174 9.4252 model | Log
SAGAN-32x32-wInplaceReLU Best IS CIFAR10 w 5 64x1 500000 380000 9.2286 11.7760 model | Log
SAGAN-32x32-wInplaceReLU Best FID CIFAR10 w 5 64x1 500000 460000 9.2061 10.7781 model | Log
SAGAN-128x128-woInplaceReLU Best IS ImageNet w/o 1 64x4 1000000 980000 31.5938 36.7712 model | Log
SAGAN-128x128-woInplaceReLU Best FID ImageNet w/o 1 64x4 1000000 950000 28.4936 34.7838 model | Log
SAGAN-128x128-BigGAN Schedule Best IS ImageNet w/o 1 32x8 1000000 826000 69.5350 12.8295 model | Log
SAGAN-128x128-BigGAN Schedule Best FID ImageNet w/o 1 32x8 1000000 826000 69.5350 12.8295 model | Log

'*' Iteration counting rule in our implementation is different from others. If you want to align with other codebases, you can use the following conversion formula:

total_iters (biggan/pytorch studio gan) = our_total_iters / dist_step

We also provide converted pre-train models from Pytorch-StudioGAN. To be noted that, in Pytorch Studio GAN, inplace ReLU is used in generator and discriminator.

Model Dataset Inplace ReLU n_disc Total Iters IS (Our Pipeline) FID (Our Pipeline) IS (StudioGAN) FID (StudioGAN) Download Original Download link
SAGAN-32x32 StudioGAN CIFAR10 w 5 100000 9.116 10.2011 8.680 14.009 model model
SAGAN0-128x128 StudioGAN ImageNet w 1 1000000 27.367 40.1162 29.848 34.726 model model
  • Our Pipeline denote results evaluated with our pipeline.
  • StudioGAN denote results released by Pytorch-StudioGAN.

For IS metric, our implementation is different from PyTorch-Studio GAN in the following aspects:

  1. We use Tero's Inception for feature extraction.
  2. We use bicubic interpolation with PIL backend to resize image before feed them to Inception.

For FID evaluation, we follow the pipeline of BigGAN, where the whole training set is adopted to extract inception statistics, and Pytorch Studio GAN uses 50000 randomly selected samples. Besides, we also use Tero's Inception for feature extraction.

You can download the preprocessed inception state by the following url: CIFAR10 and ImageNet1k.

You can use following commands to extract those inception states by yourself.

# For CIFAR10
python tools/utils/inception_stat.py --data-cfg configs/_base_/datasets/cifar10_inception_stat.py --pklname cifar10.pkl --no-shuffle --inception-style stylegan --num-samples -1 --subset train

# For ImageNet1k
python tools/utils/inception_stat.py --data-cfg configs/_base_/datasets/imagenet_128x128_inception_stat.py --pklname imagenet.pkl --no-shuffle --inception-style stylegan --num-samples -1 --subset train

Citation

@inproceedings{zhang2019self,
  title={Self-attention generative adversarial networks},
  author={Zhang, Han and Goodfellow, Ian and Metaxas, Dimitris and Odena, Augustus},
  booktitle={International conference on machine learning},
  pages={7354--7363},
  year={2019},
  organization={PMLR},
  url={https://proceedings.mlr.press/v97/zhang19d.html},
}
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy