Skip to content

AIprogrammer/Visual-Transformer-Paper-Summary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 

Repository files navigation

Awesome-Transformer-CV

If you have any problems, suggestions or improvements, please submit the issue or PR.

Contents

Attention

  • Recurrent Models of Visual Attention [2014 deepmind NIPS]
  • Neural Machine Translation by Jointly Learning to Align and Translate [ICLR 2015]

OverallSurvey

  • Efficient Transformers: A Survey [paper]
  • A Survey on Visual Transformer [paper]
  • Transformers in Vision: A Survey [paper]

NLP

Language

  • Sequence to Sequence Learning with Neural Networks [NIPS 2014] [paper] [code]
  • End-To-End Memory Networks [NIPS 2015] [paper] [code]
  • Attention is all you need [NIPS 2017] [paper] [code]
  • Bidirectional Encoder Representations from Transformers: BERT [paper] [code] [pretrained-models]
  • Reformer: The Efficient Transformer [ICLR2020] [paper] [code]
  • Linformer: Self-Attention with Linear Complexity [AAAI2020] [paper] [code]
  • GPT-3: Language Models are Few-Shot Learners [NIPS 2020] [paper] [code]

Speech

  • Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation [INTERSPEECH 2020] [paper] [code]

CV

Backbone_Classification

Papers and Codes

  • CoaT: Co-Scale Conv-Attentional Image Transformers [arxiv 2021] [paper] [code]
  • SiT: Self-supervised vIsion Transformer [arxiv 2021] [paper] [code]
  • VIT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [VIT] [ICLR 2021] [paper] [code]
    • Trained with extra private data: do not generalized well when trained on insufficient amounts of data
  • DeiT: Data-efficient Image Transformers [arxiv2021] [paper] [code]
    • Token-based strategy and build upon VIT and convolutional models
  • Transformer in Transformer [arxiv 2021] [paper] [code1] [code-official]
  • OmniNet: Omnidirectional Representations from Transformers [arxiv2021] [paper]
  • Gaussian Context Transformer [CVPR 2021] [paper]
  • General Multi-Label Image Classification With Transformers [CVPR 2021] [paper] [code]
  • Scaling Local Self-Attention for Parameter Efficient Visual Backbones [CVPR 2021] [paper]
  • T2T-ViT: Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [ICCV 2021] [paper] [code]
  • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [ICCV 2021] [paper] [code]
  • Bias Loss for Mobile Neural Networks [ICCV 2021] [paper] [[code()]]
  • Vision Transformer with Progressive Sampling [ICCV 2021] [paper] [[code(https://github.com/yuexy/PS-ViT)]]
  • Rethinking Spatial Dimensions of Vision Transformers [ICCV 2021] [paper] [code]
  • Rethinking and Improving Relative Position Encoding for Vision Transformer [ICCV 2021] [paper] [code]

Interesting Repos

Self-Supervised

  • Emerging Properties in Self-Supervised Vision Transformers [ICCV 2021] [paper] [code]
  • An Empirical Study of Training Self-Supervised Vision Transformers [ICCV 2021] [paper] [code]

Interpretability and Robustness

  • Transformer Interpretability Beyond Attention Visualization [CVPR 2021] [paper] [code]
  • On the Adversarial Robustness of Visual Transformers [arxiv 2021] [paper]
  • Robustness Verification for Transformers [ICLR 2020] [paper] [code]
  • Pretrained Transformers Improve Out-of-Distribution Robustness [ACL 2020] [paper] [code]

Detection

  • DETR: End-to-End Object Detection with Transformers [ECCV2020] [paper] [code]
  • Deformable DETR: Deformable Transformers for End-to-End Object Detection [ICLR2021] [paper] [code]
  • End-to-End Object Detection with Adaptive Clustering Transformer [arxiv2020] [paper]
  • UP-DETR: Unsupervised Pre-training for Object Detection with Transformers [[arxiv2020] [paper]
  • Rethinking Transformer-based Set Prediction for Object Detection [arxiv2020] [paper] [zhihu]
  • End-to-end Lane Shape Prediction with Transformers [WACV 2021] [paper] [code]
  • ViT-FRCNN: Toward Transformer-Based Object Detection [arxiv2020] [paper]
  • Line Segment Detection Using Transformers [CVPR 2021] [paper] [code]
  • Facial Action Unit Detection With Transformers [CVPR 2021] [paper] [code]
  • Adaptive Image Transformer for One-Shot Object Detection [CVPR 2021] [paper] [code]
  • Self-attention based Text Knowledge Mining for Text Detection [CVPR 2021] [paper] [code]
  • Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions [ICCV 2021] [paper] [code]
  • Group-Free 3D Object Detection via Transformers [ICCV 2021] [paper] [code]
  • Fast Convergence of DETR with Spatially Modulated Co-Attention [ICCV 2021] [paper] [code]

HOI

  • End-to-End Human Object Interaction Detection with HOI Transformer [CVPR 2021] [paper] [code]
  • HOTR: End-to-End Human-Object Interaction Detection with Transformers [CVPR 2021] [paper] [code]

Tracking

  • Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking [CVPR 2021] [paper] [code]
  • TransTrack: Multiple-Object Tracking with Transformer [CVPR 2021] [paper] [code]
  • Transformer Tracking [CVPR 2021] [paper] [code]
  • Learning Spatio-Temporal Transformer for Visual Tracking [ICCV 2021] [paper] [code]

Segmentation

  • SETR : Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [CVPR 2021] [paper] [code]
  • Trans2Seg: Transparent Object Segmentation with Transformer [arxiv2021] [paper] [code]
  • End-to-End Video Instance Segmentation with Transformers [arxiv2020] [paper] [zhihu]
  • MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers [CVPR 2021] [paper] [official-code] [unofficial-code]
  • Medical Transformer: Gated Axial-Attention for Medical Image Segmentation [arxiv 2020] [paper] [code]
  • SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation [CVPR 2021] [paper] [code]

Reid

  • Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer [CVPR 2021] [paper] [code]

Localization

  • LoFTR: Detector-Free Local Feature Matching with Transformers [CVPR 2021] [paper] [code]
  • MIST: Multiple Instance Spatial Transformer [CVPR 2021] [paper] [code]

Generation

Inpainting

  • STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting [ECCV 2020] [paper] [code]

Image enhancement

  • Pre-Trained Image Processing Transformer [CVPR 2021] [paper]
  • TTSR: Learning Texture Transformer Network for Image Super-Resolution [CVPR2020] [paper] [code]

Pose Estimation

  • Pose Recognition with Cascade Transformers [CVPR 2021] [paper] [code]
  • TransPose: Towards Explainable Human Pose Estimation by Transformer [arxiv 2020] [paper] [code]
  • Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [ECCV 2020] [paper]
  • HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [ACMMM 2020] [paper]
  • End-to-End Human Pose and Mesh Reconstruction with Transformers [CVPR 2021] [paper] [code]
  • 3D Human Pose Estimation with Spatial and Temporal Transformers [arxiv 2020] [paper] [code]
  • End-to-End Trainable Multi-Instance Pose Estimation with Transformers [arxiv 2020] [paper]

Face

  • Robust Facial Expression Recognition with Convolutional Visual Transformers [arxiv 2020] [paper]
  • Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition [CVPR 2021] [paper] [code]

Video Understanding

  • Is Space-Time Attention All You Need for Video Understanding? [arxiv 2020] [paper] [code]
  • Temporal-Relational CrossTransformers for Few-Shot Action Recognition [CVPR 2021] [paper] [code]
  • Self-Supervised Video Hashing via Bidirectional Transformers [CVPR 2021] [paper]
  • SSAN: Separable Self-Attention Network for Video Representation Learning [CVPR 2021] [paper]

Depth-Estimation

  • Adabins:Depth Estimation using Adaptive Bins [CVPR 2021] [paper] [code]

Prediction

  • Multimodal Motion Prediction with Stacked Transformers [CVPR 2021] [paper] [code]
  • Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case [paper]
  • Transformer networks for trajectory forecasting [ICPR 2020] [paper] [code]
  • Spatial-Channel Transformer Network for Trajectory Prediction on the Traffic Scenes [arxiv 2021] [paper] [code]
  • Pedestrian Trajectory Prediction using Context-Augmented Transformer Networks [ICRA 2020] [paper] [code]
  • Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction [ECCV 2020] [paper] [code]
  • Hierarchical Multi-Scale Gaussian Transformer for Stock Movement Prediction [paper]
  • Single-Shot Motion Completion with Transformer [arxiv2021] [paper] [code]

NAS

PointCloud

  • Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [CVPR 2021] [paper] [code]
  • Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos [CVPR 2021] [paper]

Fashion

  • Kaleido-BERT:Vision-Language Pre-training on Fashion Domain [CVPR 2021] [paper] [code]

Medical

  • Lesion-Aware Transformers for Diabetic Retinopathy Grading [CVPR 2021] [paper]

Cross-Modal

  • Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers [CVPR 2021] [paper]
  • Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning [CVPR2021] [paper] [code]
  • Topological Planning With Transformers for Vision-and-Language Navigation [CVPR 2021] [paper]
  • Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos [CVPRR 2021] [paper]
  • VLN BERT: A Recurrent Vision-and-Language BERT for Navigation [CVPR 2021] [paper] [code]
  • Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling [CVPR 2021] [paper] [code]

Reference

Acknowledgement

Thanks for the awesome survey papers of Transformer.

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy