Recent Advances in Vision - And-Language Research
Recent Advances in Vision - And-Language Research
and-Language Research
Zhe Gan, Licheng Yu, Yu Cheng, Luowei Zhou,
Linjie Li, Yen-Chun Chen, Jingjing Liu, Xiaodong He
Visual Captioning Visual QA/Grounding/Reasoning
• Popular Topics: Advanced attentions, RL/GAN-based model training, • Popular Topics: Multimodal fusion, Advanced attentions, Use of relations,
Style diversity, Language richness, Evaluation Neural modules, Language bias reduction
• Popular Tasks: Image/video captioning, Dense captioning, Storytelling • Popular Tasks: VQA, GQA, VisDial, Ref-COCO, CLEVR, VCR, NLVR2
SOTA Models:
• StackGAN
• AttnGAN SOTA Models:
• ObjGAN • Image+Text: ViLBERT, LXMERT, Unicoder-VL,UNITER, etc.
• … • Video+Text: Video-BERT, CBT, UniViLM, etc.
Tutorial Agenda
• 1:15 – 1:25 Opening Remarks
• 1:25 – 2:15 Visual QA/Reasoning
• 2:15 – 2:30 Coffee Break
• 2:30 – 3:10 Visual Captioning
• 3:10 – 3:40 Text-to-image Generation
• 3:40 – 4:00 Coffee Break
• 4:00 – 5:00 Self-supervised Learning
Time:
1:25 – 2:15 PM (50 mins)
Presenter:
Zhe Gan (Microsoft)
Zhe Gan is a Senior Researcher at Microsoft Dynamic 365 AI Research. His current
research interests include Vision-and-Language Pre-training and Self-supervised
Learning. Zhe obtained his Ph.D. degree from Duke University in 2018, and Master’s
and Bachelor’s degrees from Peking University in 2013 and 2010, respectively. He is
an Area Chair for NeurIPS 2020 and 2019, and received AAAI-2020 Outstanding
Senior Program Committee Award.
Visual QA/Reasoning/Grounding
Presenter:
Luowei Zhou (Microsoft)
Presenter:
Yu Cheng (Microsoft)
Text-to-Video Synthesis (GAN-based, VAE-based) Dialogue-based Image Synthesis (ChatPainter, CoDraw, SeqAttnGAN)
Session 4: Self-supervised Learning
Time:
4:00 – 5:00 PM (60 mins)
Presenters:
Licheng Yu (Facebook), Yen-Chun Chen (Microsoft), Linjie Li (Microsoft)
Dr. Licheng Yu is a Research Scientist at Facebook AI. Before then, he was at Microsoft Dynamics 365 AI
Research. Licheng completed his PhD from University of North Carolina at Chapel Hill in 2019, and got his B.S degree
from Shanghai Jiaotong University (SJTU) and M.S degrees from both SJTU and Georgia Tech. During his PhD study,
he did summer internships at eBay Research, Adobe Research and Facebook AI Research.
Linjie Li is a Research SDE at Microsoft Dynamic 365 AI Research. Her current research interests include Vision-and-
Language pre-training and self-supervised learning. Linjie obtained her Master's degree in computer science from
Purdue University in 2018. She also holds a Master's degree in Electrical Engineering from UC, San Diego.
Yen-Chun Chen is a Research SDE at Microsoft. He received his M.S. in computer science from UNC Chapel Hill in
2017, where he focused on NLP and text summarization. He got his bachelor degree in electrical engineering
from NTU in 2014. His current research focus is large-scale self-supervised pre-training and its applications.
Self-supervised Learning for Vision-and-Language
…
Thailand. They both seemed
interested in what we were doing