GitHunt
FN

fnzhan/Generative-AI

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

README


arXiv
Survey
Maintenance
PR's Welcome
GitHub license

This project is associated with our survey paper which comprehensively contextualizes the advance of Multimodal Image
Synthesis & Editing (MISE) and visual AIGC by formulating taxonomies according to data modality and model architectures.

Multimodal Image Synthesis and Editing: The Generative AI Era [Paper] [Project]

Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Lingjie Liu, Adam Kortylewsk,
Christian Theobalt, Eric Xing

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023


PR's Welcome
You are welcome to promote papers via pull request.

The process to submit a pull request:

  • a. Fork the project into your own repository.
  • b. Add the Title, Author, Conference, Paper link, Project link, and Code link in README.md with below format:
**Title**<br>
*Author*<br>
Conference
[[Paper](Paper link)]
[[Code](Project link)]
[[Project](Code link)]
  • c. Submit the pull request to this branch.

Adversarial Text-to-Image Synthesis: A Review

Stanislav Frolov, Tobias Hinz, Federico Raue, Jörn Hees, Andreas Dengel

Neural Networks 2021
[Paper]

GAN Inversion: A Survey

Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang

TPAMI 2022
[Paper]
[Project]

Deep Image Synthesis from Intuitive User Input: A Review and Perspectives

Yuan Xue, Yuan-Chen Guo, Han Zhang, Tao Xu, Song-Hai Zhang, Xiaolei Huang

Computational Visual Media 2022
[Paper]

Awesome-Text-to-Image


Table of Contents (Work in Progress)

Methods:

Modalities & Datasets:

Neural-Rendering-Methods

ATT3D: Amortized Text-to-3D Object Synthesis

Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas

arxiv 2023
[Paper]

TADA! Text to Animatable Digital Avatars

Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxaing Tang, Yangyi Huang, Justus Thies, Michael J. Black

arxiv 2023
[Paper]

MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR

Xudong Xu, Zhaoyang Lyu, Xingang Pan, Bo Dai

arxiv 2023
[Paper]

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin

arxiv 2023
[Paper]

AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose

Huichao Zhang, Bowen Chen, Hao Yang, Liao Qu, Xu Wang, Li Chen, Chao Long, Feida Zhu, Kang Du, Min Zheng

arxiv 2023
[Paper]
[Project]

Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa

ICCV 2023
[Paper]
[Project]
[Code]

FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields

Sungwon Hwang, Junha Hyung, Daejin Kim, Min-Jung Kim, Jaegul Choo

ICCV 2023
[Paper]

Local 3D Editing via 3D Distillation of CLIP Knowledge

Junha Hyung, Sungwon Hwang, Daejin Kim, Hyunji Lee, Jaegul Choo

CVPR 2023
[Paper]

RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models

Xingchen Zhou, Ying He, F. Richard Yu, Jianqiang Li, You Li

IJCAI 2023
[Paper]

DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation

Yukun Huang, Jianan Wang, Yukai Shi, Xianbiao Qi, Zheng-Jun Zha, Lei Zhang

arxiv 2023
[Paper]
[Project]

AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

Mohit Mendiratta, Xingang Pan, Mohamed Elgharib, Kartik Teotia, Mallikarjun B R, Ayush Tewari, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt

arxiv 2023
[Paper]
[Project]

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields

Ori Gordon, Omri Avrahami, Dani Lischinski

arxiv 2023
[Paper]
[Project]

OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields

Youtan Yin, Zhoujie Fu, Fan Yang, Guosheng Lin

arxiv 2023
[Paper]
[Project]
[Code]

HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance

Junzhe Zhu, Peiye Zhuang

arxiv 2023
[Paper]
[Project]

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu

arxiv 2023
[Paper]
[Project]

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

Jingbo Zhang, Xiaoyu Li, Ziyu Wan, Can Wang, Jing Liao

arxiv 2023
[Paper]
[Project]

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong

arxiv 2023
[Paper]
[Project]

DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun

arxiv 2023
[Paper]
[Project]
[Code]

CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout

Yiqi Lin, Haotian Bai, Sijia Li, Haonan Lu, Xiaodong Lin, Hui Xiong, Lin Wang

arxiv 2023
[Paper]

Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes

Dana Cohen-Bar, Elad Richardson, Gal Metzer, Raja Giryes, Daniel Cohen-Or

arxiv 2023
[Paper]
[Project]
[Code]

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Jaehoon Ko, Hyeonsu Kim, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim

arxiv 2023
[Paper]
[Project]
[Code]

Text-To-4D Dynamic Scene Generation

Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman

arxiv 2023
[Paper]
[Project]

Magic3D: High-Resolution Text-to-3D Content Creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin

CVPR 2023
[Paper]
[Project]

DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model

Gwanghyun Kim, Se Young Chun

CVPR 2023
[Paper]
[Code]
[Project]

Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

Gang Li, Heliang Zheng, Chaoyue Wang, Chang Li, Changwen Zheng, Dacheng Tao

arxiv 2022
[Paper]
[Project]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall

arxiv 2022
[Paper]
[Project]

Zero-Shot Text-Guided Object Generation with Dream Fields

Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole

CVPR 2022
[Paper]
[Code]
[Project]

IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis

Jingxiang Sun, Xuan Wang, Yichun Shi, Lizhen Wang, Jue Wang, Yebin Liu

SIGGRAPH Asia 2022
[Paper]
[Code]
[Project]

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

arxiv 2022
[Paper]
[Code]
[Project]

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

Can Wang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

CVPR 2022
[Paper]
[Code]
[Project]

CG-NeRF: Conditional Generative Neural Radiance Fields

Kyungmin Jo, Gyumin Shim, Sanghun Jung, Soyoung Yang, Jaegul Choo

arxiv 2021
[Paper]

Zero-Shot Text-Guided Object Generation with Dream Fields

Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole

arxiv 2021
[Paper]
[Project]

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Yudong Guo, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao, Juyong Zhang

ICCV 2021
[Paper]
[Code]
[Project]
[Video]


Diffusion-based-Methods

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

Dongxu Li, Junnan Li, Steven C.H. Hoi

Arxiv 2023
[Paper]
[Project]
[Code]

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka

Arxiv 2023
[Paper]
[Project]
[Code]

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani Yael, Pritch Michael, Rubinstein Kfir Aberman

CVPR 2023
[Paper]
[Project]
[Code]

Multi-Concept Customization of Text-to-Image Diffusion

Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu

CVPR 2023
[Paper]
[Project]
[Code]

Collaborative Diffusion for Multi-Modal Face Generation and Editing

Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu

CVPR 2023
[Paper]
[Project]
[Code]

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

Narek Tumanyan, Michal Geyer, Shai Bagon, Tali Dekel

CVPR 2023
[Paper]
[Project]
[Code]

SINE: SINgle Image Editing with Text-to-Image Diffusion Models

Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, Jian Ren

CVPR 2023
[Paper]
[Project]
[Code]

NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or

CVPR 2023
[Paper]
[Project]
[Code]

Paint by Example: Exemplar-Based Image Editing With Diffusion Models

Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen

CVPR 2023
[Paper]
[Demo]
[Code]

SpaText: Spatio-Textual Representation for Controllable Image Generation

Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin

CVPR 2023
[Paper]
[Project]

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis

CVPR 2023
[Paper]
[Project]

InstructPix2Pix Learning to Follow Image Editing Instructions

Tim Brooks, Aleksander Holynski, Alexei A. Efros

CVPR 2023
[Paper]
[Project]
[Code]

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Nithin Gopalakrishnan Nair, Chaminda Bandara, Vishal M Patel

CVPR 2023
[Paper]
[Project]
[Code]

DiffEdit: Diffusion-based semantic image editing with mask guidance

Guillaume Couairon, Jakob Verbeek, Holger Schwenk, Matthieu Cord

CVPR 2023
[Paper]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu

Arxiv 2022
[Paper]
[Project]

Prompt-to-Prompt Image Editing with Cross-Attention Control

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman1 Yael Pritch, Daniel Cohen-Or

Arxiv 2022
[Paper]
[Project]
[Code]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or

Arxiv 2022
[Paper]
[Project]
[Code]

Text2Human: Text-Driven Controllable Human Image Generation

Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, Ziwei Liu

SIGGRAPH 2022
[Paper]
[Project]
[Code]

[DALL-E 2] Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen

[Paper]
[Code]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer

CVPR 2022
[Paper]
[Code]

v objective diffusion

Katherine Crowson

[Code]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen

arxiv 2021
[Paper]
[Code]

Vector Quantized Diffusion Model for Text-to-Image Synthesis

Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo

arxiv 2021
[Paper]
[Code]

DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation

Gwanghyun Kim, Jong Chul Ye

arxiv 2021
[Paper]

Blended Diffusion for Text-driven Editing of Natural Images

Omri Avrahami, Dani Lischinski, Ohad Fried

CVPR 2022
[Paper]
[Project]
[Code]


Autoregressive-Methods

MaskGIT: Masked Generative Image Transformer

Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman

arxiv 2022
[Paper]

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

arxiv 2021
[Paper]
[Project]

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan

arxiv 2021
[Paper]
[Code]
[Video]

L-Verse: Bidirectional Generation Between Image and Text

Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae

arxiv 2021
[Paper]
[Code]

M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis

Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang

NeurIPS 2021
[Paper]

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer

NeurIPS 2021
[Paper]
[Code]
[Project]

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation

Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu

ACM MM 2021
[Paper]
[Code]

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu

ACM MM 2021
[Paper]
[Code]

Taming Transformers for High-Resolution Image Synthesis

Patrick Esser, Robin Rombach, Björn Ommer

CVPR 2021
[Paper]
[Code]
[Project]

RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

Alex Shonenkov and Michael Konstantinov

arxiv 2022
[Code]

Generate Images from Texts in Russian (ruDALL-E)

[Code]
[Project]

Zero-Shot Text-to-Image Generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

arxiv 2021
[Paper]
[Code]
[Project]

Compositional Transformers for Scene Generation

Drew A. Hudson, C. Lawrence Zitnick

NeurIPS 2021
[Paper]
[Code]

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi

EMNLP 2020
[Paper]
[Code]

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning

Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu

AAAI 2022
[Paper]


Image-Quantizer

[TE-VQGAN] Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Woncheol Shin, Gyubok Lee, Jiyoung Lee, Joonseok Lee, Edward Choi

arxiv 2021
[Paper]
[Code]

[ViT-VQGAN] Vector-quantized Image Modeling with Improved VQGAN

Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

arxiv 2021
[Paper]

[PeCo] PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

arxiv 2021
[Paper]

[VQ-GAN] Taming Transformers for High-Resolution Image Synthesis

Patrick Esser, Robin Rombach, Björn Ommer

CVPR 2021
[Paper]
[Code]

[Gumbel-VQ] vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

Alexei Baevski, Steffen Schneider, Michael Auli

ICLR 2020
[Paper]
[Code]

[EM VQ-VAE] Theory and Experiments on Vector Quantized Autoencoders

Aurko Roy, Ashish Vaswani, Arvind Neelakantan, Niki Parmar

arxiv 2018
[Paper]
[Code]

[VQ-VAE] Neural Discrete Representation Learning

Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu

NIPS 2017
[Paper]
[Code]

[VQ-VAE2 or EMA-VQ] Generating Diverse High-Fidelity Images with VQ-VAE-2

Ali Razavi, Aaron van den Oord, Oriol Vinyals

NIPS 2019
[Paper]
[Code]

[Discrete VAE] Discrete Variational Autoencoders

Jason Tyler Rolfe

ICLR 2017
[Paper]
[Code]

[DVAE++] DVAE++: Discrete Variational Autoencoders with Overlapping Transformations

Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, Evgeny Andriyash

ICML 2018
[Paper]
[Code]

[DVAE#] DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors

Arash Vahdat, Evgeny Andriyash, William G. Macready

NIPS 2018
[Paper]
[Code]


GAN-based-Methods

GauGAN2

NVIDIA

[Project]
[Video]

Multimodal Conditional Image Synthesis with Product-of-Experts GANs

Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu

arxiv 2021
[Paper]

RiFeGAN2: Rich Feature Generation for Text-to-Image Synthesis from Constrained Prior Knowledge

Jun Cheng, Fuxiang Wu, Yanling Tian, Lei Wang, Dapeng Tao

TCSVT 2021
[Paper]

TRGAN: Text to Image Generation Through Optimizing Initial Image

Liang Zhao, Xinwei Li, Pingda Huang, Zhikui Chen, Yanqi Dai, Tianyu Li

ICONIP 2021
[Paper]

Audio-Driven Emotional Video Portraits [Audio2Image]

Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu

CVPR 2021
[Paper]
[Code]
[Project]

SketchyCOCO: Image Generation from Freehand Scene Sketches

Chengying Gao, Qi Liu, Qi Xu, Limin Wang, Jianzhuang Liu, Changqing Zou

CVPR 2020
[Paper]
[Code]
[Project]

Direct Speech-to-Image Translation [Audio2Image]

Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao

JSTSP 2020
[Paper]
[Code]
[Project]

MirrorGAN: Learning Text-to-image Generation by Redescription [Text2Image]

Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao

CVPR 2019
[Paper]
[Code]

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [Text2Image]

Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He

CVPR 2018
[Paper]
[Code]

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space

Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski

CVPR 2017
[Paper]
[Code]

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks [Text2Image]

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas

TPAMI 2018
[Paper]
[Code]

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks [Text2Image]

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas

ICCV 2017
[Paper]
[Code]


GAN-Inversion-Methods

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, Christian Theobalt

SIGGRAPH 2023
[Paper]
[Code]

HairCLIP: Design Your Hair by Text and Reference Image

Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Zhentao Tan, Lu Yuan, Weiming Zhang, Nenghai Yu

arxiv 2021
[Paper]
[Code]

FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+ GAN Space Optimization

Xingchao Liu, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su, Qiang Liu

arxiv 2021
[Paper]
[Code]

StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation

Umut Kocasari, Alara Dirik, Mert Tiftikci, Pinar Yanardag

WACV 2022
[Paper]
[Code]
[Project]

Cycle-Consistent Inverse GAN for Text-to-Image Synthesis

Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao

ACM MM 2021
[Paper]

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski

ICCV 2021
[Paper]
[Code]
[Video]

Talk-to-Edit: Fine-Grained Facial Editing via Dialog

Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu

ICCV 2021
[Paper]
[Code]
[Project]

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu

CVPR 2021
[Paper]
[Code]
[Video]

Paint by Word

David Bau, Alex Andonian, Audrey Cui, YeonHwan Park, Ali Jahanian, Aude Oliva, Antonio Torralba

arxiv 2021
[Paper]


Other-Methods

Language-Driven Image Style Transfer

Tsu-Jui Fu, Xin Eric Wang, William Yang Wang

arxiv 2021
[Paper]

CLIPstyler: Image Style Transfer with a Single Text Condition

Gihyun Kwon, Jong Chul Ye

arxiv 2021
[Paper]
[Code]

Wakey-Wakey: Animate Text by Mimicking Characters in a GIF

Liwenhan Xie, Zhaoyu Zhou, Kerun Yu, Yun Wang, Huamin Qu, Siming Chen

UIST 2023
[Paper]
[Code]
[Project]



Text-Encoding

FLAVA: A Foundational Language And Vision Alignment Model

Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, Douwe Kiela

arxiv 2021
[Paper]

Learning Transferable Visual Models From Natural Language Supervision (CLIP)

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

arxiv 2021
[Paper]
[Code]


Audio-Encoding

Wav2CLIP: Learning Robust Audio Representations From CLIP (Wav2CLIP)

Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello

ICASSP 2022
[Paper]
[Code]

Datasets

Multimodal CelebA-HQ (https://github.com/IIGROUP/MM-CelebA-HQ-Dataset)

DeepFashion MultiModal (https://github.com/yumingj/DeepFashion-MultiModal)

Citation

If you use this code for your research, please cite our papers.

@inproceedings{zhan2023mise,
  title={Multimodal Image Synthesis and Editing: The Generative AI Era},
  author={Zhan, Fangneng and Yu, Yingchen and Wu, Rongliang and Zhang, Jiahui and Lu, Shijian and Liu, Lingjie and Kortylewski, Adam and Theobalt, Christian and Xing, Eric},
  booktitle={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2023},
  publisher={IEEE}
}

Languages

TeX100.0%
Created December 4, 2021
Updated February 7, 2026