Home / Papers / Text-to-image Diffusion Models in Generative AI: A Survey

Text-to-image Diffusion Models in Generative AI: A Survey

DOI: 10.48550/arXiv.2303.07909Semantic Scholar

285 Citations•2023•

Chenshuang Zhang, Chaoning Zhang, Mengchun Zhang

ArXiv

An organized review of pioneering methods and their improvements on text-to-image generation, and applications beyond image generation, such as text-guided generation for various modalities like videos, and text-guided image editing.

Abstract

This survey reviews the progress of diffusion models in generating images from text, ~\textit{i.e.} text-to-image diffusion models. As a self-contained work, this survey starts with a brief introduction of how diffusion models work for image synthesis, followed by the background for text-conditioned image synthesis. Based on that, we present an organized review of pioneering methods and their improvements on text-to-image generation. We further summarize applications beyond image generation, such as text-guided generation for various modalities like videos, and text-guided image editing. Beyond the progress made so far, we discuss existing challenges and promising future directions.