Home / Papers / Bumblebee: Text-to-Image Generation with Transformers

Bumblebee: Text-to-Image Generation with Transformers

3 Citations2019
Nathan A. Fotedar, Julia H. Wang
journal unavailable

A variational autoencoder model with transformers is explored that suggests that the use of GANs might be key to the text-to-image generation task.

Abstract

While image captioning and segmentation have been widely explored, the reverse problem of image generation from text captions remains a difficult task. The most successful attempts currently employ GANs; however, in this paper we explore a variational autoencoder model with transformers. The motivation for applying transformers to the task of text-to-image generation comes from recent success in applying attention to it (with the AttnGAN model cite. Many researchers have achieved improvements over attention based models by applying transformers, so it seemed that transformers could aid this task as well. The VAE transformer model was ultimately unable to recreate the performance of the more traditional GAN models, and our results suggest that the use of GANs might be key to the text-to-image generation task.