Delve into the exciting world of Text to Image Generation with our collection of top research papers. These papers cover innovative techniques and advancements in the field, providing essential insights for researchers and enthusiasts alike. Stay ahead of the curve by exploring the latest developments and trends shaping this intriguing area of study.
Looking for research-backed answers?Try AI Search
For the last three years diffusion denoising approach with a score-based loss function became notable as well: many researches tackling problem of image generation report SOTA results.
Guorong Xiao
2013 5th International Conference on Computational Intelligence and Communication Networks
A novel method to generate text image is presented, and some results are shown to demonstrate the generation of text image with the proposed method.
Zhiqiu Lin, Deepak Pathak, Baiqi Li + 5 more
journal unavailable
The VQAScore is introduced, which uses a visual-question-answering (VQA) model to produce an alignment score by computing the probability of a"Yes"answer to a simple"Does this figure show '{text}'?"question, and though simpler than prior art, VQAScore computed with off-the-shelf models produces state-of-the-art results across many image-text alignment benchmarks.
Junyi Li, Wayne Xin Zhao, J. Nie + 1 more
ArXiv
This paper proposes R ENDER D IF - FUSION, a novel diffusion approach for text generation via text-guided image generation that can achieve comparable or even better results than several baselines, including pretrained language models.
Junyi Li, Wayne Xin Zhao, J. Nie + 1 more
journal unavailable
GlyphDiffusion is proposed, a novel diffusion approach for text generation via text-guided image generation that utilizes a cascaded architecture (ie a base and a super-resolution diffusion model) to generate high-fidelity glyph images, conditioned on the input text.
Zhaorui Tan, Zihan Ye, Xi Yang + 3 more
ArXiv
A novel CLIP-based metric termed as Semantic Similarity Distance ( SSD) is developed, both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets, and the proposed PDF-GAN can mitigate inconsistent semantics and bridge the text-image semantic gap.
Yankun Wu, Yuta Nakashima, Noa García
ArXiv
This paper focuses on the evaluation of recent popular models such as Stable Diffusion, a diffusion model operating in the latent space and using CLIP text embedding, and DALL-E 2, a diffusion model leveraging Seq2Seq architectures like BART, involving bias evaluation setup, bias evaluation metrics, and findings and trends.
Colin Conwell, T. Ullman
ArXiv
It is suggested that current image generation models do not yet have a grasp of even basic relations involving simple objects and agents, based on a quantitative examination of people's judgments.
Songwei Ge, Taesung Park, Jun-Yan Zhu + 1 more
2023 IEEE/CVF International Conference on Computer Vision (ICCV)
This work extracts each word’s attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis, and demonstrates that the method outperforms strong baselines with quantitative evaluations.
J. Oppenlaender
Proceedings of the 25th International Academic Mindtrek Conference
The paper argues that the current product-centered view of creativity falls short in the context of text-to-image generation, and provides a high-level summary of this online ecosystem drawing on Rhodes’ conceptual four P model of creativity.
Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz + 1 more
ArXiv
A novel controllable text-to-image generative adversarial network (ControlGAN) is proposed, which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions.
Zhengcong Fei, Mingyuan Fan, Li Zhu + 1 more
journal unavailable
This paper presents a progressive model for high-fidelity text-to-image generation that produces significantly better results compared with the previous VQ-AR method in FID score across a wide variety of categories and aspects.
This work creates an end-to-end solution that can generate artistic images from text descriptions due to the lack of datasets with paired text description and artistic images.
This work develops Surgical Imagen, a diffusion-based generative model that is developed to generate photorealistic and activity-aligned surgical images from triplet-based textual prompts, and designs an instrument-based class balancing technique to counteract data imbalance and skewness, improving training convergence.
Pranjali Avhad
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
This project investigates the novel use of stable diffusion techniques to generate high-quality images from detailed text descriptions using cutting-edge stable diffusion models, which includes tokenization, pre-processing, specialized architecture design, and post-processing techniques.
Rishabh Chandila, Deepak Kumar
2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT)
With the exponential growth of data and advancements in Artificial Intelligence (AI), the task of converting textual descriptions into visual images has gained significant traction. This research addresses the challenge by leveraging AI techniques to develop a robust Text-to-Image Generation system that translating text prompt into accurate, high-quality images. This study aims to explore and analyze cutting-edge AI techniques for text-to-image generation, with a focus on enhancing the accuracy and visual quality of generated images. Our strategy involves employing state-of-the-art neural netw...
Dr. P. Srinivasu, P. Harish, K. P. Chandra + 2 more
International Journal for Research in Applied Science and Engineering Technology
This work introduces a controlled chaos system, where a stalling generator network and a disoriented discriminator network co-exist, and explores the potential of GANs for jumbled up art creation, data manipulation, and image confusion, opening a new frontier in GAN applications.
Avdhi Pagariya, Riddhi Jain
International Journal of Advanced Research in Science, Communication and Technology
A deep learning model is presented to describe images and generate captions using computer vision and machine translation to detect different objects found in an image, recognize the relationships between those objects and generate captions.
S. Vinothkumar, S. Varadhaganapathy, R. Shanthakumari + 3 more
2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)
With the rapid evolution of generative artificial intelligence (AI), this research project delves into the realm of Text to Image generation. Leveraging advanced neural network architectures, the study explores the synthesis of visual content from textual descriptions. The project not only investigates the technical intricacies of the generative models involved but also delves into the potential applications across various domains such as creative content creation, design, and multimedia enhancement. The methodology encompasses a comprehensive examination of state-of-the-art techniques in Text...
Reshma S
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
One of the approaches identified in this study is Cross-modal Semantic Matching Generative Adversarial Networks (CSM-GAN), which is used to increase semantic consistency between text descriptions and synthesised pictures for fine-grained text- to-image creation.
Yu Zhao, Hao Fei, Xiangtai Li + 6 more
journal unavailable
This work considers modeling the SI2T and ST2I together under a dual learning framework, and proposes the Spatial Dual Discrete Diffusion (SD) framework, which utilizes the intermediate features of the 3D processes to guide the hard X$\to$3D processes, such that the overall ST2I and SI2T will benefit each other.
Nikita Srivatsan, Sofía Samaniego, Omar Florez + 1 more
ArXiv
This work presents an approach for generating alternative text (or alt-text) descriptions for images shared on social media, specifically Twitter, and is the first to their knowledge that incorporates textual information from the associated social media post into the prefix as well.
Zhaorui Tan, Zihan Ye, Qiufeng Wang + 4 more
SSRN Electronic Journal
This paper makes a further step forward to develop a novel CLIP-based metric termed as Semantic Similarity Distance (SSD), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets.
A. Ramesh, Mikhail Pavlov, Gabriel Goh + 5 more
ArXiv
This work describes a simple approach based on a transformer that autoregressively models the text and image tokens as a single stream of data that is competitive with previous domain-specific models when evaluated in a zero-shot fashion.
Guojun Yin, Bin Liu, Lu Sheng + 3 more
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
A novel photo-realistic text-to-image generation model that implicitly disentangles semantics to both fulfill the high- level semantic consistency and low-level semantic diversity and a visual-semantic embedding strategy by semantic-conditioned batch normalization to find diverse low- level semantics.
Y. Hao, Zewen Chi, Li Dong + 1 more
ArXiv
This work proposes prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts, and defines a reward function that encourages the policy to generate more aesthetically pleasing images while preserving the original user intentions.
Shagan Sah, D. Peri, Ameya Shringi + 4 more
2018 25th IEEE International Conference on Image Processing (ICIP)
Qualitative and quantitative evaluations demonstrate that MMVR improves upon existing text conditioned image generation results by over 20%, while integrating visual and text modalities.
Nathan A. Fotedar, Julia H. Wang
journal unavailable
A variational autoencoder model with transformers is explored that suggests that the use of GANs might be key to the text-to-image generation task.
Yufan Zhou, Bingchen Liu, Yizhe Zhu + 3 more
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Corgi is based on the proposed shifted diffusion model, which achieves better image embedding generation from input text, and achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks.
Kilichbek Haydarov, Aashiq Muhamed, Xiaoqian Shen + 4 more
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
This paper proposes a word-level attention-based weight modulation operator that controls the generation process of INR-GAN based on hypernetworks and shows that HyperCGAN achieves competitive performance to existing pixel-based methods and retains the properties of continuous generative models.
I intend to discuss a few problems from the art historical point of view. Questions about the competence of the subject for both the large fields of word and image must inevitably be raised. Linguistics, literary criticism and the philosophy of language are responsible for the domain of language; they are also competent for a part of the field of the image. Art history needs language as a descriptive and representational medium, but language is not the object of its discipline. Further: even for the large area of the image only a small part is art history's field of research. According to Mitc...
Leigang Qu, Haochuan Li, Tan Wang + 4 more
ArXiv
This work rethink the relationship between text-to-image generation and retrieval and propose a unified framework in the context of Multimodal Large Language Models (MLLMs), and introduces a generative retrieval method to perform retrieval in a training-free manner.
J. Oppenlaender, Aku Visuri, Ville Paananen + 2 more
ArXiv
The study found that participants were aware of the risks and dangers associated with the technology, but only few participants considered the technology to be a risk to themselves, and those who had tried the technology rated its future importance lower than those whoHad not.
Bowen Li, Philip H. S. Torr, Thomas Lukasiewicz
journal unavailable
Experimental results demonstrate that the proposed memory-driven semi-parametric approach to text-to-image generation produces more realistic images than purely parametric approaches, in terms of both visual fidelity and text-image semantic consistency.
Thibault Castells, Hyoung-Kyu Song, Tairen Piao + 6 more
ArXiv
Through the thorough exploration of quantization, profiling, and on-device deployment, this work achieves rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices.
Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu + 1 more
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
A customization assistant based on pre-trained large language model and diffusion model is built, which can not only perform customized generation in a tuning-free manner, but also enable more user-friendly interactions: users can chat with the assistant and input ambiguous text or clear instruction.
Sharma Tripti, Neetu Anand, K. Gaurav + 1 more
International Journal of Next-Generation Computing
A model is created for blind people that can guide and support them while traveling on the highways just with the help of a smartphone application. This can be accomplished by first converting the scene in front of the user into text and then converting text into voice output. Then a method for the generation of image legends based on deep neural networks. With an image as an entry, the method can display an English sentence describing the contents of the image. The user first provides a voice command, then a quick snapshot is captured by the camera or webcam. This image is then fed as input...
J. Oppenlaender, Johanna M. Silvennoinen, Ville Paananen + 1 more
Proceedings of the 26th International Academic Mindtrek Conference
It is found that while participants were aware of the risks and dangers associated with the technology, only few participants considered the technology to be a personal risk, which shows that many people are still oblivious of the potential personal risks of generative artificial intelligence and the impending societal changes associated with this technology.
Scott E. Reed, Zeynep Akata, Xinchen Yan + 3 more
journal unavailable
A novel deep architecture and GAN formulation is developed to effectively bridge advances in text and image modeling, translating visual concepts from characters to pixels.
Syed sha alam.A, J. N, Mohamed faiz ali B + 1 more
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
This paper presents Perceptual Image Compression, a text-to-image technology that will enable billions of individuals to produce beautiful works of art in in a few seconds and the open ecosystem that will grow up around it as well as new models to really probe the limits of latent space.
Huixian Zhang, Shuhui Jiang, Y. Fu
2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)
A novel spatial-style loss is investigated that enables the generator to learn style distribution from text-related real images, even if shapes and postures are misaligned, and outperforms state-of-the-art methods quantitatively and qualitatively.
Ronit Sawant, Asadullah Shaikh, Sunil Sabat + 1 more
SSRN Electronic Journal
This work presents a new retrieval system based on the generative adversarial network that will improve performance by easy processes on criminal face generation that extract facial traits from text descriptions and create a realistic human face.
An overview of the techniques that make text-to-image generators possible and how advances in machine learning and the particular technique of diffusion are critical building blocks for artificial intelligence is shown.
Anish J. Jain, Diti Modi, Rudra Jikadra + 1 more
2019 6th International Conference on Computing for Sustainable Global Development (INDIACom)
This paper proposes an approach a framework that will accept text input from the user about the fashion pattern and the model will generate images of fashion clothing based on the text input, which can assist people be their own designers for creating a range offashion clothing for themselves using the power of Deep Learning and Generative Adversarial Networks.
Vadik Amar, Sonu, Hatesh Shyan
2023 International Conference on Integrated Intelligence and Communication Systems (ICIICS)
The stable training dynamics offered by this framework empower developers to explore various text-conditional generation tasks beyond mere realism, opening the door to generating images that evoke specific emotional responses, aligning AI-generated visuals with human intentions and artistic vision.
A. Bogatenkova, O. Belyaeva, A. Perminov
Proceedings of the Institute for System Programming of the RAS
The impact of various methods of generating additional training datasets on the quality of recognition models are explored: the method based on handwritten fonts, the StackMix method of gluing words from symbols, and the use of a generative adversarial network are explored.
Federico Bianchi, Pratyusha Kalluri, Esin Durmus + 7 more
journal unavailable
This research underscores the urgent need for policymakers to address the harms resulting from the mass dissemination of stereotypes through major text-to-image AI models.
By comparing five different methods base on the Generative Adversarial Networks to make image from the text, the best model for this problem is found by comparing these different approaches essential metrics.
Masato Osugi, Danilo Vasconcellos Vargas
2022 Tenth International Symposium on Computing and Networking Workshops (CANDARW)
A new image generation method that uses segmentation and text as input that is capable of handling complex layouts and maintaining natural object structure even with a large number of objects is proposed and validated.
Polina Kuznetsova, Vicente Ordonez, A. Berg + 2 more
journal unavailable
This work introduces the new task of image caption generalization, formulated as visually-guided sentence compression, and presents an efficient algorithm based on dynamic beam search with dependency-based constraints and releases a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text.