Home / Papers / InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

DOI: 10.1109/cvpr52733.2024.00816Source

104 Citations•2024•

Shi Jing, Wei Xiong, Zhe Lin

journal unavailable

InstantBooth is an innovative approach leveraging existing text-to-image models for instantaneous text-guided image personalization, eliminating the need for test-time finetuning and boasting a 100-fold increase in generation speed.

Abstract

Recent advances in personalized image generation have enabled pre-trained text-to-image models to learn new concepts from specific image sets. However, these methods often necessitate extensive test-time finetuning for each new concept, leading to inefficiencies in both time and scalability. To address this challenge, we introduce Instant-Booth, an innovative approach leveraging existing text-to-image models for instantaneous text-guided image personalization, eliminating the need for test-time finetuning. This efficiency is achieved through two primary innovations. Firstly, we utilize an image encoder that transforms input images into a global embedding to grasp the general concept. Secondly, we integrate new adapter layers into the pre-trained model, enhancing its ability to capture intricate identity details while maintaining language coherence. Significantly, our model is trained exclusively on textimage pairs, without reliance on concept-specific paired images. When benchmarked against existing finetuning-based personalization techniques like DreamBooth and TextualInversion, InstantBooth not only shows comparable proficiency in aligning language with image, maintaining image quality, and preserving the identity but also boasts a 100-fold increase in generation speed. Project Page: https://jshi31.github.io/InstantBooth/