Realistic Image Generation from Text by Using BERT-Based Embeddingopen access
- Authors
- Na, Sanghyuck; Do, Mirae; Yu, Kyeonah; Kim, Juntae
- Issue Date
- Mar-2022
- Publisher
- MDPI
- Keywords
- text to image generation; multimodal data; BERT; GAN
- Citation
- Electronics, v.11, no.5, pp 1 - 11
- Pages
- 11
- Indexed
- SCIE
SCOPUS
- Journal Title
- Electronics
- Volume
- 11
- Number
- 5
- Start Page
- 1
- End Page
- 11
- URI
- https://scholarworks.dongguk.edu/handle/sw.dongguk/3543
- DOI
- 10.3390/electronics11050764
- ISSN
- 2079-9292
2079-9292
- Abstract
- Recently, in the field of artificial intelligence, multimodal learning has received a lot of attention due to expectations for the enhancement of AI performance and potential applications. Text-to-image generation, which is one of the multimodal tasks, is a challenging topic in computer vision and natural language processing. The text-to-image generation model based on generative adversarial network (GAN) utilizes a text encoder pre-trained with image-text pairs. However, text encoders pre-trained with image-text pairs cannot obtain rich information about texts not seen during pre-training, thus it is hard to generate an image that semantically matches a given text description. In this paper, we propose a new text-to-image generation model using pre-trained BERT, which is widely used in the field of natural language processing. The pre-trained BERT is used as a text encoder by performing fine-tuning with a large amount of text, so that rich information about the text is obtained and thus suitable for the image generation task. Through experiments using a multimodal benchmark dataset, we show that the proposed method improves the performance over the baseline model both quantitatively and qualitatively.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Advanced Convergence Engineering > Department of Computer Science and Artificial Intelligence > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.