Multimodal Food Image Classification with Large Language Models

Kim, Jun-Hwa; Kim, Nam-Ho; Jo, Donghyeok; Won, Chee Sun

Detailed Information

Cited 1 time in webofscience

Cited 2 time in scopus

Metadata Downloads

Multimodal Food Image Classification with Large Language Modelsopen access

Authors: Kim, Jun-Hwa; Kim, Nam-Ho; Jo, Donghyeok; Won, Chee Sun

Issue Date: Nov-2024

Publisher: MDPI

Keywords: food image classification; fine-grained visual classification; multimodal image feature; large language model; deep learning

Citation: Electronics, v.13, no.22, pp 1 - 10

Pages: 10

Indexed: SCIE
SCOPUS

Journal Title: Electronics

Volume: 13

Number: 22

Start Page: 1

End Page: 10

URI: https://scholarworks.dongguk.edu/handle/sw.dongguk/56353

DOI: 10.3390/electronics13224552

ISSN: 2079-9292
2079-9292

Abstract: In this study, we leverage advancements in large language models (LLMs) for fine-grained food image classification. We achieve this by integrating textual features extracted from images using an LLM into a multimodal learning framework. Specifically, semantic textual descriptions generated by the LLM are encoded and combined with image features obtained from a transformer-based architecture to improve food image classification. Our approach employs a cross-attention mechanism to effectively fuse visual and textual modalities, enhancing the model's ability to extract discriminative features beyond what can be achieved with visual features alone.

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > Department of Electronics and Electrical Engineering > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE