Detailed Information

Cited 1 time in webofscience Cited 2 time in scopus
Metadata Downloads

Multimodal Food Image Classification with Large Language Modelsopen access

Authors
Kim, Jun-HwaKim, Nam-HoJo, DonghyeokWon, Chee Sun
Issue Date
Nov-2024
Publisher
MDPI
Keywords
food image classification; fine-grained visual classification; multimodal image feature; large language model; deep learning
Citation
Electronics, v.13, no.22, pp 1 - 10
Pages
10
Indexed
SCIE
SCOPUS
Journal Title
Electronics
Volume
13
Number
22
Start Page
1
End Page
10
URI
https://scholarworks.dongguk.edu/handle/sw.dongguk/56353
DOI
10.3390/electronics13224552
ISSN
2079-9292
2079-9292
Abstract
In this study, we leverage advancements in large language models (LLMs) for fine-grained food image classification. We achieve this by integrating textual features extracted from images using an LLM into a multimodal learning framework. Specifically, semantic textual descriptions generated by the LLM are encoded and combined with image features obtained from a transformer-based architecture to improve food image classification. Our approach employs a cross-attention mechanism to effectively fuse visual and textual modalities, enhancing the model's ability to extract discriminative features beyond what can be achieved with visual features alone.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > Department of Electronics and Electrical Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE