Multimodal Food Image Classification with Large Language Models

Kim, Jun-Hwa; Kim, Nam-Ho; Jo, Donghyeok; Won, Chee Sun

Detailed Information

Cited 1 time in webofscience

Cited 2 time in scopus

Metadata Downloads

Multimodal Food Image Classification with Large Language Models

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Jun-Hwa	-
dc.contributor.author	Kim, Nam-Ho	-
dc.contributor.author	Jo, Donghyeok	-
dc.contributor.author	Won, Chee Sun	-
dc.date.accessioned	2024-12-10T00:00:14Z	-
dc.date.available	2024-12-10T00:00:14Z	-
dc.date.issued	2024-11	-
dc.identifier.issn	2079-9292	-
dc.identifier.issn	2079-9292	-
dc.identifier.uri	https://scholarworks.dongguk.edu/handle/sw.dongguk/56353	-
dc.description.abstract	In this study, we leverage advancements in large language models (LLMs) for fine-grained food image classification. We achieve this by integrating textual features extracted from images using an LLM into a multimodal learning framework. Specifically, semantic textual descriptions generated by the LLM are encoded and combined with image features obtained from a transformer-based architecture to improve food image classification. Our approach employs a cross-attention mechanism to effectively fuse visual and textual modalities, enhancing the model's ability to extract discriminative features beyond what can be achieved with visual features alone.	-
dc.format.extent	10	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	MDPI	-
dc.title	Multimodal Food Image Classification with Large Language Models	-
dc.type	Article	-
dc.publisher.location	스위스	-
dc.identifier.doi	10.3390/electronics13224552	-
dc.identifier.scopusid	2-s2.0-85210254120	-
dc.identifier.wosid	001364377100001	-
dc.identifier.bibliographicCitation	Electronics, v.13, no.22, pp 1 - 10	-
dc.citation.title	Electronics	-
dc.citation.volume	13	-
dc.citation.number	22	-
dc.citation.startPage	1	-
dc.citation.endPage	10	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Physics	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Physics, Applied	-
dc.subject.keywordAuthor	food image classification	-
dc.subject.keywordAuthor	fine-grained visual classification	-
dc.subject.keywordAuthor	multimodal image feature	-
dc.subject.keywordAuthor	large language model	-
dc.subject.keywordAuthor	deep learning	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > Department of Electronics and Electrical Engineering > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE