Cited 2 time in
Multimodal Food Image Classification with Large Language Models
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Kim, Jun-Hwa | - |
| dc.contributor.author | Kim, Nam-Ho | - |
| dc.contributor.author | Jo, Donghyeok | - |
| dc.contributor.author | Won, Chee Sun | - |
| dc.date.accessioned | 2024-12-10T00:00:14Z | - |
| dc.date.available | 2024-12-10T00:00:14Z | - |
| dc.date.issued | 2024-11 | - |
| dc.identifier.issn | 2079-9292 | - |
| dc.identifier.issn | 2079-9292 | - |
| dc.identifier.uri | https://scholarworks.dongguk.edu/handle/sw.dongguk/56353 | - |
| dc.description.abstract | In this study, we leverage advancements in large language models (LLMs) for fine-grained food image classification. We achieve this by integrating textual features extracted from images using an LLM into a multimodal learning framework. Specifically, semantic textual descriptions generated by the LLM are encoded and combined with image features obtained from a transformer-based architecture to improve food image classification. Our approach employs a cross-attention mechanism to effectively fuse visual and textual modalities, enhancing the model's ability to extract discriminative features beyond what can be achieved with visual features alone. | - |
| dc.format.extent | 10 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | MDPI | - |
| dc.title | Multimodal Food Image Classification with Large Language Models | - |
| dc.type | Article | - |
| dc.publisher.location | 스위스 | - |
| dc.identifier.doi | 10.3390/electronics13224552 | - |
| dc.identifier.scopusid | 2-s2.0-85210254120 | - |
| dc.identifier.wosid | 001364377100001 | - |
| dc.identifier.bibliographicCitation | Electronics, v.13, no.22, pp 1 - 10 | - |
| dc.citation.title | Electronics | - |
| dc.citation.volume | 13 | - |
| dc.citation.number | 22 | - |
| dc.citation.startPage | 1 | - |
| dc.citation.endPage | 10 | - |
| dc.type.docType | Article | - |
| dc.description.isOpenAccess | Y | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalResearchArea | Engineering | - |
| dc.relation.journalResearchArea | Physics | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
| dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
| dc.relation.journalWebOfScienceCategory | Physics, Applied | - |
| dc.subject.keywordAuthor | food image classification | - |
| dc.subject.keywordAuthor | fine-grained visual classification | - |
| dc.subject.keywordAuthor | multimodal image feature | - |
| dc.subject.keywordAuthor | large language model | - |
| dc.subject.keywordAuthor | deep learning | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114
Copyright(c) 2023 DONGGUK UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
