Detailed Information

Cited 8 time in webofscience Cited 12 time in scopus
Metadata Downloads

Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning

Full metadata record
DC Field Value Language
dc.contributor.authorLee, Hojun-
dc.contributor.authorCho, Hyunjun-
dc.contributor.authorPark, Jieun-
dc.contributor.authorChae, Jinyeong-
dc.contributor.authorKim, Jihie-
dc.date.accessioned2023-04-27T13:40:40Z-
dc.date.available2023-04-27T13:40:40Z-
dc.date.issued2022-02-
dc.identifier.issn1424-8220-
dc.identifier.issn1424-8220-
dc.identifier.urihttps://scholarworks.dongguk.edu/handle/sw.dongguk/3672-
dc.description.abstractTransformer-based approaches have shown good results in image captioning tasks. However, current approaches have a limitation in generating text from global features of an entire image. Therefore, we propose novel methods for generating better image captioning as follows: (1) The Global-Local Visual Extractor (GLVE) to capture both global features and local features. (2) The Cross Encoder-Decoder Transformer (CEDT) for injecting multiple-level encoder features into the decoding process. GLVE extracts not only global visual features that can be obtained from an entire image, such as size of organ or bone structure, but also local visual features that can be generated from a local region, such as lesion area. Given an image, CEDT can create a detailed description of the overall features by injecting both low-level and high-level encoder outputs into the decoder. Each method contributes to performance improvement and generates a description such as organ size and bone structure. The proposed model was evaluated on the IU X-ray dataset and achieved better performance than the transformer-based baseline results, by 5.6% in BLEU score, by 0.56% in METEOR, and by 1.98% in ROUGE-L.-
dc.format.extent13-
dc.language영어-
dc.language.isoENG-
dc.publisherMDPI-
dc.titleCross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning-
dc.typeArticle-
dc.publisher.location스위스-
dc.identifier.doi10.3390/s22041429-
dc.identifier.scopusid2-s2.0-85124348434-
dc.identifier.wosid000765175400001-
dc.identifier.bibliographicCitationSensors, v.22, no.4, pp 1 - 13-
dc.citation.titleSensors-
dc.citation.volume22-
dc.citation.number4-
dc.citation.startPage1-
dc.citation.endPage13-
dc.type.docTypeArticle-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaChemistry-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaInstruments & Instrumentation-
dc.relation.journalWebOfScienceCategoryChemistry, Analytical-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalWebOfScienceCategoryInstruments & Instrumentation-
dc.subject.keywordAuthormedical image captioning-
dc.subject.keywordAuthordeep learning-
dc.subject.keywordAuthortransformer-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Advanced Convergence Engineering > Department of Computer Science and Artificial Intelligence > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Ji Hie photo

Kim, Ji Hie
College of Advanced Convergence Engineering (Department of Computer Science and Artificial Intelligence)
Read more

Altmetrics

Total Views & Downloads

BROWSE