Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Pretrained patient trajectories for adverse drug event prediction using common data model-based electronic health recordsopen access

Authors
Kim, JunmoKim, Joo SeongLee, Ji-HyangKim, Min-GyuKim, TaehyunCho, ChaeeunPark, Rae WoongKim, Kwangsoo
Issue Date
Jun-2025
Publisher
Springer Nature
Citation
Communications Medicine, v.5, no.1
Indexed
SCOPUS
ESCI
Journal Title
Communications Medicine
Volume
5
Number
1
URI
https://scholarworks.dongguk.edu/handle/sw.dongguk/58584
DOI
10.1038/s43856-025-00914-7
ISSN
2730-664X
2730-664X
Abstract
BackgroundPretraining electronic health record (EHR) data using language models has enhanced performance across various medical tasks. Despite the potential of EHR pretraining models, predicting adverse drug events (ADEs) using EHR pretraining models has not been explored.MethodsWe used observational medical outcomes partnership common data model (CDM)-based EHR data from Seoul National University Hospital (SNUH) between January 2001 and December 2023 and Ajou University Medical Center (AUMC) between January 2004 and December 2023. In total 510,879 and 419,505 adult inpatients from SNUH and AUMC are included in internal and external datasets. For pretraining, the model was trained to infer randomly masked tokens using preceding and following history. In this process, we introduced domain embedding (DE) to provide information about the domain of masked tokens, preventing the model from finding codes from irrelevant domains. For qualitative analysis, we identified important features using the attention matrix from each finetuned model.ResultsHere we show that EHR pretraining models with DE outperform the models without pretraining and DE in predicting various ADEs, with the average area under the receiver operating characteristic curve (AUROC) of 0.958 and 0.964 in internal and external validations, respectively. For feature importance analysis, we demonstrate that the results are consistent with priorly reported background clinical knowledge. In addition to cohort-level interpretation, patient-level interpretation is also available.ConclusionsThe CDM-based EHR pretraining model with DE can improve prediction performance for various ADEs and can provide proper explanation at cohort and patient level. Our model has the potential to serve as a foundation model due to its strong prediction performance, interpretability, and compatibility.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School > Department of Medicine > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE