Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Type-based mixture of experts and semi-supervised multi-task pre-training for symbolic musicopen access

Authors
Li, ShuyuSung, Yunsick
Issue Date
Nov-2025
Publisher
Elsevier Ltd.
Keywords
Symbolic music; Semi-supervised learning; Multi-task; Mixture of experts; Pre-training; Fine-tuning
Citation
Expert Systems with Applications, v.292, pp 1 - 14
Pages
14
Indexed
SCIE
SCOPUS
Journal Title
Expert Systems with Applications
Volume
292
Start Page
1
End Page
14
URI
https://scholarworks.dongguk.edu/handle/sw.dongguk/58628
DOI
10.1016/j.eswa.2025.128613
ISSN
0957-4174
1873-6793
Abstract
In the rapidly evolving field of AI-driven music applications, there is a growing interest in the understanding and generation of symbolic music (e.g., MIDI). Symbolic music, unlike audio waveforms, contains discrete representations of musical elements, making it both a detailed and challenging domain for AI models to process. While pre-training techniques from natural language processing have been adapted for music-related tasks, these pre-trained models often struggle with the hierarchical and polyphonic characteristics of symbolic music. To overcome these problems, a method is proposed comprising two components, a foundational model named type-based mixture of experts (TypeMoE) and a semi-supervised multi-task pre-training (SS-MTP) strategy. Type-MoE captures fine-grained musical features more effectively by dynamically activating specialized experts for different event types, while SS-MTP covers tasks including key-signature recognition, time-signature recognition, and causal language modeling. Unlike purely self-supervised approaches, SS-MTP utilizes a small amount of labeled data alongside extensive unlabeled data, enabling structural representation learning and promoting efficient knowledge sharing across tasks. Experimental results showed that TypeMoE, when pre-trained with the SS-MTP strategy, outperformed baseline models in both music understanding and generation tasks. Specifically, it achieved 71.80% accuracy in genre classification and 76.79% in emotion classification. For music generation, it outperformed baselines with 54.24% Hits@1 and 0.7521 BLEU-2 in continue generation, and 75.79% Hits@1 and 0.8757 BLEU-2 in conditional generation. Additionally, it obtained a CLAP-based semantic alignment score of 0.24.
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Sung, Yunsick photo

Sung, Yunsick
College of Advanced Convergence Engineering (Department of Computer Science and Artificial Intelligence)
Read more

Altmetrics

Total Views & Downloads

BROWSE