Deformer: Denoising Transformer for Improved Audio Music Genre Classification
Citations

WEB OF SCIENCE

8
Citations

SCOPUS

10

초록

Audio music genre classification is performed to categorize audio music into various genres. Traditional approaches based on convolutional recurrent neural networks do not consider long temporal information, and their sequential structures result in longer training times and convergence difficulties. To overcome these problems, a traditional transformer-based approach was introduced. However, this approach employs pre-training based on momentum contrast (MoCo), a technique that increases computational costs owing to its reliance on extracting many negative samples and its use of highly sensitive hyperparameters. Consequently, this complicates the training process and increases the risk of learning imbalances between positive and negative sample sets. In this paper, a method for audio music genre classification called Deformer is proposed. The Deformer learns deep representations of audio music data through a denoising process, eliminating the need for MoCo and additional hyperparameters, thus reducing computational costs. In the denoising process, it employs a prior decoder to reconstruct the audio patches, thereby enhancing the interpretability of the representations. By calculating the mean squared error loss between the reconstructed and real patches, Deformer can learn a more refined representation of the audio data. The performance of the proposed method was experimentally compared with that of two distinct baseline models: one based on S3T and one employing a residual neural network-bidirectional gated recurrent unit (ResNet-BiGRU). The Deformer achieved an 84.5% accuracy, surpassing both the ResNet-BiGRU-based (81%) and S3T-based (81.1%) models, highlighting its superior performance in audio classification.

키워드

music information retrievalgenre classificationpre-trainingtransformeraudio music
제목
Deformer: Denoising Transformer for Improved Audio Music Genre Classification
저자
Wang, JigangLi, ShuyuSung, Yunsick
DOI
10.3390/app132312673
발행일
2023-12
유형
Article
저널명
Applied Sciences
13
23
페이지
1 ~ 15