3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classification

Qiu, Lvyang; Li, Shuyu; Sung, Yunsick

Detailed Information

Cited 12 time in webofscience

Cited 16 time in scopus

Metadata Downloads

3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classificationopen access

Authors: Qiu, Lvyang; Li, Shuyu; Sung, Yunsick

Issue Date: Sep-2021

Publisher: MDPI

Keywords: music genre classification; MIDI; autoencoder model; 3D CNN; unsupervised learning

Citation: MATHEMATICS, v.9, no.18

Indexed: SCIE
SCOPUS

Journal Title: MATHEMATICS

Volume: 9

Number: 18

URI: https://scholarworks.dongguk.edu/handle/sw.dongguk/4542

DOI: 10.3390/math9182274

ISSN: 2227-7390
2227-7390

Abstract: With unlabeled music data widely available, it is necessary to build an unsupervised latent music representation extractor to improve the performance of classification models. This paper proposes an unsupervised latent music representation learning method based on a deep 3D convolutional denoising autoencoder (3D-DCDAE) for music genre classification, which aims to learn common representations from a large amount of unlabeled data to improve the performance of music genre classification. Specifically, unlabeled MIDI files are applied to 3D-DCDAE to extract latent representations by denoising and reconstructing input data. Next, a decoder is utilized to assist the 3D-DCDAE in training. After 3D-DCDAE training, the decoder is replaced by a multilayer perceptron (MLP) classifier for music genre classification. Through the unsupervised latent representations learning method, unlabeled data can be applied to classification tasks so that the problem of limiting classification performance due to insufficient labeled data can be solved. In addition, the unsupervised 3D-DCDAE can consider the musicological structure to expand the understanding of the music field and improve performance in music genre classification. In the experiments, which utilized the Lakh MIDI dataset, a large amount of unlabeled data was utilized to train the 3D-DCDAE, obtaining a denoising and reconstruction accuracy of approximately 98%. A small amount of labeled data was utilized for training a classification model consisting of the trained 3D-DCDAE and the MLP classifier, which achieved a classification accuracy of approximately 88%. The experimental results show that the model achieves state-of-the-art performance and significantly outperforms other methods for music genre classification with only a small amount of labeled data.

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Advanced Convergence Engineering > Department of Computer Science and Artificial Intelligence > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Sung, Yunsick photo

Sung, Yunsick: College of Advanced Convergence Engineering (Department of Computer Science and Artificial Intelligence)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE