Singing Voice Detection Based on a Deeper Convolutional Neural Network

Abstract

Singing voice detection is a fundamental task in music information retrieval,which benefits other tasks such as singing voice separation. We propose a new algorithm based on a deeper convolution neural network,fed with the logarithmic and mel-scaled spectrogram,to exact and integrate the features of the different layers of the network and to discriminate the singing voice finally. We demonstrate that this deeper network can produce good performances and be designed efficiently to some extent. The experiments are based on the public datasets: Jamendo,Mir1k,RWC pop,and their combined dataset. We also studied what depth of the network is suitable for this task. The experiments show that the optimal depth on the four public datasets is 152.

Publication
Proceedings of the 3rd International Symposium on Automation,Information and Computing (ISAIC)