Audio-only Bird Species Automated Identification Method with Limited Training Data Based on Multi-Channel Deep Convolutional Neural Networks
نویسندگان
چکیده
XIE Jiang-jian DING Chang-qing; LI Wen-bin; CAI Cheng-hao (1 School of Technology,Beijing Forestry University, Beijing, 100083, P. R. China. 2 School of Nature Conservation,Beijing Forestry University, Beijing, 100083, P. R. China.) Abstract 1. Deep convolutional neural networks (DCNN) have achieved breakthrough performance on bird species identification tasks based on spectrogram features, but a huge number of labeled samples are needed to train an excellent DCNN model. However, it is difficult to collect enough samples for certain bird species. For practical uses of bird species identification, it is significant to study a method which only requires a small sample set. 2. Transfer learning is an available solution to train deep learning models with limited samples. Based on the parameter transfer mode, we design a bird species identification model that uses the VGG-16 model (pretrained on ImageNet) for feature extraction, then a classifier consisting of two fully-connected hidden layers and a Softmax layer is attached. We take the vocalization signals of eighteen bird species which were recorded in Beijing Song-Shan National Nature Reserve as an example, comparing the performance of the transfer learning model with the original VGG16 model. The results show that the former has higher train efficiency, but lower mean average precisions (MAP). 3.To improve the MAP of the transfer learning model, we investigate two fusion modes to form multi-channel identification models. Then we evaluate the models on our own sample sets. We find that the result fusion mode outperforms the feature fusion mode, and the best MAP reaches 0.9998. The number of model parameters is 13110, which is only 0.0082% of the VGG16 model. Also, the size demand of sample is decreased. 4. The type and duration of spectrogram may affect the performance of identification. We choose three kinds of time frequency transformation methods, including Short Time Fourier Transform, Mel-frequency Cepstrum Transform and Chirplet Transform, to calculate the spectrogram. Then the spectrogram is segmented to the duration of 100ms, 300ms and 500ms. Chirplet spectrogram improves training efficiency and the MAP, which is more suitable for the feature representation of bird vocalization. We choose three durations, 100ms, 300ms and 500ms for comparison, the result reveals that the 300ms duration is the best. The duration should be determined based on the time domain characteristic of bird vocalization. Tweetable Abstract: Based on the transfer learning, we design a bird species identification model that uses the VGG-16 model (pretrained on ImageNet) for feature extraction, then a classifier consisting of two fully-connected hidden layers and a Softmax layer is attached. We compare the performance of the proposed model with the original VGG16 model. The results show that the former has higher train efficiency, but lower mean average precisions(MAP). To improve the MAP of the proposed model, we investigate the result fusion mode to form multi-channel identification model, the best MAP reaches 0.9998. The number of model parameters is 13110, which is only 0.0082% of the VGG16 model. Also, the size demand of sample is decreased.
منابع مشابه
Recognizing Bird Species in Audio Recordings using Deep Convolutional Neural Networks
This paper summarizes a method for purely audio-based bird species recognition through the application of convolutional neural networks. The approach is evaluated in the context of the LifeCLEF 2016 bird identification task an open challenge conducted on a dataset containing 34 128 audio recordings representing 999 bird species from South America. Three different network architectures and a sim...
متن کاملCombining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)
Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...
متن کاملConvolutional Neural Networks for Large-Scale Bird Song Classification in Noisy Environment
This paper describes a convolutional neural network based deep learning approach for bird song classification that was used in an audio record-based bird identification challenge, called BirdCLEF 2016. The training and test set contained about 24k and 8.5k recordings, belonging to 999 bird species. The recorded waveforms were very diverse in terms of length and content. We converted the wavefor...
متن کاملLarge-Scale Bird Sound Classification using Convolutional Neural Networks
Identifying bird species in audio recordings is a challenging field of research. In this paper, we summarize a method for large-scale bird sound classification in the context of the LifeCLEF 2017 bird identification task. We used a variety of convolutional neural networks to generate features extracted from visual representations of field recordings. The BirdCLEF 2017 training dataset consist o...
متن کاملA New Method to Improve Automated Classification of Heart Sound Signals: Filter Bank Learning in Convolutional Neural Networks
Introduction: Recent studies have acknowledged the potential of convolutional neural networks (CNNs) in distinguishing healthy and morbid samples by using heart sound analyses. Unfortunately the performance of CNNs is highly dependent on the filtering procedure which is applied to signal in their convolutional layer. The present study aimed to address this problem by a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018