Augmentations to improve rare bird call classification for a highly imbalanced multi-label soundscape environment

Published:

Automated bioacoustics monitoring with machine learning (ML) can provide ecologists and wildlife conservationists the means to better understand ecological patterns and species-specific vocal behavioral responses to natural and anthropogenic events. There have been several innovations in automated recognition using deep learning, ranging from classification of regionally rare birds to livestock, amphibians, aquatic mammals and bats. However, there remains a large potential to improve the automated classification of multiple avian species in soundscape recordings with no distinction between foreground and background annotated labels. In this study, we present a deep learning solution to classify multiple bird vocalizations in a multi-label multi-species soundscape environment without a clear distinction between foreground and background species. Specifically, we focus on testing the effectiveness of various data augmentation methods to improve the classification of rare bird calls against some of the key challenges typical to a soundscape dataset - multiple overlapping bird calls, high environmental noise and high class imbalance. Our training data, collected from the Western Ghats and central India, comprises over 80 hours of 10 second annotated audio recordings that span 139 bird species. We train our model by fine-tuning a deep learning neural network that allows us to maximize performance at the cost of reduced training data and time. We employed various raw audio and spectrogram based data augmentation methods such as pitch and time shifting, frequency and time masking, additive white noise, time stretching and audio mixing. Further, we modified an existing augmentation technique of mixing bird sounds to meet the requirements for our multi-label classification objective. We monitor the results separately for all 139 avian classes including 40 custom defined rare classes as well as top 10 and bottom 10 individual bird species by each landscape. Using the F1 score for model performance evaluation, we find that employing the chosen data augmentations for our dataset yields improvements of 8% for all target labels and 25% for rare labels over the baseline results. There have also been notable gains for many rare species with less than 100 training samples such as Great Hornbill (15%), Nilgiri Flycatcher (35%) and Flame Throated Bulbul (27%) among several others. Our multi-label soundscape dataset and augmentation methods serve as a benchmark for future research work and can be adopted easily across acoustic domains.

This study is still in progress and as such the code or dataset has not been made publicly available yet.