Portfolio
ChirpNet - A Neural Bird Sound Synthesis Model
Evanston, 2022
Automated species recognition through Artificial Intelligence (AI) can provide ecologists and wildlife conservationists the means to detect and monitor vocalizing avian species of interest. However, AI systems rely on a multitude of data to learn the inductive biases and to generalize to the vocalizations of bird species. This is problematic when classifying rare bird species which have limited presence in naturally occurring audio data. In this project, we try to address this problem by generating synthetic avian audio samples as a type of augmentation that can be used in future works to improve the bird sound classification accuracy.
Click here to read more about this work and listen to the generated samples.
Bird Sound Denoiser
Evanston, 2022
In this project, I applied Facebook’s Demucs - state-of-the-art audio denoiser to remove noisy environmental background sounds from a bird sound dataset. These sounds consist of insects such as crickets, rain, wind, machines, vehicles, etc. which were synthetically added in to a clean set through a source separation task.
Click here to read more about this work and listen to the generated samples.
Audioneme - Detecting Speech Disorder in Children
Evanston, 2022
There has been an increased interest in developing automated methods that quantify speech patterns for diagnosing speech disorders in children. In this project, I fine-tuned Facebook’s Wav2Vec2 on child speech data in conjunction with the utterance transcriptions to automate screening and assessment of speech disorders and speech intelligibility in children. The dataset for this project consisted of weakly labeled utterances comprising ~15,000 recordings of children with and without speech disorder.
Click here to read more about this work.
Efficient Scaling and Pre-Training of Language Models with Electra and Reformers
Evanston, 2021
In this paper, we repurpose a highly efficient Reformer encoder architecture to serve as the foundational blocks for the Electra pre-training methodology, thereby allowing the network to scale to 8 times the size of its transformer counterpart while maintaining the same memory requirements. The subsequent downstream performance of this scaled up architecture is at par with the transformer based Electra benchmark, while being pre-trained using only a third of the data.
Click here to read more about this work.
Augmentations to improve rare bird call classification for a highly imbalanced multi-label soundscape environment
Evanston, 2021
In this study, we present a deep learning solution to classify multiple bird vocalizations in a multi-label multi-species soundscape environment without a clear distinction between foreground and background species. Specifically, we focus on testing the effectiveness of various data augmentation methods to improve the classification of rare bird calls against some of the key challenges typical to a soundscape dataset - multiple overlapping bird calls, high environmental noise and high class imbalance.
Click here to read more about this work.