Courtesy of the NOAA

November 9, 2022

Cornell Researcher Builds Groundbreaking Machine Learning Toolkit For Bioacoustics

Print More

A recent breakthrough has been made in bio-acoustic deep learning techniques — a method for automated detection of animal sounds — at Cornell’s K. Lisa Yang Center for Conservation Bioacoustics. Dr. Shyam Madhusudhana, a postdoctoral researcher in the Lab of Ornithology, built a toolkit enabling bio-acousticians to create complex audio recognition models with just a few lines of code.

The toolkit, Koogu, was used in a recent study that bested marine analysts in the detection of blue whale D-calls.

Blue whale D-calls are calls of varying frequency that are produced by male and female whales, unlike the well-known whale song which is produced only by males. While whale songs are often predictable and easily recognizable, D-calls are erratic and produced less repetitively. 

However, while blue whale D-calls are more difficult to identify, monitoring their presence allows for a much better understanding of their migration patterns and acoustic behaviors.

Acoustic monitoring has long been pursued as a viable method of recording rare species which lack sufficient visual data. In recent years, machine learning algorithms have demonstrated promising results in analyzing acoustic monitoring data. In the marine biome, where visual surveys are hardly feasible, this method becomes all the more relevant in the effort to track the movement and habits of different aquatic species.

This is where Koogu comes in. 

“As long as someone has their own annotated data set [of acoustic monitoring], they could take Koogu and build a model of their own,” Madhusudhana said.

This methodology was adopted by a team of researchers at the Australian Antarctic Division, led by Brian Miller. The researchers used Koogu to build an automated detection model for their study of blue whale calls.

Their study, which was co-authored by Madhusudhana, is titled “Deep learning algorithm outperforms experienced human observer at detection of blue whale calls: a double-observer analysis”. It found that human experts detected 70 percent of the D-calls whereas the model detected 90 percent of the whale calls accurately. The model’s rate of detection was also considerably faster than the marine analysts, lacking the fatigue factor associated with human analysis.

The study is only the first instance where Koogu has been used effectively. However, according to Madhusudhana, Koogu is far from limited to only detecting marine auditory data. 

“Koogu isn’t a toolkit just for whale calls – [it is] just a convenient way to build machine learning solutions – anything from whales to birds as well as insects,” Madhusudhana said.

Koogu has the potential to be an impactful tool in the bioacoustics field. While there has been significant development in the machine learning domain, most of the development in the acoustics domain relates to human speech. Madhusudhana said Koogu bridges the gap between the two.

“If you’re looking at a visual representation of audio – like a spectrogram – you could treat it as an image and apply image classification techniques on it,” Madhusudhana said.

Koogu transforms acoustic data into a form that visual classification machine learning models can use. Madhusudhana ensured that most of the model was left configurable. Any bio-acoustic expert could vary the parameters and then modify how the audio is transformed to images. Following this, the images are classified using an image classification model.

“If you try to develop a neural network-based solution for bioacoustics, there are probably a few hundred lines of code needed. What I’ve done is [enabled you to] call three or four functions and you’re done,” Madhusudhana said.

The goal was for bio-acousticians and other researchers to be able to use their own data and domain knowledge and combine it with Koogu’s functionality to efficiently analyze sounds. Koogu’s unique relevance lies in its audio-to-image conversion process.

As Madhusudhana explains, every sound is turned into a colored map to easily distinguish one audio signal from the other. When compressing this into an image for image classification, there is significant data loss that occurs. Koogu avoids this data loss, greatly increasing accuracy.

This advantage is especially apparent in audio recordings with low or moderate intensity. Such recordings make blue whale calls harder to detect – especially in the case of human experts. 

The open-source toolkit for universal audio recognition has significantly streamlined the process of automated acoustic recognition.

But whale acoustic monitoring is just one part of the equation, according to Madhusudhana.  “Our goal is to conserve biodiversity across species – that was Koogu’s goal – [to] have something very generic that everyone across the world can use.”