Creating a database
In this tutorial we will use Ketos to build a database of North Atlantic Right Whale upcalls. The audio will be represented as spectrograms and can later be used to train a deep learning based classfier and build a right whale detector.
Note
You can download an executable version (Jupyter Notebook) of this tutorial, the data needed to follow along
here
.
Creating a training database¶
In this tutorial, we will use Ketos to create a database that can be used to train a deep learning classifier.
We will use a subset of the data described in Kirsebom et al. 2020. These data consist of 3-s long clips, some containing right whale upcalls and others containing only background noise. The clips are wave files extracted from recordings produced by bottom-mounted hydrophones in the Gulf of Saint Lawrence, Canada.
Our starting point will be a collection of .wav files accompanied by annotations. You can find them in the data
folder within the .zip file linked at the top of this page. In the train
folder, there are 2,000 files, half of them containing upcalls and the other half containing background noise (which, for our purpose, is any sound that is not an upcall. This includes sounds produced by other animals and the overall ambient noise). The annotations_train.csv
file contains the label attributed to each file: 1 for upcall, 0 for background. Similarly, the val
(validation) folder contains 200 .wav files (50% with upcalls) and is accompanied by the annotations_val.csv
file.
We will use Ketos to produce a database with spectrogram representations of the training and validation clips, so that we later can train a deep learning classifier to distinguish the upcalls from the other sounds. Eventually, we will use that classifier to build a detector.
A different scenario would be where you have audio recordings and annotations indicating where in these recording the signals of interest are, but you don't have clips of uniform length with examples of the target signal(s) and background. That case is covered in this tutorial.
We also encourage you to explore the documentation, since Ketos has a variety of tools that might help you to build training databases in different scenarios.
Contents:¶
1. Importing the packages
2. Loading the annotations
3. Putting the annotations in the Ketos format
4. Choosing the spectrogram settings
5. Creating the database
1. Importing the packages¶
For this tutorial we will use several modules within ketos. We will also the pandas to read our annotations files.
import pandas as pd
from ketos.data_handling import selection_table as sl
import ketos.data_handling.database_interface as dbi
from ketos.data_handling.parsing import load_audio_representation
from ketos.audio.spectrogram import MagSpectrogram
from ketos.data_handling.parsing import load_audio_representation
2. Loading the annotations¶
Our annotations are saved in two .csv
files: annotations_train.csv
and annotations_val.csv
, which we will use to create the training and validation datasets respectively.
annot_train = pd.read_csv("annotations_train.csv")
annot_val = pd.read_csv("annotations_val.csv")
Let's inspect our annotations
annot_train
sound_file | label | |
---|---|---|
0 | Old_Harry_ete_2018-06-13_052139_0.wav | 0 |
1 | Old_Harry_ete_2018-06-14_122355_0.wav | 0 |
2 | Old_Harry_ete_2018-06-15_030637_0.wav | 0 |
3 | Old_Harry_ete_2018-06-15_051054_0.wav | 0 |
4 | Old_Harry_ete_2018-06-15_071621_0.wav | 0 |
... | ... | ... |
1995 | Shediac_ete_2018-07-11_202226_1.wav | 1 |
1996 | Shediac_ete_2018-07-11_234110_0.wav | 0 |
1997 | Shediac_ete_2018-07-12_213604_0.wav | 0 |
1998 | Shediac_ete_2018-07-13_121216_1.wav | 1 |
1999 | Shediac_ete_2018-07-13_135427_0.wav | 0 |
2000 rows × 2 columns
annot_val
sound_file | label | |
---|---|---|
0 | Perce_ete_2018-06-10_093641_0.wav | 0 |
1 | Perce_ete_2018-06-10_150035_0.wav | 0 |
2 | Perce_ete_2018-06-11_144117_1.wav | 1 |
3 | Perce_ete_2018-06-11_211502_0.wav | 0 |
4 | Perce_ete_2018-06-12_091317_0.wav | 0 |
... | ... | ... |
195 | Shediac_ete_2018-07-10_070046_1.wav | 1 |
196 | Shediac_ete_2018-07-10_123759_0.wav | 0 |
197 | Shediac_ete_2018-07-10_131555_0.wav | 0 |
198 | Shediac_ete_2018-07-11_100911_0.wav | 0 |
199 | Shediac_ete_2018-07-13_140736_1.wav | 1 |
200 rows × 2 columns
The annot_train dataframe contains 2000 rows and the annot_val 200. The columns indicate:
sound_file: name of the audio file
label: label for the annotation (1 for upcall, 0 for background))
3. Putting the annotations in the Ketos format¶
Let's check if our annotations follow the Ketos standard.
If that's the case, the function sl.is_standardized
will return True
.
sl.is_standardized(annot_train)
Your table is not in the Ketos format. It should have two levels of indices: filename and annot_id. It should also contain at least the 'label' column. If your annotations have time information, these should appear in the 'start' and 'end' columns extra columns are allowed. Here is a minimum example: label filename annot_id file1.wav 0 2 1 1 2 2 file2.wav 0 2 1 2 2 1 And here is a table with time information and a few extra columns ('min_freq', 'max_freq' and 'file_time_stamp') start end label min_freq max_freq file_time_stamp filename annot_id file1.wav 0 7.0 8.1 2 180.6 294.3 2019-02-24 13:15:00 1 8.5 12.5 1 174.2 258.7 2019-02-24 13:15:00 2 13.1 14.0 2 183.4 292.3 2019-02-24 13:15:00 file2.wav 0 2.2 3.1 2 148.8 286.6 2019-02-24 13:30:00 1 5.8 6.8 2 156.6 278.3 2019-02-24 13:30:00 2 9.0 13.0 1 178.2 304.5 2019-02-24 13:30:00
False
sl.is_standardized(annot_val, verbose=False)
False
Neither of our annotations are in the format ketos expects. But we can use the sl.standardize
function to convert to the specified format.
The annot_id column is created automatically by the sl.standardize
function. From the remaining required columns indicated in the example above, we already have start, end and label. Our sound_file column needs to be renamed to filename, so we will need to provide a dictionary to specify that.
We have one extra column, datetime, that we don't really need to keep, so we'll set trim_table=True
, which will exclude any columns that are not required by the standardized tables.
If we wanted to keep the datetime (or any other columns), we would just set trim_table=False
. One situation in which you might to do that is if you need this information to split a dataset in train/test or train/validation/test, because then you can sort all your annotations by time and make sure the training set does not overlap with the validation/test. But in our case, the annotations are already split.
map_to_ketos_annot_std ={'filename': 'sound_file'}
std_annot_train = sl.standardize(table=annot_train, mapper=map_to_ketos_annot_std, trim_table=True)
std_annot_val = sl.standardize(table=annot_val, mapper=map_to_ketos_annot_std, trim_table=True)
Let's have a look at our standardized tables
std_annot_train
label | ||
---|---|---|
filename | annot_id | |
Old_Harry_ete_2018-06-13_052139_0.wav | 0 | 0 |
Old_Harry_ete_2018-06-14_122355_0.wav | 0 | 0 |
Old_Harry_ete_2018-06-15_030637_0.wav | 0 | 0 |
Old_Harry_ete_2018-06-15_051054_0.wav | 0 | 0 |
Old_Harry_ete_2018-06-15_071621_0.wav | 0 | 0 |
... | ... | ... |
Shediac_ete_2018-07-11_202226_1.wav | 0 | 1 |
Shediac_ete_2018-07-11_234110_0.wav | 0 | 0 |
Shediac_ete_2018-07-12_213604_0.wav | 0 | 0 |
Shediac_ete_2018-07-13_121216_1.wav | 0 | 1 |
Shediac_ete_2018-07-13_135427_0.wav | 0 | 0 |
2000 rows × 1 columns
std_annot_val
label | ||
---|---|---|
filename | annot_id | |
Perce_ete_2018-06-10_093641_0.wav | 0 | 0 |
Perce_ete_2018-06-10_150035_0.wav | 0 | 0 |
Perce_ete_2018-06-11_144117_1.wav | 0 | 1 |
Perce_ete_2018-06-11_211502_0.wav | 0 | 0 |
Perce_ete_2018-06-12_091317_0.wav | 0 | 0 |
... | ... | ... |
Shediac_ete_2018-07-10_070046_1.wav | 0 | 1 |
Shediac_ete_2018-07-10_123759_0.wav | 0 | 0 |
Shediac_ete_2018-07-10_131555_0.wav | 0 | 0 |
Shediac_ete_2018-07-11_100911_0.wav | 0 | 0 |
Shediac_ete_2018-07-13_140736_1.wav | 0 | 1 |
200 rows × 1 columns
4. Choosing the spectrogram settings¶
As mentioned earlier, we'll represent the segments as spectrograms.
In the .zip file where you found the data, there's also a spectrogram configuration file (spec_config.json
) which contains the settings we want to use.
This configuration file is simply a text file in the .json
format, so you could make a copy of it, change a few parameters and save several settings to use later or to share the with someone else.
spec_cfg = load_audio_representation('spec_config.json', name="spectrogram")
spec_cfg
{'rate': 1000, 'window': 0.256, 'step': 0.032, 'freq_min': 0, 'freq_max': 500, 'window_func': 'hamming', 'type': 'MagSpectrogram', 'duration': 3.0}
The result is a python dictionary. We could change some value, like the step size:
#spec_cfg['step'] = 0.064
But we will stick to the original here.
5. Creating the database¶
Now, we have to compute the spectrograms following the settings above for each selection in our selection tables (i.e.: each 3s clip) and then save them in a database.
All of this can be done with the dbi.create_database
function in Ketos.
We will start with the training dataset. We need to indicate the name for the database we want to create, where the audio files are, a name for the dataset, the selections table and the audio representation. As specified in our spec_cfg
, this is a Magnitude spectrogram, but ketos can also create databases with Power, Mel and CQT spectrograms, as well as time-domain data (waveforms).
dbi.create_database(output_file='database.h5', data_dir='data/train',
dataset_name='train',selections=std_annot_train,
audio_repres=spec_cfg)
100%|██████████████████████████████████████| 2000/2000 [00:13<00:00, 148.48it/s]
2000 items saved to database.h5
And we do the same thing for the validation set. Note that, by specifying the same database name, we are telling ketos that we want to add the validation set to the existing database.
dbi.create_database(output_file='database.h5', data_dir='data/val',
dataset_name='val',selections=std_annot_val,
audio_repres=spec_cfg)
100%|████████████████████████████████████████| 200/200 [00:01<00:00, 142.79it/s]
200 items saved to database.h5
Now we have our database with spectrograms representing audio segments with and without the North Atlantic Right Whale upcall. The data is divided into 'train' and 'validation'.
db = dbi.open_file("database.h5", 'r')
db
File(filename=database.h5, title='', mode='r', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None)) / (RootGroup) '' /train (Group) '' /train/data (Table(4000,)fletcher32, shuffle, zlib(1)) '' description := { "data": Float32Col(shape=(94, 129), dflt=0.0, pos=0), "filename": StringCol(itemsize=100, shape=(), dflt=b'', pos=1), "id": UInt32Col(shape=(), dflt=0, pos=2), "label": UInt8Col(shape=(), dflt=0, pos=3), "offset": Float64Col(shape=(), dflt=0.0, pos=4)} byteorder := 'little' chunkshape := (5,) /val (Group) '' /val/data (Table(200,)fletcher32, shuffle, zlib(1)) '' description := { "data": Float32Col(shape=(94, 129), dflt=0.0, pos=0), "filename": StringCol(itemsize=100, shape=(), dflt=b'', pos=1), "id": UInt32Col(shape=(), dflt=0, pos=2), "label": UInt8Col(shape=(), dflt=0, pos=3), "offset": Float64Col(shape=(), dflt=0.0, pos=4)} byteorder := 'little' chunkshape := (5,)
db.close() #Close the database connection
Here we can see the data divided into 'train' and 'validation'. These are called 'groups' in HDF5 terms. Within each of them there is a dataset called 'data', which contains the spectrograms and respective labels.
You will likely not need to directly interact with the database. In a following tutorial, we will use Ketos to build a deep neural network and train it to recognize upcalls. Ketos handles the database interactions, so we won't really have to go into the details of it, but if you would like to learn more about how to get data from this database, take a look at the database_interface module in ketos and the pyTables documentation.