Creating a database

In this tutorial we will use Ketos to build a database of North Atlantic Right Whale upcalls. The audio will be represented as spectrograms and can later be used to train a deep learning based classfier and build a right whale detector.

Note

You can download an executable version (Jupyter Notebook) of this tutorial, the data needed to follow along here.

create_database_simpler

Creating a training database¶

In this tutorial, we will use Ketos to create a database that can be used to train a deep learning classifier.

We will use a subset of the data described in Kirsebom et al. 2020. These data consist of 3-s long clips, some containing right whale upcalls and others containing only background noise. The clips are wave files extracted from recordings produced by bottom-mounted hydrophones in the Gulf of Saint Lawrence, Canada.

Our starting point will be a collection of .wav files accompanied by annotations. You can find them in the data folder within the .zip file linked at the top of this page. In the train folder, there are 2,000 files, half of them containing upcalls and the other half containing background noise (which, for our purpose, is any sound that is not an upcall. This includes sounds produced by other animals and the overall ambient noise). The annotations_train.csv file contains the label attributed to each file: 1 for upcall, 0 for background. Similarly, the val (validation) folder contains 200 .wav files (50% with upcalls) and is accompanied by the annotations_val.csv file.

We will use Ketos to produce a database with spectrogram representations of the training and validation clips, so that we later can train a deep learning classifier to distinguish the upcalls from the other sounds. Eventually, we will use that classifier to build a detector.

A different scenario would be where you have audio recordings and annotations indicating where in these recording the signals of interest are, but you don't have clips of uniform length with examples of the target signal(s) and background. That case is covered in this tutorial.

We also encourage you to explore the documentation, since Ketos has a variety of tools that might help you to build training databases in different scenarios.

Contents:¶

1. Importing the packages
2. Loading the annotations
3. Putting the annotations in the Ketos format
4. Choosing the spectrogram settings
5. Creating the database

1. Importing the packages¶

For this tutorial we will use several modules within ketos. We will also the pandas to read our annotations files.

In [1]:

import pandas as pd
from ketos.data_handling import selection_table as sl
import ketos.data_handling.database_interface as dbi
from ketos.data_handling.parsing import load_audio_representation
from ketos.audio.spectrogram import MagSpectrogram
from ketos.data_handling.parsing import load_audio_representation

2. Loading the annotations¶

Our annotations are saved in two .csv files: annotations_train.csv and annotations_val.csv, which we will use to create the training and validation datasets respectively.

In [2]:

annot_train = pd.read_csv("annotations_train.csv")
annot_val = pd.read_csv("annotations_val.csv")

Let's inspect our annotations

In [3]:

annot_train

Out[3]:

	sound_file	label
0	Old_Harry_ete_2018-06-13_052139_0.wav	0
1	Old_Harry_ete_2018-06-14_122355_0.wav	0
2	Old_Harry_ete_2018-06-15_030637_0.wav	0
3	Old_Harry_ete_2018-06-15_051054_0.wav	0
4	Old_Harry_ete_2018-06-15_071621_0.wav	0
...	...	...
1995	Shediac_ete_2018-07-11_202226_1.wav	1
1996	Shediac_ete_2018-07-11_234110_0.wav	0
1997	Shediac_ete_2018-07-12_213604_0.wav	0
1998	Shediac_ete_2018-07-13_121216_1.wav	1
1999	Shediac_ete_2018-07-13_135427_0.wav	0

2000 rows × 2 columns

In [4]:

annot_val

Out[4]:

	sound_file	label
0	Perce_ete_2018-06-10_093641_0.wav	0
1	Perce_ete_2018-06-10_150035_0.wav	0
2	Perce_ete_2018-06-11_144117_1.wav	1
3	Perce_ete_2018-06-11_211502_0.wav	0
4	Perce_ete_2018-06-12_091317_0.wav	0
...	...	...
195	Shediac_ete_2018-07-10_070046_1.wav	1
196	Shediac_ete_2018-07-10_123759_0.wav	0
197	Shediac_ete_2018-07-10_131555_0.wav	0
198	Shediac_ete_2018-07-11_100911_0.wav	0
199	Shediac_ete_2018-07-13_140736_1.wav	1

200 rows × 2 columns

The annot_train dataframe contains 2000 rows and the annot_val 200. The columns indicate:

sound_file: name of the audio file
label: label for the annotation (1 for upcall, 0 for background))

3. Putting the annotations in the Ketos format¶

Let's check if our annotations follow the Ketos standard.

If that's the case, the function sl.is_standardized will return True.

In [5]:

sl.is_standardized(annot_train)

 Your table is not in the Ketos format.

            It should have two levels of indices: filename and annot_id.
            It should also contain at least the 'label' column.
            If your annotations have time information, these should appear in the 'start' and 'end' columns

            extra columns are allowed.

            Here is a minimum example:

                                 label
            filename  annot_id                    
            file1.wav 0          2
                      1          1
                      2          2
            file2.wav 0          2
                      1          2
                      2          1


            And here is a table with time information and a few extra columns ('min_freq', 'max_freq' and 'file_time_stamp')

                                 start   end  label  min_freq  max_freq  file_time_stamp
            filename  annot_id                    
            file1.wav 0           7.0   8.1      2    180.6     294.3    2019-02-24 13:15:00
                      1           8.5  12.5      1    174.2     258.7    2019-02-24 13:15:00
                      2          13.1  14.0      2    183.4     292.3    2019-02-24 13:15:00
            file2.wav 0           2.2   3.1      2    148.8     286.6    2019-02-24 13:30:00
                      1           5.8   6.8      2    156.6     278.3    2019-02-24 13:30:00
                      2           9.0  13.0      1    178.2     304.5    2019-02-24 13:30:00

Out[5]:

False

In [6]:

sl.is_standardized(annot_val, verbose=False) 

Out[6]:

False

Neither of our annotations are in the format ketos expects. But we can use the sl.standardize function to convert to the specified format.

The annot_id column is created automatically by the sl.standardize function. From the remaining required columns indicated in the example above, we already have start, end and label. Our sound_file column needs to be renamed to filename, so we will need to provide a dictionary to specify that.

We have one extra column, datetime, that we don't really need to keep, so we'll set trim_table=True, which will exclude any columns that are not required by the standardized tables.

If we wanted to keep the datetime (or any other columns), we would just set trim_table=False. One situation in which you might to do that is if you need this information to split a dataset in train/test or train/validation/test, because then you can sort all your annotations by time and make sure the training set does not overlap with the validation/test. But in our case, the annotations are already split.

In [7]:

map_to_ketos_annot_std ={'filename': 'sound_file'} 
std_annot_train = sl.standardize(table=annot_train, mapper=map_to_ketos_annot_std, trim_table=True)
std_annot_val = sl.standardize(table=annot_val, mapper=map_to_ketos_annot_std, trim_table=True)

Let's have a look at our standardized tables

In [8]:

std_annot_train

Out[8]:

		label
filename	annot_id
Old_Harry_ete_2018-06-13_052139_0.wav	0	0
Old_Harry_ete_2018-06-14_122355_0.wav	0	0
Old_Harry_ete_2018-06-15_030637_0.wav	0	0
Old_Harry_ete_2018-06-15_051054_0.wav	0	0
Old_Harry_ete_2018-06-15_071621_0.wav	0	0
...	...	...
Shediac_ete_2018-07-11_202226_1.wav	0	1
Shediac_ete_2018-07-11_234110_0.wav	0	0
Shediac_ete_2018-07-12_213604_0.wav	0	0
Shediac_ete_2018-07-13_121216_1.wav	0	1
Shediac_ete_2018-07-13_135427_0.wav	0	0

2000 rows × 1 columns

In [9]:

std_annot_val

Out[9]:

		label
filename	annot_id
Perce_ete_2018-06-10_093641_0.wav	0	0
Perce_ete_2018-06-10_150035_0.wav	0	0
Perce_ete_2018-06-11_144117_1.wav	0	1
Perce_ete_2018-06-11_211502_0.wav	0	0
Perce_ete_2018-06-12_091317_0.wav	0	0
...	...	...
Shediac_ete_2018-07-10_070046_1.wav	0	1
Shediac_ete_2018-07-10_123759_0.wav	0	0
Shediac_ete_2018-07-10_131555_0.wav	0	0
Shediac_ete_2018-07-11_100911_0.wav	0	0
Shediac_ete_2018-07-13_140736_1.wav	0	1

200 rows × 1 columns

4. Choosing the spectrogram settings¶

As mentioned earlier, we'll represent the segments as spectrograms. In the .zip file where you found the data, there's also a spectrogram configuration file (spec_config.json) which contains the settings we want to use.

This configuration file is simply a text file in the .json format, so you could make a copy of it, change a few parameters and save several settings to use later or to share the with someone else.

In [10]:

spec_cfg = load_audio_representation('spec_config.json', name="spectrogram")

In [11]:

spec_cfg

Out[11]:

{'rate': 1000,
 'window': 0.256,
 'step': 0.032,
 'freq_min': 0,
 'freq_max': 500,
 'window_func': 'hamming',
 'type': 'MagSpectrogram',
 'duration': 3.0}

The result is a python dictionary. We could change some value, like the step size:

In [12]:

#spec_cfg['step'] = 0.064

But we will stick to the original here.

5. Creating the database¶

Now, we have to compute the spectrograms following the settings above for each selection in our selection tables (i.e.: each 3s clip) and then save them in a database.

All of this can be done with the dbi.create_database function in Ketos.

We will start with the training dataset. We need to indicate the name for the database we want to create, where the audio files are, a name for the dataset, the selections table and the audio representation. As specified in our spec_cfg, this is a Magnitude spectrogram, but ketos can also create databases with Power, Mel and CQT spectrograms, as well as time-domain data (waveforms).

In [13]:

dbi.create_database(output_file='database.h5', data_dir='data/train',
                               dataset_name='train',selections=std_annot_train,
                               audio_repres=spec_cfg)
                              

100%|██████████████████████████████████████| 2000/2000 [00:13<00:00, 148.48it/s]

2000 items saved to database.h5

And we do the same thing for the validation set. Note that, by specifying the same database name, we are telling ketos that we want to add the validation set to the existing database.

In [14]:

dbi.create_database(output_file='database.h5', data_dir='data/val',
                               dataset_name='val',selections=std_annot_val,
                               audio_repres=spec_cfg)
                              

100%|████████████████████████████████████████| 200/200 [00:01<00:00, 142.79it/s]

200 items saved to database.h5

Now we have our database with spectrograms representing audio segments with and without the North Atlantic Right Whale upcall. The data is divided into 'train' and 'validation'.

In [15]:

db = dbi.open_file("database.h5", 'r')

In [16]:

db

Out[16]:

File(filename=database.h5, title='', mode='r', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) ''
/train (Group) ''
/train/data (Table(4000,)fletcher32, shuffle, zlib(1)) ''
  description := {
  "data": Float32Col(shape=(94, 129), dflt=0.0, pos=0),
  "filename": StringCol(itemsize=100, shape=(), dflt=b'', pos=1),
  "id": UInt32Col(shape=(), dflt=0, pos=2),
  "label": UInt8Col(shape=(), dflt=0, pos=3),
  "offset": Float64Col(shape=(), dflt=0.0, pos=4)}
  byteorder := 'little'
  chunkshape := (5,)
/val (Group) ''
/val/data (Table(200,)fletcher32, shuffle, zlib(1)) ''
  description := {
  "data": Float32Col(shape=(94, 129), dflt=0.0, pos=0),
  "filename": StringCol(itemsize=100, shape=(), dflt=b'', pos=1),
  "id": UInt32Col(shape=(), dflt=0, pos=2),
  "label": UInt8Col(shape=(), dflt=0, pos=3),
  "offset": Float64Col(shape=(), dflt=0.0, pos=4)}
  byteorder := 'little'
  chunkshape := (5,)

In [17]:

db.close()  #Close the database connection

Here we can see the data divided into 'train' and 'validation'. These are called 'groups' in HDF5 terms. Within each of them there is a dataset called 'data', which contains the spectrograms and respective labels.

You will likely not need to directly interact with the database. In a following tutorial, we will use Ketos to build a deep neural network and train it to recognize upcalls. Ketos handles the database interactions, so we won't really have to go into the details of it, but if you would like to learn more about how to get data from this database, take a look at the database_interface module in ketos and the pyTables documentation.