Creating a database

In this tutorial we will use Ketos to build a database of North Atlantic Right Whale upcalls. The audio will be represented as spectrograms and can later be used to train a deep learning based classfier and build a right whale detector.

Note

You can download an executable version (Jupyter Notebook) of this tutorial, the data needed to follow along here.

database_creation

Creating a training database

In this tutorial, we will use Ketos to create a training database that can be used to train a deep learning-based classifier.

The data is a subset of the recordings used in the 2013 DCLDE challenge, in which participants had to detect calls from the North Atlantic Right Whale. To keep the tutorial simple, we will only use recording from a couple of days containing the characteristic upcall. The size of the database will also be modest. In practice you'll probably want to have more data available to train your classifiers/detectors. But the steps included here should give you a good understanding of the data preparation steps. We will even use a simple data augmentation techinique to increase the size of the training dataset.

Starting with the raw wavfiles and annotations, we will build a database of spectrograms that will be used to train a deep neural network capable of distinguishing upcalls from the background sounds. For our purposes, 'background' includes all sounds that are not upcalls, from other animal vocalizations to ambient noises produced by waves.

You can find the audio files and the annotations in the data folder within .zip file at the top of this page.

This is a common scenario: you have audio recordings and annotations indicating where in these recording the signals of interest are. However, if your data is not in the exact same format, we encourage you to explore the documentation, since Ketos has a variety of tools that won't be necessary in this tutorial.

Contents:

1. Importing the packages
2. Loading the annotations
3. Putting the annotations in the Ketos format
4. Creating segments of uniform length
5. Augmenting the data
6. Including background noise
7. Choosing the spectrogram settings
8. Creating the database

1. Importing the packages

For this tutorial we will use several modules within ketos and also the pandas package

In [2]:
import pandas as pd
from ketos.data_handling import selection_table as sl
import ketos.data_handling.database_interface as dbi
from ketos.data_handling.parsing import load_audio_representation
from ketos.audio.spectrogram import MagSpectrogram
from ketos.data_handling.parsing import load_audio_representation

2. Loading the annotations

Our annotations are saved in two .csv files (separated by ';'): "annotations_train.csv" and "annotations_test.csv", which we will use to create the training and test datasets respectively. These files can also be found within the .zip file at the top of the page.

In [3]:
annot_train = pd.read_csv("annotations_train.csv")
annot_test = pd.read_csv("annotations_test.csv")

Let's inspect our annotations

In [4]:
annot_train
Out[4]:
Unnamed: 0 start end label sound_file datetime
0 2957 188.8115 190.5858 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
1 2958 235.7556 237.1603 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
2 2959 398.6924 400.1710 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
3 2960 438.9091 440.3138 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
4 2961 451.0518 452.2716 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
5 2962 565.3811 566.6748 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
6 2963 567.6359 568.8926 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
7 2964 547.3427 548.5625 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
8 2965 549.0430 550.4477 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
9 2966 632.1194 633.3761 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
10 2967 637.2204 638.4402 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
11 2968 642.5617 643.9293 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
12 2969 645.8884 647.5148 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
13 2970 654.0205 655.6838 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
14 2971 672.8166 674.0734 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
15 2972 712.7930 713.6432 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
16 2973 735.1747 736.2466 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
17 2974 711.7211 712.9778 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
18 2975 749.8308 751.0506 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
19 2976 784.5954 785.9261 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
20 2977 858.6525 860.3159 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
21 2978 890.2936 892.2896 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
22 2979 895.0989 896.4666 upcall NOPP6_EST_20090328_000000.wav 2009-03-28 00:00:00
23 2980 38.3282 39.8437 upcall NOPP6_EST_20090328_001500.wav 2009-03-28 00:15:00
24 2981 114.5847 115.8414 upcall NOPP6_EST_20090328_001500.wav 2009-03-28 00:15:00
25 2982 161.8984 163.2661 upcall NOPP6_EST_20090328_001500.wav 2009-03-28 00:15:00
26 2983 262.7729 263.9557 upcall NOPP6_EST_20090328_001500.wav 2009-03-28 00:15:00
27 2984 271.0897 272.6792 upcall NOPP6_EST_20090328_001500.wav 2009-03-28 00:15:00
28 2985 324.5765 325.5745 upcall NOPP6_EST_20090328_001500.wav 2009-03-28 00:15:00
29 2986 335.3514 336.5712 upcall NOPP6_EST_20090328_001500.wav 2009-03-28 00:15:00
... ... ... ... ... ... ...
970 3927 584.7077 585.8721 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
971 3928 589.4761 590.7698 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
972 3929 598.0425 599.0405 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
973 3930 612.8927 614.2789 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
974 3931 618.7607 619.4261 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
975 3932 636.3648 637.2149 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
976 3933 671.3418 672.3953 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
977 3934 685.1201 686.1736 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
978 3935 679.9822 681.1927 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
979 3936 806.9993 808.2855 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
980 3937 845.4787 846.8464 upcall NOPP6_EST_20090329_024500.wav 2009-03-29 02:45:00
981 3938 19.2032 20.5708 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
982 3939 83.5573 84.9619 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
983 3940 131.6473 133.9391 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
984 3941 187.2040 188.3499 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
985 3942 196.8701 197.9420 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
986 3943 293.9003 294.9352 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
987 3944 470.2734 471.4932 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
988 3945 473.7480 475.3375 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
989 3946 514.0756 516.1456 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
990 3947 705.3083 706.3063 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
991 3948 867.8939 869.2616 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
992 3949 872.9949 874.5474 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
993 3950 895.3766 896.5964 upcall NOPP6_EST_20090329_030000.wav 2009-03-29 03:00:00
994 3951 1.0690 2.2518 upcall NOPP6_EST_20090329_031500.wav 2009-03-29 03:15:00
995 3952 52.0791 53.6686 upcall NOPP6_EST_20090329_031500.wav 2009-03-29 03:15:00
996 3953 76.1057 77.2146 upcall NOPP6_EST_20090329_031500.wav 2009-03-29 03:15:00
997 3954 99.9104 101.3520 upcall NOPP6_EST_20090329_031500.wav 2009-03-29 03:15:00
998 3955 120.9983 121.9224 upcall NOPP6_EST_20090329_031500.wav 2009-03-29 03:15:00
999 3956 104.6603 105.8431 upcall NOPP6_EST_20090329_031500.wav 2009-03-29 03:15:00

1000 rows × 6 columns

In [5]:
annot_test
Out[5]:
Unnamed: 0 start end label sound_file datetime file_duration
0 4157 891.4625 892.5714 upcall NOPP6_EST_20090329_084500.wav 2009-03-29 08:45:00 900
1 4158 52.7486 53.8945 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
2 4159 42.1030 43.5076 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
3 4160 98.0663 98.9165 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
4 4161 116.4928 117.8605 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
5 4162 288.6890 290.2415 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
6 4163 293.5683 295.3425 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
7 4164 298.0409 299.0020 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
8 4165 323.7862 325.1539 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
9 4166 358.0887 360.0848 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
10 4167 402.3899 404.0164 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
11 4168 391.3008 392.7793 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
12 4169 435.1030 436.6185 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
13 4170 440.3888 441.9043 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
14 4171 493.8755 494.9105 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
15 4172 513.4849 514.9634 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
16 4173 551.4468 552.5927 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
17 4174 569.1895 570.4093 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
18 4175 604.6378 606.2273 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
19 4176 734.0299 735.4345 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
20 4177 856.9163 858.1361 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
21 4178 863.8375 864.8615 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
22 4179 897.5020 898.5260 upcall NOPP6_EST_20090329_090000.wav 2009-03-29 09:00:00 900
23 4180 4.5425 5.5665 upcall NOPP6_EST_20090329_091500.wav 2009-03-29 09:15:00 900
24 4181 14.6909 16.0955 upcall NOPP6_EST_20090329_091500.wav 2009-03-29 09:15:00 900
25 4182 22.8414 24.3939 upcall NOPP6_EST_20090329_091500.wav 2009-03-29 09:15:00 900
26 4183 33.4870 34.6698 upcall NOPP6_EST_20090329_091500.wav 2009-03-29 09:15:00 900
27 4184 41.6560 43.5042 upcall NOPP6_EST_20090329_091500.wav 2009-03-29 09:15:00 900
28 4185 72.0773 73.4080 upcall NOPP6_EST_20090329_091500.wav 2009-03-29 09:15:00 900
29 4186 120.9436 122.4221 upcall NOPP6_EST_20090329_091500.wav 2009-03-29 09:15:00 900
... ... ... ... ... ... ... ...
470 4627 700.2496 701.4694 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
471 4628 704.8700 705.7941 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
472 4629 712.7434 714.0001 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
473 4630 729.0629 730.4675 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
474 4631 731.0590 731.9091 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
475 4632 744.6247 745.6597 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
476 4633 753.4406 754.6234 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
477 4634 788.0573 788.9814 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
478 4635 791.6797 792.4929 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
479 4636 799.5530 800.3293 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
480 4637 801.4567 802.6026 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
481 4638 804.3768 805.7814 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
482 4639 820.1974 821.7868 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
483 4640 863.9256 865.2563 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
484 4641 872.9079 873.9798 upcall NOPP6_EST_20090329_124500.wav 2009-03-29 12:45:00 900
485 4642 9.1694 9.8348 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
486 4643 10.7958 12.2005 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
487 4644 39.8310 40.6072 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
488 4645 68.7552 70.3816 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
489 4646 93.1698 94.3527 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
490 4647 77.2015 78.3473 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
491 4648 122.2604 123.7020 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
492 4649 127.9898 129.1357 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
493 4650 129.8380 130.9839 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
494 4651 193.0463 194.2661 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
495 4652 201.8252 203.4886 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
496 4653 235.1851 236.1831 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
497 4654 236.7006 237.7726 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
498 4655 246.4406 247.6974 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900
499 4656 264.1833 265.8097 upcall NOPP6_EST_20090329_130000.wav 2009-03-29 13:00:00 900

500 rows × 7 columns

The annot_train dataframe contains 1000 rows and the annot_test 500. The columns indicate:

start: start time for the annotation, in seconds from the beginning of the file
end: end time for the annotation, in seconds from the beginning of the file
label: label for the annotation (in our case, all annotated signals are 'upcalls', but the origincal DCLDE2013 dataset also had 'gunshots')
sound_file: name of the audio file
datetime: a timestamp for the beginning of the file (UTC)

3. Putting the annotations in the Ketos format

Let's check if our annotations follow the Ketos standard.

If that's the case, the function sl.is_standardized will return True.

In [6]:
sl.is_standardized(annot_train)
 Your table is not in the Ketos format.

            It should have two levels of indices: filename and annot_id.
            It should also contain at least the 'label' column.
            If your annotations have time information, these should appear in the 'start' and 'end' columns

            extra columns are allowed.

            Here is a minimum example:

                                 label
            filename  annot_id                    
            file1.wav 0          2
                      1          1
                      2          2
            file2.wav 0          2
                      1          2
                      2          1


            And here is a table with time information and a few extra columns ('min_freq', 'max_freq' and 'file_time_stamp')

                                 start   end  label  min_freq  max_freq  file_time_stamp
            filename  annot_id                    
            file1.wav 0           7.0   8.1      2    180.6     294.3    2019-02-24 13:15:00
                      1           8.5  12.5      1    174.2     258.7    2019-02-24 13:15:00
                      2          13.1  14.0      2    183.4     292.3    2019-02-24 13:15:00
            file2.wav 0           2.2   3.1      2    148.8     286.6    2019-02-24 13:30:00
                      1           5.8   6.8      2    156.6     278.3    2019-02-24 13:30:00
                      2           9.0  13.0      1    178.2     304.5    2019-02-24 13:30:00

    
    
Out[6]:
False

Setting the verbose argument to False will not show the example above:

In [7]:
sl.is_standardized(annot_test, verbose=False)
Out[7]:
False

Neither of our annotations are in the format ketos expects. But we can use the sl.standardize function to convert to the specified format.

The annot_id column is created automatically by the sl.standardize function. From the remaining required columns indicated in the example above, we already have start, end and label. Our sound_file column needs to be renamed to filename, so we will need to provide a dictionary to specify that.

We have one extra column, datetime, that we don't really need to keep, so we'll set trim_table=True, which will discard any columns that are not required by the standardized.

If we wanted to keep the datetime (or any other columns), we would just set trim_table=False. One situation in which you might want to do that is if you need this information to split a dataset into train/test or train/validation/test, because then you can sort all your annotations by time and make sure the training set does not overlap with the validation/test. But in our case, the annotations are already split.

In [8]:
map_to_ketos_annot_std ={'sound_file': 'filename'} 
std_annot_train = sl.standardize(table=annot_train, signal_labels=["upcall"], mapper=map_to_ketos_annot_std, trim_table=True)
std_annot_test = sl.standardize(table=annot_test, signal_labels=["upcall"], mapper=map_to_ketos_annot_std, trim_table=True)

Let's have a look at our standardized tables

In [9]:
std_annot_train
Out[9]:
start end label
filename annot_id
NOPP6_EST_20090328_000000.wav 0 188.8115 190.5858 1
1 235.7556 237.1603 1
2 398.6924 400.1710 1
3 438.9091 440.3138 1
4 451.0518 452.2716 1
5 565.3811 566.6748 1
6 567.6359 568.8926 1
7 547.3427 548.5625 1
8 549.0430 550.4477 1
9 632.1194 633.3761 1
10 637.2204 638.4402 1
11 642.5617 643.9293 1
12 645.8884 647.5148 1
13 654.0205 655.6838 1
14 672.8166 674.0734 1
15 712.7930 713.6432 1
16 735.1747 736.2466 1
17 711.7211 712.9778 1
18 749.8308 751.0506 1
19 784.5954 785.9261 1
20 858.6525 860.3159 1
21 890.2936 892.2896 1
22 895.0989 896.4666 1
NOPP6_EST_20090328_001500.wav 0 38.3282 39.8437 1
1 114.5847 115.8414 1
2 161.8984 163.2661 1
3 262.7729 263.9557 1
4 271.0897 272.6792 1
5 324.5765 325.5745 1
6 335.3514 336.5712 1
... ... ... ... ...
NOPP6_EST_20090329_024500.wav 19 584.7077 585.8721 1
20 589.4761 590.7698 1
21 598.0425 599.0405 1
22 612.8927 614.2789 1
23 618.7607 619.4261 1
24 636.3648 637.2149 1
25 671.3418 672.3953 1
26 685.1201 686.1736 1
27 679.9822 681.1927 1
28 806.9993 808.2855 1
29 845.4787 846.8464 1
NOPP6_EST_20090329_030000.wav 0 19.2032 20.5708 1
1 83.5573 84.9619 1
2 131.6473 133.9391 1
3 187.2040 188.3499 1
4 196.8701 197.9420 1
5 293.9003 294.9352 1
6 470.2734 471.4932 1
7 473.7480 475.3375 1
8 514.0756 516.1456 1
9 705.3083 706.3063 1
10 867.8939 869.2616 1
11 872.9949 874.5474 1
12 895.3766 896.5964 1
NOPP6_EST_20090329_031500.wav 0 1.0690 2.2518 1
1 52.0791 53.6686 1
2 76.1057 77.2146 1
3 99.9104 101.3520 1
4 120.9983 121.9224 1
5 104.6603 105.8431 1

1000 rows × 3 columns

In [10]:
std_annot_test
Out[10]:
start end label
filename annot_id
NOPP6_EST_20090329_084500.wav 0 891.4625 892.5714 1
NOPP6_EST_20090329_090000.wav 0 52.7486 53.8945 1
1 42.1030 43.5076 1
2 98.0663 98.9165 1
3 116.4928 117.8605 1
4 288.6890 290.2415 1
5 293.5683 295.3425 1
6 298.0409 299.0020 1
7 323.7862 325.1539 1
8 358.0887 360.0848 1
9 402.3899 404.0164 1
10 391.3008 392.7793 1
11 435.1030 436.6185 1
12 440.3888 441.9043 1
13 493.8755 494.9105 1
14 513.4849 514.9634 1
15 551.4468 552.5927 1
16 569.1895 570.4093 1
17 604.6378 606.2273 1
18 734.0299 735.4345 1
19 856.9163 858.1361 1
20 863.8375 864.8615 1
21 897.5020 898.5260 1
NOPP6_EST_20090329_091500.wav 0 4.5425 5.5665 1
1 14.6909 16.0955 1
2 22.8414 24.3939 1
3 33.4870 34.6698 1
4 41.6560 43.5042 1
5 72.0773 73.4080 1
6 120.9436 122.4221 1
... ... ... ... ...
NOPP6_EST_20090329_124500.wav 61 700.2496 701.4694 1
62 704.8700 705.7941 1
63 712.7434 714.0001 1
64 729.0629 730.4675 1
65 731.0590 731.9091 1
66 744.6247 745.6597 1
67 753.4406 754.6234 1
68 788.0573 788.9814 1
69 791.6797 792.4929 1
70 799.5530 800.3293 1
71 801.4567 802.6026 1
72 804.3768 805.7814 1
73 820.1974 821.7868 1
74 863.9256 865.2563 1
75 872.9079 873.9798 1
NOPP6_EST_20090329_130000.wav 0 9.1694 9.8348 1
1 10.7958 12.2005 1
2 39.8310 40.6072 1
3 68.7552 70.3816 1
4 93.1698 94.3527 1
5 77.2015 78.3473 1
6 122.2604 123.7020 1
7 127.9898 129.1357 1
8 129.8380 130.9839 1
9 193.0463 194.2661 1
10 201.8252 203.4886 1
11 235.1851 236.1831 1
12 236.7006 237.7726 1
13 246.4406 247.6974 1
14 264.1833 265.8097 1

500 rows × 3 columns

Notice that the 'label' column now encodes 'upcall' as 1s, as the ketos format uses integers to represent labels.

4. Creating segments of uniform length

If you look back at our std_annot_train and std_annot_test you'll notice that annotations have a variety of lengths, since they mark the beginning and end of an upcall and these have variable durations. For our purposes, we want each signal in the database to be represented as spectrograms, all of same length. Each spectrogram will be labelled as containing an upcall or not.

The sl.select function in ketos can help us to do just that: for each annotated upcall, it'll select a portion of the recording surrounding it. It takes a standard selections table as input and lets you specify the length of the output segments. We'll use 3 seconds, as it is enough to encompass most upcalls.

Our standardized tables only contain annotated upcalls. Later we will also want some examples of segments that only contain background noise, but for now we'll just create the uniform upcall segments, which we'll call 'positives'

In [11]:
positives_train = sl.select(annotations=std_annot_train, length=3.0)
positives_test = sl.select(annotations=std_annot_test, length=3.0, step=0.0, center=False)

Have a look at the results and notice how each entry is now 3.0 seconds long.

In [12]:
positives_train
Out[12]:
label start end
filename sel_id
NOPP6_EST_20090328_000000.wav 0 1 188.505821 191.505821
1 1 235.144067 238.144067
2 1 397.974660 400.974660
3 1 438.595215 441.595215
4 1 450.735524 453.735524
5 1 563.750671 566.750671
6 1 566.312310 569.312310
7 1 545.640917 548.640917
8 1 548.141737 551.141737
9 1 631.700747 634.700747
10 1 636.133811 639.133811
11 1 641.762301 644.762301
12 1 645.843897 648.843897
13 1 652.702845 655.702845
14 1 671.911848 674.911848
15 1 710.777639 713.777639
16 1 733.780307 736.780307
17 1 710.129000 713.129000
18 1 749.550831 752.550831
19 1 784.049950 787.049950
20 1 857.816413 860.816413
21 1 889.790064 892.790064
22 1 894.392944 897.392944
NOPP6_EST_20090328_001500.wav 0 1 37.238110 40.238110
1 1 113.564393 116.564393
2 1 161.681734 164.681734
3 1 261.538927 264.538927
4 1 270.663886 273.663886
5 1 324.158069 327.158069
6 1 333.854207 336.854207
... ... ... ... ...
NOPP6_EST_20090329_024500.wav 19 1 583.597954 586.597954
20 1 587.881577 590.881577
21 1 597.919677 600.919677
22 1 612.647644 615.647644
23 1 618.666776 621.666776
24 1 636.216577 639.216577
25 1 670.453591 673.453591
26 1 684.232552 687.232552
27 1 678.577175 681.577175
28 1 805.907640 808.907640
29 1 844.114076 847.114076
NOPP6_EST_20090329_030000.wav 0 1 17.847298 20.847298
1 1 82.647307 85.647307
2 1 131.010253 134.010253
3 1 186.276704 189.276704
4 1 196.577775 199.577775
5 1 293.785304 296.785304
6 1 469.976772 472.976772
7 1 473.330541 476.330541
8 1 513.429442 516.429442
9 1 703.958674 706.958674
10 1 867.646022 870.646022
11 1 872.422265 875.422265
12 1 895.127993 898.127993
NOPP6_EST_20090329_031500.wav 0 1 0.517951 3.517951
1 1 50.802804 53.802804
2 1 75.652178 78.652178
3 1 99.899106 102.899106
4 1 120.222738 123.222738
5 1 104.623971 107.623971

1000 rows × 3 columns

In [13]:
positives_test
Out[13]:
label start end
filename sel_id
NOPP6_EST_20090329_084500.wav 0 1 890.100436 893.100436
NOPP6_EST_20090329_090000.wav 0 1 51.413506 54.413506
1 1 41.592974 44.592974
2 1 97.386199 100.386199
3 1 115.234384 118.234384
4 1 288.680821 291.680821
5 1 292.441137 295.441137
6 1 296.548931 299.548931
7 1 323.259055 326.259055
8 1 357.530377 360.530377
9 1 401.227509 404.227509
10 1 390.314226 393.314226
11 1 434.550023 437.550023
12 1 439.979963 442.979963
13 1 492.413814 495.413814
14 1 513.106820 516.106820
15 1 551.306710 554.306710
16 1 567.635057 570.635057
17 1 604.062372 607.062372
18 1 733.756561 736.756561
19 1 855.693299 858.693299
20 1 861.955671 864.955671
21 1 897.331731 900.331731
NOPP6_EST_20090329_091500.wav 0 1 3.420198 6.420198
1 1 14.066753 17.066753
2 1 21.761687 24.761687
3 1 32.583240 35.583240
4 1 41.612624 44.612624
5 1 71.793040 74.793040
6 1 120.361574 123.361574
... ... ... ... ...
NOPP6_EST_20090329_124500.wav 61 1 699.890299 702.890299
62 1 703.414265 706.414265
63 1 711.872941 714.872941
64 1 727.622338 730.622338
65 1 729.370055 732.370055
66 1 743.195702 746.195702
67 1 753.168759 756.168759
68 1 787.630758 790.630758
69 1 790.492793 793.492793
70 1 798.820875 801.820875
71 1 800.362594 803.362594
72 1 804.248848 807.248848
73 1 819.142024 822.142024
74 1 863.724132 866.724132
75 1 872.336333 875.336333
NOPP6_EST_20090329_130000.wav 0 1 7.668247 10.668247
1 1 9.603355 12.603355
2 1 39.049304 42.049304
3 1 68.746681 71.746681
4 1 91.885280 94.885280
5 1 76.969680 79.969680
6 1 121.012622 124.012622
7 1 127.647454 130.647454
8 1 129.100619 132.100619
9 1 192.518624 195.518624
10 1 201.033773 204.033773
11 1 233.905817 236.905817
12 1 235.614636 238.614636
13 1 245.646977 248.646977
14 1 263.731547 266.731547

500 rows × 3 columns

5. Augmenting the data

Data augmentation is a set of tecnhiques used in machine learning to increase the data available to train models. There are many different techniques that can be used. The sl.select function we just used offers a simple way to augment the data while you are creating the uniform selections. It creates segments that are longer than the annotated signals and then shifts the start and end of those segments, resulting in multiple segments with the same annotated signal (our upcalls) positioned at different times. This is a very safe technique, as it is not altering the original signal, but it can already help to increase the amount of data available. It also helps to present a larger variety of contexts in which the upcall can appear.

We'll augment the training portion of our annotations by using two additional arguments. The step specifies how much the signal will be shifted (in seconds). Smaller values will produce more augmented selections, but they will be more similar to the previous selection. The min_overlap argument specifies the fraction of the augmented signal that needs to overlap the original annotation in order for it to be included in the augmented selections table. A value of 1.0 means 100%, this is, the new annotation will only be included if the entire upcall falls within the stablished interval. Lower values will result in segments that only contain part of the original upcall. We'll set this value to 0.5, meaning that some of our augmented segments might have as little as half of the original call.

In [14]:
positives_train = sl.select(annotations=std_annot_train, length=3.0, step=0.5, min_overlap=0.5, center=False)
In [15]:
positives_train
Out[15]:
label start end
filename sel_id
NOPP6_EST_20090328_000000.wav 0 1 187.639360 190.639360
1 1 188.139360 191.139360
2 1 188.639360 191.639360
3 1 234.513519 237.513519
4 1 235.013519 238.013519
5 1 235.513519 238.513519
6 1 397.672005 400.672005
7 1 398.172005 401.172005
8 1 437.537038 440.537038
9 1 438.037038 441.037038
10 1 438.537038 441.537038
11 1 450.039691 453.039691
12 1 450.539691 453.539691
13 1 546.077120 549.077120
14 1 546.577120 549.577120
15 1 547.077120 550.077120
16 1 547.547645 550.547645
17 1 548.047645 551.047645
18 1 548.547645 551.547645
19 1 564.229033 567.229033
20 1 564.729033 567.729033
21 1 566.418305 569.418305
22 1 566.918305 569.918305
23 1 630.406477 633.406477
24 1 630.906477 633.906477
25 1 631.406477 634.406477
26 1 635.691668 638.691668
27 1 636.191668 639.191668
28 1 636.691668 639.691668
29 1 641.312003 644.312003
... ... ... ... ...
NOPP6_EST_20090329_030000.wav 25 1 513.964139 516.964139
26 1 514.464139 517.464139
27 1 704.176359 707.176359
28 1 704.676359 707.676359
29 1 866.581307 869.581307
30 1 867.081307 870.081307
31 1 867.581307 870.581307
32 1 871.940831 874.940831
33 1 872.440831 875.440831
34 1 872.940831 875.940831
35 1 894.355879 897.355879
36 1 894.855879 897.855879
37 1 895.355879 898.355879
NOPP6_EST_20090329_031500.wav 0 1 -0.042952 2.957048
1 1 0.457048 3.457048
2 1 50.791279 53.791279
3 1 51.291279 54.291279
4 1 51.791279 54.791279
5 1 74.824385 77.824385
6 1 75.324385 78.324385
7 1 75.824385 78.824385
8 1 98.676047 101.676047
9 1 99.176047 102.176047
10 1 99.676047 102.676047
11 1 103.222448 106.222448
12 1 103.722448 106.722448
13 1 104.222448 107.222448
14 1 119.948518 122.948518
15 1 120.448518 123.448518
16 1 120.948518 123.948518

2763 rows × 3 columns

Notice that now our positives_train tables has almost 3x more rows than before.

6. Including background noise

Now that we have the positive instances that we need to create our database, we need to include some examples of the negative class, or instances without upcalls.

The sl.create_rndm_backgr_selections is ideal for our situation. It takes a standardized ketos table describing all sections of the recordings that contain annotations and takes samples from the non-annotaded portions of the files, assuming everything that is not annotated can be used as a 'background' category.

Note: You might find yourself in a different scenario. For example, your annotations might already include a 'background' class or you might have annoted different classes of sounds and you only want to use a few of them. In any case, ketos provides a variety of other functions that are helpful in different scenarios. Have a look at the documentation for more details. Specially the selection_table module.

The sl.create_rndm_backgr_selections also needs the duration of each file, which we can generate using the sl.file_duration function.

In [16]:
file_durations_train = sl.file_duration_table('data/train')
file_durations_test = sl.file_duration_table('data/test') 
In [17]:
file_durations_train
Out[17]:
filename duration
0 NOPP6_EST_20090328_000000.wav 900.0
1 NOPP6_EST_20090328_001500.wav 900.0
2 NOPP6_EST_20090328_003000.wav 900.0
3 NOPP6_EST_20090328_004500.wav 900.0
4 NOPP6_EST_20090328_010000.wav 900.0
5 NOPP6_EST_20090328_011500.wav 900.0
6 NOPP6_EST_20090328_013000.wav 900.0
7 NOPP6_EST_20090328_014500.wav 900.0
8 NOPP6_EST_20090328_020000.wav 900.0
9 NOPP6_EST_20090328_021500.wav 900.0
10 NOPP6_EST_20090328_023000.wav 900.0
11 NOPP6_EST_20090328_024500.wav 900.0
12 NOPP6_EST_20090328_030000.wav 900.0
13 NOPP6_EST_20090328_031500.wav 900.0
14 NOPP6_EST_20090328_033000.wav 900.0
15 NOPP6_EST_20090328_034500.wav 900.0
16 NOPP6_EST_20090328_041500.wav 900.0
17 NOPP6_EST_20090328_043000.wav 900.0
18 NOPP6_EST_20090328_044500.wav 900.0
19 NOPP6_EST_20090328_053000.wav 900.0
20 NOPP6_EST_20090328_054500.wav 900.0
21 NOPP6_EST_20090328_060000.wav 900.0
22 NOPP6_EST_20090328_061500.wav 900.0
23 NOPP6_EST_20090328_063000.wav 900.0
24 NOPP6_EST_20090328_064500.wav 900.0
25 NOPP6_EST_20090328_070000.wav 900.0
26 NOPP6_EST_20090328_074500.wav 900.0
27 NOPP6_EST_20090328_091500.wav 900.0
28 NOPP6_EST_20090328_093000.wav 900.0
29 NOPP6_EST_20090328_094500.wav 900.0
... ... ...
54 NOPP6_EST_20090328_193000.wav 900.0
55 NOPP6_EST_20090328_194500.wav 900.0
56 NOPP6_EST_20090328_200000.wav 900.0
57 NOPP6_EST_20090328_201500.wav 900.0
58 NOPP6_EST_20090328_203000.wav 900.0
59 NOPP6_EST_20090328_204500.wav 900.0
60 NOPP6_EST_20090328_210000.wav 900.0
61 NOPP6_EST_20090328_211500.wav 900.0
62 NOPP6_EST_20090328_213000.wav 900.0
63 NOPP6_EST_20090328_220000.wav 900.0
64 NOPP6_EST_20090328_221500.wav 900.0
65 NOPP6_EST_20090328_223000.wav 900.0
66 NOPP6_EST_20090328_224500.wav 900.0
67 NOPP6_EST_20090328_230000.wav 900.0
68 NOPP6_EST_20090328_231500.wav 900.0
69 NOPP6_EST_20090328_233000.wav 900.0
70 NOPP6_EST_20090328_234500.wav 900.0
71 NOPP6_EST_20090329_000000.wav 900.0
72 NOPP6_EST_20090329_003000.wav 900.0
73 NOPP6_EST_20090329_004500.wav 900.0
74 NOPP6_EST_20090329_010000.wav 900.0
75 NOPP6_EST_20090329_011500.wav 900.0
76 NOPP6_EST_20090329_013000.wav 900.0
77 NOPP6_EST_20090329_014500.wav 900.0
78 NOPP6_EST_20090329_020000.wav 900.0
79 NOPP6_EST_20090329_021500.wav 900.0
80 NOPP6_EST_20090329_023000.wav 900.0
81 NOPP6_EST_20090329_024500.wav 900.0
82 NOPP6_EST_20090329_030000.wav 900.0
83 NOPP6_EST_20090329_031500.wav 900.0

84 rows × 2 columns

Now that we have the file durations, we can generate our table of negative segments. We'll specify the same length (3.0 seconds). The num argument specifies the number of background segments we would like to generate. Let's make this number equal to the number of positive examples in each dataset (len(positive_train) and len(positive_test))

In [19]:
negatives_train=sl.create_rndm_backgr_selections(annotations=std_annot_train, files=file_durations_train, length=3.0, num=len(positives_train), trim_table=True)
negatives_train
Out[19]:
start end label
filename sel_id
NOPP6_EST_20090328_000000.wav 0 3.903217 6.903217 0
1 5.537173 8.537173 0
2 27.155842 30.155842 0
3 48.589104 51.589104 0
4 90.068517 93.068517 0
5 97.106125 100.106125 0
6 130.141053 133.141053 0
7 134.551215 137.551215 0
8 148.674811 151.674811 0
9 167.232753 170.232753 0
10 193.140155 196.140155 0
11 204.552811 207.552811 0
12 206.504786 209.504786 0
13 226.809061 229.809061 0
14 244.763926 247.763926 0
15 324.325925 327.325925 0
16 330.936250 333.936250 0
17 343.416685 346.416685 0
18 358.911778 361.911778 0
19 383.734205 386.734205 0
20 401.643721 404.643721 0
21 408.724698 411.724698 0
22 516.672974 519.672974 0
23 537.422873 540.422873 0
24 603.514072 606.514072 0
25 650.628613 653.628613 0
26 773.214563 776.214563 0
27 788.702702 791.702702 0
28 795.625846 798.625846 0
29 805.448154 808.448154 0
... ... ... ... ...
NOPP6_EST_20090329_031500.wav 24 467.088028 470.088028 0
25 484.465908 487.465908 0
26 490.495482 493.495482 0
27 534.085320 537.085320 0
28 543.392140 546.392140 0
29 558.707660 561.707660 0
30 567.383308 570.383308 0
31 585.873192 588.873192 0
32 592.527538 595.527538 0
33 604.180489 607.180489 0
34 605.733439 608.733439 0
35 627.867235 630.867235 0
36 630.587216 633.587216 0
37 639.222546 642.222546 0
38 642.769442 645.769442 0
39 644.821419 647.821419 0
40 715.585410 718.585410 0
41 717.894039 720.894039 0
42 718.313190 721.313190 0
43 728.203924 731.203924 0
44 778.048312 781.048312 0
45 779.882461 782.882461 0
46 795.866640 798.866640 0
47 796.353947 799.353947 0
48 814.847563 817.847563 0
49 821.248096 824.248096 0
50 837.484186 840.484186 0
51 854.602601 857.602601 0
52 871.130977 874.130977 0
53 880.082869 883.082869 0

2763 rows × 3 columns

In [21]:
negatives_test=sl.create_rndm_backgr_selections(annotations=std_annot_train, files=file_durations_test, length=3.0, num=len(positives_test), trim_table=True)
negatives_test
Out[21]:
start end label
filename sel_id
NOPP6_EST_20090329_084500.wav 0 8.603795 11.603795 0
1 14.883873 17.883873 0
2 89.923328 92.923328 0
3 131.676077 134.676077 0
4 145.002652 148.002652 0
5 169.341068 172.341068 0
6 189.527045 192.527045 0
7 226.650309 229.650309 0
8 230.808391 233.808391 0
9 307.966641 310.966641 0
10 351.378814 354.378814 0
11 356.830521 359.830521 0
12 388.342629 391.342629 0
13 441.310250 444.310250 0
14 448.546981 451.546981 0
15 501.585845 504.585845 0
16 513.518457 516.518457 0
17 628.035535 631.035535 0
18 634.467216 637.467216 0
19 700.879630 703.879630 0
20 713.383087 716.383087 0
21 719.025952 722.025952 0
22 736.362923 739.362923 0
23 741.364835 744.364835 0
24 768.035056 771.035056 0
25 795.283044 798.283044 0
26 818.071985 821.071985 0
27 880.901327 883.901327 0
28 884.248457 887.248457 0
NOPP6_EST_20090329_090000.wav 0 28.011847 31.011847 0
... ... ... ... ...
NOPP6_EST_20090329_124500.wav 25 792.271875 795.271875 0
26 847.367740 850.367740 0
NOPP6_EST_20090329_130000.wav 0 47.371578 50.371578 0
1 61.823787 64.823787 0
2 89.843122 92.843122 0
3 136.861575 139.861575 0
4 226.259463 229.259463 0
5 234.413781 237.413781 0
6 235.375820 238.375820 0
7 385.608384 388.608384 0
8 420.164919 423.164919 0
9 423.130044 426.130044 0
10 434.862795 437.862795 0
11 581.915976 584.915976 0
12 592.743423 595.743423 0
13 632.189442 635.189442 0
14 668.677738 671.677738 0
15 675.430884 678.430884 0
16 675.867658 678.867658 0
17 684.971202 687.971202 0
18 737.840801 740.840801 0
19 751.533666 754.533666 0
20 783.827134 786.827134 0
21 788.032465 791.032465 0
22 815.817842 818.817842 0
23 816.357790 819.357790 0
24 857.441977 860.441977 0
25 860.217478 863.217478 0
26 873.911652 876.911652 0
27 876.154200 879.154200 0

500 rows × 3 columns

There we have it! Now we'll just put the positives_train and negatives_train together and do the same to the test tables.

In [22]:
selections_train = positives_train.append(negatives_train, sort=False)
selections_test = positives_test.append(negatives_test, sort=False)
In [23]:
selections_train
Out[23]:
label start end
filename sel_id
NOPP6_EST_20090328_000000.wav 0 1 187.639360 190.639360
1 1 188.139360 191.139360
2 1 188.639360 191.639360
3 1 234.513519 237.513519
4 1 235.013519 238.013519
5 1 235.513519 238.513519
6 1 397.672005 400.672005
7 1 398.172005 401.172005
8 1 437.537038 440.537038
9 1 438.037038 441.037038
10 1 438.537038 441.537038
11 1 450.039691 453.039691
12 1 450.539691 453.539691
13 1 546.077120 549.077120
14 1 546.577120 549.577120
15 1 547.077120 550.077120
16 1 547.547645 550.547645
17 1 548.047645 551.047645
18 1 548.547645 551.547645
19 1 564.229033 567.229033
20 1 564.729033 567.729033
21 1 566.418305 569.418305
22 1 566.918305 569.918305
23 1 630.406477 633.406477
24 1 630.906477 633.906477
25 1 631.406477 634.406477
26 1 635.691668 638.691668
27 1 636.191668 639.191668
28 1 636.691668 639.691668
29 1 641.312003 644.312003
... ... ... ... ...
NOPP6_EST_20090329_031500.wav 24 0 467.088028 470.088028
25 0 484.465908 487.465908
26 0 490.495482 493.495482
27 0 534.085320 537.085320
28 0 543.392140 546.392140
29 0 558.707660 561.707660
30 0 567.383308 570.383308
31 0 585.873192 588.873192
32 0 592.527538 595.527538
33 0 604.180489 607.180489
34 0 605.733439 608.733439
35 0 627.867235 630.867235
36 0 630.587216 633.587216
37 0 639.222546 642.222546
38 0 642.769442 645.769442
39 0 644.821419 647.821419
40 0 715.585410 718.585410
41 0 717.894039 720.894039
42 0 718.313190 721.313190
43 0 728.203924 731.203924
44 0 778.048312 781.048312
45 0 779.882461 782.882461
46 0 795.866640 798.866640
47 0 796.353947 799.353947
48 0 814.847563 817.847563
49 0 821.248096 824.248096
50 0 837.484186 840.484186
51 0 854.602601 857.602601
52 0 871.130977 874.130977
53 0 880.082869 883.082869

5526 rows × 3 columns

In [24]:
selections_test
Out[24]:
label start end
filename sel_id
NOPP6_EST_20090329_084500.wav 0 1 890.038230 893.038230
NOPP6_EST_20090329_090000.wav 0 1 52.295923 55.295923
1 1 41.521524 44.521524
2 1 97.109988 100.109988
3 1 115.174586 118.174586
4 1 287.769407 290.769407
5 1 292.730881 295.730881
6 1 297.404447 300.404447
7 1 323.224849 326.224849
8 1 357.546480 360.546480
9 1 401.238546 404.238546
10 1 389.953811 392.953811
11 1 434.636543 437.636543
12 1 439.656984 442.656984
13 1 492.309440 495.309440
14 1 513.252469 516.252469
15 1 550.535524 553.535524
16 1 567.700942 570.700942
17 1 603.275188 606.275188
18 1 732.721823 735.721823
19 1 855.597277 858.597277
20 1 862.823188 865.823188
21 1 896.999647 899.999647
NOPP6_EST_20090329_091500.wav 0 1 3.078656 6.078656
1 1 13.938433 16.938433
2 1 22.542530 25.542530
3 1 33.216794 36.216794
4 1 41.032084 44.032084
5 1 70.538442 73.538442
6 1 120.568689 123.568689
... ... ... ... ...
NOPP6_EST_20090329_124500.wav 21 0 665.364745 668.364745
22 0 749.495614 752.495614
23 0 750.234964 753.234964
24 0 755.385768 758.385768
25 0 772.542144 775.542144
26 0 781.664112 784.664112
27 0 794.467950 797.467950
28 0 801.581981 804.581981
29 0 861.975521 864.975521
30 0 876.015338 879.015338
31 0 892.538924 895.538924
NOPP6_EST_20090329_130000.wav 0 0 24.219547 27.219547
1 0 66.293456 69.293456
2 0 84.418910 87.418910
3 0 85.629556 88.629556
4 0 118.369322 121.369322
5 0 256.307407 259.307407
6 0 278.779690 281.779690
7 0 359.376189 362.376189
8 0 430.358637 433.358637
9 0 435.174061 438.174061
10 0 462.063592 465.063592
11 0 672.391735 675.391735
12 0 673.524707 676.524707
13 0 780.381262 783.381262
14 0 786.747187 789.747187
15 0 826.441672 829.441672
16 0 847.694462 850.694462
17 0 863.730120 866.730120
18 0 896.926679 899.926679

1000 rows × 3 columns

At this point, we have defined which audio segments we want in our database: a little over 5500 in the training dataset, 50% containing upcalls and the remaing not, and 1000 for the test set, maintaining the same ratio.

Now we need to decide how these segments will be represented.

7. Choosing the spectrogram settings

As mentioned earlier, we'll represent the segments as spectrograms. In the .zip file where you found the data, there's also a spectrogram configuration file (spec_config.json) which contains the settings we want to use.

This configuration file is simply a text file in the .json format, so you could make a copy of it, change a few parameters and save several settings to use later or to share the with someone else.

In [24]:
spec_cfg = load_audio_representation('spec_config.json', name="spectrogram")
In [25]:
spec_cfg
Out[25]:
{'type': 'MagSpectrogram',
 'rate': 1000,
 'window': 0.256,
 'step': 0.032,
 'freq_min': 0,
 'freq_max': 500,
 'window_func': 'hamming'}

The result is a python dictionary. We could change some value, like the step size:

In [26]:
#spec_cfg['step'] = 0.064

But we will stick to the original here.

8. Creating the database

Now we have to compute the spectrograms following the settings above for each selection in our selection tables and then save them in a database.

All of this can be done with the dbi.create_database function in Ketos.

We will start with the training dataset. We need to indicate the name for the database we want to create, where the audio files are, a name for the dataset, the selections table and, finally the audio representation. As specified in our spec_cfg, this is a Magnitude spectrogram, but ketos can also create databases with Power, Mel and CQT spectrograms, as well as time-domain data (waveforms).

In [27]:
dbi.create_database(output_file='database.h5', data_dir='data/train',
                               dataset_name='train',selections=selections_train,
                               audio_repres=spec_cfg)
                              
100%|██████████| 5526/5526 [00:45<00:00, 122.50it/s]
5526 items saved to database.h5

And we do the same thing for the test set. Note that, by specifying the same database name, we are telling ketos that we want to add the test set to the existing database.

In [28]:
dbi.create_database(output_file='database.h5', data_dir='data/test',
                               dataset_name='test',selections=selections_test,
                               audio_repres=spec_cfg)
                              
100%|██████████| 1000/1000 [00:08<00:00, 120.57it/s]
1000 items saved to database.h5

Now we have our database with spectrograms representing audio segments with and without the North Atlantic Right Whale upcall. The data is divided into 'train' and 'test'.

In [29]:
db = dbi.open_file("database.h5", 'r')
In [30]:
db
Out[30]:
File(filename=database.h5, title='', mode='r', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) ''
/test (Group) ''
/test/data (Table(1000,), fletcher32, shuffle, zlib(1)) ''
  description := {
  "data": Float32Col(shape=(94, 129), dflt=0.0, pos=0),
  "filename": StringCol(itemsize=100, shape=(), dflt=b'', pos=1),
  "id": UInt32Col(shape=(), dflt=0, pos=2),
  "label": UInt8Col(shape=(), dflt=0, pos=3),
  "offset": Float64Col(shape=(), dflt=0.0, pos=4)}
  byteorder := 'little'
  chunkshape := (5,)
/train (Group) ''
/train/data (Table(5526,), fletcher32, shuffle, zlib(1)) ''
  description := {
  "data": Float32Col(shape=(94, 129), dflt=0.0, pos=0),
  "filename": StringCol(itemsize=100, shape=(), dflt=b'', pos=1),
  "id": UInt32Col(shape=(), dflt=0, pos=2),
  "label": UInt8Col(shape=(), dflt=0, pos=3),
  "offset": Float64Col(shape=(), dflt=0.0, pos=4)}
  byteorder := 'little'
  chunkshape := (5,)

Here we can see the data divided into 'train' and 'test' These are called 'groups' in HDF5 terms. Within each of them there's a dataset called 'data', which contains the spectrograms and respective labels.

In [31]:
db.close() #close the database connection

You will likely not need to directly interact with the database. In a following tutorial, we will use Ketos to build a deep neural network and train it to recognize upcalls. Ketos handles the database interactions, so we won't really have to go into the details of it, but if you would like to learn more about how to get data from this database, take a look at the database_interface module in ketos and the pyTables documentation.