Overview

The data handling modules provide high-level interfaces for storing audio samples in databases along with relevant metadata and annotations, and for retrieving stored data for efficient ingestion into neural networks. Ketos uses the HDF5 database format, a file format designed to store and organize large amounts of data which is widely used in scientific computing. The data handling modules also provide high-level functionalities for working with annotation data and selection tables.

Annotation and Selection Tables

The Selection Table module provides functions for manipulating annotation tables and creating selection tables. The tables are saved in .csv format and loaded into memory as pandas DataFrames.

A Ketos annotation table always has the column ‘label’. For call-level annotations, the table also contains the columns ‘start’ and ‘end’, giving the start and end time of the call measured in seconds since the beginning of the file. The table may also contain the columns ‘freq_min’ and ‘freq_max’, giving the minimum and maximum frequencies of the call in Hz, but this is not required. The user may add any number of additional columns. Note that the table uses two levels of indices, the first index being the filename and the second index an annotation identifier.

Here is a minimum example:

                    label
filename  annot_id
file1.wav 0             2
          1             1
          2             2
file2.wav 0             2
          1             2
          2             1

And here is a table with call-level annotations and a few extra columns:

                     start   end  label  min_freq  max_freq        file_time_stamp
filename  annot_id
file1.wav 0           7.0    8.1      2     180.6     294.3    2019-02-24 13:15:00
          1           8.5   12.5      1     174.2     258.7    2019-02-24 13:15:00
          2          13.1   14.0      2     183.4     292.3    2019-02-24 13:15:00
file2.wav 0           2.2    3.1      2     148.8     286.6    2019-02-24 13:30:00
          1           5.8    6.8      2     156.6     278.3    2019-02-24 13:30:00
          2           9.0   13.0      1     178.2     304.5    2019-02-24 13:30:00

Selection tables look similar to annotation tables, except that they are not required to have ‘label’ column. Instead, they typically only have the columns ‘start’ and ‘end’, supplemented by a selection index and a filename index.

When working with annotation tables, the first step is typically to standardize the table format to match the format expected by Ketos. For example, given the annotation table:

>>> import pandas as pd
>>> annot = pd.read_csv('annotations.csv')
>>> annot
      source  start_time  stop_time       species           time_stamp
0  file1.wav         7.0        8.1      humpback  2019-02-24 13:15:00
1  file1.wav         8.5       12.5  killer whale  2019-02-24 13:15:00
2  file2.wav         2.2        3.1  killer whale  2019-02-24 13:30:00
3  file2.wav         5.8        6.8          boat  2019-02-24 13:30:00
4  file2.wav         9.0       13.0      humpback  2019-02-24 13:30:00

we apply the standardize() method to obtain:

>>> from ketos.data_handling.selection_table import standardize
>>> annot_std, label_dict = standardize(annot, mapper={'source':'filename', 'start_time':'start', 'stop_time':'end', 'species':'label'}, return_label_dict=True)
>>> label_dict
{'boat': 1, 'humpback': 2, 'killer whale': 3}
>>> annot_std
                    start   end  label           time_stamp
filename  annot_id
file1.wav 0           7.0   8.1      2  2019-02-24 13:15:00
          1           8.5  12.5      3  2019-02-24 13:15:00
file2.wav 0           2.2   3.1      3  2019-02-24 13:30:00
          1           5.8   6.8      1  2019-02-24 13:30:00
          2           9.0  13.0      2  2019-02-24 13:30:00

Having transformed the annotation table to the standard Ketos format, we can now use it to create a selection table. The Selection Table module provides a few methods for this task such as select(), select_by_segmenting(), and create_rndm_backgr_selections(). Here, we will demonstrate a simple use case of the select() method:

>>> from ketos.data_handling.selection_table import select
>>> st = select(df_std, length=6.0, center=True) #create 6-s wide selection windows, centered on each annotation
>>> st
                  label           time_stamp  start    end
filename  sel_id
file1.wav 0           2  2019-02-24 13:15:00   4.55  10.55
          1           3  2019-02-24 13:15:00   7.50  13.50
file2.wav 0           3  2019-02-24 13:30:00  -0.35   5.65
          1           1  2019-02-24 13:30:00   3.30   9.30
          2           2  2019-02-24 13:30:00   8.00  14.00

Based on this selection table, one can create a database of sound clips using create_database(), as discussed below.

The Selection Table module provides several other useful methods, e.g., for querying annotation tables. See the documentation of the Selection Table module for more information.

Database Interface

The Database Interface module provides high-level functions for managing audio data stored in the HDF5 databases. For the implementation of these functionalities, we rely extensively on the PyTables package.

The AudioWriter class provides a convenient interface for saving Ketos audio objects such Waveform or Spectrogram to a database,:

>>> from ketos.data_handling.database_interface import AudioWriter
>>> aw = AudioWriter('db.h5') #create an audio writer instance
>>> from ketos.audio.spectrogram import MagSpectrogram
>>> spec = MagSpectrogram.from_wav('sound.wav', window=0.2, step=0.01) #load a spectrogram
>>> aw.write(spec) #save the spectrogram to the database (by default, the spectrogram is stored under /audio)
>>> aw.close() #close the database file

The spectrogram is saved along with relevant metadata such as the filename, the window and step sizes used, etc. Any annotations associated with the spectrogram are also saved.

The spectrogram can be loaded back into memory as follows,:

>>> import ketos.data_handling.database_interface as dbi
>>> fil = dbi.open_file('db.h5', 'r')
>>> tbl = dbi.open_table(fil, '/audio')
>>> spec = load_audio(tbl)[0]

The Database Interface module provides several other useful methods, including create_database() for creating a database of audio samples directly from a set of .wav files.

See the documentation of the Database Interface module for more information.

Data Feeding

The ketos.data_handling.data_feeding.BatchGenerator class provides a high-level interface for loading waveform and spectrogram objects stored in the Ketos HDF5 database format and feeding them in batches to a machine learning model. See the class documentation for more information.