Database Interface

‘data_handling.database_interface’ module within the ketos library

This module provides functions to create and use HDF5 databases as storage for acoustic data including metadata and annotations.

An audio segment or spectrogram is said to be ‘weakly annotated’, if it is assigned a single (integer) label, and is said to be ‘strongly annotated’, if it is assigned one or several labels, each accompanied by a start and end time, and potentially also a minimum and maximum frequecy.

class ketos.data_handling.database_interface.AudioWriter(output_file, max_size=1000000000.0, verbose=False, mode='w', discard_wrong_shape=False, allow_resizing=1, include_source=True, include_label=True, max_filename_len=100, data_name=None)[source]

Bases: object

Saves waveform or spectrogram objects to a database file (*.h5).

If the combined size of the saved data exceeds max_size (1 GB by default), the output database file will be split into several files, with _000, _001, etc, appended to the filename.

Args:
output_file: str

Full path to output database file (*.h5)

max_size: int

Maximum size of output database file in bytes. If file exceeds this size, it will be split up into several files with _000, _001, etc, appended to the filename. The default values is max_size=1E9 (1 Gbyte). If None, no restriction is imposed on the file size (i.e. the file is never split).

verbose: bool

Print relevant information during execution such as no. of files written to disk

discard_wrong_shape: bool

Discard objects that do not have the same shape as previously saved objects. Default is False.

allow_resizing: int

If the object shape differs from previously saved objects, the object will be resized using the resize method of the scikit-image package, provided the mismatch is no greater than allow_resizing in either dimension.

include_source: bool

If True, the name of the wav file from which the waveform or spectrogram was generated and the offset within that file, is saved to the table. Default is True.

max_filename_len: int

Maximum allowed length of filename. Only used if include_source is True.

data_name: str or list(str)

Name(s) of the data columns. If None is specified, the data column is named ‘data’, or ‘data0’, ‘data1’, … if the table contains multiple data columns.

Attributes:
base: str

Output filename base

ext: str

Output filename extension (*.h5)

file: tables.File

Database file

file_counter: int

Keeps track of how many files have been written to disk

item_counter: int

Keeps track of how many audio objects have been written to files

path: str

Path to table within database filesystem

name: str

Name of table

max_size: int

Maximum size of output database file in bytes If file exceeds this size, it will be split up into several files with _000, _001, etc, appended to the filename. The default values is max_size=1E9 (1 Gbyte). Disabled if writing in ‘append’ mode.

verbose: bool

Print relevant information during execution such as files written to disk

mode: str
The mode to open the file. It can be one of the following:

’r’: Read-only; no data can be modified. ’w’: Write; a new file is created (an existing file with the same name would be deleted). ’a’: Append; an existing file is opened for reading and writing, and if the file does not exist it is created. ’r+’: It is similar to ‘a’, but the file must already exist.

discard_wrong_shape: bool

Discard objects that do not have the same shape as previously saved objects. Default is False.

allow_resizing: int

If the object shape differs from previously saved objects, the object will be resized using the resize method of the scikit-image package, provided the mismatch is no greater than allow_resizing in either dimension.

num_ignore: int

Number of ignored objects

data_shape: tuple

Data shape

include_source: bool

If True, the name of the wav file from which the waveform or spectrogram was generated and the offset within that file, is saved to the table. Default is True.

include_label: bool

Include integer label column in data table. Only relevant for weakly annotated samples. Default is True.

filename_len: int

Maximum allowed length of filename. Only used if include_source is True.

data_name: str or list(str)

Name(s) of the data columns. If None is specified, the data column is named ‘data’, or ‘data0’, ‘data1’, … if the table contains multiple data columns.

close(final=True)[source]

Close the currently open database file, if any

Args:
final: bool

If True, this instance of AudioWriter will not be able to save more spectrograms to file

set_table(path, name)[source]

Change the current table

Args:
path: str

Path to the group containing the table

name: str

Name of the table

write(x, path=None, name=None)[source]

Write waveform or spectrogram object to a table in the database file

If path and name are not specified, the object will be saved to the current directory (as set with the cd() method).

Args:
x: instance of BaseAudio or list

Object(s) to be saved

path: str

Path to the group containing the table

name: str

Name of the table

ketos.data_handling.database_interface.create_database(output_file, data_dir, selections, channel=0, audio_repres={'type': 'Waveform'}, annotations=None, dataset_name=None, max_size=None, verbose=True, progress_bar=True, discard_wrong_shape=False, allow_resizing=1, include_source=True, include_label=True, data_name=None)[source]

Create a database from a selection table.

Note that all selections must have the same duration. This is necessary to ensure that all the objects stored in the database have the same dimension.

If each entry in the selection table can have multiple annotations, these can be specified with the ‘annotations’ argument. On the other hand, if each entry in the selection table is chacterized by a single, integer label, these should be included as a column named ‘label’ in the selection table.

If ‘dataset_name’ is not specified, the name of the folder containing the audio files (‘data_dir’) will be used.

Args:
output_file:str

The name of the HDF5 file in which the data will be stored. Can include the path (e.g.:’/home/user/data/database_abc.h5’). If the file does not exist, it will be created. If the file already exists, new data will be appended to it.

data_dir:str

Path to folder containing *.wav files.

selections: pandas DataFrame

Selection table

channel: int

For stereo recordings, this can be used to select which channel to read from

audio_repres: dict or list(dict)

A dictionary containing the parameters used to generate the spectrogram or waveform segments. See :class:~ketos.audio.auio_loader.AudioLoader for details on the required and optional fields for each type of signal. It is also possible to specify multiple audio representations as a list.

annotations: pandas DataFrame

Annotation table. Optional.

dataset_name:str

Name of the node (HDF5 group) within the database (e.g.: ‘train’) Under this node, two datasets will be created: ‘data’ and ‘data_annot’, containing the data (spectrograms or waveforms) and the annotations for each entry in the selections_table.

max_size: int

Maximum size of output database file in bytes. If file exceeds this size, it will be split up into several files with _000, _001, etc, appended to the filename. The default values is max_size=1E9 (1 Gbyte). If None, no restriction is imposed on the file size (i.e. the file is never split).

verbose: bool

Print relevant information during execution such as no. of files written to disk

progress_bar: bool

Show progress bar.

discard_wrong_shape: bool

Discard objects that do not have the same shape as previously saved objects. Default is False.

allow_resizing: int

If the object shape differs from previously saved objects, the object will be resized using the resize method of the scikit-image package, provided the mismatch is no greater than allow_resizing in either dimension.

include_source: bool

If True, the name of the wav file from which the waveform or spectrogram was generated and the offset within that file, is saved to the table. Default is True.

include_label: bool

Include integer label column in data table. Only relevant for weakly annotated samples. Default is True.

data_name: str or list(str)

Name(s) of the data columns. If None is specified, the data column is named ‘data’, or ‘data0’, ‘data1’, … if the table contains multiple data columns.

ketos.data_handling.database_interface.create_table(h5file, path, name, description, data_name='data', chunkshape=None, verbose=False)[source]

Create a new table.

If the table already exists, open it.

Args:
h5file: tables.file.File object

HDF5 file handler.

path: str

The group where the table will be located. Ex: ‘/features/spectrograms’

name: str

The name of the table.

table_description: class (tables.IsDescription)

The class describing the table structure.

data_name: str or list(str)

Name(s) of the table column(s) used to store the data array(s).

chunkshape: tuple

The chunk shape to be used for compression

Returns:
table: table.Table object

The created/open table.

Examples:
>>> import tables
>>> from ketos.data_handling.database_interface import open_file, table_description, create_table
>>> # Open a connection to the database
>>> h5file = open_file("ketos/tests/assets/tmp/database1.h5", 'w')
>>> # Create table descriptions for weakly labeled spectrograms with shape (32,64)
>>> descr = table_description((32,64), include_label=False)
>>> # Create 'table_data' within 'group1'
>>> my_table = create_table(h5file, "/group1/", "table_data", descr) 
>>> # Show the table description, with the field names (columns)
>>> # and information about types and shapes
>>> my_table
/group1/table_data (Table(0,), fletcher32, shuffle, zlib(1)) ''
  description := {
  "data": Float32Col(shape=(32, 64), dflt=0.0, pos=0),
  "filename": StringCol(itemsize=100, shape=(), dflt=b'', pos=1),
  "id": UInt32Col(shape=(), dflt=0, pos=2),
  "offset": Float64Col(shape=(), dflt=0.0, pos=3)}
  byteorder := 'little'
  chunkshape := (15,)
>>> # Close the HDF5 database file
>>> h5file.close()            
ketos.data_handling.database_interface.filter_by_label(table, label)[source]

Find all audio objects in the table with the specified label.

Args:
table: tables.Table

The table containing the annotations

label: int or list of ints

The labels to be searched

Raises:

TypeError: if label is not an int or list of ints.

Returns:
indices: list(int)

Indices of the audio objects with the specified label(s). If there are no objects that match the label, returs an empty list.

Examples:
>>> from ketos.data_handling.database_interface import open_file, open_table
>>>
>>> # Open a database and an existing table
>>> h5file = open_file("ketos/tests/assets/11x_same_spec.h5", 'r')
>>> table = open_table(h5file, "/group_1/table_annot")
>>>
>>> # Retrieve the indices for all spectrograms that contain the label 1
>>> # (all spectrograms in this table)
>>> filter_by_label(table, 2)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>
>>> # Since none of the spectrograms in the table include the label 4, 
>>> # an empty list is returned
>>> filter_by_label(table, 4)
[]
>>> h5file.close()
ketos.data_handling.database_interface.load_audio(table, indices=None, table_annot=None, stack=False)[source]

Retrieve all the audio objects in a table or a subset specified by the index_list

Warnings: Loading all objects in a table might cause memory problems.

Args:
table: tables.Table

The table containing the audio objects

indices: list of ints or None

A list with the indices of the audio objects that will be retrieved. If set to None, loads all objects in the table.

table_annot: tables.Table

The table containing the annotations. If no such table is provided, the audio objects are still loaded, but without annotations.

stack: bool

Stack the audio objects into a single object

Returns:
audio_objs: list or instance of Waveform, MagSpectrogram, PowerSpectrogram, MelSpectrogram, CQTSpectrogram

Audio objects, or numpy array

Examples:
>>> from ketos.data_handling.database_interface import open_file, open_table, load_audio
>>> # Open a connection to the database.
>>> h5file = open_file("ketos/tests/assets/11x_same_spec.h5", 'r')
>>> # Open the tables in group_1
>>> tbl_data = open_table(h5file,"/group_1/table_data")
>>> tbl_annot = open_table(h5file,"/group_1/table_annot")    
>>> # Load the spectrograms stored on rows 0, 3 and 10, including their annotations
>>> selected_specs = load_audio(table=tbl_data, table_annot=tbl_annot, indices=[0,3,10])
>>> # The resulting list has the 3 spectrogram objects
>>> len(selected_specs)
3
>>> type(selected_specs[0])
<class 'ketos.audio.spectrogram.MagSpectrogram'>
>>>
>>> h5file.close()
ketos.data_handling.database_interface.open_file(path, mode)[source]

Open an HDF5 database file.

Wrapper function around tables.open_file: https://www.pytables.org/usersguide/libref/top_level.html

Args:
path: str

The file’s full path.

mode: str
The mode to open the file. It can be one of the following:
  • ’r’: Read-only; no data can be modified.

  • ’w’: Write; a new file is created (an existing file with the same name would be deleted).

  • ’a’: Append; an existing file is opened for reading and writing, and if the file does not exist it is created.

  • ’r+’: It is similar to ‘a’, but the file must already exist.

Returns:
: table.File object

The h5file.

ketos.data_handling.database_interface.open_table(h5file, table_path)[source]

Open a table from an HDF5 file.

Args:
h5file: tables.file.File object

HDF5 file handler.

table_path: str

The table’s full path.

Raises:

NoSuchNodeError if table does not exist.

Returns:
table: table.Table object or None

The table, if it exists. Otherwise, raises an exeption and returns None.

Examples:
>>> from ketos.data_handling.database_interface import open_file, open_table
>>> h5file = open_file("ketos/tests/assets/15x_same_spec.h5", 'r')
>>> data = open_table(h5file, "/train/species1")
>>> #data is a pytables 'Table' object
>>> type(data)
<class 'tables.table.Table'>
>>> # with 15 items (rows)
>>> data.nrows
15
>>> h5file.close()       
ketos.data_handling.database_interface.table_description(data_shape, data_name=None, include_label=True, include_source=True, filename_len=100, return_data_name=False)[source]

Description of table structure for storing audio signals or spectrograms.

Args:
data_shape: tuple (ints) or numpy array or :class:`audio.base_audio.BaseAudio’ or list

The shape of the waveform or spectrogram to be stored in the table. If a numpy array is provided, the shape is deduced from this array. If an instance of BaseAudio is provided, the shape is deduced from the data attribute. It is also possible to specify a list of data shapes, in which case the table will have multiple data columns.

data_name: str or list(str)

Name(s) of the data columns. If None is specified, the data column is named ‘data’, or ‘data0’, ‘data1’, … if the table contains multiple data columns.

include_label: bool

Include integer label column. Default is True.

include_source: bool

If True, the name of the wav file from which the audio signal or spectrogram was generated and the placement within that file, is saved to the table. Default is True.

filename_len: int

Maximum allowed length of filename. Only used if include_source is True.

return_data_name: bool

Return the names of the columns used to store the data arrays.

Returns:
TableDescription: class (tables.IsDescription)

The class describing the table structure.

data_name: list(str)

The names of the columns used to store the data arrays. Only returned if return_data_name=True.

Examples:
>>> import numpy as np
>>> from ketos.data_handling.database_interface import table_description
>>> 
>>> #Create a 64 x 20 image
>>> spec = np.random.random_sample((64,20))
>>>
>>> #Create a table description for weakly labeled spectrograms of this shape
>>> descr = table_description(spec)
>>>
>>> #Inspect the table structure
>>> cols = descr.columns
>>> for key in sorted(cols.keys()):
...     print("%s: %s" % (key, cols[key]))
data: Float32Col(shape=(64, 20), dflt=0.0, pos=None)
filename: StringCol(itemsize=100, shape=(), dflt=b'', pos=None)
id: UInt32Col(shape=(), dflt=0, pos=None)
label: UInt8Col(shape=(), dflt=0, pos=None)
offset: Float64Col(shape=(), dflt=0.0, pos=None)
>>>
>>> #Create a table description for strong annotations
>>> descr_annot =  table_description_annot()
>>>
>>> #Inspect the annotation table structure
>>> cols = descr_annot.columns
>>> for key in sorted(cols.keys()):
...     print("%s: %s" % (key, cols[key]))
data_index: UInt32Col(shape=(), dflt=0, pos=None)
end: Float64Col(shape=(), dflt=0.0, pos=None)
label: UInt8Col(shape=(), dflt=0, pos=None)
start: Float64Col(shape=(), dflt=0.0, pos=None)
ketos.data_handling.database_interface.table_description_annot(freq_range=False)[source]

Table descriptions for strong annotations.

Args:
freq_range: bool

Set to True, if your annotations include frequency range. Otherwise, set to False (default). Only used for strong annotations.

Returns:
TableDescription: class (tables.IsDescription)

The class describing the table structure.

ketos.data_handling.database_interface.write(x, table, table_annot=None, id=None)[source]

Write waveform or spectrogram and annotations to HDF5 tables.

Note: If the id argument is not specified, the row number will will be used as a unique identifier for the spectrogram.

When multiple audio objects are provided, only the filename, offset, label, and annotations of the first object is written to the table.

Args:
x: instance of :class:`audio.waveform.Waveform’,

:class:`audio.spectrogram.MagSpectrogram’, :class:`audio.spectrogram.PowerSpectrogram’, :class:`audio.spectrogram.MelSpectrogram’, :class:`audio.spectrogram.CQTSpectrogram’, numpy.ndarray The audio object to be stored in the table. It is also possible to specify a list of audio objects. The number of objects must match the number of data columns in the table.

table: tables.Table

Table in which the audio data will be stored. (described by table_description()).

table_annot: tables.Table

Table in which the annotations will be stored. (described by table_description_weak_annot() or table_description_strong_annot()).

id: int

Audio object unique identifier. Optional.

Returns:

None.

Examples:
>>> import tables
>>> from ketos.data_handling.database_interface import open_file, create_table, table_description, table_description_annot, write
>>> from ketos.audio.spectrogram import MagSpectrogram
>>> from ketos.audio.waveform import Waveform
>>>
>>> # Create an Waveform object from a .wav file
>>> audio = Waveform.from_wav('ketos/tests/assets/2min.wav')
>>> # Use that signal to create a spectrogram
>>> spec = MagSpectrogram.from_waveform(audio, window=0.2, step=0.05)
>>> # Add a single annotation
>>> spec.annotate(label=1, start=0., end=2.)
>>>
>>> # Open a connection to a new HDF5 database file
>>> h5file = open_file("ketos/tests/assets/tmp/database2.h5", 'w')
>>> # Create table descriptions for storing the spectrogram data
>>> descr_data = table_description(spec)
>>> descr_annot = table_description_annot()
>>> # Create tables
>>> tbl_data = create_table(h5file, "/group1/", "table_data", descr_data) 
>>> tbl_annot = create_table(h5file, "/group1/", "table_annot", descr_annot) 
>>> # Write spectrogram and its annotation to the tables
>>> write(spec, tbl_data, tbl_annot)
>>> # flush memory to ensure data is put in the tables
>>> tbl_data.flush()
>>> tbl_annot.flush()
>>>
>>> # Check that the spectrogram data have been saved 
>>> tbl_data.nrows
1
>>> tbl_annot.nrows
1
>>> # Check annotation data
>>> tbl_annot[0]['label']
1
>>> tbl_annot[0]['start']
0.0
>>> tbl_annot[0]['end']
2.0
>>> # Check audio source data
>>> tbl_data[0]['filename'].decode()
'2min.wav'
>>> h5file.close()
ketos.data_handling.database_interface.write_annot(table, data_index, annots)[source]

Write annotations to a HDF5 table.

Args:
table: tables.Table

Table in which the annotations will be stored. (described by table_description()).

data_index: int

Audio object unique identifier.

annots: pandas DataFrame

Annotations

Returns:

None.

ketos.data_handling.database_interface.write_attrs(table, x)[source]

Writes the spectrogram attributes into the HDF5 table.

The attributes include,

  • Time resolution in seconds (time_res)

  • Minimum frequency in Hz (freq_min)

  • Spectrogram type (type)

  • Frequency resolution in Hz (freq_res) or, in the case of CQT spectrograms, the number of bins per octave (bins_per_octave).

Args:
table: tables.Table

Table in which the spectrogram will be stored (described by spec_description()).

x: instance of :class:`spectrogram.MagSpectrogram’, :class:`spectrogram.PowerSpectrogram’, :class:`spectrogram.MelSpectrogram’, :class:`spectrogram.CQTSpectrogram’, numpy.array

The audio object to be stored in the table.

Returns:

None.

ketos.data_handling.database_interface.write_audio(table, data, filename=None, offset=0, label=None, id=None)[source]

Write an audio object, typically a waveform or spectrogram, to a HDF5 table.

Args:
table: tables.Table

Table in which the audio data will be stored. (described by table_description()).

data: numpy.array or list(numpy.array)

Audio data array(s). The number of data arrays must match the number of data columns in the table.

filename: str

Filename

offset: float

Offset with respect to beginning of file in seconds.

label: int

Integer valued label. Optional

id: int

Unique identifier. Optional

Returns:
index: int

Index of row that the audio object was saved to.