AudioLoader

class ketos.audio.audio_loader.AudioLoader(selection_gen, channel=0, annotations=None, representation=<class 'ketos.audio.waveform.Waveform'>, representation_params=None, batch_size=1, stop=True, **kwargs)[source]

Class for loading segments of audio data from .wav files.

Several representations of the audio data are possible, including waveform, magnitude spectrogram, power spectrogram, mel spectrogram, and CQT spectrogram.

Args:
selection_gen: SelectionGenerator

Selection generator

channel: int

For stereo recordings, this can be used to select which channel to read from

annotations: pandas DataFrame

Annotation table

representation: class or list of classes

Audio data representation. This is a class that must receive the raw audio data and will transform the data into the specified audio representation object.

Classes available in ketos:

  • Waveform:

    (rate), (resample_method)

  • MagSpectrogram, PowerSpectrogram, MelSpectrogram:

    audio, window, step, (window_func), (rate), (resample_method)

  • CQTSpectrogram:

    audio, step, bins_per_oct, (freq_min), (freq_max), (window_func), (rate), (resample_method)

It is also possible to specify multiple audio presentations as a list.

representation_params: dict or list of dict

Dictionary containing any required and optional arguments for the representation class. If more than one representation is given representation_params must be a list of the same length and in the same order.

batch_size: int

Load segments in batches rather than one at the time.

stop: bool

Raise StopIteration when all selections have been loaded. Default is True.

Examples:

Creating an AudioLoader to load selections:

>>> from ketos.audio.audio_loader import AudioLoader, SelectionTableIterator
>>> from ketos.data_handling.selection_table import use_multi_indexing
>>> import pandas as pd
>>> # Load the audio representation you want to pass
>>> from ketos.audio.spectrogram import MagSpectrogram
>>> # specify the audio representation
>>> rep = {'window':0.2, 'step':0.02, 'window_func':'hamming'}
>>> # Load selections
>>> sel = pd.DataFrame({'filename':["2min.wav", "2min.wav"],'start':[0.10,0.12],'end':[0.46,0.42]})
>>> sel = use_multi_indexing(sel, 'sel_id')
>>> # create a generator for iterating over all the selections 
>>> generator = SelectionTableIterator(data_dir="ketos/tests/assets/", selection_table=sel)
>>> # Create a loader by passing the generator and the representation to the AudioLoader
>>> loader = AudioLoader(selection_gen=generator, representation=MagSpectrogram, representation_params=rep)
>>> # print number of segments
>>> print(loader.num())
2
>>> # load and plot the first selection
>>> spec = next(loader)
>>>
>>> import matplotlib.pyplot as plt
>>> fig = spec.plot()
>>> fig.savefig("ketos/tests/assets/tmp/spec_loader_2min_0.png")
>>> plt.close(fig)
../_images/spec_loader_2min_0.png

Creating an AudioLoader to load selections made from annotations:

>>> from ketos.audio.audio_loader import AudioLoader, SelectionTableIterator
>>> from ketos.data_handling.selection_table import standardize
>>> import pandas as pd
>>> # Load the audio representation you want to pass
>>> from ketos.audio.spectrogram import MagSpectrogram
>>> # specify the audio representation
>>> rep = {'window':0.2, 'step':0.02, 'window_func':'hamming'}
>>> # Load selections
>>> annot = pd.DataFrame([{"filename":"2min.wav", "start":2.0, "end":3.0, "label":0},
...         {"filename":"2min.wav", "start":5.0, "end":6.0, "label":0},
...         {"filename":"2min.wav", "start":21.0, "end":22.0, "label":0},
...         {"filename":"2min.wav", "start":25.0, "end":27.0, "label":0}])
>>> annot_std = standardize(table=annot)
>>> # create a generator for iterating over all the selections 
>>> generator = SelectionTableIterator(data_dir="ketos/tests/assets/", selection_table=annot_std)
>>> # Create a loader by passing the generator and the representation to the AudioLoader
>>> loader = AudioLoader(selection_gen=generator, representation=MagSpectrogram, representation_params=rep)
>>> # print number of segments
>>> print(loader.num())
4
>>> # load and plot the first selection
>>> spec = next(loader)
>>>
>>> import matplotlib.pyplot as plt
>>> fig = spec.plot()
>>> fig.savefig("ketos/tests/assets/tmp/spec_loader_2min_1.png")
>>> plt.close(fig)
../_images/spec_loader_2min_1.png

For more examples see child class audio.audio_loader.AudioFrameLoader

Methods

load(data_dir, filename[, offset, duration, ...])

Load audio segment for specified file and time.

num()

Returns total number of segments.

reset()

Resets the audio loader to the beginning.

skip()

Skip to the next audio segment or batch of audio segments without loading the current one.

load(data_dir, filename, offset=0, duration=None, label=None, **kwargs)[source]

Load audio segment for specified file and time.

Args:
data_dir: str

Data directory

filename: str

Filename or relative path

offset: float

Start time of the segment in seconds, measured from the beginning of the file.

duration: float

Duration of segment in seconds.

label: int

Integer label

Returns:
seg: BaseAudio or list(BaseAudio)

Audio segment

num()[source]

Returns total number of segments.

Returns:
: int

Total number of segments.

reset()[source]

Resets the audio loader to the beginning.

skip()[source]

Skip to the next audio segment or batch of audio segments without loading the current one.