AudioLoader

class ketos.audio.audio_loader.AudioLoader(selection_gen, channel=0, annotations=None, representation=<class 'ketos.audio.waveform.Waveform'>, representation_params=None, batch_size=1, stop=True, **kwargs)[source]

Class for loading segments of audio data from .wav files.

Several representations of the audio data are possible, including waveform, magnitude spectrogram, power spectrogram, mel spectrogram, and CQT spectrogram.

Args:

selection_gen: SelectionGenerator

Selection generator

channel: int

For stereo recordings, this can be used to select which channel to read from

annotations: pandas DataFrame

Annotation table

representation: class or list of classes

Audio data representation. This is a class that must receive the raw audio data and will transform the data into the specified audio representation object.

Classes available in ketos:

Waveform:
(rate), (resample_method)

MagSpectrogram, PowerSpectrogram, MelSpectrogram:
audio, window, step, (window_func), (rate), (resample_method)

CQTSpectrogram:
audio, step, bins_per_oct, (freq_min), (freq_max), (window_func), (rate), (resample_method)

It is also possible to specify multiple audio presentations as a list.

representation_params: dict or list of dict

Dictionary containing any required and optional arguments for the representation class. If more than one representation is given representation_params must be a list of the same length and in the same order.

batch_size: int

Load segments in batches rather than one at the time.

stop: bool

Raise StopIteration when all selections have been loaded. Default is True.

Examples:

Creating an AudioLoader to load selections:

>>> from ketos.audio.audio_loader import AudioLoader, SelectionTableIterator
>>> from ketos.data_handling.selection_table import use_multi_indexing
>>> import pandas as pd
>>> # Load the audio representation you want to pass
>>> from ketos.audio.spectrogram import MagSpectrogram
>>> # specify the audio representation
>>> rep = {'window':0.2, 'step':0.02, 'window_func':'hamming'}
>>> # Load selections
>>> sel = pd.DataFrame({'filename':["2min.wav", "2min.wav"],'start':[0.10,0.12],'end':[0.46,0.42]})
>>> sel = use_multi_indexing(sel, 'sel_id')
>>> # create a generator for iterating over all the selections 
>>> generator = SelectionTableIterator(data_dir="ketos/tests/assets/", selection_table=sel)
>>> # Create a loader by passing the generator and the representation to the AudioLoader
>>> loader = AudioLoader(selection_gen=generator, representation=MagSpectrogram, representation_params=rep)
>>> # print number of segments
>>> print(loader.num())
2
>>> # load and plot the first selection
>>> spec = next(loader)
>>>
>>> import matplotlib.pyplot as plt
>>> fig = spec.plot()
>>> fig.savefig("ketos/tests/assets/tmp/spec_loader_2min_0.png")
>>> plt.close(fig)

Creating an AudioLoader to load selections made from annotations:

>>> from ketos.audio.audio_loader import AudioLoader, SelectionTableIterator
>>> from ketos.data_handling.selection_table import standardize
>>> import pandas as pd
>>> # Load the audio representation you want to pass
>>> from ketos.audio.spectrogram import MagSpectrogram
>>> # specify the audio representation
>>> rep = {'window':0.2, 'step':0.02, 'window_func':'hamming'}
>>> # Load selections
>>> annot = pd.DataFrame([{"filename":"2min.wav", "start":2.0, "end":3.0, "label":0},
...         {"filename":"2min.wav", "start":5.0, "end":6.0, "label":0},
...         {"filename":"2min.wav", "start":21.0, "end":22.0, "label":0},
...         {"filename":"2min.wav", "start":25.0, "end":27.0, "label":0}])
>>> annot_std = standardize(table=annot)
>>> # create a generator for iterating over all the selections 
>>> generator = SelectionTableIterator(data_dir="ketos/tests/assets/", selection_table=annot_std)
>>> # Create a loader by passing the generator and the representation to the AudioLoader
>>> loader = AudioLoader(selection_gen=generator, representation=MagSpectrogram, representation_params=rep)
>>> # print number of segments
>>> print(loader.num())
4
>>> # load and plot the first selection
>>> spec = next(loader)
>>>
>>> import matplotlib.pyplot as plt
>>> fig = spec.plot()
>>> fig.savefig("ketos/tests/assets/tmp/spec_loader_2min_1.png")
>>> plt.close(fig)

For more examples see child class audio.audio_loader.AudioFrameLoader

Methods

`load`(data_dir, filename[, offset, duration, ...])	Load audio segment for specified file and time.
`num`()	Returns total number of segments.
`reset`()	Resets the audio loader to the beginning.
`skip`()	Skip to the next audio segment or batch of audio segments without loading the current one.

load(data_dir, filename, offset=0, duration=None, label=None, **kwargs)[source]

Load audio segment for specified file and time.

Args:

data_dir: str: Data directory
filename: str: Filename or relative path
offset: float: Start time of the segment in seconds, measured from the beginning of the file.
duration: float: Duration of segment in seconds.
label: int: Integer label

Returns:

seg: BaseAudio or list(BaseAudio): Audio segment

num()[source]

Returns total number of segments.

Returns:

: int: Total number of segments.

reset()[source]: Resets the audio loader to the beginning.

skip()[source]: Skip to the next audio segment or batch of audio segments without loading the current one.