Audio Loader

‘audio.audio_loader’ module within the ketos library

This module contains the utilities for loading waveforms and computing spectrograms.

Contents:

AudioLoader class: AudioSelectionLoader class: AudioSequenceLoader class

class ketos.audio.audio_loader.AudioFrameLoader(frame=None, step=None, path=None, filename=None, channel=0, annotations=None, repres={'type': 'Waveform'}, batch_size=1)[source]

Bases: ketos.audio.audio_loader.AudioLoader

Load segments of audio data from .wav files.

Loads segments of uniform duration ‘frame’, with successive segments displaced by an amount ‘step’. (If ‘step’ is not specified, it is set equal to ‘frame’.)

Args:
frame: float

Segment duration in seconds. Can also be specified via the ‘duration’ item of the ‘repres’ dictionary.

step: float

Separation between consecutive segments in seconds. If None, the step size equals the segment duration.

path: str

Path to folder containing .wav files. If None is specified, the current directory will be used.

filename: str or list(str)

relative path to a single .wav file or a list of .wav files. Optional

channel: int

For stereo recordings, this can be used to select which channel to read from

annotations: pandas DataFrame

Annotation table

repres: dict

Audio data representation. Must contain the key ‘type’ as well as any arguments required to initialize the class using the from_wav method. It is also possible to specify multiple audio presentations as a list. These presentations must have the same duration.

batch_size: int or str

Load segments in batches rather than one at the time. Increasing the batch size can help reduce computational time. The default batch size is 1. You can also specify batch_size=’file’ to load one wav file at the time.

Examples:
>>> import librosa
>>> from ketos.audio.audio_loader import AudioFrameLoader
>>> # specify path to wav file
>>> filename = 'ketos/tests/assets/2min.wav'
>>> # check the duration of the audio file
>>> print(librosa.get_duration(filename=filename))
120.832
>>> # specify the audio representation
>>> rep = {'type':'MagSpectrogram', 'window':0.2, 'step':0.02, 'window_func':'hamming', 'freq_max':1000.}
>>> # create an object for loading 30-s long spectrogram segments, using a step size of 15 s (50% overlap) 
>>> loader = AudioFrameLoader(frame=30., step=15., filename=filename, repres=rep)
>>> # print number of segments
>>> print(loader.num())
8
>>> # load and plot the first segment
>>> spec = next(loader)
>>>
>>> import matplotlib.pyplot as plt
>>> fig = spec.plot()
>>> fig.savefig("ketos/tests/assets/tmp/spec_2min_0.png")
>>> plt.close(fig)
../../_images/spec_2min_0.png
load_next_batch()[source]

Load the next batch of waveforms or spectrograms.

next_in_batch()[source]

Load the next waveform or spectrogram in the batch.

Returns:
a: Waveform or Spectrogram

Next segment

class ketos.audio.audio_loader.AudioLoader(selection_gen, channel=0, annotations=None, repres={'type': 'Waveform'})[source]

Bases: object

Class for loading segments of audio data from .wav files.

Several representations of the audio data are possible, including waveform, magnitude spectrogram, power spectrogram, mel spectrogram, and CQT spectrogram.

Args:
selection_gen: SelectionGenerator

Selection generator

channel: int

For stereo recordings, this can be used to select which channel to read from

annotations: pandas DataFrame

Annotation table

repres: dict

Audio data representation. Must contain the key ‘type’ as well as any arguments required to initialize the class using the from_wav method.

  • Waveform:

    (rate), (resample_method)

  • MagSpectrogram, PowerSpectrogram, MelSpectrogram:

    window, step, (window_func), (rate), (resample_method)

  • CQTSpectrogram:

    step, bins_per_oct, (freq_min), (freq_max), (window_func), (rate), (resample_method)

Optionally, may also contain the key ‘normalize_wav’ which can have value True or False. If True, the waveform is normalized zero mean (mean=0) and (std=1) unity standard deviation. It is also possible to specify multiple audio presentations as a list.

Examples:

See child classes audio.audio_loader.AudioFrameLoader and audio.audio_loader.AudioSelectionLoader.

load(offset, duration, data_dir, filename, label)[source]

Load audio segment for specified file and time.

Args:
offset: float

Start time of the segment in seconds, measured from the beginning of the file.

duration: float

Duration of segment in seconds.

data_dir: str

Data directory

filename: str

Filename or relative path

label: int

Integer label

Returns:
seg: BaseAudio

Audio segment

num()[source]

Returns total number of segments.

Returns:
: int

Total number of segments.

reset()[source]

Resets the audio loader to the beginning.

class ketos.audio.audio_loader.AudioSelectionLoader(path, selections, channel=0, annotations=None, repres={'type': 'Waveform'})[source]

Bases: ketos.audio.audio_loader.AudioLoader

Load segments of audio data from .wav files.

The segments to be loaded are specified via a selection table.

Args:
selections: pandas DataFrame

Selection table

path: str

Path to folder containing .wav files

filename: str or list(str)

relative path to a single .wav file or a list of .wav files. Optional

annotations: pandas DataFrame

Annotation table

repres: dict

Audio data representation. Must contain the key ‘type’ as well as any arguments required to initialize the class using the from_wav method. It is also possible to specify multiple audio presentations as a list.

class ketos.audio.audio_loader.FrameStepper(frame, step=None, path=None, filename=None)[source]

Bases: ketos.audio.audio_loader.SelectionGenerator

Generates selections with uniform duration ‘frame’, with successive selections displaced by a fixed amount ‘step’ (If ‘step’ is not specified, it is set equal to ‘frame’.)

Args:
frame: float

Frame length in seconds.

step: float

Separation between consecutive frames in seconds. If None, the step size equals the frame length.

path: str

Path to folder containing .wav files. If None is specified, the current directory will be used.

filename: str or list(str)

Relative path to a single .wav file or a list of .wav files. Optional.

num()[source]

Returns total number of selections.

Returns:
: int

Total number of selections.

reset()[source]

Resets the selection generator to the beginning of the first file.

class ketos.audio.audio_loader.SelectionGenerator[source]

Bases: object

Template class for selection generators.

num()[source]

Returns total number of selections.

Must be implemented in child class.

Returns:
: int

Total number of selections.

reset()[source]

Resets the selection generator to the beginning.

class ketos.audio.audio_loader.SelectionTableIterator(data_dir, selection_table, duration=None)[source]

Bases: ketos.audio.audio_loader.SelectionGenerator

Iterates over entries in a selection table.

Args:
data_dir: str

Path to top folder containing audio files.

selection_table: pandas DataFrame

Selection table

duration: float

Use this argument to enforce uniform duration of all selections. Any selection longer than the specified duration will be shortened

num()[source]

Returns total number of selections.

Returns:
: int

Total number of selections.

reset()[source]

Resets the selection generator to the beginning of the selection table.