AudioFrameLoader

class ketos.audio.audio_loader.AudioFrameLoader(duration, step=None, path=None, filename=None, channel=0, annotations=None, representation=<class 'ketos.audio.waveform.Waveform'>, representation_params=None, batch_size=1, stop=True, pad=True)[source]

Load audio segments by sliding a fixed-size frame across the recording.

The frame size is specified with the ‘duration’ argument, while the ‘step’ argument may be used to specify the step size. (If ‘step’ is not specified, it is set equal to ‘duration’.)

Args:
duration: float

Segment duration in seconds.

step: float

Separation between consecutive segments in seconds. If None, the step size equals the segment duration.

path: str

Path to folder containing .wav files. If None is specified, the current directory will be used.

filename: str or list(str)

relative path to a single .wav file or a list of .wav files. Optional

channel: int

For stereo recordings, this can be used to select which channel to read from

annotations: pandas DataFrame

Annotation table

representation: class or list of classes

Audio data representation. This is a class that must receive the raw audio data and will transform the data into the specified audio representation object. It is also possible to specify multiple audio presentations as a list. These presentations must have the same duration.

representation_params: dict or list of dict

Dictionary containing any required and optional arguments for the representation class. If more than one representation is given representation_params must be a list of the same length and in the same order.

batch_size: int

Load segments in batches rather than one at the time.

stop: bool

Raise StopIteration if the iteration exceeds the number of available selections. Default is False.

pad: bool

If True (default), the last segment is allowed to extend beyond the endpoint of the audio file.

Examples:
>>> from ketos.audio.audio_loader import AudioFrameLoader
>>> # Load the audio representation you want to pass
>>> from ketos.audio.spectrogram import MagSpectrogram
>>> # specify path to wav file
>>> filename = 'ketos/tests/assets/2min.wav'
>>> # check the duration of the audio file
>>> from ketos.audio.waveform import get_duration
>>> print(get_duration(filename)[0])
120.832
>>> # specify the audio representation parameters
>>> rep = {'window':0.2, 'step':0.02, 'window_func':'hamming', 'freq_max':1000.}
>>> # create an object for loading 30-s long spectrogram segments, using a step size of 15 s (50% overlap) 
>>> loader = AudioFrameLoader(duration=30., step=15., filename=filename, representation=MagSpectrogram, representation_params=rep)
>>> # print number of segments
>>> print(loader.num())
8
>>> # load and plot the first segment
>>> spec = next(loader)
>>>
>>> import matplotlib.pyplot as plt
>>> fig = spec.plot()
>>> fig.savefig("ketos/tests/assets/tmp/spec_2min_0.png")
>>> plt.close(fig)
../_images/spec_2min_0.png

Methods

get_file_durations()

Get the durations of the audio files associated with this instance.

get_file_paths([fullpath])

Get the paths to the audio files associated with this instance.

get_file_durations()[source]

Get the durations of the audio files associated with this instance.

Returns:
ans: list

List of file durations in seconds

get_file_paths(fullpath=True)[source]

Get the paths to the audio files associated with this instance.

Args:
fullpath: bool

Whether to return the full path (default) or only the filename.

Returns:
ans: list

List of file paths