MelSpectrogram

class ketos.audio.spectrogram.MelSpectrogram(data, num_filters, time_res, freq_max, start_bin=0, bins=None, window_func=None, filename=None, offset=0, label=None, annot=None, transforms=None, transform_log=None, waveform_transform_log=None, **kwargs)[source]

Mel Spectrogram.

Args:

data: 2d numpy array: Mel spectrogram pixel values.
num_filters: int: The number of filters in the filter bank.
time_res: float: Time resolution in seconds (corresponds to the bin size used on the time axis)
freq_max: float: Maximum frequency in Hz
window_func: str: Window function used for computing the spectrogram
filename: str or list(str): Name of the source audio file, if available.
offset: float or array-like: Position in seconds of the left edge of the spectrogram within the source audio file, if available.
label: int: Spectrogram label. Optional
annot: AnnotationHandler: AnnotationHandler object. Optional
transforms: list(dict): List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the spectrogram. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}
transform_log: list(dict): List of transforms that have been applied to this spectrogram
waveform_transform_log: list(dict): List of transforms that have been applied to the waveform before generating this spectrogram

Attrs:

window_func: str: Window function.

Methods

`empty`()	Creates an empty MelSpectrogram object
`from_wav`(path, window, step[, channel, ...])	Create Mel spectrogram directly from wav file.
`from_waveform`(audio[, window, step, ...])	Creates a Mel Spectrogram from an `audio_signal.Waveform`.
`get_kwargs`()	Get keyword arguments required to create a copy of this instance.
`get_repres_attrs`()	Get audio representation attributes
`plot`([show_annot, figsize, cmap, ...])	Plot the spectrogram with proper axes ranges and labels.

classmethod empty()[source]: Creates an empty MelSpectrogram object

classmethod from_wav(path, window, step, channel=0, rate=None, window_func='hamming', num_filters=40, offset=0, duration=None, resample_method='scipy', id=None, normalize_wav=False, transforms=None, waveform_transforms=None, smooth=0.01, **kwargs)[source]

Create Mel spectrogram directly from wav file.

The arguments offset and duration can be used to select a portion of the wav file.

Note that values specified for the arguments window, step, offset, and duration may all be subject to slight adjustments to ensure that the selected portion corresponds to an integer number of window frames, and that the window and step sizes correspond to an integer number of samples.

Args:

path: str

Path to wav file

window: float

Window size in seconds

step: float

Step size in seconds

channel: int

Channel to read from. Only relevant for stereo recordings

rate: float

Desired sampling rate in Hz. If None, the original sampling rate will be used

window_func: str

Window function (optional). Select between

bartlett
blackman
hamming (default)
hanning

num_filters: int

The number of filters in the filter bank. Default is 40.

offset: float

Start time of spectrogram in seconds, relative the start of the wav file.

duration: float

Length of spectrogrma in seconds.

resample_method: str

Resampling method. Only relevant if rate is specified. Options are

kaiser_best
kaiser_fast
scipy (default)
polyphase

See https://librosa.github.io/librosa/generated/librosa.core.resample.html for details on the individual methods.

id: str

Unique identifier (optional). If None, the filename will be used.

normalize_wav: bool

Normalize the waveform to have a mean of zero (mean=0) and a standard deviation of unity (std=1) before computing the spectrogram. Default is False.

transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the spectrogram. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

waveform_transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the waveform before generating the spectrogram. For example, {“name”:”add_gaussian_noise”, “sigma”:0.5}

smooth: float

Width in seconds of the smoothing region used for stitching together audio files.

Returns:

spec: MelSpectrogram: Mel spectrogram

Example:

>>> # load spectrogram from wav file
>>> from ketos.audio.spectrogram import MelSpectrogram
>>> spec = MelSpectrogram.from_wav('ketos/tests/assets/grunt1.wav', window=0.2, step=0.01)
>>> # crop frequency
>>> spec = spec.crop(freq_min=50, freq_max=800)
>>> # show
>>> fig = spec.plot()
>>> fig.savefig("ketos/tests/assets/tmp/mel_grunt1.png")
>>> plt.close(fig)

classmethod from_waveform(audio, window=None, step=None, seg_args=None, window_func='hamming', num_filters=40, transforms=None, **kwargs)[source]

Creates a Mel Spectrogram from an audio_signal.Waveform.

Args:

audio: Waveform

Audio signal

window: float

Window length in seconds

step: float

Step size in seconds

seg_args: dict

Input arguments used for evaluating audio.audio.segment_args(). Optional. If specified, the arguments window and step are ignored.

window_func: str

Window function (optional). Select between

bartlett
blackman
hamming (default)
hanning

num_filters: int

The number of filters in the filter bank. Default is 40.

transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the spectrogram. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

Returns:

: MelSpectrogram: Mel spectrogram

get_kwargs()[source]

Get keyword arguments required to create a copy of this instance.

Does not include the data array and annotation handler.

get_repres_attrs()[source]: Get audio representation attributes

plot(show_annot=False, figsize=(5, 4), cmap='viridis', label_in_title=True, vmin=None, vmax=None, num_labels=5)[source]

Plot the spectrogram with proper axes ranges and labels.

The colormaps available can be seen here: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html

Note: The resulting figure can be shown (fig.show()) or saved (fig.savefig(file_name))

TODO: Check implementation for filter_bank=True

Args:

show_annot: bool: Display annotations
figsize: tuple: Figure size
cmap: string: The colormap to be used
label_in_title: bool: Include label (if available) in figure title
num_labels: int: Number of labels

Returns:

fig: matplotlib.figure.Figure: A figure object.