CQTSpectrogram

class ketos.audio.spectrogram.CQTSpectrogram(data, time_res, bins_per_oct, freq_min, window_func=None, filename=None, offset=0, label=None, annot=None, transforms=None, transform_log=None, waveform_transform_log=None, **kwargs)[source]

Magnitude Spectrogram computed from Constant Q Transform (CQT).

Args:
image: 2d or 3d numpy array

Spectrogram pixel values.

time_res: float

Time resolution in seconds (corresponds to the bin size used on the time axis)

freq_min: float

Lower value of the frequency axis in Hz

bins_per_oct: int

Number of bins per octave

window_func: str

Window function used for computing the spectrogram

filename: str or list(str)

Name of the source audio file, if available.

offset: float or array-like

Position in seconds of the left edge of the spectrogram within the source audio file, if available.

label: int

Spectrogram label. Optional

annot: AnnotationHandler

AnnotationHandler object. Optional

transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the spectrogram. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

transform_log: list(dict)

List of transforms that have been applied to this spectrogram

waveform_transform_log: list(dict)

List of transforms that have been applied to the waveform before generating this spectrogram

Attrs:
window_func: str

Window function.

Methods

bins_per_octave()

Get no.

empty()

Creates an empty CQTSpectrogram object

from_wav(path, step, bins_per_oct[, ...])

Create CQT spectrogram directly from wav file.

from_waveform(audio, step, bins_per_oct[, ...])

Magnitude Spectrogram computed from Constant Q Transform (CQT) using the librosa implementation:

get_kwargs()

Get keyword arguments required to create a copy of this instance.

get_repres_attrs()

Get audio representation attributes

plot([show_annot, figsize, cmap, ...])

Plot the spectrogram with proper axes ranges and labels.

bins_per_octave()[source]

Get no. bins per octave.

Returns:
: int

No. bins per octave.

classmethod empty()[source]

Creates an empty CQTSpectrogram object

classmethod from_wav(path, step, bins_per_oct, freq_min=1, freq_max=None, channel=0, rate=None, window_func='hann', offset=0, duration=None, resample_method='scipy', id=None, normalize_wav=False, transforms=None, waveform_transforms=None, smooth=0.01, **kwargs)[source]

Create CQT spectrogram directly from wav file.

The arguments offset and duration can be used to select a segment of the audio file.

Note that values specified for the arguments window, step, offset, and duration may all be subject to slight adjustments to ensure that the selected portion corresponds to an integer number of window frames, and that the window and step sizes correspond to an integer number of samples.

Args:
path: str

Complete path to wav file

step: float

Step size in seconds

bins_per_oct: int

Number of bins per octave

freq_min: float

Minimum frequency in Hz. Default is 1 Hz.

freq_max: float

Maximum frequency in Hz If None, it is set half the sampling rate.

channel: int

Channel to read from. Only relevant for stereo recordings

rate: float

Desired sampling rate in Hz. If None, the original sampling rate will be used

window_func: str
Window function (optional). Select between
  • bartlett

  • blackman

  • hamming (default)

  • hanning

offset: float

Start time of spectrogram in seconds, relative the start of the wav file.

duration: float

Length of spectrogrma in seconds.

resample_method: str
Resampling method. Only relevant if rate is specified. Options are
  • kaiser_best

  • kaiser_fast

  • scipy (default)

  • polyphase

See https://librosa.github.io/librosa/generated/librosa.core.resample.html for details on the individual methods.

id: str

Unique identifier (optional). If None, the filename will be used.

normalize_wav: bool

Normalize the waveform to have a mean of zero (mean=0) and a standard deviation of unity (std=1) before computing the spectrogram. Default is False.

transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the spectrogram. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

waveform_transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the waveform before generating the spectrogram. For example, {“name”:”add_gaussian_noise”, “sigma”:0.5}

smooth: float

Width in seconds of the smoothing region used for stitching together audio files.

Returns:
: CQTSpectrogram

CQT spectrogram

Example:
>>> # load spectrogram from wav file
>>> from ketos.audio.spectrogram import CQTSpectrogram
>>> spec = CQTSpectrogram.from_wav('ketos/tests/assets/grunt1.wav', step=0.01, freq_min=10, freq_max=800, bins_per_oct=16)
>>> # show
>>> fig = spec.plot()
>>> fig.savefig("ketos/tests/assets/tmp/cqt_grunt1.png")
>>> plt.close(fig)
../_images/cqt_grunt1.png
classmethod from_waveform(audio, step, bins_per_oct, freq_min=1, freq_max=None, window_func='hann', transforms=None, **kwargs)[source]

Magnitude Spectrogram computed from Constant Q Transform (CQT) using the librosa implementation:

https://librosa.github.io/librosa/generated/librosa.core.cqt.html

The frequency axis of a CQT spectrogram is essentially a logarithmic axis with base 2. It is characterized by an integer number of bins per octave (an octave being a doubling of the frequency.)

For further details, see audio.audio.cqt().

Args:
audio: Waveform

Audio signal

step: float

Step size in seconds

bins_per_oct: int

Number of bins per octave

freq_min: float

Minimum frequency in Hz. Default is 1 Hz.

freq_max: float

Maximum frequency in Hz If None, it is set half the sampling rate.

window_func: str
Window function (optional). Select between
  • bartlett

  • blackman

  • hamming

  • hanning (default)

transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the spectrogram. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

Returns:
spec: CQTSpectrogram

CQT spectrogram

get_kwargs()[source]

Get keyword arguments required to create a copy of this instance.

Does not include the data array and annotation handler.

get_repres_attrs()[source]

Get audio representation attributes

plot(show_annot=False, figsize=(5, 4), cmap='viridis', label_in_title=True, vmin=None, vmax=None)[source]

Plot the spectrogram with proper axes ranges and labels.

Optionally, also display annotations as boxes superimposed on the spectrogram.

The colormaps available can be seen here: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html

Note: The resulting figure can be shown (fig.show()) or saved (fig.savefig(file_name))

Args:
show_annot: bool

Display annotations

figsize: tuple

Figure size

cmap: string

The colormap to be used

label_in_title: bool

Include label (if available) in figure title

Returns:
fig: matplotlib.figure.Figure

A figure object.