cqt

ketos.audio.utils.misc.cqt(x, rate, step, bins_per_oct, freq_min, freq_max=None, window_func='hamming')[source]

Compute the CQT spectrogram of an audio signal.

Uses the librosa implementation,

https://librosa.github.io/librosa/generated/librosa.core.cqt.html

To compute the CQT spectrogram, the user must specify the step size, the minimum and maximum frequencies, f_{min} and f_{max}, and the number of bins per octave, m. While f_{min} and m are fixed to the input values, the step size and f_{max} are adjusted as detailed below, attempting to match the input values as closely as possible.

The total number of bins is given by n = k \cdot m where k denotes the number of octaves, computed as

k = ceil(log_{2}[f_{max}/f_{min}])

For example, with f_{min}=10, f_{max}=16000, and m = 32 the number of octaves is k = 11 and the total number of bins is n = 352. The frequency of a given bin, i, is given by

f_{i} = 2^{i / m} \cdot f_{min}

This implies that the maximum frequency is given by f_{max} = f_{n} = 2^{n/m} \cdot f_{min}. For the above example, we find f_{max} = 20480 Hz, i.e., somewhat larger than the requested maximum value.

Note that if f_{max} exceeds the Nyquist frequency, f_{nyquist} = 0.5 \cdot s, where s is the sampling rate, the number of octaves, k, is reduced to ensure that f_{max} \leq f_{nyquist}.

The CQT algorithm requires the step size to be an integer multiple 2^k. To ensure that this is the case, the step size is computed as follows,

h = ceil(s \cdot x / 2^k ) \cdot 2^k

where s is the sampling rate in Hz, and x is the step size in seconds as specified via the argument winstep. For example, assuming a sampling rate of 32 kHz (s = 32000) and a step size of 0.02 seconds (x = 0.02) and adopting the same frequency limits as above (f_{min}=10 and f_{max}=16000), the actual step size is determined to be h = 2^{11} = 2048, corresponding to a physical bin size of t_{res} = 2048 / 32000 Hz = 0.064 s, i.e., about three times as large as the requested step size.

TODO: If possible, remove librosa dependency

Args:

x: numpy.array

Audio signal

rate: float

Sampling rate in Hz

step: float

Step size in seconds

bins_per_oct: int

Number of bins per octave

freq_min: float

Minimum frequency in Hz

freq_max: float

Maximum frequency in Hz. If None, it is set equal to half the sampling rate.

window_func: str

Window function (optional). Select between

bartlett
blackman
hamming (default)
hanning

Returns:

img: numpy.array: Resulting CQT spectrogram image.
step: float: Adjusted step size in seconds.