cqt
- ketos.audio.utils.misc.cqt(x, rate, step, bins_per_oct, freq_min, freq_max=None, window_func='hamming')[source]
Compute the CQT spectrogram of an audio signal.
Uses the librosa implementation,
To compute the CQT spectrogram, the user must specify the step size, the minimum and maximum frequencies, f_{min} and f_{max}, and the number of bins per octave, m. While f_{min} and m are fixed to the input values, the step size and f_{max} are adjusted as detailed below, attempting to match the input values as closely as possible.
The total number of bins is given by n = k \cdot m where k denotes the number of octaves, computed as
k = ceil(log_{2}[f_{max}/f_{min}])
For example, with f_{min}=10, f_{max}=16000, and m = 32 the number of octaves is k = 11 and the total number of bins is n = 352. The frequency of a given bin, i, is given by
f_{i} = 2^{i / m} \cdot f_{min}
This implies that the maximum frequency is given by f_{max} = f_{n} = 2^{n/m} \cdot f_{min}. For the above example, we find f_{max} = 20480 Hz, i.e., somewhat larger than the requested maximum value.
Note that if f_{max} exceeds the Nyquist frequency, f_{nyquist} = 0.5 \cdot s, where s is the sampling rate, the number of octaves, k, is reduced to ensure that f_{max} \leq f_{nyquist}.
The CQT algorithm requires the step size to be an integer multiple 2^k. To ensure that this is the case, the step size is computed as follows,
h = ceil(s \cdot x / 2^k ) \cdot 2^k
where s is the sampling rate in Hz, and x is the step size in seconds as specified via the argument winstep. For example, assuming a sampling rate of 32 kHz (s = 32000) and a step size of 0.02 seconds (x = 0.02) and adopting the same frequency limits as above (f_{min}=10 and f_{max}=16000), the actual step size is determined to be h = 2^{11} = 2048, corresponding to a physical bin size of t_{res} = 2048 / 32000 Hz = 0.064 s, i.e., about three times as large as the requested step size.
TODO: If possible, remove librosa dependency
- Args:
- x: numpy.array
Audio signal
- rate: float
Sampling rate in Hz
- step: float
Step size in seconds
- bins_per_oct: int
Number of bins per octave
- freq_min: float
Minimum frequency in Hz
- freq_max: float
Maximum frequency in Hz. If None, it is set equal to half the sampling rate.
- window_func: str
- Window function (optional). Select between
bartlett
blackman
hamming (default)
hanning
- Returns:
- img: numpy.array
Resulting CQT spectrogram image.
- step: float
Adjusted step size in seconds.