Miscellaneous¶
‘audio.utils.misc’ module within the ketos library
This module provides utilities to perform various types of operations on audio data, acting either in the time domain (waveform) or in the frequency domain (spectrogram), or both.
-
ketos.audio.utils.misc.
cqt
(x, rate, step, bins_per_oct, freq_min, freq_max=None, window_func='hamming')[source]¶ Compute the CQT spectrogram of an audio signal.
Uses the librosa implementation,
To compute the CQT spectrogram, the user must specify the step size, the minimum and maximum frequencies,
and
, and the number of bins per octave,
. While
and
are fixed to the input values, the step size and
are adjusted as detailed below, attempting to match the input values as closely as possible.
The total number of bins is given by
where
denotes the number of octaves, computed as
For example, with
,
, and
the number of octaves is
and the total number of bins is
. The frequency of a given bin,
, is given by
This implies that the maximum frequency is given by
. For the above example, we find
Hz, i.e., somewhat larger than the requested maximum value.
Note that if
exceeds the Nyquist frequency,
, where
is the sampling rate, the number of octaves,
, is reduced to ensure that
.
The CQT algorithm requires the step size to be an integer multiple
. To ensure that this is the case, the step size is computed as follows,
where
is the sampling rate in Hz, and
is the step size in seconds as specified via the argument winstep. For example, assuming a sampling rate of 32 kHz (
) and a step size of 0.02 seconds (
) and adopting the same frequency limits as above (
and
), the actual step size is determined to be
, corresponding to a physical bin size of
, i.e., about three times as large as the requested step size.
- Args:
- x: numpy.array
Audio signal
- rate: float
Sampling rate in Hz
- step: float
Step size in seconds
- bins_per_oct: int
Number of bins per octave
- freq_min: float
Minimum frequency in Hz
- freq_max: float
Maximum frequency in Hz. If None, it is set equal to half the sampling rate.
- window_func: str
- Window function (optional). Select between
bartlett
blackman
hamming (default)
hanning
- Returns:
- img: numpy.array
Resulting CQT spectrogram image.
- step: float
Adjusted step size in seconds.
-
ketos.audio.utils.misc.
from_decibel
(y)[source]¶ - Convert any data array,
, typically a spectrogram, from decibel scale
to linear scale by applying the operation
.
- Args:
- ynumpy array
Input array
- Returns:
- xnumpy array
Converted array
- Example:
>>> import numpy as np >>> from ketos.audio.utils.misc import from_decibel >>> img = np.array([[10., 20.],[30., 40.]]) >>> img_db = from_decibel(img) >>> img_db = np.around(img_db, decimals=2) # only keep up to two decimals >>> print(img_db) [[ 3.16 10. ] [ 31.62 100. ]]
- Convert any data array,
-
ketos.audio.utils.misc.
num_samples
(time, rate, even=False)[source]¶ Convert time interval to number of samples.
If the time corresponds to a non-integer number of samples, round to the nearest larger integer value.
- Args:
- time: float
Timer interval in seconds
- rate: float
Sampling rate in Hz
- even: bool
Convert to nearest larger even integer.
- Returns:
- n: int
Number of samples
- Example:
>>> from ketos.audio.utils.misc import num_samples >>> print(num_samples(rate=1000., time=0.0)) 0 >>> print(num_samples(rate=1000., time=2.0)) 2000 >>> print(num_samples(rate=1000., time=2.001)) 2001 >>> print(num_samples(rate=1000., time=2.001, even=True)) 2002
-
ketos.audio.utils.misc.
pad_reflect
(x, pad_left=0, pad_right=0)[source]¶ Pad array with its own (inverted) reflection along the first axis (0).
- Args:
- x: numpy.array
The data to be padded.
- pad_left: int
Amount of padding on the left
- pad_right: int
Amount of padding on the right
- Returns:
- x_padded: numpy.array
Padded array
- Example:
>>> from ketos.audio.utils.misc import pad_reflect >>> arr = np.arange(9) #create a simple array >>> print(arr) [0 1 2 3 4 5 6 7 8] >>> arr = pad_reflect(arr, pad_right=3) #pad on the right >>> print(arr) [ 0 1 2 3 4 5 6 7 8 9 10 11]
-
ketos.audio.utils.misc.
pad_zero
(x, pad_left=0, pad_right=0)[source]¶ Pad array with zeros along the first axis (0).
- Args:
- x: numpy.array
The data to be padded.
- pad_left: int
Amount of padding on the left
- pad_right: int
Amount of padding on the right
- Returns:
- x_padded: numpy.array
Padded array
- Example:
>>> from ketos.audio.utils.misc import pad_zero >>> arr = np.arange(9) #create a simple array >>> print(arr) [0 1 2 3 4 5 6 7 8] >>> arr = pad_zero(arr, pad_right=3) #pad on the right >>> print(arr) [0 1 2 3 4 5 6 7 8 0 0 0]
-
ketos.audio.utils.misc.
segment
(x, win_len, step_len, num_segs=None, offset_len=0, pad_mode='reflect', mem_warning=True)[source]¶ Divide an array into segments of equal length along its first axis (0), each segment being shifted by a fixed amount with respetive to the previous segment.
If offset_len is negative the input array will be padded with its own inverted reflection on the left.
If the combined length of the segments exceeds the length of the input array (minus any positive offset), the array will be padded with its own inverted reflection on the right.
- Args:
- x: numpy.array
The data to be segmented
- win_len: int
Window length in no. of samples
- step_len: float
Step size in no. of samples
- num_segs: int
Number of segments. Optional.
- offset_len: int
Position of the first frame in no. of samples. Defaults to 0, if not specified.
- pad_mode: str
Padding mode. Select between ‘reflect’ (default) and ‘zero’.
- mem_warning: bool
Print warning if the size of the array exceeds 10% of the available memory.
- Returns:
- segs: numpy.array
Segmented data, has shape (num_segs, win_len, x.shape[1:])
- Example:
>>> from ketos.audio.utils.misc import segment >>> x = np.arange(10) >>> print(x) [0 1 2 3 4 5 6 7 8 9] >>> y = segment(x, win_len=4, step_len=2, num_segs=3, offset_len=0) >>> print(y) [[0 1 2 3] [2 3 4 5] [4 5 6 7]] >>> y = segment(x, win_len=4, step_len=2, num_segs=3, offset_len=-3) >>> print(y) [[-3 -2 -1 0] [-1 0 1 2] [ 1 2 3 4]]
-
ketos.audio.utils.misc.
segment_args
(rate, duration, offset, window, step)[source]¶ Computes input arguments for
audio.utils.misc.make_segment()
to produce a centered spectrogram with properties as close as possible to those specified.- Args:
- rate: float
Sampling rate in Hz
- duration: float
Duration in seconds
- offset: float
Offset in seconds
- window: float
Window size in seconds
- step: float
Window size in seconds
- Returns:
- : dict
- Dictionary with following keys and values:
win_len: Window size in number of samples (int)
step_len: Step size in number of samples (int)
num_segs: Number of steps (int)
offset_len: Offset in number of samples (int)
- Example:
>>> from ketos.audio.utils.misc import segment_args >>> args = segment_args(rate=1000., duration=3., offset=0., window=0.1, step=0.02) >>> for key,value in sorted(args.items()): ... print(key,':',value) num_segs : 150 offset_len : -40 step_len : 20 win_len : 100
-
ketos.audio.utils.misc.
spec2wave
(image, phase_angle, num_fft, step_len, num_iters, window_func)[source]¶ Estimate audio signal from magnitude spectrogram.
Implements the algorithm described in
Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.
Follows closely the implentation of https://github.com/tensorflow/magenta/blob/master/magenta/models/nsynth/utils.py
- Args:
- image: 2d numpy array
Magnitude spectrogram, linear scale
- phase_angle:
Initial condition for phase in degrees
- num_fft: int
Number of points used for the Fast-Fourier Transform. Same as window size.
- step_len: int
Step size.
- num_iters:
Number of iterations to perform.
- window_func: string, tuple, number, function, np.ndarray [shape=(num_fft,)]
a window specification (string, tuple, or number); see scipy.signal.get_window
a window function, such as scipy.signal.hamming
a user-specified window vector of length num_fft
- Returns:
- audio: 1d numpy array
Audio signal
- Example:
>>> #Create a simple sinusoidal audio signal with frequency of 10 Hz >>> import numpy as np >>> x = np.arange(1000) >>> audio = 32600 * np.sin(2 * np.pi * 10 * x / 1000) >>> #Compute the Short Time Fourier Transform of the audio signal >>> #using a window size of 200, step size of 40, and a Hamming window, >>> from ketos.audio.utils.misc import stft >>> win_fun = 'hamming' >>> mag, freq_max, num_fft, _ = stft(x=audio, rate=1000, seg_args={'win_len':200, 'step_len':40}, window_func=win_fun) >>> #Estimate the original audio signal >>> from ketos.audio.utils.misc import spec2wave >>> audio_est = spec2wave(image=mag, phase_angle=0, num_fft=num_fft, step_len=40, num_iters=25, window_func=win_fun) >>> #plot the original and the estimated audio signal >>> import matplotlib.pyplot as plt >>> plt.clf() >>> _ = plt.plot(audio) >>> plt.savefig("ketos/tests/assets/tmp/sig_orig.png") >>> _ = plt.plot(audio_est) >>> plt.savefig("ketos/tests/assets/tmp/sig_est.png")
-
ketos.audio.utils.misc.
stft
(x, rate, window=None, step=None, seg_args=None, window_func='hamming', decibel=True)[source]¶ Compute Short Time Fourier Transform (STFT).
Uses
audio.utils.misc.segment_args()
to convert the window size and step size into an even integer number of samples.The number of points used for the Fourier Transform is equal to the number of samples in the window.
- Args:
- x: numpy.array
Audio signal
- rate: float
Sampling rate in Hz
- window: float
Window length in seconds
- step: float
Step size in seconds
- seg_args: dict
Input arguments for
audio.utils.misc.segment_args()
. Optional. If specified, the arguments window and step are ignored.- window_func: str
- Window function (optional). Select between
bartlett
blackman
hamming (default)
hanning
- decibel: bool
Convert to dB scale
- Returns:
- img: numpy.array
Short Time Fourier Transform of the input signal.
- freq_max: float
Maximum frequency in Hz
- num_fft: int
Number of points used for the Fourier Transform.
- seg_args: dict
Input arguments used for evaluating
audio.utils.misc.segment_args()
.
-
ketos.audio.utils.misc.
to_decibel
(x)[source]¶ - Convert any data array,
, typically a spectrogram, from linear scale
to decibel scale by applying the operation
.
- Args:
- xnumpy array
Input array
- Returns:
- ynumpy array
Converted array
- Example:
>>> import numpy as np >>> from ketos.audio.utils.misc import to_decibel >>> img = np.array([[10., 20.],[30., 40.]]) >>> img_db = to_decibel(img) >>> img_db = np.around(img_db, decimals=2) # only keep up to two decimals >>> print(img_db) [[20.0 26.02] [29.54 32.04]]
- Convert any data array,