MelSpectrogram
- class ketos.audio.spectrogram.MelSpectrogram(data, num_filters, time_res, freq_max, start_bin=0, bins=None, window_func=None, filename=None, offset=0, label=None, annot=None, transforms=None, transform_log=None, waveform_transform_log=None, **kwargs)[source]
 Mel Spectrogram.
- Args:
 - data: 2d numpy array
 Mel spectrogram pixel values.
- num_filters: int
 The number of filters in the filter bank.
- time_res: float
 Time resolution in seconds (corresponds to the bin size used on the time axis)
- freq_max: float
 Maximum frequency in Hz
- window_func: str
 Window function used for computing the spectrogram
- filename: str or list(str)
 Name of the source audio file, if available.
- offset: float or array-like
 Position in seconds of the left edge of the spectrogram within the source audio file, if available.
- label: int
 Spectrogram label. Optional
- annot: AnnotationHandler
 AnnotationHandler object. Optional
- transforms: list(dict)
 List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the spectrogram. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}
- transform_log: list(dict)
 List of transforms that have been applied to this spectrogram
- waveform_transform_log: list(dict)
 List of transforms that have been applied to the waveform before generating this spectrogram
- Attrs:
 - window_func: str
 Window function.
Methods
empty()Creates an empty MelSpectrogram object
from_wav(path, window, step[, channel, ...])Create Mel spectrogram directly from wav file.
from_waveform(audio[, window, step, ...])Creates a Mel Spectrogram from an
audio_signal.Waveform.Get keyword arguments required to create a copy of this instance.
Get audio representation attributes
plot([show_annot, figsize, cmap, ...])Plot the spectrogram with proper axes ranges and labels.
- classmethod from_wav(path, window, step, channel=0, rate=None, window_func='hamming', num_filters=40, offset=0, duration=None, resample_method='scipy', id=None, normalize_wav=False, transforms=None, waveform_transforms=None, smooth=0.01, **kwargs)[source]
 Create Mel spectrogram directly from wav file.
The arguments offset and duration can be used to select a portion of the wav file.
Note that values specified for the arguments window, step, offset, and duration may all be subject to slight adjustments to ensure that the selected portion corresponds to an integer number of window frames, and that the window and step sizes correspond to an integer number of samples.
- Args:
 - path: str
 Path to wav file
- window: float
 Window size in seconds
- step: float
 Step size in seconds
- channel: int
 Channel to read from. Only relevant for stereo recordings
- rate: float
 Desired sampling rate in Hz. If None, the original sampling rate will be used
- window_func: str
 - Window function (optional). Select between
 bartlett
blackman
hamming (default)
hanning
- num_filters: int
 The number of filters in the filter bank. Default is 40.
- offset: float
 Start time of spectrogram in seconds, relative the start of the wav file.
- duration: float
 Length of spectrogrma in seconds.
- resample_method: str
 - Resampling method. Only relevant if rate is specified. Options are
 kaiser_best
kaiser_fast
scipy (default)
polyphase
See https://librosa.github.io/librosa/generated/librosa.core.resample.html for details on the individual methods.
- id: str
 Unique identifier (optional). If None, the filename will be used.
- normalize_wav: bool
 Normalize the waveform to have a mean of zero (mean=0) and a standard deviation of unity (std=1) before computing the spectrogram. Default is False.
- transforms: list(dict)
 List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the spectrogram. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}
- waveform_transforms: list(dict)
 List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the waveform before generating the spectrogram. For example, {“name”:”add_gaussian_noise”, “sigma”:0.5}
- smooth: float
 Width in seconds of the smoothing region used for stitching together audio files.
- Returns:
 - spec: MelSpectrogram
 Mel spectrogram
- Example:
 >>> # load spectrogram from wav file >>> from ketos.audio.spectrogram import MelSpectrogram >>> spec = MelSpectrogram.from_wav('ketos/tests/assets/grunt1.wav', window=0.2, step=0.01) >>> # crop frequency >>> spec = spec.crop(freq_min=50, freq_max=800) >>> # show >>> fig = spec.plot() >>> fig.savefig("ketos/tests/assets/tmp/mel_grunt1.png") >>> plt.close(fig)
- classmethod from_waveform(audio, window=None, step=None, seg_args=None, window_func='hamming', num_filters=40, transforms=None, **kwargs)[source]
 Creates a Mel Spectrogram from an
audio_signal.Waveform.- Args:
 - audio: Waveform
 Audio signal
- window: float
 Window length in seconds
- step: float
 Step size in seconds
- seg_args: dict
 Input arguments used for evaluating
audio.audio.segment_args(). Optional. If specified, the arguments window and step are ignored.- window_func: str
 - Window function (optional). Select between
 bartlett
blackman
hamming (default)
hanning
- num_filters: int
 The number of filters in the filter bank. Default is 40.
- transforms: list(dict)
 List of dictionaries, where each dictionary specifies the name of a transformation to be applied to the spectrogram. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}
- Returns:
 - : MelSpectrogram
 Mel spectrogram
- get_kwargs()[source]
 Get keyword arguments required to create a copy of this instance.
Does not include the data array and annotation handler.
- plot(show_annot=False, figsize=(5, 4), cmap='viridis', label_in_title=True, vmin=None, vmax=None, num_labels=5)[source]
 Plot the spectrogram with proper axes ranges and labels.
The colormaps available can be seen here: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
Note: The resulting figure can be shown (fig.show()) or saved (fig.savefig(file_name))
TODO: Check implementation for filter_bank=True
- Args:
 - show_annot: bool
 Display annotations
- figsize: tuple
 Figure size
- cmap: string
 The colormap to be used
- label_in_title: bool
 Include label (if available) in figure title
- num_labels: int
 Number of labels
- Returns:
 - fig: matplotlib.figure.Figure
 A figure object.