Base Audio

‘audio.base_audio’ module within the ketos library

This module contains the base class for the Waveform and Spectrogram classes.

Contents:

BaseAudio class

class ketos.audio.base_audio.BaseAudio(data, time_res, ndim, filename='', offset=0, label=None, annot=None, transforms=None, transform_log=None, **kwargs)[source]

Bases: object

Parent class for time-series data classes such as audio.waveform.Waveform and audio.spectrogram.Spectrogram.

Args:
data: numpy array

Data

time_res: float

Time resolution in seconds

ndim: int

Dimensionality of data.

filename: str

Filename of the original data file, if available (optional)

offset: float

Position within the original data file, in seconds measured from the start of the file. Defaults to 0 if not specified.

label: int

Spectrogram label. Optional

annot: AnnotationHandler

AnnotationHandler object. Optional

transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation and its arguments, if any. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

Attributes:
data: numpy array

Data

ndim: int

Dimensionality of data.

time_ax: LinearAxis

Axis object for the time dimension

filename: str

Filename of the original data file, if available (optional)

offset: float

Position within the original data file, in seconds measured from the start of the file. Defaults to 0 if not specified.

label: int

Data label.

annot: AnnotationHandler or pandas DataFrame

AnnotationHandler object.

allowed_transforms: dict

Transforms that can be applied via the apply_transform method

transform_log: list

List of transforms that have been applied to this object

adjust_range(range=0, 1)[source]

Applies a linear transformation to the data array that puts the values within the specified range.

Args:
range: tuple(float,float)

Minimum and maximum value of the desired range. Default is (0,1)

annotate(**kwargs)[source]

Add an annotation or a collection of annotations.

Input arguments are described in ketos.audio.annotation.AnnotationHandler.add()

apply_transforms(transforms)[source]

Apply specified transforms to the audio object.

Args:
transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation and its arguments, if any. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

Returns:

None

Example:
>>> from ketos.audio.waveform import Waveform
>>> # read audio signal from wav file
>>> wf = Waveform.from_wav('ketos/tests/assets/grunt1.wav')
>>> # print allowed transforms
>>> wf.view_allowed_transforms()
['normalize', 'adjust_range', 'crop', 'add_gaussian_noise']
>>> # apply gaussian normalization followed by cropping
>>> transforms = [{'name':'normalize','mean':0.5,'std':1.0},{'name':'crop','start':0.2,'end':0.7}]
>>> wf.apply_transforms(transforms)
>>> # inspect record of applied transforms 
>>> wf.transform_log
[{'name': 'normalize', 'mean': 0.5, 'std': 1.0}, {'name': 'crop', 'start': 0.2, 'end': 0.7, 'length': None}]
average()[source]

Average value along time axis

Returns:
: array-like

Average value of the data array

crop(start=None, end=None, length=None, make_copy=False)[source]

Crop audio signal.

Args:
start: float

Start time in seconds, measured from the left edge of spectrogram.

end: float

End time in seconds, measured from the left edge of spectrogram.

length: int

Horizontal size of the cropped image (number of pixels). If provided, the end argument is ignored.

make_copy: bool

Return a cropped copy of the spectrogra. Leaves the present instance unaffected. Default is False.

Returns:
a: BaseAudio

Cropped data array

deepcopy()[source]

Make a deep copy of the present instance

See https://docs.python.org/2/library/copy.html

Returns:
: BaseAudio

Deep copy.

duration()[source]

Data array duration in seconds

Returns:
: float

Duration in seconds

get(id)[source]

Get a given data object stored in this instance

get_annotations(id=None)[source]

Get annotations.

Args:
id: int

Data array ID. Only relevant if the object contains multiple, stacked arrays.

Returns:
: pandas DataFrame

Annotations

get_attrs()[source]

Get scalar attributes

get_data(id=None)[source]

Get underlying data.

Args:
id: int

Data array ID. Only relevant if the object contains multiple, stacked arrays.

Returns:
: numpy array

Data array

get_filename(id=None)[source]

Get filename.

Args:
id: int

Data array ID. Only relevant if the object contains multiple, stacked arrays.

Returns:
: array-like

Filename

get_label(id=None)[source]

Get label.

Args:
id: int

Data array ID. Only relevant if the object contains multiple, stacked arrays.

Returns:
: array-like

Label

get_offset(id=None)[source]

Get offset.

Args:
id: int

Data array ID. Only relevant if the object contains multiple, stacked arrays.

Returns:
: array-like

Offset

label_array(label)[source]

Get an array indicating presence/absence (1/0) of the specified annotation label for each time bin.

Args:
label: int

Label of interest.

Returns:
y: numpy.array

Label array

max()[source]

Maximum data value along time axis

Returns:
: array-like

Maximum value of the data array

median()[source]

Median value along time axis

Returns:
: array-like

Median value of the data array

min()[source]

Minimum data value along time axis

Returns:
: array-like

Minimum value of the data array

normalize(mean=0, std=1)[source]

Normalize the data array to specified mean and standard deviation.

For the data array to be normalizable, it must have non-zero standard deviation. If this is not the case, the array is unchanged by calling this method.

Args:
mean: float

Mean value of the normalized array. The default is 0.

std: float

Standard deviation of the normalized array. The default is 1.

num_objects()[source]

Get number of data objects stored in this instance

plot(id=0, figsize=5, 4, label_in_title=True)[source]

Plot the data with proper axes ranges and labels.

Optionally, also display annotations as boxes superimposed on the data.

Note: The resulting figure can be shown (fig.show()) or saved (fig.savefig(file_name))

Args:
id: int

ID of data array to be plotted. Only relevant if the object contains multiple, stacked data arrays.

figsize: tuple

Figure size

label_in_title: bool

Include label (if available) in figure title

Returns:
fig: matplotlib.figure.Figure

A figure object.

ax: matplotlib.axes.Axes

Axes object

segment(window, step=None)[source]

Divide the time axis into segments of uniform length, which may or may not be overlapping.

Window length and step size are converted to the nearest integer number of time steps.

If necessary, the data array will be padded with zeros at the end to ensure that all segments have an equal number of samples.

Args:
window: float

Length of each segment in seconds.

step: float

Step size in seconds.

Returns:
d: BaseAudio

Stacked data segments

classmethod stack(objects)[source]

Stack objects

Args:
objects: list(BaseAudio)

List of objects to be stacked.

Returns:
: BaseAudio

Stacked objects

std()[source]

Standard deviation along time axis

Returns:
: array-like

Standard deviation of the data array

time_res()[source]

Get the time resolution.

Returns:
: float

Time resolution in seconds

view_allowed_transforms()[source]

View allowed transformations for this audio object.

Returns:
: list

List of allowed transformations

ketos.audio.base_audio.get_slice(arr, axis=0, indices=None)[source]

Get a slice of an array.

Args:
arr: array-like

Input array

axis: int

The axis over which to select values.

indices: int or tuple

The indices of the values to extract.

Returns:
ans: array-like

Sliced array

ketos.audio.base_audio.segment_data(x, window, step=None)[source]

Divide the time axis into segments of uniform length, which may or may not be overlapping.

Window length and step size are converted to the nearest integer number of time steps.

If necessary, the data array will be padded with zeros at the end to ensure that all segments have an equal number of samples.

Args:
x: BaseAudio

Data to be segmented

window: float

Length of each segment in seconds.

step: float

Step size in seconds.

Returns:
segs: BaseAudio

Stacked data segments

filename: array-like

Filenames

offset: array-like

Offsets in seconds

label: array-like

Labels

annot: AnnotationHandler

Stacked annotation handlers, if any

ketos.audio.base_audio.stack_attr(value, shape, dtype)[source]

Ensure that data attribute has the requested shape.

Args:
value: array-like

Attribute values.

shape: tuple

Requested shape.

dtype: str

Type

Returns:
value_stacked: numpy array

Array containing the stacked attribute values