BaseAudio

class ketos.audio.base_audio.BaseAudio(data, filename='', offset=0, duration=None, label=None, annot=None, transforms=None, transform_log=None, **kwargs)[source]

Parent class for all audio classes.

While the underlying data array can be accessed via the data attribute, it is recommended to always use the get_data() function to access the data array, i.e.,

>>> from ketos.audio.base_audio import BaseAudio
>>> x = np.ones(6)
>>> audio_sample = BaseAudio(data=x)
>>> audio_sample.get_data()
array([1., 1., 1., 1., 1., 1.])
Args:
data: numpy array

Data

filename: str

Filename of the original data file, if available (optional)

offset: float

Position within the original data file, in seconds measured from the start of the file. Defaults to 0 if not specified.

duration: float

Duration in seconds.

label: int

Spectrogram label. Optional

annot: AnnotationHandler

AnnotationHandler object. Optional

transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation and its arguments, if any. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

Attributes:
data: numpy array

Data

ndim: int

Dimensionality of data.

filename: str

Filename of the original data file, if available (optional)

offset: float

Position within the original data file, in seconds measured from the start of the file. Defaults to 0 if not specified.

label: int

Data label.

annot: AnnotationHandler or pandas DataFrame

AnnotationHandler object.

allowed_transforms: dict

Transforms that can be applied via the apply_transform method

transform_log: list

List of transforms that have been applied to this object

Methods

adjust_range([range])

Applies a linear transformation to the data array that puts the values within the specified range.

annotate(**kwargs)

Add an annotation or a collection of annotations.

apply_transforms(transforms)

Apply specified transforms to the audio object.

average([axis])

Average value along selected axis

deepcopy()

Make a deep copy of the present instance

duration()

Data array duration in seconds

get()

Get a copy of this instance

get_annotations()

Get annotations.

get_data()

Get underlying data.

get_filename()

Get filename.

get_instance_attrs()

Get instance attributes

get_kwargs()

Get keyword arguments required to create a copy of this instance.

get_label([id])

Get label.

get_offset()

Get offset.

get_repres_attrs()

Get audio representation attributes

infer_shape(**kwargs)

Infers the data shape that would result if the class were instantiated with a specific set of parameter values.

max([axis])

Maximum data value along selected axis

median([axis])

Median value along selected axis

min([axis])

Minimum data value along selected axis

normalize([mean, std])

Normalize the data array to specified mean and standard deviation.

std([axis])

Standard deviation along selected axis

view_allowed_transforms()

View allowed transformations for this audio object.

adjust_range(range=(0, 1))[source]

Applies a linear transformation to the data array that puts the values within the specified range.

Args:
range: tuple(float,float)

Minimum and maximum value of the desired range. Default is (0,1)

annotate(**kwargs)[source]

Add an annotation or a collection of annotations.

Input arguments are described in ketos.audio.annotation.AnnotationHandler.add()

apply_transforms(transforms)[source]

Apply specified transforms to the audio object.

Args:
transforms: list(dict)

List of dictionaries, where each dictionary specifies the name of a transformation and its arguments, if any. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

Returns:

None

Example:
>>> from ketos.audio.waveform import Waveform
>>> # read audio signal from wav file
>>> wf = Waveform.from_wav('ketos/tests/assets/grunt1.wav')
>>> # print allowed transforms
>>> wf.view_allowed_transforms()
['normalize', 'adjust_range', 'crop', 'add_gaussian_noise', 'bandpass_filter']
>>> # apply gaussian normalization followed by cropping
>>> transforms = [{'name':'normalize','mean':0.5,'std':1.0},{'name':'crop','start':0.2,'end':0.7}]
>>> wf.apply_transforms(transforms)
>>> # inspect record of applied transforms 
>>> wf.transform_log
[{'name': 'normalize', 'mean': 0.5, 'std': 1.0}, {'name': 'crop', 'start': 0.2, 'end': 0.7, 'length': None}]
average(axis=0)[source]

Average value along selected axis

Args:
axis: int

Axis along which metric is computed

Returns:
: array-like

Average value of the data array

deepcopy()[source]

Make a deep copy of the present instance

See https://docs.python.org/2/library/copy.html

Returns:
: BaseAudio

Deep copy.

duration()[source]

Data array duration in seconds

TODO: rename to get_duration()

Returns:
: float

Duration in seconds

get()[source]

Get a copy of this instance

get_annotations()[source]

Get annotations.

Returns:
: pandas DataFrame

Annotations

get_data()[source]

Get underlying data.

Returns:
: numpy array

Data array

get_filename()[source]

Get filename.

Returns:
: string

Filename

get_instance_attrs()[source]

Get instance attributes

get_kwargs()[source]

Get keyword arguments required to create a copy of this instance.

Does not include the data array and annotation handler.

get_label(id=None)[source]

Get label.

Returns:
: int

Label

get_offset()[source]

Get offset.

Returns:
: float

Offset

get_repres_attrs()[source]

Get audio representation attributes

static infer_shape(**kwargs)[source]

Infers the data shape that would result if the class were instantiated with a specific set of parameter values.

Returns a None value if duration or rate are not specified.

Args:
duration: float

Duration in seconds

rate: float

Sampling rate in Hz

Returns:
: tuple

Inferred shape. If the parameter value do not allow the shape be inferred, a None value is returned.

max(axis=0)[source]

Maximum data value along selected axis

Args:
axis: int

Axis along which metric is computed

Returns:
: array-like

Maximum value of the data array

median(axis=0)[source]

Median value along selected axis

Args:
axis: int

Axis along which metric is computed

Returns:
: array-like

Median value of the data array

min(axis=0)[source]

Minimum data value along selected axis

Args:
axis: int

Axis along which metric is computed

Returns:
: array-like

Minimum value of the data array

normalize(mean=0, std=1)[source]

Normalize the data array to specified mean and standard deviation.

For the data array to be normalizable, it must have non-zero standard deviation. If this is not the case, the array is unchanged by calling this method.

Args:
mean: float

Mean value of the normalized array. The default is 0.

std: float

Standard deviation of the normalized array. The default is 1.

std(axis=0)[source]

Standard deviation along selected axis

Args:
axis: int

Axis along which metric is computed

Returns:
: array-like

Standard deviation of the data array

view_allowed_transforms()[source]

View allowed transformations for this audio object.

Returns:
: list

List of allowed transformations