BaseAudio

class ketos.audio.base_audio.BaseAudio(data, filename='', offset=0, duration=None, label=None, annot=None, transforms=None, transform_log=None, **kwargs)[source]

Parent class for all audio classes.

While the underlying data array can be accessed via the data attribute, it is recommended to always use the get_data() function to access the data array, i.e.,

>>> from ketos.audio.base_audio import BaseAudio
>>> x = np.ones(6)
>>> audio_sample = BaseAudio(data=x)
>>> audio_sample.get_data()
array([1., 1., 1., 1., 1., 1.])

Args:

data: numpy array: Data
filename: str: Filename of the original data file, if available (optional)
offset: float: Position within the original data file, in seconds measured from the start of the file. Defaults to 0 if not specified.
duration: float: Duration in seconds.
label: int: Spectrogram label. Optional
annot: AnnotationHandler: AnnotationHandler object. Optional
transforms: list(dict): List of dictionaries, where each dictionary specifies the name of a transformation and its arguments, if any. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

Attributes:

data: numpy array: Data
ndim: int: Dimensionality of data.
filename: str: Filename of the original data file, if available (optional)
offset: float: Position within the original data file, in seconds measured from the start of the file. Defaults to 0 if not specified.
label: int: Data label.
annot: AnnotationHandler or pandas DataFrame: AnnotationHandler object.
allowed_transforms: dict: Transforms that can be applied via the apply_transform method
transform_log: list: List of transforms that have been applied to this object

Methods

`adjust_range`([range])	Applies a linear transformation to the data array that puts the values within the specified range.
`annotate`(**kwargs)	Add an annotation or a collection of annotations.
`apply_transforms`(transforms)	Apply specified transforms to the audio object.
`average`([axis])	Average value along selected axis
`deepcopy`()	Make a deep copy of the present instance
`duration`()	Data array duration in seconds
`get`()	Get a copy of this instance
`get_annotations`()	Get annotations.
`get_data`()	Get underlying data.
`get_filename`()	Get filename.
`get_instance_attrs`()	Get instance attributes
`get_kwargs`()	Get keyword arguments required to create a copy of this instance.
`get_label`([id])	Get label.
`get_offset`()	Get offset.
`get_repres_attrs`()	Get audio representation attributes
`infer_shape`(**kwargs)	Infers the data shape that would result if the class were instantiated with a specific set of parameter values.
`max`([axis])	Maximum data value along selected axis
`median`([axis])	Median value along selected axis
`min`([axis])	Minimum data value along selected axis
`normalize`([mean, std])	Normalize the data array to specified mean and standard deviation.
`std`([axis])	Standard deviation along selected axis
`view_allowed_transforms`()	View allowed transformations for this audio object.

adjust_range(range=(0, 1))[source]

Applies a linear transformation to the data array that puts the values within the specified range.

Args:

range: tuple(float,float): Minimum and maximum value of the desired range. Default is (0,1)

annotate(**kwargs)[source]

Add an annotation or a collection of annotations.

Input arguments are described in ketos.audio.annotation.AnnotationHandler.add()

apply_transforms(transforms)[source]

Apply specified transforms to the audio object.

Args:

transforms: list(dict): List of dictionaries, where each dictionary specifies the name of a transformation and its arguments, if any. For example, {“name”:”normalize”, “mean”:0.5, “std”:1.0}

Returns:

None

Example:

>>> from ketos.audio.waveform import Waveform
>>> # read audio signal from wav file
>>> wf = Waveform.from_wav('ketos/tests/assets/grunt1.wav')
>>> # print allowed transforms
>>> wf.view_allowed_transforms()
['normalize', 'adjust_range', 'crop', 'add_gaussian_noise', 'bandpass_filter']
>>> # apply gaussian normalization followed by cropping
>>> transforms = [{'name':'normalize','mean':0.5,'std':1.0},{'name':'crop','start':0.2,'end':0.7}]
>>> wf.apply_transforms(transforms)
>>> # inspect record of applied transforms 
>>> wf.transform_log
[{'name': 'normalize', 'mean': 0.5, 'std': 1.0}, {'name': 'crop', 'start': 0.2, 'end': 0.7, 'length': None}]

average(axis=0)[source]

Average value along selected axis

Args:

axis: int: Axis along which metric is computed

Returns:

: array-like: Average value of the data array

deepcopy()[source]

Make a deep copy of the present instance

See https://docs.python.org/2/library/copy.html

Returns:

: BaseAudio: Deep copy.

duration()[source]

Data array duration in seconds

TODO: rename to get_duration()

Returns:

: float: Duration in seconds

get()[source]: Get a copy of this instance

get_annotations()[source]

Get annotations.

Returns:

: pandas DataFrame: Annotations

get_data()[source]

Get underlying data.

Returns:

: numpy array: Data array

get_filename()[source]

Get filename.

Returns:

: string: Filename

get_instance_attrs()[source]: Get instance attributes

get_kwargs()[source]

Get keyword arguments required to create a copy of this instance.

Does not include the data array and annotation handler.

get_label(id=None)[source]

Get label.

Returns:

: int: Label

get_offset()[source]

Get offset.

Returns:

: float: Offset

get_repres_attrs()[source]: Get audio representation attributes

static infer_shape(**kwargs)[source]

Infers the data shape that would result if the class were instantiated with a specific set of parameter values.

Returns a None value if duration or rate are not specified.

Args:

duration: float: Duration in seconds
rate: float: Sampling rate in Hz

Returns:

: tuple: Inferred shape. If the parameter value do not allow the shape be inferred, a None value is returned.

max(axis=0)[source]

Maximum data value along selected axis

Args:

axis: int: Axis along which metric is computed

Returns:

: array-like: Maximum value of the data array

median(axis=0)[source]

Median value along selected axis

Args:

axis: int: Axis along which metric is computed

Returns:

: array-like: Median value of the data array

min(axis=0)[source]

Minimum data value along selected axis

Args:

axis: int: Axis along which metric is computed

Returns:

: array-like: Minimum value of the data array

normalize(mean=0, std=1)[source]

Normalize the data array to specified mean and standard deviation.

For the data array to be normalizable, it must have non-zero standard deviation. If this is not the case, the array is unchanged by calling this method.

Args:

mean: float: Mean value of the normalized array. The default is 0.
std: float: Standard deviation of the normalized array. The default is 1.

std(axis=0)[source]

Standard deviation along selected axis

Args:

axis: int: Axis along which metric is computed

Returns:

: array-like: Standard deviation of the data array

view_allowed_transforms()[source]

View allowed transformations for this audio object.

Returns:

: list: List of allowed transformations