select_by_segmenting
- ketos.data_handling.selection_table.select_by_segmenting(files, length, annotations=None, step=None, pad=True, discard_empty=False, keep_only_empty=False, label_empty=0, avoid_label=None)[source]
Generate a selection table by stepping across the audio files, using a fixed step size (step) and fixed selection window size (length).
Unlike the
data_handling.selection_table.select()
method, selections created by this method are not characterized by a single, integer-valued label, but rather a list of annotations (which can have any length, including zero).Therefore, the method returns not one, but two tables: A selection table indexed by filename and segment id, and an annotation table indexed by filename, segment id, and annotation id.
However, if keep_only_empty=True only a selection table is returned. This table has a column named label with all entries having the same value, as specified via the label_empty argument.
- Args:
- files: pandas DataFrame
Table with file durations in seconds. Should contain columns named ‘filename’ and ‘duration’.
- length: float
Selection length in seconds.
- annotations: pandas DataFrame
Annotation table.
- step: float
Selection step size in seconds. If None, the step size is set equal to the selection length.
- pad: bool
If True (default), the last selection window is allowed to extend beyond the endpoint of the audio file.
- discard_empty: bool
If True, only selection that contain annotations will be used. If False (default), all selections are used.
- keep_only_empty: bool
If True, only selections without any annotations are used, and only the selections table is returned. Default is False.
- label_empty: int
Only relevant if keep_only_empty is True. Value to be assigned to selections without annotations. Default is 0.
- avoid_label: int or list(int)
If specified, only selections without annotations with these labels are used.
- Returns:
- sel: pandas DataFrame
Selection table
- annot: pandas DataFrame
Annotations table. Only returned if annotations is specified and keep_only_empty is False.
- Example:
>>> import pandas as pd >>> from ketos.data_handling.selection_table import select_by_segmenting, standardize >>> >>> #Load and inspect the annotations. >>> annot = pd.read_csv("ketos/tests/assets/annot_001.csv") >>> >>> #Standardize annotation table format >>> annot = standardize(annot, labels={0:1, 1:2}) >>> print(annot) start end label filename annot_id file1.wav 0 7.0 8.1 2 1 8.5 12.5 1 2 13.1 14.0 2 file2.wav 0 2.2 3.1 2 1 5.8 6.8 2 2 9.0 13.0 1 >>> >>> #Create file table >>> files = pd.DataFrame({'filename':['file1.wav', 'file2.wav', 'file3.wav'], 'duration':[11.0, 19.2, 15.1]}) >>> print(files) filename duration 0 file1.wav 11.0 1 file2.wav 19.2 2 file3.wav 15.1 >>> >>> #Create a selection table by splitting the audio data into segments of >>> #uniform length. The length is set to 10.0 sec and the step size to 5.0 sec. >>> sel = select_by_segmenting(files=files, length=10.0, annotations=annot, step=5.0) >>> #Inspect the selection table >>> print(sel[0].round(2)) start end filename sel_id file1.wav 0 0.0 10.0 1 5.0 15.0 file2.wav 0 0.0 10.0 1 5.0 15.0 2 10.0 20.0 file3.wav 0 0.0 10.0 1 5.0 15.0 2 10.0 20.0 >>> #Inspect the annotations >>> print(sel[1].round(2)) start end label filename sel_id annot_id file1.wav 0 0 7.0 8.1 2 1 8.5 12.5 1 1 0 2.0 3.1 2 1 3.5 7.5 1 2 8.1 9.0 2 2 1 -1.5 2.5 1 2 3.1 4.0 2 file2.wav 0 0 2.2 3.1 2 1 5.8 6.8 2 2 9.0 13.0 1 1 1 0.8 1.8 2 2 4.0 8.0 1 2 2 -1.0 3.0 1