select

ketos.data_handling.selection_table.select(annotations, length, step=0, min_overlap=0, center=False, discard_long=False, keep_id=False, keep_freq=False, label=None, avoid_label=None, discard_outside=False, files=None)[source]

Generate a selection table by defining intervals of fixed length around annotated sections of the audio data. Each selection created in this way is characterized by a single, integer-valued, label.

This approach to generating selections lends itself well to cases in which the annotated sections are well separated and rarely overlap. If this is not the case, you may find the related function data_handling.selection_table.select_by_segmenting() more useful.

By default all annotated sections are used for generating selections except those with label -1 which are ignored. Use the label argument to only generate selections for specific labels.

Conversely, the argument avoid_label can be used to ensure that the generated selections do not overlap with annotated sections with specific labels. For example, if label=[1,2] and avoid_label=[4], selections will be generated for every annotated section with label 1 or 2, but any selection that happens to overlap with an annotated sections with label 4 will be discarded.

The input table must have the standardized Ketos format and contain call-level annotations, see data_handling.selection_table.standardize().

The output table uses two levels of indexing, the first level being the filename and the second level being a selection id.

The generated selections have uniform length given by the length argument. Annotated sections longer than the specified length will be cropped (unless discard_long=True) whereas shorter sections will be extended to achieve the specified length.

The step and min_overlap arguments may be used to generate multiple, time-shifted selections for every annotated sections.

Note that the selections may have negative start times and/or end times that exceed the file duration, unless discard_outside=True in which case only selections with start times and end times within the file duration are returned.

Args:
annotations: pandas DataFrame

Input table with call-level annotations.

length: float

Selection length in seconds.

step: float

Produce multiple selections for each annotated section by shifting the selection window in steps of length step (in seconds) both forward and backward in time. The default value is 0.

min_overlap: float

Minimum required overlap between the selection and the annotated section, expressed as a fraction of whichever of the two is shorter. Only used if step > 0.

center: bool

Center annotations. Default is False.

discard_long: bool

Discard all annotations longer than the output length. Default is False.

keep_id: bool

For each generated selection, include the id of the annotation from which the selection was generated.

keep_freq: bool

For each generated selection, include the min and max frequency, if known.

label: int or list(int)

Only create selections for annotated sections with these labels.

avoid_label: int, list(int) or str

Avoid overlap with annotated sections with these labels. If overlap is to be avoided with all other labels but the labels specified by the label argument, set avoid_label=”ALL”.

discard_outside: bool

Discard selections that extend beyond file duration. Requires that a file duration table is specified via the files argument.

files: pandas DataFrame

Table with file durations in seconds. Must contain columns named ‘filename’ and ‘duration’. Only required if discard_outside=True.

Results:
df: pandas DataFrame

Output selection table.

Example:
>>> import pandas as pd
>>> from ketos.data_handling.selection_table import select, standardize
>>> 
>>> #Load and inspect the annotations.
>>> df = pd.read_csv("ketos/tests/assets/annot_001.csv")
>>>
>>> #Standardize annotation table format
>>> df = standardize(df, start_labels_at_1=True)
>>> print(df)
                    start   end  label
filename  annot_id                    
file1.wav 0           7.0   8.1      2
          1           8.5  12.5      1
          2          13.1  14.0      2
file2.wav 0           2.2   3.1      2
          1           5.8   6.8      2
          2           9.0  13.0      1
>>> 
>>> #Create a selection table by defining intervals of fixed 
>>> #length around every annotation.
>>> #Set the length to 3.0 sec and require a minimum overlap of 16%
>>> #between selection and annotations.
>>> #Also, create multiple time-shifted versions of the same selection
>>> #using a step size of 1.0 sec.     
>>> df_sel = select(df, length=3.0, step=1.0, min_overlap=0.16, center=True, keep_id=True) 
>>> print(df_sel.round(2))
                  label  start    end  annot_id
filename  sel_id                               
file1.wav 0           2   5.05   8.05         0
          1           1   6.00   9.00         1
          2           2   6.05   9.05         0
          3           1   7.00  10.00         1
          4           2   7.05  10.05         0
          5           1   8.00  11.00         1
          6           1   9.00  12.00         1
          7           1  10.00  13.00         1
          8           1  11.00  14.00         1
          9           2  11.05  14.05         2
          10          1  12.00  15.00         1
          11          2  12.05  15.05         2
          12          2  13.05  16.05         2
file2.wav 0           2   0.15   3.15         0
          1           2   1.15   4.15         0
          2           2   2.15   5.15         0
          3           2   3.80   6.80         1
          4           2   4.80   7.80         1
          5           2   5.80   8.80         1
          6           1   6.50   9.50         2
          7           1   7.50  10.50         2
          8           1   8.50  11.50         2
          9           1   9.50  12.50         2
          10          1  10.50  13.50         2
          11          1  11.50  14.50         2
          12          1  12.50  15.50         2