create_rndm_selections

ketos.data_handling.selection_table.create_rndm_selections(files, length, num, label=0, annotations=None, no_overlap=False, trim_table=False, buffer=0)[source]

Create selections of uniform length, randomly distributed across the data set and not overlapping with any annotations. The created selections will have a label value defined by the ‘label’ parameter.

The random sampling is performed without regard to already created selections. Therefore, it is in principle possible that some of the created selections will overlap, although in practice this will only occur with very small probability, unless the number of requested selections (num) is very large and/or the (annotation-free part of) the data set is small in size.

To avoid any overlap, set the ‘no_overlap’ to True, but note that this can lead to longer execution times.

Use the ‘buffer’ argument to ensure a minimum separation between selections and the annotated segments. This can be useful if the annotation start and end times are not always fully accurate.

Args:

files: pandas DataFrame: Table with file durations in seconds. Should contain columns named ‘filename’ and ‘duration’.
length: float: Selection length in seconds.
num: int: Number of selections to be created.
label: int: Value to be assigned to the created selections.
annotations: pandas DataFrame: Annotation table. Optional.
no_overlap: bool: If True, randomly selected segments will have no overlap.
trim_table: bool: Keep only the columns prescribed by the Ketos annotation format.
buffer: float: Minimum separation in seconds between the background selections and the annotated segments. The default value is zero.

Returns:

table_backgr: pandas DataFrame: Output selection table.

Example:

>>> import pandas as pd
>>> import numpy as np
>>> from ketos.data_handling.selection_table import select
>>> 
>>> #Ensure reproducible results by fixing the random number generator seed.
>>> np.random.seed(3)
>>> 
>>> #Load and inspect the annotations.
>>> df = pd.read_csv("ketos/tests/assets/annot_001.csv")
>>> print(df)
    filename  start   end  label
0  file1.wav    7.0   8.1      1
1  file1.wav    8.5  12.5      0
2  file1.wav   13.1  14.0      1
3  file2.wav    2.2   3.1      1
4  file2.wav    5.8   6.8      1
5  file2.wav    9.0  13.0      0
>>>
>>> #Standardize annotation table format
>>> df = standardize(annotations=df, labels={0:1, 1:2})  # Standardize annotation table format (we want to create background random segments with label 0, so lets map the labels we have to 1 and 2)
>>> print(df)
                    start   end  label
filename  annot_id                    
file1.wav 0           7.0   8.1      2
          1           8.5  12.5      1
          2          13.1  14.0      2
file2.wav 0           2.2   3.1      2
          1           5.8   6.8      2
          2           9.0  13.0      1
>>>
>>> #Enter file durations into a pandas DataFrame
>>> file_dur = pd.DataFrame({'filename':['file1.wav','file2.wav','file3.wav',], 'duration':[18.,20.,15.]})
>>> 
>>> #Create randomly sampled background selection with fixed 3.0-s length.
>>> df_bgr = create_rndm_selections(annotations=df, files=file_dur, length=3.0, num=12, trim_table=True) 
>>> print(df_bgr.round(2))
                  start    end  label
filename  sel_id                     
file1.wav 0        3.38   6.38      0
          1        3.89   6.89      0
file2.wav 0       16.52  19.52      0
file3.wav 0        0.29   3.29      0
          1        2.77   5.77      0
          2        3.23   6.23      0
          3        5.49   8.49      0
          4        5.63   8.63      0
          5        6.69   9.69      0
          6        6.71   9.71      0
          7        8.18  11.18      0
          8       10.33  13.33      0