create_rndm_selections

ketos.data_handling.selection_table.create_rndm_selections(files, length, num, label=0, annotations=None, no_overlap=False, trim_table=False, buffer=0)[source]

Create selections of uniform length, randomly distributed across the data set and not overlapping with any annotations. The created selections will have a label value defined by the ‘label’ parameter.

The random sampling is performed without regard to already created selections. Therefore, it is in principle possible that some of the created selections will overlap, although in practice this will only occur with very small probability, unless the number of requested selections (num) is very large and/or the (annotation-free part of) the data set is small in size.

To avoid any overlap, set the ‘no_overlap’ to True, but note that this can lead to longer execution times.

Use the ‘buffer’ argument to ensure a minimum separation between selections and the annotated segments. This can be useful if the annotation start and end times are not always fully accurate.

Args:
files: pandas DataFrame

Table with file durations in seconds. Should contain columns named ‘filename’ and ‘duration’.

length: float

Selection length in seconds.

num: int

Number of selections to be created.

label: int

Value to be assigned to the created selections.

annotations: pandas DataFrame

Annotation table. Optional.

no_overlap: bool

If True, randomly selected segments will have no overlap.

trim_table: bool

Keep only the columns prescribed by the Ketos annotation format.

buffer: float

Minimum separation in seconds between the background selections and the annotated segments. The default value is zero.

Returns:
table_backgr: pandas DataFrame

Output selection table.

Example:
>>> import pandas as pd
>>> import numpy as np
>>> from ketos.data_handling.selection_table import select
>>> 
>>> #Ensure reproducible results by fixing the random number generator seed.
>>> np.random.seed(3)
>>> 
>>> #Load and inspect the annotations.
>>> df = pd.read_csv("ketos/tests/assets/annot_001.csv")
>>> print(df)
    filename  start   end  label
0  file1.wav    7.0   8.1      1
1  file1.wav    8.5  12.5      0
2  file1.wav   13.1  14.0      1
3  file2.wav    2.2   3.1      1
4  file2.wav    5.8   6.8      1
5  file2.wav    9.0  13.0      0
>>>
>>> #Standardize annotation table format
>>> df = standardize(df, start_labels_at_1=True)
>>> print(df)
                    start   end  label
filename  annot_id                    
file1.wav 0           7.0   8.1      2
          1           8.5  12.5      1
          2          13.1  14.0      2
file2.wav 0           2.2   3.1      2
          1           5.8   6.8      2
          2           9.0  13.0      1
>>>
>>> #Enter file durations into a pandas DataFrame
>>> file_dur = pd.DataFrame({'filename':['file1.wav','file2.wav','file3.wav',], 'duration':[18.,20.,15.]})
>>> 
>>> #Create randomly sampled background selection with fixed 3.0-s length.
>>> df_bgr = create_rndm_selections(annotations=df, files=file_dur, length=3.0, num=12, trim_table=True) 
>>> print(df_bgr.round(2))
                  start    end  label
filename  sel_id                     
file1.wav 0        3.38   6.38      0
          1        3.89   6.89      0
file2.wav 0       16.52  19.52      0
file3.wav 0        0.29   3.29      0
          1        2.77   5.77      0
          2        3.23   6.23      0
          3        5.49   8.49      0
          4        5.63   8.63      0
          5        6.69   9.69      0
          6        6.71   9.71      0
          7        8.18  11.18      0
          8       10.33  13.33      0