create_rndm_selections
- ketos.data_handling.selection_table.create_rndm_selections(files, length, num, label=0, annotations=None, no_overlap=False, trim_table=False, buffer=0)[source]
Create selections of uniform length, randomly distributed across the data set and not overlapping with any annotations. The created selections will have a label value defined by the ‘label’ parameter.
The random sampling is performed without regard to already created selections. Therefore, it is in principle possible that some of the created selections will overlap, although in practice this will only occur with very small probability, unless the number of requested selections (num) is very large and/or the (annotation-free part of) the data set is small in size.
To avoid any overlap, set the ‘no_overlap’ to True, but note that this can lead to longer execution times.
Use the ‘buffer’ argument to ensure a minimum separation between selections and the annotated segments. This can be useful if the annotation start and end times are not always fully accurate.
- Args:
- files: pandas DataFrame
Table with file durations in seconds. Should contain columns named ‘filename’ and ‘duration’.
- length: float
Selection length in seconds.
- num: int
Number of selections to be created.
- label: int
Value to be assigned to the created selections.
- annotations: pandas DataFrame
Annotation table. Optional.
- no_overlap: bool
If True, randomly selected segments will have no overlap.
- trim_table: bool
Keep only the columns prescribed by the Ketos annotation format.
- buffer: float
Minimum separation in seconds between the background selections and the annotated segments. The default value is zero.
- Returns:
- table_backgr: pandas DataFrame
Output selection table.
- Example:
>>> import pandas as pd >>> import numpy as np >>> from ketos.data_handling.selection_table import select >>> >>> #Ensure reproducible results by fixing the random number generator seed. >>> np.random.seed(3) >>> >>> #Load and inspect the annotations. >>> df = pd.read_csv("ketos/tests/assets/annot_001.csv") >>> print(df) filename start end label 0 file1.wav 7.0 8.1 1 1 file1.wav 8.5 12.5 0 2 file1.wav 13.1 14.0 1 3 file2.wav 2.2 3.1 1 4 file2.wav 5.8 6.8 1 5 file2.wav 9.0 13.0 0 >>> >>> #Standardize annotation table format >>> df = standardize(annotations=df, labels={0:1, 1:2}) # Standardize annotation table format (we want to create background random segments with label 0, so lets map the labels we have to 1 and 2) >>> print(df) start end label filename annot_id file1.wav 0 7.0 8.1 2 1 8.5 12.5 1 2 13.1 14.0 2 file2.wav 0 2.2 3.1 2 1 5.8 6.8 2 2 9.0 13.0 1 >>> >>> #Enter file durations into a pandas DataFrame >>> file_dur = pd.DataFrame({'filename':['file1.wav','file2.wav','file3.wav',], 'duration':[18.,20.,15.]}) >>> >>> #Create randomly sampled background selection with fixed 3.0-s length. >>> df_bgr = create_rndm_selections(annotations=df, files=file_dur, length=3.0, num=12, trim_table=True) >>> print(df_bgr.round(2)) start end label filename sel_id file1.wav 0 3.38 6.38 0 1 3.89 6.89 0 file2.wav 0 16.52 19.52 0 file3.wav 0 0.29 3.29 0 1 2.77 5.77 0 2 3.23 6.23 0 3 5.49 8.49 0 4 5.63 8.63 0 5 6.69 9.69 0 6 6.71 9.71 0 7 8.18 11.18 0 8 10.33 13.33 0