standardize

ketos.data_handling.selection_table.standardize(annotations=None, sep=',', labels='auto', unfold_labels=False, label_sep=',', trim_table=False, datetime_format=None, table=None, path=None)[source]

Standardize the annotation table format.

The input table can be passed as a pandas DataFrame or as the filename of a csv file. The table may have either a single label per row, in which case unfold_labels should be set to False, or multiple labels per row (e.g. as a comma-separated list of values), in which case unfold_labels should be set to True and label_sep should be specified.

The table headings are renamed to conform with the ketos standard naming convention, following the name mapping specified by the user.

Note that the standardized output table has two levels of indices, the first index being the filename and the second index the annotation identifier.

The label mapping is stored as a class attribute named ‘label_dict’ within the output table and may be retrieved with df.attrs[‘label_dict’].

Required Columns:

‘filename’: The name or path of the file associated with each annotation.
‘label’: The label or category associated with each annotation. If unfold_labels is True,
this column may contain multiple labels separated by label_sep.

Optional Columns (depending on usage): - ‘start’: The start time or position of the annotation. - ‘end’: The end time or position of the annotation.

Args:

annotations: str, pandas DataFrame

If a string, it is assumed to be the path to a CSV file containing the annotation table. If a pandas DataFrame, it is used directly as the annotation table.

sep: str

Separator. Only relevant if filename is specified. Default is “,”.

labels: ‘auto’, None, dict, or list

‘auto’ (default): All unique labels in the table are automatically mapped to integers starting from 0.
None: No label mapping is applied, labels are left as-is.
dict: A user-specified mapping of labels to integers (Note that ketos expects labels to be incremental integers starting with 0).
list: A subset of labels to map to integers starting from 0.

Any unspecified label is mapped to -1.

unfold_labels: bool

Should be set to True if any of the rows have multiple labels and False otherwise (default).

label_sep: str

Character used to separate multiple labels. Only relevant if unfold_labels is set to True. Default is “,”.

trim_table: bool

Keep only the columns prescribed by the Ketos annotation format and any additional columns specified in the mapper dictionary.

datetime_format: str

String defining the date-time format. Example: %d_%m_%Y* would capture “14_3_1999.txt”. See https://pypi.org/project/datetime-glob/ for a list of valid directives. If specified, the method will look for a column named ‘datetime’ and, if found, attempt to parse the values in this column. If your datetime column has a different name, use the mapper argument to change its name to ‘datetime’. If the method does not find a column named ‘datetime’ it will attempt to parse the datetime information from the filename column.

table: pandas DataFrame (deprecated)

Deprecated. Use ‘annotations’ instead.

path: str (deprecated)

Deprecated. Use ‘annotations’ instead.

Returns:

df: pandas DataFrame: Standardized annotation table

Note:

The function assumes that the necessary preprocessing (e.g., renaming columns to match the expected names) has been done prior to calling this function. It is the responsibility of the user to ensure that the input table conforms to the expected format.