standardize

ketos.data_handling.selection_table.standardize(table=None, path=None, sep=',', mapper=None, labels=None, start_labels_at_1=False, unfold_labels=False, label_sep=',', trim_table=False, datetime_format=None)[source]

Standardize the annotation table format.

The input table can be passed as a pandas DataFrame or as the filename of a csv file. The table may have either a single label per row, in which case unfold_labels should be set to False, or multiple labels per row (e.g. as a comma-separated list of values), in which case unfold_labels should be set to True and label_sep should be specified.

The table headings are renamed to conform with the ketos standard naming convention, following the name mapping specified by the user.

Labels specified by the labels argument are mapped to integers 0,1,2,… and any remaining labels are mapped to -1.

Note that the labels can be mapped to 1,2,3,…. instead using the start_labels_at_1 argument. This can be useful if you want to reserve the label 0 for background/negative samples.

Note that the standardized output table has two levels of indices, the first index being the filename and the second index the annotation identifier.

The label mapping is stored as a class attribute named ‘label_dict’ within the output table and may be retrieved with df.attrs[‘label_dict’].

Args:
table: pandas DataFrame

Annotation table.

path: str

Full path to csv file containing the annotation table.

sep: str

Separator. Only relevant if filename is specified. Default is “,”.

mapper: dict

Dictionary mapping the standard ketos headings to the headings of the input table. It is also possible to specify mappings that involve mathematical/logical operations on the headings of the input table. For example, {“end”: “x[‘Start’] + x[‘Duration’]”}.

labels: list, or list of lists

Labels of interest. Will be mapped to 0,1,2,… Several labels can be mapped to the same integer by using nested lists. For example, signal_labels=[A,[B,C]] would result in A being mapped to 0 and B and C both being mapped to 1. Any remaining labels not specified by the labels argument are mapped to -1.

start_labels_at_1: bool

Map labels to 1,2,3,… instead of 0,1,2,… Default is False. Useful if you want to reserve the label 0 for background/negative samples.

unfold_labels: bool

Should be set to True if any of the rows have multiple labels and False otherwise (default).

label_sep: str

Character used to separate multiple labels. Only relevant if unfold_labels is set to True. Default is “,”.

trim_table: bool

Keep only the columns prescribed by the Ketos annotation format and any additional columns specified in the mapper dictionary.

datetime_format: str

String defining the date-time format. Example: %d_%m_%Y* would capture “14_3_1999.txt”. See https://pypi.org/project/datetime-glob/ for a list of valid directives. If specified, the method will look for a column named ‘datetime’ and, if found, attempt to parse the values in this column. If your datetime column has a different name, use the mapper argument to change its name to ‘datetime’. If the method does not find a column named ‘datetime’ it will attempt to parse the datetime information from the filename column.

Returns:
df: pandas DataFrame

Standardized annotation table