standardize
- ketos.data_handling.selection_table.standardize(table=None, path=None, sep=',', mapper=None, labels=None, start_labels_at_1=False, unfold_labels=False, label_sep=',', trim_table=False, datetime_format=None)[source]
Standardize the annotation table format.
The input table can be passed as a pandas DataFrame or as the filename of a csv file. The table may have either a single label per row, in which case unfold_labels should be set to False, or multiple labels per row (e.g. as a comma-separated list of values), in which case unfold_labels should be set to True and label_sep should be specified.
The table headings are renamed to conform with the ketos standard naming convention, following the name mapping specified by the user.
Labels specified by the labels argument are mapped to integers 0,1,2,… and any remaining labels are mapped to -1.
Note that the labels can be mapped to 1,2,3,…. instead using the start_labels_at_1 argument. This can be useful if you want to reserve the label 0 for background/negative samples.
Note that the standardized output table has two levels of indices, the first index being the filename and the second index the annotation identifier.
The label mapping is stored as a class attribute named ‘label_dict’ within the output table and may be retrieved with df.attrs[‘label_dict’].
- Args:
- table: pandas DataFrame
Annotation table.
- path: str
Full path to csv file containing the annotation table.
- sep: str
Separator. Only relevant if filename is specified. Default is “,”.
- mapper: dict
Dictionary mapping the standard ketos headings to the headings of the input table. It is also possible to specify mappings that involve mathematical/logical operations on the headings of the input table. For example, {“end”: “x[‘Start’] + x[‘Duration’]”}.
- labels: list, or list of lists
Labels of interest. Will be mapped to 0,1,2,… Several labels can be mapped to the same integer by using nested lists. For example, signal_labels=[A,[B,C]] would result in A being mapped to 0 and B and C both being mapped to 1. Any remaining labels not specified by the labels argument are mapped to -1.
- start_labels_at_1: bool
Map labels to 1,2,3,… instead of 0,1,2,… Default is False. Useful if you want to reserve the label 0 for background/negative samples.
- unfold_labels: bool
Should be set to True if any of the rows have multiple labels and False otherwise (default).
- label_sep: str
Character used to separate multiple labels. Only relevant if unfold_labels is set to True. Default is “,”.
- trim_table: bool
Keep only the columns prescribed by the Ketos annotation format and any additional columns specified in the mapper dictionary.
- datetime_format: str
String defining the date-time format. Example: %d_%m_%Y* would capture “14_3_1999.txt”. See https://pypi.org/project/datetime-glob/ for a list of valid directives. If specified, the method will look for a column named ‘datetime’ and, if found, attempt to parse the values in this column. If your datetime column has a different name, use the mapper argument to change its name to ‘datetime’. If the method does not find a column named ‘datetime’ it will attempt to parse the datetime information from the filename column.
- Returns:
- df: pandas DataFrame
Standardized annotation table