check_data_sanity

ketos.data_handling.data_handling.check_data_sanity(images, labels)[source]
Check that all images have same size, all labels have values,

and number of images and labels match.

Args:
images: numpy array or pandas series

Images

labels: numpy array or pandas series

Labels

Raises:
ValueError:

If no images or labels are passed; If the number of images and labels is different; If images have different shapes; If any labels are NaN.

Returns:

True if all checks pass.

Examples:
>>> from ketos.data_handling.data_handling import check_data_sanity
>>> # Load a database with images and integer labels
>>> data = pd.read_pickle("ketos/tests/assets/pd_img_db.pickle")
>>> images = data['image']
>>> labels = data['label']
>>> # When all the images and labels  pass all the quality checks,
>>> # The function returns True            
>>> check_data_sanity(images, labels)
True
>>> # If something is wrong, like if the number of labels
>>> # is different from the number of images, and exeption is raised
>>> labels = data['label'][:10] 
>>> check_data_sanity(images, labels=labels)
Traceback (most recent call last):
    File "/usr/lib/python3.6/doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
    File "<doctest data_handling.check_data_sanity[5]>", line 1, in <module>
        check_data_sanity(images, labels=labels)
    File "ketos/data_handling/data_handling.py", line 599, in check_data_sanity
        raise ValueError("Image and label columns have different lengths")
ValueError: Image and label columns have different lengths