check_data_sanity
- ketos.data_handling.data_handling.check_data_sanity(images, labels)[source]
- Check that all images have same size, all labels have values,
and number of images and labels match.
- Args:
- images: numpy array or pandas series
Images
- labels: numpy array or pandas series
Labels
- Raises:
- ValueError:
If no images or labels are passed; If the number of images and labels is different; If images have different shapes; If any labels are NaN.
- Returns:
True if all checks pass.
- Examples:
>>> from ketos.data_handling.data_handling import check_data_sanity >>> # Load a database with images and integer labels >>> data = pd.read_pickle("ketos/tests/assets/pd_img_db.pickle") >>> images = data['image'] >>> labels = data['label'] >>> # When all the images and labels pass all the quality checks, >>> # The function returns True >>> check_data_sanity(images, labels) True >>> # If something is wrong, like if the number of labels >>> # is different from the number of images, and exeption is raised >>> labels = data['label'][:10] >>> check_data_sanity(images, labels=labels) Traceback (most recent call last): File "/usr/lib/python3.6/doctest.py", line 1330, in __run compileflags, 1), test.globs) File "<doctest data_handling.check_data_sanity[5]>", line 1, in <module> check_data_sanity(images, labels=labels) File "ketos/data_handling/data_handling.py", line 599, in check_data_sanity raise ValueError("Image and label columns have different lengths") ValueError: Image and label columns have different lengths