Tutorial: Ketos Run

The ‘ketos-run’ module in the Ketos suite facilitates running pretrained neural networks through audio files to yield detection outputs.

Quick Start

This quick start guide will walk you through the process of using our pretrained neural network model to classify NARW (North Atlantic Right Whales) upcalls within three audio files. At the end of this process, you will obtain a .csv file containing the detection outputs.

You can find the model and the audio files at the following locations:

  • Download the pre-trained NARW model: narw.kt

  • Download the sample audio data: data

If you unzip data.zip, you will find the audio folder, with three .wav files with 30 minutes each.

To execute the trained model as a detector, use the ‘ketos-run’ command. After extracting the downloaded data, issue the following command in your CLI:

For Windows:

ketos-run models\\narw_resnet.kt sample_audio\\data\\ --output detections\\detections.csv

For Linux / Mac:

ketos-run models/narw_resnet.kt sample_audio/data/ --output detections/detections.csv

The command will save your detections in a new file named ‘detections.csv’. Please note that some of these detections will be true positives (accurate detections), while others will be false positives (incorrect detections). You can compare these detections with the correct annotations available in the ‘sample_audio/’ folder.

Note

For a more detailed walkthrough on how to use each module, please refer to the Examples section.

The ketos-run command has a few parameters that can be used to adjust its bahavior.

ketos-run -h

ketos_run.py [-h] [--file_list FILE_LIST] [--table_name TABLE_NAME]
                [--output_folder OUTPUT_FOLDER] [--overwrite OVERWRITE]
                [--step_size STEP_SIZE] [--threshold THRESHOLD]
                [--merge_detections MERGE_DETECTIONS] [--buffer BUFFER]
                [--running_avg RUNNING_AVG] [--batch_size BATCH_SIZE]
                [--output_function_arguments [OUTPUT_FUNCTION_ARGUMENTS ...]]
                model_file audio_data

positional arguments:

model_file      Path to the ketos model file (*.kt)
audio_data      Path to either a folder with audio files or an HDF5
                database file with the data to process.

options:

-h, --help      show this help message and exit
--file_list FILE_LIST
                A .csv or .txt file where each row (or line) is the
                name of a file to detect within the audio folder. By
                default, all files will be processed. Not relevant if
                audio_data is an HDF5 file.
--table_name TABLE_NAME
                Table name within the HDF5 database where the data is
                stored. Must start with a foward slash. For instance
                '/test'. If not given, the root '/' path will be used.
                Not relevant if audio_data is a fodler with audio
                files.
--output_folder OUTPUT_FOLDER
                Location to output the detections. For instance:
                detections/
--overwrite OVERWRITE
                Overwrites the detections, otherwise appends to it.
--step_size STEP_SIZE
                Step size in seconds. If not specified, the step size
                is set equal to the duration of the audio
                representation.
--threshold THRESHOLD
                The threshold value used to determine the cut-off
                point for detections. This is a floating-point value
                between 0 and 1. A detection is considered positive if
                its score is above this threshold. The default value
                is 0.5.
--merge_detections MERGE_DETECTIONS
                A flag indicating whether to merge overlapping
                detections into a single detection. If set to True,
                overlapping detections are merged. The default value
                is False, meaning detections are kept separate.
--buffer BUFFER
                The buffer duration to be added to each detection in
                seconds. This helps to extend the start and end times
                of each detection to include some context around the
                detected event. The default value is 0.0, which means
                no buffer is added.
--running_avg RUNNING_AVG
                Compute a running average of the scores over a
                specified window size in frames. Must be an odd
                integer.
--batch_size BATCH_SIZE
                How many samples will be loaded into memory. Lower
                this number if you are running into out of memory
                problems.
--output_function_arguments [OUTPUT_FUNCTION_ARGUMENTS ...]
                Output function arguments. If you created a custom
                output transform function, you can use this option to
                pass any arguments to it. Usage:
                --output_function_arguments arg1=value1 arg2=value2

Examples

While a detector’s implementation can vary based on the workflow, this tutorial focuses on constructing a detector that processes .wav files using a trained network. This process will output a list of North Atlantic Right Whale (NARW) upcall detections in a .csv file.

To accomplish this, we will use the trained network and a folder containing the audio files for processing. The program will sequentially segment the audio data into 3-second intervals and execute the trained model. All NARW detections will subsequently be recorded in a .csv file.

Now, let’s execute our trained model on the data we’ve just downloaded:

For Windows:

ketos-run trained_models\\my_model.kt sample_audio\\data\\ --output detections\\detections.csv

For Mac / Linux:

ketos-run trained_models/my_model.kt sample_audio/data/ --output detections/detections.csv

The results from executing our model on the audio files are stored in ‘detections.csv’. This output delineates which 3-second segments received a score higher than the set threshold (a default of 0.5 was used), i.e., the segments that the model identified as a NARW sound.

Command breakdown:

  • ‘trained_models/my_model.kt’: This is the path to the previously trained model.

  • ‘sample_audio/data/’: This is the path to the folder containing the continuous audio files.

  • ‘–output detections/detections.csv’: This defines the output detection file where the detections will be saved.

Note

The program runs through non-overlapping segments of the audio files. However, you can modify this behavior to utilize overlapping windows by setting the ‘–step_size’ parameter.