Tutorial: Ketos Run

The ‘ketos-run’ module in the Ketos suite facilitates running pretrained neural networks through audio files to yield detection outputs.

Quick Start

This quick start guide will walk you through the process of using our pretrained neural network model to detect NARW (North Atlantic Right Whales) upcalls within three audio files. At the end of this process, you will obtain a .csv file containing the detection outputs.

You can find the model and the audio files at the following locations:

Download the pre-trained NARW model: narw.kt
Download the sample audio data: data

Uzip data.zip to find the audio folder, with three .wav files with 30 minutes each.

To execute the trained model as a detector, use the ‘ketos-run’ command. After extracting the downloaded data, issue the following command in your CLI:

For Windows:

.. code-block:: shell

ketos-run narw_resnet.kt data\audio\ –output_folder detections –labels 1

For Linux / Mac:

.. code-block:: shell

ketos-run narw_resnet.kt data/audio/ –output_folder detections –labels 1

The command will save your detections in a new file named ‘detections.csv’ inside a folder called detections. Please note that some of these detections will be true positives (accurate detections), while others will be false positives (incorrect detections). You can compare these detections with the correct annotations available in the ‘data/’ folder.

Note

For a more detailed walkthrough on how to use each module, please refer to the Examples section.

The ketos-run command has a few parameters that can be used to adjust its bahavior.

ketos-run -h

ketos_run.py [-h] [--file_list FILE_LIST] [--table_name TABLE_NAME]
            [--output_folder OUTPUT_FOLDER] [--overwrite OVERWRITE]
            [--log_file LOG_FILE] [--step_size STEP_SIZE] [--threshold THRESHOLD]
            [--labels LABELS] [--merge_detections MERGE_DETECTIONS]
            [--buffer BUFFER] [--running_avg RUNNING_AVG]
            [--highest_score_only HIGHEST_SCORE_ONLY]
            [--batch_size BATCH_SIZE]
            [--output_function_arguments [OUTPUT_FUNCTION_ARGUMENTS ...]]
            model_file audio_data

positional arguments:

model_file      Path to the ketos model file (*.kt)
audio_data      Path to the audio data to be processed. This can be a
                path to a single audio file, a directory of audio
                files, or an HDF5 database.

options:

-h, --help      show this help message and exit
--file_list FILE_LIST
                A .csv or .txt file where each row (or line) is the
                name of a file to detect within the audio folder. By
                default, all files will be processed. Not relevant if
                audio_data is an HDF5 file.
--table_name TABLE_NAME
                Table name within the HDF5 database where the data is
                stored. Must start with a foward slash. For instance
                '/test'. If not given, the root '/' path will be used.
                Not relevant if audio_data is a fodler with audio
                files.
--output_folder OUTPUT_FOLDER
                Location to output the detections. For instance:
                detections/
--overwrite OVERWRITE
                Overwrites the detections, otherwise appends to it.
--log_file LOG_FILE
                Name of the log file to be created/used during
                the process. Defaults to "ketos-run.log".
--step_size STEP_SIZE
                Step size in seconds. If not specified, the step size
                is set equal to the duration of the audio
                representation.
--threshold THRESHOLD
                The threshold value used to determine the cut-off
                point for detections. This is a floating-point value
                between 0 and 1. A detection is considered positive if
                its score is above this threshold. The default value
                is 0.5.
--labels LABELS
                List or integer of labels to filter by. Example
                usage: --labels 1 or --labels [1,2,3].
                Defaults to None.
--merge_detections MERGE_DETECTIONS
                A flag indicating whether to merge overlapping
                detections into a single detection. If set to True,
                overlapping detections are merged. The default value
                is False, meaning detections are kept separate.
--buffer BUFFER
                The buffer duration to be added to each detection in
                seconds. This helps to extend the start and end times
                of each detection to include some context around the
                detected event. The default value is 0.0, which means
                no buffer is added.
--running_avg RUNNING_AVG
                Compute a running average of the scores over a
                specified window size in frames. Must be an odd
                integer.
--highest_score_only HIGHEST_SCORE_ONLY
                If True, will only return the label associated with
                the highest score even if more than one passes the
                threshold. Defaults to False.
--batch_size BATCH_SIZE
                How many samples will be loaded into memory. Lower
                this number if you are running into out of memory
                problems.
--output_function_arguments [OUTPUT_FUNCTION_ARGUMENTS ...]
                Output function arguments. If you created a custom
                output transform function, you can use this option to
                pass any arguments to it. Usage:
                --output_function_arguments arg1=value1 arg2=value2

Examples

Usually, there are two primary ways to run trained models. The first involves processing long, continuous audio files, where only small segments might contain the detection events of interest. The second approach is to run the model over clips in a database, typically to compute performance metrics. In this first example, we will focus on the first method. For examples concerning the second approach, you may refer to the section Clips.

While a detector’s implementation can vary based on the workflow, this tutorial focuses on constructing a detector that processes .wav files using a trained network. This process will output a list of North Atlantic Right Whale (NARW) upcall detections in a .csv file.

To accomplish this, we will use the trained network and a folder containing the audio files for processing. The program will sequentially segment the audio data into 3-second intervals and execute the trained model. All NARW detections will subsequently be recorded in a .csv file.

Example 1: Detecting North Atlantic Right Whales (NARW) over Audio Files

Lets resume from the example given in the Quick Start section.

For Windows:

.. code-block:: shell

ketos-run narw_resnet.kt data\audio\ –output_folder detections –labels 1

For Mac / Linux:

.. code-block:: shell

ketos-run narw_resnet.kt data/audio/ –output_folder detections –labels 1

The command will save your detections in a new file named ‘detections.csv’ inside a folder called detections. Please note that some of these detections will be true positives (accurate detections), while others will be false positives (incorrect detections). You can compare these detections with the correct annotations available in the ‘data/’ folder.

This approach is most commonly used in real-world applications where the goal is to detect a specific class and generate annotations. However, it may be less useful for computing performance metrics.

Command breakdown:

Lets break down the command:

narw_resnet.kt is the pre-trained ketos model.
data/audio/ is the path to the audio folder containing the recordings.
–output_folder detections is the path to the folder where the output will be stored.
–labels 1 instructs our model to only save the detections for label 1 (which corresponds to our NARW class).

The output folder generated by running the ketos-run command contains three different types of files, each serving a different purpose. Here’s a breakdown:

Detection CSV

This file is a CSV document that contains the processed output of the neural network. It includes:

Start Time: When the detected event starts in the audio file. End Time: When the detected event ends in the audio file. Associated Class: What class the detection belongs to Score: A confidence score that the neural network assigns to each detection.

This file is likely to be the most immediately useful if you are looking to review or analyze the detection results quickly.

Ketos-Run.log

This log file contains various kinds of metadata and diagnostic information regarding the run, such as:

Time the program started and ended Any warnings or errors that occurred Details about the input and output

Raw_Output.pkl

The raw_output.pkl is a pickle file, which is a way to serialize Python objects for later use. This file contains the raw, unprocessed output directly from the neural network. According to the tutorial, this output is organized into a dictionary with the following keys:

filename: The name of the audio file that was processed. start: Start time of the detection within that audio file. end: End time of the detection within that audio file. score: The raw score from the neural network for each detection.

The data in this file has not been filtered or thresholded, meaning that it contains all of the detections made by the network, regardless of confidence level. This is useful for more in-depth analysis or for applying different post-processing steps without rerunning the entire detection process.

In summary, these output files provide a multi-tiered way to understand, diagnose, and further manipulate the results of your audio analysis.

Note

The program runs through non-overlapping segments of the audio files. However, you can modify this behavior to utilize overlapping windows by setting the ‘–step_size’ parameter.

Example 2: Creating Detections for Performance Evaluation over Audio Files

In this second example, we will generate a series of detections ready to be fed into ketos-metrics for performance evaluation. We will use the same audio files as in the previous example.

For Windows:

.. code-block:: shell

ketos-run narw_resnet.kt data\audio\ –output_folder detections –labels 1 –threshold 0

For Mac / Linux:

.. code-block:: shell

ketos-run narw_resnet.kt data/audio/ –output_folder detections –labels 1 –threshold 0

The main difference here is that we set the detection threshold to 0 instead of using the default value of 0.5. Effectively, this allows every classification made by the model to be considered a detection, regardless of its score. While this approach may not be practical for generating annotations in a real-world application, it is useful for evaluating how the model’s performance varies with changing thresholds. This evaluation is automatically carried out by ketos-metrics.

Example 3: Resuming a stopped process

ketos-run processes audio files in batches and writes the results to the output folder. This feature allows you to interrupt and resume the process at any point. To test this, execute the original command and then stop the process at your convenience:

ketos-run narw_resnet.kt data/audio/ --output_folder detections --labels 1

To resume the operation, ketos-run uses the information in the log file to determine where to pick up from. Make sure to set the –overwrite option to False:

ketos-run narw_resnet.kt data/audio/ --output_folder detections --labels 1 --overwrite False

Clips

Example 1: Detecting North Atlantic Right Whales (NARW) over clips in an HDF5 database

Usually, when working with clips that contain specific audio segments, each clip will either have an associated label, or you may want your model to output a classification for each clip—regardless of whether it contains a vocalization or is simply background noise. In the following sections, we will explore various examples that showcase the flexibility of the ketos-run command to address these scenarios.

We will use the same data from the Creating a training database tutorial for these examples. This dataset comprises 3-second long clips, some of which contain right whale upcalls, while others feature only background noise. For this tutorial, the data has already been processed into an HDF5 dataset. You can find this dataset, along with the corresponding annotations, at the following location: Ketos Commands Data. Please ensure to download and extract the contents of the zip file.

Let’s use the same model from the previous examples. This time, instead of passing an audio folder to the command, we will use the database.h5 file, which already contains our clips processed into spectrograms.

ketos-run narw_resnet.kt database.h5 --table_name /val --highest_score_only True --output_folder detections

Command Breakdown:

Firstly, note that we are not passing a –labels argument this time. This is because each clip is associated with a class (1 for NARW and 0 for background), and we are working with a balanced dataset. In this case, we want the model to output labels for both classes.

In addition, we set the –highest_score_only parameter to True. This filters the results to include only the class with the highest score, even if more than one class has passed the threshold.

Let’s break down the command further:

narw_resnet.kt is the pre-trained Ketos model.
database.h5 is the HDF5 database file containing our processed audio clips.
–table_name /val is the location of the data in the HDF5 database.
–highest_score_only True instructs the model to only include the highest-scoring class for each detection, even if more than one class passes the threshold.
–output_folder detections specifies the path to the folder where the output will be stored.

Example 2: Detecting North Atlantic Right Whales (NARW) Using Audio Clips in a Folder

In the previous example, we worked with clips that were stored in an HDF5 database. You can achieve similar functionality using clips stored as audio files in a directory. To do so, execute the following command:

ketos-run narw_resnet.kt data/val/ --highest_score_only True --output_folder detections

narw_resnet.kt: This specifies the pre-trained Ketos model file. In this case, narw_resnet.kt is the model trained to detect NARW vocalizations.
data/val/: This is the path to the directory containing your audio clips. The command expects this directory to have audio files that the model will process.
–highest_score_only True: This argument ensures that, for each processed audio segment, only the class with the highest score is kept if it surpasses the threshold. This is particularly useful for focusing on the most likely predictions.
–output_folder detections: This specifies the directory where the detection results will be stored. In this case, all output files will be saved in a folder named detections.

When you run this command, the Ketos suite processes each audio file in the specified directory (data/val/), applies the NARW detection model (narw_resnet.kt), and saves the results in the detections folder, following the criteria set by –highest_score_only.