batch_load_audio_file_data
- ketos.neural_networks.dev_utils.detection.batch_load_audio_file_data(loader, batch_size, start_idx=0, logger=None)[source]
This function generates batches of audio data from a given AudioFrameLoader.
Each batch consists of spectrogram data, filename, start time and end time of each audio segment. The loader loads audio files, splits into smaller segments and convert them into spectrograms.
- Args:
- loader: ketos.audio.audio_loader.AudioFrameLoader
An AudioFrameLoader object that computes spectrograms from the audio files as requested.
- batch_size: int
The number of samples to include in each batch.
- start_idx: int
The batch index to start from. This allows the generator to skip the initial batches up to the specified index.
- logger: logging.Logger or KetosLogger
A Logger instance to log errors encountered while loading audio file data.
- Returns:
- dict: A dictionary containing the batch data. The dictionary keys are ‘data’, ‘filename’,
‘start’, and ‘end’. The ‘data’ field is a numpy array of shape (batch_size, time_bins, freq_bins), where time_bins and freq_bins correspond to the shape of the spectrogram data. ‘filename’ contains the list of filenames for the audio segments in the batch. ‘start’ and ‘end’ are lists containing the start and end times (in seconds) of each audio segment.