JointBatchGen

class ketos.data_handling.data_feeding.JointBatchGen(batch_generators, n_batches='min', shuffle_batch=False, reset_generators=False, return_batch_ids=False, output_transform_func=None)[source]

Join two or more batch generators.

A joint batch generator is composed by multiple BatchGenerator objects. It offers a flexible way of composing custom batches for training neural networks. Each batch is composed by joining the batches of all generators in the ‘batch_generators’ list.

In order to be able to combine batch generators in this manner, the batch generators must yield data batches (X,Y) with the same format. Furthermore, the first dimension must be the batch size. In the case of multimodal generators, the second dimension must be the number of modes.

For example, if the generator is returning a waveform and a spectrogram, and the batch size was set to 32, the JointBatchGen expects X to have length 32 and every element in X to have length 2 (corresponding to the two modalities, waveform and spectrogram).

An assertion is made at initialization to check that all batch generators yield data with consistent formats. If the assertion fails, an error is thrown.

Args:
batch_generators: list of BatchGenerator objects

A list of 2 or more BatchGenerator instances.

n_batches: str or int (default:’min’)

The number of batches for the joint generator. It can be an integer number, ‘min’, which will use the lowest n_batches among the batch generators, or ‘max, which will use the highest value.

shuffle_batch:bool (default:False)

If True, shuffle the joint batch before returning it. Note that this only concerns the joint batches and is independent of wheter the joined generators shuffle or not.

reset_generators:bool (default:False)

If True, reset the current batch counter of each generator whenever the joint generator reaches the n_batches value. This evokes the end-of-epoch behaviour for each batch generator (i.e.: if a batch generator was created with ‘duffle_on_epoch_end=True’, then it will shuffle at this time, even if that generator’s batch counter is not yet at the maximum).

return_batch_ids: bool

If False, each batch will consist of X and Y. If True, the generator index and the instance indices (as they are in the hdf5_table) will be included ((ids, X, Y)). Default is False.

output_transform_func: function

A function to be applied to the joint batch, transforming the instances. Must accept ‘X’ and ‘Y’ and, after processing, also return ‘X’ and ‘Y’ in a tuple.

Example:
>>> from tables import open_file
>>> from ketos.data_handling.database_interface import open_table
>>> h5 = open_file("ketos/tests/assets/multimodal.h5", 'r') # create the database handle  
>>> tbl_pos = open_table(h5, "/train/pos/data") #table with positive samples
>>> tbl_neg = open_table(h5, "/train/neg/data") #table with negative samples
>>> #Create batch generators for multi-modal data (waveform, spectrogram)
>>> generator_pos = BatchGenerator(data_table=tbl_pos, batch_size=2, x_field=['waveform','spectrogram']) 
>>> generator_neg = BatchGenerator(data_table=tbl_neg, batch_size=3, x_field=['waveform','spectrogram']) 
>>> #Join the generators
>>> generator = JointBatchGen([generator_pos, generator_neg])
>>> #Loading the first batch, we note that the joint generator has a batch size of 2+3=5
>>> #and the waveforms and spectrograms have shapes (3000,) and (129,94), respectively.
>>> X, Y = next(generator)
>>> print(len(X), len(X[0]), X[0][0].shape, X[0][1].shape)
5 2 (3000,) (94, 129)
>>> h5.close() #close the database handle.

Methods

get_indices()

Get the indice sequence used for sampling the data tables

reset([indices])

Resets the individual batch generators.

set_return_batch_ids(v)

Change the behaviour of the generator between returning only X,Y or id,X,Y

set_shuffle(v)

Change the behaviour of the generator between shuffling or not shuffling the indices.

get_indices()[source]

Get the indice sequence used for sampling the data tables

Returns:
: array

Indices

reset(indices=None)[source]

Resets the individual batch generators.

Args:
indices: array

Manually specify the sequence of indices that should be used after reset.

set_return_batch_ids(v)[source]

Change the behaviour of the generator between returning only X,Y or id,X,Y

Args:
v: bool

Whether to return id in addition to X,Y

set_shuffle(v)[source]

Change the behaviour of the generator between shuffling or not shuffling the indices.

Args:
v: bool

Whether to return shuffle the indices