DP Data Loader

class opacus.data_loader.DPDataLoader(dataset, *, sample_rate, collate_fn=None, drop_last=False, generator=None, distributed=False, **kwargs)[source]

DataLoader subclass that always does Poisson sampling and supports empty batches by default.

Typically instantiated via DPDataLoader.from_data_loader() method based on another DataLoader. DPDataLoader would preserve the behaviour of the original data loader, except for the two aspects.

First, it switches batch_sampler to UniformWithReplacementSampler, thus enabling Poisson sampling (i.e. each element in the dataset is selected to be in the next batch with a certain probability defined by sample_rate parameter). NB: this typically leads to a batches of variable size. NB2: By default, sample_rate is calculated based on the batch_size of the original data loader, so that the average batch size stays the same

Second, it wraps collate function with support for empty batches. Most PyTorch modules will happily process tensors of shape (0, N, ...), but many collate functions will fail to produce such a batch. As with the Poisson sampling empty batches become a possibility, we need a DataLoader that can handle them.

Parameters:
classmethod from_data_loader(data_loader, *, distributed=False, generator=None)[source]

Creates new DPDataLoader based on passed data_loader argument.

Parameters:
  • data_loader (DataLoader) – Any DataLoader instance. Must not be over an IterableDataset

  • distributed (bool) – set True if you’ll be using DPDataLoader in a DDP environment

  • generator – Random number generator used to sample elements. Defaults to generator from the original data loader.

Returns:

New DPDataLoader instance, with all attributes and parameters inherited from the original data loader, except for sampling mechanism.

Examples

>>> x, y = torch.randn(64, 5), torch.randint(0, 2, (64,))
>>> dataset = TensorDataset(x,y)
>>> data_loader = DataLoader(dataset, batch_size=4)
>>> dp_data_loader = DPDataLoader.from_data_loader(data_loader)
opacus.data_loader.collate(batch, *, collate_fn, sample_empty_shapes, dtypes)[source]

Wraps collate_fn to handle empty batches.

Default collate_fn implementations typically can’t handle batches of length zero. Since this is a possible case for poisson sampling, we need to wrap the collate method, producing tensors with the correct shape and size (albeit the batch dimension being zero-size)

Parameters:
Returns:

Batch tensor(s)

opacus.data_loader.dtype_safe(x)[source]

Exception-safe getter for dtype attribute

Parameters:

x (Any) – any object

Return type:

Union[dtype, Type]

Returns:

x.dtype if attribute exists, type of x otherwise

opacus.data_loader.shape_safe(x)[source]

Exception-safe getter for shape attribute

Parameters:

x (Any) – any object

Return type:

Tuple

Returns:

x.shape if attribute exists, empty tuple otherwise

opacus.data_loader.switch_generator(*, data_loader, generator)[source]

Creates new instance of a DataLoader, with the exact same behaviour of the provided data loader, except for the source of randomness.

Typically used to enhance a user-provided data loader object with cryptographically secure random number generator

Parameters:
  • data_loader (DataLoader) – Any DataLoader object

  • generator – Random number generator object

Returns:

New DataLoader object with the exact same behaviour as the input data loader, except for the source of randomness.

opacus.data_loader.wrap_collate_with_empty(*, collate_fn, sample_empty_shapes, dtypes)[source]

Wraps given collate function to handle empty batches.

Parameters:
  • collate_fn (Optional[Callable[[List[TypeVar(T)]], Any]]) – collate function to wrap

  • sample_empty_shapes (Sequence[Tuple]) – expected shape for a batch of size 0. Input is a sequence - one for each tensor in the dataset

Returns:

New collate function, which is equivalent to input collate_fn for non-empty batches and outputs empty tensors with shapes from sample_empty_shapes if the input batch is of size 0