Privacy Engine

class opacus.privacy_engine.PrivacyEngine(*, accountant='rdp', secure_mode=False)[source]

Main entry point to the Opacus API - use PrivacyEngine to enable differential privacy for your model training.

PrivacyEngine object encapsulates current privacy state (privacy budget + method it’s been calculated) and exposes make_private method to wrap your PyTorch training objects with their private counterparts.

Example

>>> dataloader = demo_dataloader
>>> model = MyCustomModel()
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
>>> privacy_engine = PrivacyEngine()
>>>
>>> model, optimizer, dataloader = privacy_engine.make_private(
...    module=model,
...    optimizer=optimizer,
...    data_loader=dataloader,
...    noise_multiplier=1.0,
...    max_grad_norm=1.0,
... )
>>> # continue training as normal
Parameters
  • accountant (str) – Accounting mechanism. Currently supported: - rdp (RDPAccountant) - gdp (GaussianAccountant)

  • secure_mode (bool) – Set to True if cryptographically strong DP guarantee is required. secure_mode=True uses secure random number generator for noise and shuffling (as opposed to pseudo-rng in vanilla PyTorch) and prevents certain floating-point arithmetic-based attacks. See _generate_noise() for details. When set to True requires torchcsprng to be installed

classmethod get_compatible_module(module)[source]

Return a privacy engine compatible module. Also validates the module after running registered fixes.

Parameters

module (Module) – module to be modified

Return type

Module

Returns

Module with some submodules replaced for their deep copies or close equivalents. See ModuleValidator for more details

get_epsilon(delta)[source]

Computes the (epsilon, delta) privacy budget spent so far.

Parameters

delta – The target delta.

Returns

Privacy budget (epsilon) expended so far.

is_compatible(*, module, optimizer, data_loader)[source]

Check if task components are compatible with DP.

Parameters
  • module (Module) – module to be checked

  • optimizer (Optional[Optimizer]) – optimizer to be checked

  • data_loader (Optional[DataLoader]) – data_loader to be checked

Return type

bool

Returns

True if compatible, False otherwise

make_private(*, module, optimizer, data_loader, noise_multiplier, max_grad_norm, batch_first=True, loss_reduction='mean', poisson_sampling=True, clipping='flat', noise_generator=None)[source]

Add privacy-related responsibilites to the main PyTorch training objects: model, optimizer, and the data loader.

All of the returned objects act just like their non-private counterparts passed as arguments, but with added DP tasks.

  • Model is wrapped to also compute per sample gradients.

  • Optimizer is now responsible for gradient clipping and adding noise to the gradients.

  • DataLoader is updated to perform Poisson sampling.

Notes

Using any other models, optimizers, or data sources during training will invalidate stated privacy guarantees.

Parameters
  • module (Module) – PyTorch module to be used for training

  • optimizer (Optimizer) – Optimizer to be used for training

  • data_loader (DataLoader) – DataLoader to be used for training

  • noise_multiplier (float) – The ratio of the standard deviation of the Gaussian noise to the L2-sensitivity of the function to which the noise is added (How much noise to add)

  • max_grad_norm (Union[float, List[float]]) – The maximum norm of the per-sample gradients. Any gradient with norm higher than this will be clipped to this value.

  • batch_first (bool) – Flag to indicate if the input tensor to the corresponding module has the first dimension representing the batch. If set to True, dimensions on input tensor are expected be [batch_size, ...], otherwise [K, batch_size, ...]

  • loss_reduction (str) – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”

  • poisson_sampling (bool) – True if you want to use standard sampling required for DP guarantees. Setting False will leave provided data_loader unchanged. Technically this doesn’t fit the assumptions made by privacy accounting mechanism, but it can be a good approximation when using Poisson sampling is unfeasible.

  • clipping (str) – Per sample gradient clipping mechanism (“flat” or “per_layer”). Flat clipping calculates the norm of the entire gradient over all parameters, while per layer clipping sets individual norms for every parameter tensor. Flat clipping is usually preferred, but using per layer clipping in combination with distributed training can provide notable performance gains.

  • noise_generator – torch.Generator() object used as a source of randomness for the noise

Return type

Tuple[GradSampleModule, DPOptimizer, DataLoader]

Returns

Tuple of (model, optimizer, data_loader).

Model is a wrapper around the original model that also computes per sample

gradients

Optimizer is a wrapper around the original optimizer that also does

gradient clipping and noise addition to the gradients

DataLoader is a brand new DataLoader object, constructed to behave as

equivalent to the original data loader, possibly with updated sampling mechanism. Points to the same dataset object.

make_private_with_epsilon(*, module, optimizer, data_loader, target_epsilon, target_delta, epochs, max_grad_norm, batch_first=True, loss_reduction='mean', noise_generator=None, **kwargs)[source]

Version of make_private(), that calculates privacy parameters based on a given privacy budget.

For the full documentation see make_private()

Parameters
  • module (Module) – PyTorch module to be used for training

  • optimizer (Optimizer) – Optimizer to be used for training

  • data_loader (DataLoader) – DataLoader to be used for training

  • noise_multiplier – The ratio of the standard deviation of the Gaussian noise to the L2-sensitivity of the function to which the noise is added (How much noise to add)

  • max_grad_norm (float) – The maximum norm of the per-sample gradients. Any gradient with norm higher than this will be clipped to this value.

  • batch_first (bool) – Flag to indicate if the input tensor to the corresponding module has the first dimension representing the batch. If set to True, dimensions on input tensor are expected be [batch_size, ...], otherwise [K, batch_size, ...]

  • loss_reduction (str) – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”

  • noise_seed – Seed to be used for random noise generation

  • poisson_samplingTrue if you want to use standard sampling required for DP guarantees. Setting False will leave provided data_loader unchanged. Technically this doesn’t fit the assumptions made by privacy accounting mechanism, but it can be a good approximation when using Poisson sampling is unfeasible.

  • clipping – Per sample gradient clipping mechanism (“flat” or “per_layer”). Flat clipping calculates the norm of the entire gradient over all parameters, while per layer clipping sets individual norms for every parameter tensor. Flat clipping is usually preferred, but using per layer clipping in combination with distributed training can provide notable performance gains.

Returns

Tuple of (model, optimizer, data_loader).

Model is a wrapper around the original model that also computes per sample

gradients

Optimizer is a wrapper around the original optimizer that also does

gradient clipping and adding noise to the gradients

DataLoader is a brand new DataLoader object, constructed to behave as

equivalent to the original data loader, possibly with updated sampling mechanism. Points to the same dataset object.

validate(*, module, optimizer, data_loader)[source]

Validate that task components are compatible with DP. Same as is_compatible(), but raises error instead of returning bool.

Parameters
  • module (Module) – module to be checked

  • optimizer (Optional[Optimizer]) – optimizer to be checked

  • data_loader (Optional[DataLoader]) – data_loader to be checked

Raises

UnsupportedModuleError – If one or more modules found to be incompatible