DPOptimizerFastGradientClipping

class opacus.optimizers.optimizer_fast_gradient_clipping.DPOptimizerFastGradientClipping(optimizer, *, noise_multiplier, max_grad_norm, expected_batch_size, loss_reduction='mean', generator=None, secure_mode=False, **kwargs)[source]

torch.optim.Optimizer wrapper to implement Fast Gradient and Ghost Clipping – modifies DPOptimizer to only add noise to the average gradient, without clipping.

Can be used with any torch.optim.Optimizer subclass as an underlying optimizer. DPOptimizerFastGradientClipping assumes that parameters over which it performs optimization belong to GradSampleModuleFastGradientClipping and therefore have the grad_sample attribute.

On a high level DPOptimizerFastGradientClipping’s step looks like this: 1) Add Gaussian noise to p.grad calibrated to a given noise multiplier and max grad norm limit (std = noise_multiplier * max_grad_norm). 2) Call underlying optimizer to perform optimization step

Examples

>>> module = MyCustomModel()
>>> optimizer = torch.optim.SGD(module.parameters(), lr=0.1)
>>> dp_optimizer = DPOptimizerFastGradientClipping(
...     optimizer=optimizer,
...     noise_multiplier=1.0,
...     max_grad_norm=1.0,
...     expected_batch_size=4,
... )
Parameters:
  • optimizer (Optimizer) – wrapped optimizer.

  • noise_multiplier (float) – noise multiplier

  • max_grad_norm (float) – max grad norm used for calculating the standard devition of noise added

  • expected_batch_size (Optional[int]) – batch_size used for averaging gradients. When using Poisson sampling averaging denominator can’t be inferred from the actual batch size. Required is loss_reduction="mean", ignored if loss_reduction="sum"

  • loss_reduction (str) – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”

  • generator – torch.Generator() object used as a source of randomness for the noise

  • secure_mode (bool) – if True uses noise generation approach robust to floating point arithmetic attacks. See _generate_noise() for details

accumulate()[source]

Performs gradient accumulation. Stores aggregated gradients into p.summed_grad``

property accumulated_iterations: int

Returns number of batches currently accumulated and not yet processed.

In other words accumulated_iterations tracks the number of forward/backward passed done in between two optimizer steps. The value would typically be 1, but there are possible exceptions.

Used by privacy accountants to calculate real sampling rate.

clip_and_accumulate()[source]

Redefines a parent class’ function to not do anything

pre_step(closure=None)[source]

Perform actions specific to DPOptimizer before calling underlying optimizer.step()

Parameters:

closure (Optional[Callable[[], float]]) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Return type:

Optional[float]

zero_grad(set_to_none=False)[source]

Clear gradients.

Clears p.grad, p.grad_sample and p.summed_grad for all of it’s parameters

Notes

set_to_none argument only affects p.grad. p.grad_sample and p.summed_grad is never zeroed out and always set to None. Normal grads can do this, because their shape is always the same. Grad samples do not behave like this, as we accumulate gradients from different batches in a list

Parameters:
  • set_to_none (bool) – instead of setting to zero, set the grads to None. (only

  • None) (affects regular gradients. Per sample gradients are always set to)