DPOptimizerFastGradientClipping¶

class opacus.optimizers.optimizer_fast_gradient_clipping.DPOptimizerFastGradientClipping(optimizer, *, noise_multiplier, max_grad_norm, expected_batch_size, loss_reduction='mean', generator=None, secure_mode=False, **kwargs)[source]¶

torch.optim.Optimizer wrapper to implement Fast Gradient and Ghost Clipping – modifies DPOptimizer to only add noise to the average gradient, without clipping.

Can be used with any torch.optim.Optimizer subclass as an underlying optimizer. DPOptimizerFastGradientClipping assumes that parameters over which it performs optimization belong to GradSampleModuleFastGradientClipping and therefore have the grad_sample attribute.

On a high level DPOptimizerFastGradientClipping’s step looks like this: 1) Add Gaussian noise to p.grad calibrated to a given noise multiplier and max grad norm limit (std = noise_multiplier * max_grad_norm). 2) Call underlying optimizer to perform optimization step

Examples

>>> module = MyCustomModel()
>>> optimizer = torch.optim.SGD(module.parameters(), lr=0.1)
>>> dp_optimizer = DPOptimizerFastGradientClipping(
...     optimizer=optimizer,
...     noise_multiplier=1.0,
...     max_grad_norm=1.0,
...     expected_batch_size=4,
... )

Parameters:

optimizer (Optimizer) – wrapped optimizer.
noise_multiplier (float) – noise multiplier
max_grad_norm (float) – max grad norm used for calculating the standard devition of noise added
expected_batch_size (Optional[int]) – batch_size used for averaging gradients. When using Poisson sampling averaging denominator can’t be inferred from the actual batch size. Required is loss_reduction="mean", ignored if loss_reduction="sum"
loss_reduction (str) – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”
generator – torch.Generator() object used as a source of randomness for the noise
secure_mode (bool) – if True uses noise generation approach robust to floating point arithmetic attacks. See _generate_noise() for details

accumulate()[source]¶: Performs gradient accumulation. Stores aggregated gradients into p.summed_grad``

property accumulated_iterations: int¶

Returns number of batches currently accumulated and not yet processed.

In other words accumulated_iterations tracks the number of forward/backward passed done in between two optimizer steps. The value would typically be 1, but there are possible exceptions.

Used by privacy accountants to calculate real sampling rate.

clip_and_accumulate()[source]¶: Redefines a parent class’ function to not do anything

pre_step(closure=None)[source]¶

Perform actions specific to DPOptimizer before calling underlying optimizer.step()

Parameters:: closure (Optional[Callable[[], float]]) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Return type:: Optional[float]

zero_grad(set_to_none=False)[source]¶

Clear gradients.

Clears p.grad and p.summed_grad for all of it’s parameters

Notes

set_to_none argument only affects p.grad and p.summed_grad is never zeroed out and always set to None. Normal grads can do this, because their shape is always the same. Grad samples do not behave like this, as we accumulate gradients from different batches in a list

Parameters:

set_to_none (bool) – instead of setting to zero, set the grads to None. (only
None) (affects regular gradients. Per sample gradients are always set to)

DPOptimizerFastGradientClipping¶

Opacus

Navigation

Related Topics