DPOptimizerFastGradientClipping¶
- class opacus.optimizers.optimizer_fast_gradient_clipping.DPOptimizerFastGradientClipping(optimizer, *, noise_multiplier, max_grad_norm, expected_batch_size, loss_reduction='mean', generator=None, secure_mode=False, **kwargs)[source]¶
torch.optim.Optimizer
wrapper to implement Fast Gradient and Ghost Clipping – modifies DPOptimizer to only add noise to the average gradient, without clipping.Can be used with any
torch.optim.Optimizer
subclass as an underlying optimizer.DPOptimizerFastGradientClipping
assumes that parameters over which it performs optimization belong to GradSampleModuleFastGradientClipping and therefore have thegrad_sample
attribute.On a high level
DPOptimizerFastGradientClipping
’s step looks like this: 1) Add Gaussian noise top.grad
calibrated to a given noise multiplier and max grad norm limit (std = noise_multiplier * max_grad_norm
). 2) Call underlying optimizer to perform optimization stepExamples
>>> module = MyCustomModel() >>> optimizer = torch.optim.SGD(module.parameters(), lr=0.1) >>> dp_optimizer = DPOptimizerFastGradientClipping( ... optimizer=optimizer, ... noise_multiplier=1.0, ... max_grad_norm=1.0, ... expected_batch_size=4, ... )
- Parameters:
optimizer (
Optimizer
) – wrapped optimizer.noise_multiplier (
float
) – noise multipliermax_grad_norm (
float
) – max grad norm used for calculating the standard devition of noise addedexpected_batch_size (
Optional
[int
]) – batch_size used for averaging gradients. When using Poisson sampling averaging denominator can’t be inferred from the actual batch size. Required isloss_reduction="mean"
, ignored ifloss_reduction="sum"
loss_reduction (
str
) – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”generator – torch.Generator() object used as a source of randomness for the noise
secure_mode (
bool
) – ifTrue
uses noise generation approach robust to floating point arithmetic attacks. See_generate_noise()
for details
- accumulate()[source]¶
Performs gradient accumulation. Stores aggregated gradients into p.summed_grad``
- property accumulated_iterations: int¶
Returns number of batches currently accumulated and not yet processed.
In other words
accumulated_iterations
tracks the number of forward/backward passed done in between two optimizer steps. The value would typically be 1, but there are possible exceptions.Used by privacy accountants to calculate real sampling rate.
- pre_step(closure=None)[source]¶
Perform actions specific to
DPOptimizer
before calling underlyingoptimizer.step()
- zero_grad(set_to_none=False)[source]¶
Clear gradients.
Clears
p.grad
,p.grad_sample
andp.summed_grad
for all of it’s parametersNotes
set_to_none
argument only affectsp.grad
.p.grad_sample
andp.summed_grad
is never zeroed out and always set to None. Normal grads can do this, because their shape is always the same. Grad samples do not behave like this, as we accumulate gradients from different batches in a list- Parameters:
set_to_none (
bool
) – instead of setting to zero, set the grads to None. (onlyNone) (affects regular gradients. Per sample gradients are always set to)