GradSampleModuleFastGradientClipping

class opacus.grad_sample.grad_sample_module_fast_gradient_clipping.GradSampleModuleFastGradientClipping(m, *, batch_first=True, loss_reduction='mean', strict=True, force_functorch=False, max_grad_norm=1, use_ghost_clipping=True)[source]

Hooks-based implementation of GradSampleModule with Fast Gradient and Ghost Clipping

Computes norms of gradients without gradient instantiation

Parameters:
  • m (Module) – nn.Module to be wrapped

  • batch_first – Flag to indicate if the input tensor to the corresponding module has the first dimension representing the batch. If set to True, dimensions on input tensor are expected be [batch_size, ...], otherwise [K, batch_size, ...]

  • loss_reduction – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”

  • max_grad_norm – The value at which gradients are to be clipped.

  • strict (bool) – If set to True, the input module will be validated to make sure that it does not have buffers in all its submodules.

  • force_functorch – If set to True, will use functorch to compute all per sample gradients. Otherwise, functorch will be used only for layers without registered grad sampler methods.

  • use_ghost_clipping – If set to True, Ghost Clipping will be used for clipping gradients of supported layers. If False, Fast Gradient Clipping will be used for all layers.

Raises:

NotImplementedError – If strict is set to True and module m (or any of its submodules) includes a buffer.

capture_backprops_hook(module, _forward_input, forward_output, loss_reduction, batch_first)[source]

Computes norms of per sample gradient given the current backprops and activations stored by the associated forward hook. Computed per sample gradient norms are stored in norm_sample field in each parameter.

Parameters:
  • module (Module) – nn.Module,

  • _forward_input (Tensor) – torch.Tensor,

  • forward_output (Tensor) – torch.Tensor,

  • loss_reduction (str) – str,

  • batch_first (bool) – bool,

get_clipping_coef()[source]

Get per-example gradient scaling factor for clipping.

Return type:

Tensor

get_norm_sample()[source]

Get per-example gradient norms.

Return type:

Tensor

opacus.grad_sample.grad_sample_module_fast_gradient_clipping.create_norm_sample(*, param, grad_sample, max_batch_len)[source]

Creates a _norm_sample attribute in the given parameter

Parameters:
  • param (Tensor) – Parameter to which _norm_sample will be added

  • grad_sample (Tensor) – Per-sample gradients tensor. Must be of the same shape as param with extra batch dimension

Return type:

None