GradSampleModuleFastGradientClipping

class opacus.grad_sample.grad_sample_module_fast_gradient_clipping.GradSampleModuleFastGradientClipping(m, *, batch_first=True, loss_reduction='mean', strict=True, force_functorch=False, max_grad_norm=1, use_ghost_clipping=True)[source]

Hooks-based implementation of GradSampleModule with Fast Gradient and Ghost Clipping

Computes norms of gradients without gradient instantiation

Parameters:
  • m (Module) – nn.Module to be wrapped

  • batch_first – Flag to indicate if the input tensor to the corresponding module has the first dimension representing the batch. If set to True, dimensions on input tensor are expected be [batch_size, ...], otherwise [K, batch_size, ...]

  • loss_reduction – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”

  • max_grad_norm – The value at which gradients are to be clipped.

  • strict (bool) – If set to True, the input module will be validated to make sure that it does not have buffers in all its submodules.

  • force_functorch – If set to True, will use functorch to compute all per sample gradients. Otherwise, functorch will be used only for layers without registered grad sampler methods.

  • use_ghost_clipping – If set to True, Ghost Clipping will be used for clipping gradients of supported layers. If False, Fast Gradient Clipping will be used for all layers.

Raises:

NotImplementedError – If strict is set to True and module m (or any of its submodules) includes a buffer.

capture_backprops_hook(module, _forward_input, forward_output, loss_reduction, batch_first)[source]

Computes norms of per sample gradient given the current backprops and activations stored by the associated forward hook. Computed per sample gradient norms are stored in norm_sample field in each parameter.

Parameters:
  • module (Module) – nn.Module,

  • _forward_input (Tensor) – torch.Tensor,

  • forward_output (Tensor) – torch.Tensor,

  • loss_reduction (str) – str,

  • batch_first (bool) – bool,

get_clipping_coef()[source]

Get per-example gradient scaling factor for clipping.

Return type:

Tensor

get_norm_sample()[source]

Get per-example gradient norms.

Return type:

Tensor

log_module_gradient_sample_mode(module, *, force_functorch=False, use_ghost_clipping=True)[source]

Add logs to track gradient sample mode for each part of the module, including 1) Ghost Clipping, 2) Fast Gradient Clipping (hook mode), and 3) Fast Gradient Clipping (functorch mode).

Parameters:
  • module (Module) – nn.Module to be checked

  • force_functorch – If set to True, will use functorch to compute all per sample gradients. Otherwise, functorch will be used only for layers without registered grad sampler methods.

  • use_ghost_clipping – If set to True, Ghost Clipping will be used for clipping gradients of supported layers. If False, Fast Gradient Clipping will be used for all layers.

property per_sample_gradient_norms: Tensor

Returns per sample gradient norms. Note that these are not privatized and should only be used for debugging purposes or in non-private settings

opacus.grad_sample.grad_sample_module_fast_gradient_clipping.create_norm_sample(*, param, grad_sample, max_batch_len)[source]

Creates a _norm_sample attribute in the given parameter

Parameters:
  • param (Tensor) – Parameter to which _norm_sample will be added

  • grad_sample (Tensor) – Per-sample gradients tensor. Must be of the same shape as param with extra batch dimension

Return type:

None