GradSampleModuleFastGradientClipping¶
- class opacus.grad_sample.grad_sample_module_fast_gradient_clipping.GradSampleModuleFastGradientClipping(m, *, batch_first=True, loss_reduction='mean', strict=True, force_functorch=False, max_grad_norm=1, use_ghost_clipping=True)[source]¶
Hooks-based implementation of GradSampleModule with Fast Gradient and Ghost Clipping
Computes norms of gradients without gradient instantiation
- Parameters:
m (
Module
) – nn.Module to be wrappedbatch_first – Flag to indicate if the input tensor to the corresponding module has the first dimension representing the batch. If set to True, dimensions on input tensor are expected be
[batch_size, ...]
, otherwise[K, batch_size, ...]
loss_reduction – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”
max_grad_norm – The value at which gradients are to be clipped.
strict (
bool
) – If set to True, the input module will be validated to make sure that it does not have buffers in all its submodules.force_functorch – If set to
True
, will use functorch to compute all per sample gradients. Otherwise, functorch will be used only for layers without registered grad sampler methods.use_ghost_clipping – If set to
True
, Ghost Clipping will be used for clipping gradients of supported layers. IfFalse
, Fast Gradient Clipping will be used for all layers.
- Raises:
NotImplementedError – If
strict
is set toTrue
and modulem
(or any of its submodules) includes a buffer.
- capture_backprops_hook(module, _forward_input, forward_output, loss_reduction, batch_first)[source]¶
Computes norms of per sample gradient given the current backprops and activations stored by the associated forward hook. Computed per sample gradient norms are stored in
norm_sample
field in each parameter.