Per Sample Gradients¶
Autograd Grad Sample¶
Based on https://github.com/cybertronai/autogradhacks
This module provides functions to capture persample gradients by using hooks.
Notes
The register_backward_hook()
function has a known issue being tracked at
https://github.com/pytorch/pytorch/issues/598. However, it is the only known
way of implementing this as of now (your suggestions and contributions are
very welcome). The behaviour has been verified to be correct for the layers
currently supported by opacus.

opacus.autograd_grad_sample.
add_hooks
(model, loss_reduction='mean', batch_first=True)[source]¶ Adds hooks to model to save activations and backprop values. The hooks will
save activations into
param.activations
during forward pass.compute persample gradients and save them in
param.grad_sample
during backward pass.
 Parameters
model (
Module
) – Model to which hooks are added.loss_reduction (
str
) – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take valuessum
ormean
.batch_first (
bool
) – Flag to indicate if the input tensor to the corresponding module has the first dimension represent the batch, for example of shape[batch_size, ..., ...]
. Set to True if batch appears in first dimension else set to False (batch_first=False
implies that the batch is always in the second dimension).

opacus.autograd_grad_sample.
disable_hooks
()[source]¶ Globally disables all hooks installed by this library.

opacus.autograd_grad_sample.
enable_hooks
()[source]¶ Globally enables all hooks installed by this library.

opacus.autograd_grad_sample.
is_supported
(layer)[source]¶ Checks if the
layer
is supported by this library. Parameters
layer (
Module
) – Layer for which we need to determine if the support for capturing persample gradients is available. Return type
 Returns
Whether the
layer
is supported by this library.
Layers Grad Samplers¶
This module is a collection of grad samplers  methods to calculate per sample gradients for a layer given two tensors: activations (module inputs) and backpropagations (gradient values propagated from downstream layers).