DPMultiheadAttention

class opacus.layers.dp_multihead_attention.SequenceBias(embed_dim, batch_first=False)[source]

Adds one bias element to the end of the sequence. so if the input has a shape (L, N, E), (batch_first = False), where L is the sequence length, N is the batch size, and E is the embedding dimension, the output will have a shape (L+1, N, E). When batch_first = True, input has a shape (N, L, E) and the output will have a shape (N, L+1, E)

bias

the learnable bias of the module of shape (E), where E is the embedding dimension.

Type:

torch.nn.parameter.Parameter

Example

>>> m = SequenceBias(16, batch_first=False)
>>> input = torch.randn(20, 4, 16)
>>> output = m(input)
>>> output.size()
torch.Size([21, 4, 16])
Parameters:

embed_dim (int) – Embedding dimension

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opacus.layers.dp_multihead_attention.DPMultiheadAttention(embed_dim, num_heads, dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, kdim=None, vdim=None, batch_first=False, device=None, dtype=None)[source]

This is DP-friendly implementation of nn.MultiheadAttention. For full reference see original module refer to torch.nn.MultiheadAttention.

Current implementation leverages pytorch modules as building blocks to allow DP engine to calculate per-sample gradients. This is in contrast with original implementation based on nn.functional.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(query, key, value, key_padding_mask=None, need_weights=True, attn_mask=None, is_causal=False)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

load_state_dict(state_dict)[source]

Loads module from previously saved state.

Supports loading from both torch.nn.MultiheadAttention and opacus.layers.dp_multihead_attention.DPMultiheadAttention.

Parameters:

state_dict – Please refer to https://pytorch.org/tutorials/recipes/recipes/what_is_state_dict.html.

state_dict(destination=None, prefix='', keep_vars=False)[source]

Return a dictionary containing references to the whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to None are not included.

Note

The returned object is a shallow copy. It contains references to the module’s parameters and buffers.

Warning

Currently state_dict() also accepts positional arguments for destination, prefix and keep_vars in order. However, this is being deprecated and keyword arguments will be enforced in future releases.

Warning

Please avoid the use of argument destination as it is not designed for end-users.

Parameters:
  • destination (dict, optional) – If provided, the state of module will be updated into the dict and the same object is returned. Otherwise, an OrderedDict will be created and returned. Default: None.

  • prefix (str, optional) – a prefix added to parameter and buffer names to compose the keys in state_dict. Default: ''.

  • keep_vars (bool, optional) – by default the Tensor s returned in the state dict are detached from autograd. If it’s set to True, detaching will not be performed. Default: False.

Returns:

a dictionary containing a whole state of the module

Return type:

dict

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> module.state_dict().keys()
['bias', 'weight']