DPMultiheadAttention

class opacus.layers.dp_multihead_attention.SequenceBias(embed_dim)[source]

Adds one bias element to the end of the sequence. so if the input has a shape (L, N, E), where L is the sequence length, N is the batch size, and E is the embedding dimension, the output will have a shape (L+1, N, E).

bias

the learnable bias of the module of shape (E), where E is the embedding dimension.

Type

torch.nn.parameter.Parameter

Example

>>> m = SequenceBias(16)
>>> input = torch.randn(20, 4, 16)
>>> output = m(input)
>>> print(output.size())
torch.Size([21, 4, 16])
Parameters

embed_dim (int) – Embedding dimension

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opacus.layers.dp_multihead_attention.DPMultiheadAttention(embed_dim, num_heads, dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, kdim=None, vdim=None)[source]

This is DP-friendly implementation of nn.MultiheadAttention. For full reference see original module refer to torch.nn.MultiheadAttention.

Current implementation leverages pytorch modules as building blocks to allow DP engine to calculate per-sample gradients. This is in contrast with original implementation based on nn.functional.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(query, key, value, key_padding_mask=None, need_weights=True, attn_mask=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

load_state_dict(state_dict)[source]

Loads module from previously saved state.

Supports loading from both torch.nn.MultiheadAttention and opacus.layers.dp_multihead_attention.DPMultiheadAttention.

Parameters

state_dict – Please refer to https://pytorch.org/tutorials/recipes/recipes/what_is_state_dict.html.