PrivacyEngine(module, *, sample_rate=None, batch_size=None, sample_size=None, max_grad_norm, noise_multiplier=None, alphas=[1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9.0, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10.0, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], secure_rng=False, batch_first=True, target_delta=1e-06, target_epsilon=None, epochs=None, loss_reduction='mean', **misc_settings)¶
The main component of Opacus is the
To train a model with differential privacy, all you need to do is to define a
PrivacyEngineand later attach it to your optimizer before running.
This example shows how to define a
PrivacyEngineand to attach it to your optimizer.
>>> import torch >>> model = torch.nn.Linear(16, 32) # An example model >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.05) >>> privacy_engine = PrivacyEngine(model, sample_rate=0.01, noise_multiplier=1.3, max_grad_norm=1.0) >>> privacy_engine.attach(optimizer) # That's it! Now it's business as usual.
Module) – The Pytorch module to which we are attaching the privacy engine
bool) – If on, it will use
torchcsprngfor secure random number generation. Comes with a significant performance cost, therefore it’s recommended that you turn it off when just experimenting.
bool) – Flag to indicate if the input tensor to the corresponding module has the first dimension representing the batch. If set to True, dimensions on input tensor will be
[batch_size, ..., ...].
float) – The target delta. If unset, we will set it for you.
str) – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”
**misc_settings – Other arguments to the init
Attaches the privacy engine to the optimizer.
Attaches to the
PrivacyEnginean optimizer object,and injects itself into the optimizer’s step. To do that it,
Validates that the model does not have unsupported layers.
Adds a pointer to this object (the
PrivacyEngine) inside the optimizer.
Moves optimizer’s original
4. Monkeypatches the optimizer’s
step()function to call
step()on the query engine automatically whenever it would call
Optimizer) – The optimizer to which the privacy engine will attach
Detaches the privacy engine from optimizer.
To detach the
PrivacyEnginefrom optimizer, this method returns the model and the optimizer to their original states (i.e. all added attributes/methods will be removed).
Computes the (epsilon, delta) privacy budget spent so far.
This method converts from an (alpha, epsilon)-DP guarantee for all alphas that the
PrivacyEnginewas initialized with. It returns the optimal alpha together with the best epsilon.
Takes a step for the privacy engine.
You should not call this method directly. Rather, by attaching your
PrivacyEngineto the optimizer, the
PrivacyEnginewould have the optimizer call this method for you.
ValueError – If the last batch of training epoch is greater than others. This ensures the clipper consumed the right amount of gradients. In the last batch of a training epoch, we might get a batch that is smaller than others but we should never get a batch that is too large
Moves the privacy engine to the target device.
device]) – The device on which Pytorch Tensors are allocated. See: https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.device
This example shows the usage of this method, on how to move the model after instantiating the
>>> model = torch.nn.Linear(16, 32) # An example model. Default device is CPU >>> privacy_engine = PrivacyEngine(model, sample_rate=0.01, noise_multiplier=0.8, max_grad_norm=0.5) >>> device = "cuda:3" # GPU >>> model.to(device) # If we move the model to GPU, we should call the to() method of the privacy engine (next line) >>> privacy_engine.to(device)
Takes a virtual step.
Virtual batches enable training with arbitrary large batch sizes, while keeping the memory consumption constant. This is beneficial, when training models with larger batch sizes than standard models.
Imagine you want to train a model with batch size of 2048, but you can only fit batch size of 128 in your GPU. Then, you can do the following:
>>> for i, (X, y) in enumerate(dataloader): >>> logits = model(X) >>> loss = criterion(logits, y) >>> loss.backward() >>> if i % 16 == 15: >>> optimizer.step() # this will call privacy engine's step() >>> optimizer.zero_grad() >>> else: >>> optimizer.virtual_step() # this will call privacy engine's virtual_step()
The rough idea of virtual step is as follows:
loss.backward()repeatedly stores the per-sample gradients for all mini-batches. If we call
Ntimes on mini-batches of size
B, then each weight’s
.grad_samplefield will contain
NxBgradients. Then, when calling
step(), the privacy engine clips all
NxBgradients and computes the average gradient for an effective batch of size
NxB. A call to
optimizer.zero_grad()erases the per-sample gradients.
2. By calling
Bper-sample gradients for this mini-batch are clipped and summed up into a gradient accumulator. The per-sample gradients can then be discarded. After
Niterations (alternating calls to
virtual_step()), a call to
step()will compute the average gradient for an effective batch of size
The advantage here is that this is memory-efficient: it discards the per-sample gradients after every mini-batch. We can thus handle batches of arbitrary size.
Resets clippers status.
Clipper keeps internal gradient per sample in the batch in each
forwardcall of the module, they need to be cleaned before the next round.
If these variables are not cleaned the per sample gradients keep being concatenated accross batches. If accumulating gradients is intented behavious, e.g. simulating a large batch, prefer using