Privacy Engine¶
- class opacus.privacy_engine.PrivacyEngine(*, accountant='rdp', secure_mode=False)[source]¶
Main entry point to the Opacus API - use
PrivacyEngine
to enable differential privacy for your model training.PrivacyEngine
object encapsulates current privacy state (privacy budget + method it’s been calculated) and exposesmake_private
method to wrap your PyTorch training objects with their private counterparts.Example
>>> dataloader = demo_dataloader >>> model = MyCustomModel() >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.05) >>> privacy_engine = PrivacyEngine() >>> >>> model, optimizer, dataloader = privacy_engine.make_private( ... module=model, ... optimizer=optimizer, ... data_loader=dataloader, ... noise_multiplier=1.0, ... max_grad_norm=1.0, ... ) >>> # continue training as normal
- Parameters
accountant (
str
) – Accounting mechanism. Currently supported: - rdp (RDPAccountant
) - gdp (GaussianAccountant
)secure_mode (
bool
) – Set toTrue
if cryptographically strong DP guarantee is required.secure_mode=True
uses secure random number generator for noise and shuffling (as opposed to pseudo-rng in vanilla PyTorch) and prevents certain floating-point arithmetic-based attacks. See_generate_noise()
for details. When set toTrue
requirestorchcsprng
to be installed
- classmethod get_compatible_module(module)[source]¶
Return a privacy engine compatible module. Also validates the module after running registered fixes.
- Parameters
module (
Module
) – module to be modified- Return type
Module
- Returns
Module with some submodules replaced for their deep copies or close equivalents. See
ModuleValidator
for more details
- get_epsilon(delta)[source]¶
Computes the (epsilon, delta) privacy budget spent so far.
- Parameters
delta – The target delta.
- Returns
Privacy budget (epsilon) expended so far.
- is_compatible(*, module, optimizer, data_loader)[source]¶
Check if task components are compatible with DP.
- make_private(*, module, optimizer, data_loader, noise_multiplier, max_grad_norm, batch_first=True, loss_reduction='mean', poisson_sampling=True, clipping='flat', noise_generator=None)[source]¶
Add privacy-related responsibilites to the main PyTorch training objects: model, optimizer, and the data loader.
All of the returned objects act just like their non-private counterparts passed as arguments, but with added DP tasks.
Model is wrapped to also compute per sample gradients.
Optimizer is now responsible for gradient clipping and adding noise to the gradients.
DataLoader is updated to perform Poisson sampling.
Notes
Using any other models, optimizers, or data sources during training will invalidate stated privacy guarantees.
- Parameters
module (
Module
) – PyTorch module to be used for trainingoptimizer (
Optimizer
) – Optimizer to be used for trainingdata_loader (
DataLoader
) – DataLoader to be used for trainingnoise_multiplier (
float
) – The ratio of the standard deviation of the Gaussian noise to the L2-sensitivity of the function to which the noise is added (How much noise to add)max_grad_norm (
Union
[float
,List
[float
]]) – The maximum norm of the per-sample gradients. Any gradient with norm higher than this will be clipped to this value.batch_first (
bool
) – Flag to indicate if the input tensor to the corresponding module has the first dimension representing the batch. If set to True, dimensions on input tensor are expected be[batch_size, ...]
, otherwise[K, batch_size, ...]
loss_reduction (
str
) – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”poisson_sampling (
bool
) –True
if you want to use standard sampling required for DP guarantees. SettingFalse
will leave provided data_loader unchanged. Technically this doesn’t fit the assumptions made by privacy accounting mechanism, but it can be a good approximation when using Poisson sampling is unfeasible.clipping (
str
) – Per sample gradient clipping mechanism (“flat” or “per_layer”). Flat clipping calculates the norm of the entire gradient over all parameters, while per layer clipping sets individual norms for every parameter tensor. Flat clipping is usually preferred, but using per layer clipping in combination with distributed training can provide notable performance gains.noise_generator – torch.Generator() object used as a source of randomness for the noise
- Return type
Tuple
[GradSampleModule
,DPOptimizer
,DataLoader
]- Returns
Tuple of (model, optimizer, data_loader).
- Model is a wrapper around the original model that also computes per sample
gradients
- Optimizer is a wrapper around the original optimizer that also does
gradient clipping and noise addition to the gradients
- DataLoader is a brand new DataLoader object, constructed to behave as
equivalent to the original data loader, possibly with updated sampling mechanism. Points to the same dataset object.
- make_private_with_epsilon(*, module, optimizer, data_loader, target_epsilon, target_delta, epochs, max_grad_norm, batch_first=True, loss_reduction='mean', noise_generator=None, **kwargs)[source]¶
Version of
make_private()
, that calculates privacy parameters based on a given privacy budget.For the full documentation see
make_private()
- Parameters
module (
Module
) – PyTorch module to be used for trainingoptimizer (
Optimizer
) – Optimizer to be used for trainingdata_loader (
DataLoader
) – DataLoader to be used for trainingnoise_multiplier – The ratio of the standard deviation of the Gaussian noise to the L2-sensitivity of the function to which the noise is added (How much noise to add)
max_grad_norm (
float
) – The maximum norm of the per-sample gradients. Any gradient with norm higher than this will be clipped to this value.batch_first (
bool
) – Flag to indicate if the input tensor to the corresponding module has the first dimension representing the batch. If set to True, dimensions on input tensor are expected be[batch_size, ...]
, otherwise[K, batch_size, ...]
loss_reduction (
str
) – Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. Can take values “sum” or “mean”noise_seed – Seed to be used for random noise generation
poisson_sampling –
True
if you want to use standard sampling required for DP guarantees. SettingFalse
will leave provided data_loader unchanged. Technically this doesn’t fit the assumptions made by privacy accounting mechanism, but it can be a good approximation when using Poisson sampling is unfeasible.clipping – Per sample gradient clipping mechanism (“flat” or “per_layer”). Flat clipping calculates the norm of the entire gradient over all parameters, while per layer clipping sets individual norms for every parameter tensor. Flat clipping is usually preferred, but using per layer clipping in combination with distributed training can provide notable performance gains.
- Returns
Tuple of (model, optimizer, data_loader).
- Model is a wrapper around the original model that also computes per sample
gradients
- Optimizer is a wrapper around the original optimizer that also does
gradient clipping and adding noise to the gradients
- DataLoader is a brand new DataLoader object, constructed to behave as
equivalent to the original data loader, possibly with updated sampling mechanism. Points to the same dataset object.