0% found this document useful (0 votes)

22 views17 pages

DP Generalization Roth

Uploaded by

Krishna Acharya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views17 pages

DP Generalization Roth

Uploaded by

Krishna Acharya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

A New Analysis of Differential Privacy’s

Generalization Guarantees
Christopher Jung
University of Pennsylvania, Philadelphia, PA, USA
chrjung@seas.upenn.edu
Katrina Ligett
The Hebrew University, Jerusalem, Israel
katrina@cs.huji.ac.il
Seth Neel
University of Pennsylvania, Philadelphia, PA, USA
sethneel@wharton.upenn.edu
Aaron Roth
University of Pennsylvania, Philadelphia, PA, USA
aaroth@cis.upenn.edu
Saeed Sharifi-Malvajerdi
University of Pennsylvania, Philadelphia, PA, USA
saeedsh@wharton.upenn.edu
Moshe Shenfeld
The Hebrew University, Jerusalem, Israel
moshe.shenfeld@mail.huji.ac.il

Abstract
We give a new proof of the “transfer theorem” underlying adaptive data analysis: that any mechanism
for answering adaptively chosen statistical queries that is differentially private and sample-accurate
is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we
expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation of
any query on the conditional distribution on datasets induced by the transcript of the interaction is
close to its expectation on the data distribution, and 2) sample accuracy on its own ensures that any
query answer produced by the mechanism is close to the expectation of the query on the conditional
distribution. This second claim follows from a thought experiment in which we imagine that the
dataset is resampled from the conditional distribution after the mechanism has committed to its
answers. The transfer theorem then follows by summing these two bounds, and in particular, avoids
the “monitor argument” used to derive high probability bounds in prior work.
An upshot of our new proof technique is that the concrete bounds we obtain are substantially
better than the best previously known bounds, even though the improvements are in the constants,
rather than the asymptotics (which are known to be tight). As we show, our new bounds outperform
the naive “sample-splitting” baseline at dramatically smaller dataset sizes compared to the previous
state of the art, bringing techniques from this literature closer to practicality.

2012 ACM Subject Classification Theory of computation → Sample complexity and generalization
bounds

Keywords and phrases Differential Privacy, Adaptive Data Analysis, Transfer Theorem

Digital Object Identifier 10.4230/LIPIcs.ITCS.2020.31

Related Version arXiv version available at https://arxiv.org/abs/1909.03577.

Funding Christopher Jung: Supported in part by NSF grant AF-1763307.

Katrina Ligett: Supported in part by Israel Science Foundation (ISF) grant #1044/16, the United
States Air Force and DARPA under contracts FA8750-16-C-0022 and FA8750-19-2-0222, and the
Federmann Cyber Security Center in conjunction with the Israel national cyber directorate.
© Christopher Jung, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, and Moshe
Shenfeld;
licensed under Creative Commons License CC-BY
11th Innovations in Theoretical Computer Science Conference (ITCS 2020).
Editor: Thomas Vidick; Article No. 31; pp. 31:1–31:17
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
31:2 A New Analysis of Differential Privacy’s Generalization Guarantees

Seth Neel: Supported in part by an NSF Graduate Research Fellowship.

Aaron Roth: Supported in part by NSF grant AF-1763314, the United States Air Force and DARPA
under Contract No FA8750-16-C-0022, and a grant from the Sloan Foundation.
Moshe Shenfeld: Supported in part by Israel Science Foundation (ISF) grant #1044/16, the United
States Air Force and DARPA under contracts FA8750-16-C-0022 and FA8750-19-2-0222, and the
Federmann Cyber Security Center in conjunction with the Israel national cyber directorate. Any
opinions,findings and conclusions or recommendations expressed in this materialare those of the
author(s) and do not necessarily reflect the views ofthe United States Air Force and DARPA.

Acknowledgements We thank Adam Smith for helpful conversations at an early stage of this work,
and Daniel Roy for helpful feedback on the presentation of the result.

1 Introduction

Many data analysis pipelines are adaptive: the choice of which analysis to run next depends on
the outcome of previous analyses. Common examples include variable selection for regression
problems and hyper-parameter optimization in large-scale machine learning problems: in
both cases, common practice involves repeatedly evaluating a series of models on the same
dataset. Unfortunately, this kind of adaptive re-use of data invalidates many traditional
methods of avoiding over-fitting and false discovery, and has been blamed in part for the
recent flood of non-reproducible findings in the empirical sciences [14].
There is a simple way around this problem: don’t re-use data. This idea suggests a
baseline called data splitting: to perform k analyses on a dataset, randomly partition the
dataset into k disjoint parts, and perform each analysis on a fresh part. The standard
“holdout method” is the special case of k = 2. Unfortunately, this natural baseline makes
poor use of data: in particular, the data requirements of this method grow linearly with the
number of analyses k to be performed.
A recent literature starting with Dwork et al. [6] shows how to give a significant asymptotic
improvement over this baseline via a connection to differential privacy: rather than computing
and reporting exact sample quantities, perturb these quantities with noise. This line of
work established a powerful transfer theorem, that informally says that any analysis that
is simultaneously differentially private and accurate in-sample will also be accurate out-of-
sample. The best analysis of this technique shows that for a√broad class of analyses and a
target accuracy goal, the data requirements grow only with k – a quadratic improvement
over the baseline [1]. Moreover, it is known that in the worst case, this cannot be improved
asymptotically [15, 23]. Unfortunately, thus far this literature has had little impact on
practice. One major reason for this is that although the more sophisticated techniques from
this literature give asymptotic improvements over the sample-splitting baseline, the concrete
bounds do not actually improve on the baseline until the dataset is enormous. This remains
true even after optimizing the constants that arise from the arguments of Dwork et al. [6] or
Bassily et al. [1], and appears to be a fundamental limitation of their proof techniques [20].
In this paper, we give a new proof of the transfer theorem connecting differential privacy
and in-sample accuracy to out-of-sample accuracy. Our proof is based on a simple insight
that arises from imagining a “resampling” experiment, and in particular yields an improved
concrete bound that beats the sample-splitting baseline at dramatically smaller data set
sizes n compared to prior work. In fact, at reasonable dataset sizes, the magnitude of the
improvement arising from our new theorem is significantly larger than the improvement
between the bounds of Bassily et al. [1] and Dwork et al. [6]: see Figure 1.
C. Jung, K. Ligett, S. Neel, A. Roth, S. Sharifi-Malvajerdi, and M. Shenfeld 31:3

Max Queries Answered with Width = 0.1 and Uniform Coverage 95.0%
25000
Sample Splitting Baseline
Optimized BNSSSU
20000 DFHPRR
Our Bound
Number of Queries k

15000

10000

5000

0
200000 400000 600000 800000 1000000 1200000 1400000 1600000
Data Set Size n
Figure 1 A comparison of the number of adaptive linear queries that can be answered using the
Gaussian mechanism as analyzed by our transfer theorem (Theorem 9), the numerically optimized
variant of the bound from Bassily et al. [1] (Optimized BNSSSU) as derived in [20], and the original
transfer theorem from Dwork et al. [6] (DFHPRR). We plot for each dataset size n, the number
of queries k that can be answered while guaranteeing confidence intervals around the answer that
have width α = 0.1 and uniform coverage probability 1 − β = 0.95. We compare with the naive
sample splitting baseline that simply splits the dataset into k pieces and answers each query with
the empirical answer on a fresh piece.

1.1 Proof Techniques

Prior Work

Consider an unknown data distribution P over a data-domain X , and a dataset S ∼ P n

consisting of n i.i.d. draws from P. It is a folklore observation (attributed to Frank
McSherry) that if a predicate q : X → [0, 1] is selected by an -differentially private algorithm
M acting on S, then it will generalize in expectation (or have low bias) in the sense that
| Eq∼M (S) [Ex∼P [q(x)] − n1 x∈S q(x)]| ≈ . But bounds on bias are not enough to yield tight
P

confidence intervals, and so prior work has focused on strengthening the above observation
into a high probability bound. For small , the optimal bound has the asymptotic form:
2
Prq∼M (S) [| Ex∼P [q(x)] − n1 x∈S q(x)| ≥ ] ≤ e−O( n) [1]. Note that this bound does not
P

refer to the estimated answers supplied to the data analyst: it says only that a differentially
private data analyst is unlikely to be able to find a query whose average value on the dataset
differs substantially from its expectation. Pairing this with a simultaneous high probability
bound on the in-sample accuracy of a mechanism – that it supplies answers a such that with
high probability the empirical error is small: Pra∼M (S) [|a − n1 x∈S q(x)| ≥ α] ≤ β – yields
P

a bound on out-of-sample accuracy via the triangle inequality.

Dwork et al. [6] proved their high probability bound via a direct computation on the
moments of empirical query values, but this technique was unable to achieve the optimal
rate. Bassily et al. [1] proved a bound with the optimal rate by introducing the ingenious
monitor technique. This important technique has subsequently found other uses [24, 19, 13],
but is a heavy hammer that seems unavoidably to yield large constant overhead, even after
numeric optimization [20].

ITCS 2020
31:4 A New Analysis of Differential Privacy’s Generalization Guarantees

Our Approach
We take a fundamentally different approach by directly providing high probability bounds
on the out-of-sample accuracy |a − Ex∼P [q(x)]| of mechanisms that are both differentially
private and accurate in-sample. Our elementary approach is motivated by the following
thought experiment: in actuality, the dataset S is fixed before any interaction with M begins.
However, imagine that after the entire interaction with M is complete, the dataset S is
resampled from the conditional distribution Q on datasets conditioned on the output of M .
This thought experiment doesn’t alter the joint distribution on datasets and outputs, and
so any in-sample accuracy guarantees that M has continue to hold under this hypothetical
re-sampling experiment. But because the empirical value of the queries on the re-sampled
dataset are likely to be close to their expected value over the conditional distribution Q, the
only way the mechanism can promise to be sample-accurate with high probability is if it
provides answers that are close to their expected value over the conditional distribution with
high probability.
This focuses attention on the conditional distribution on datasets induced by differentially
private transcripts. But it is not hard to show that a consequence of differential privacy
is that the conditional expectation of any query must be close to its expectation over the
data distribution with high probability. In contrast to prior work, this argument directly
leverages high-probability in-sample accuracy guarantees of a private mechanism to derive
high-probability out-of-sample guarantees, without the need for additional machinery like
the monitor argument of Bassily et al. [1].

1.2 Further Related Work

The study of “adaptive data analysis” was initiated by Dwork et al. [6, 5] who provided upper
bounds via a connection to differential privacy, and Hardt and Ullman [15] who provided
lower bounds via a connection to fingerprinting codes. The upper bounds were subsequently
strengthened by Bassily et al. [1], and the lower bounds by Steinke and Ullman [23] to be
(essentially) matching, asymptotically. The upper bounds were optimized by Rogers et al. [20],
which we use in our comparisons. Subsequent work proved transfer theorems related to other
quantities like description length bounds [4] and compression schemes [3], and expanded the
types of analyses whose generalization properties we could reason about via a connection
to a quantity called approximate max information [4, 21]. Feldman and Steinke [11, 12]
give improved methods that could guarantee out-of-sample accuracy bounds that depended
on query variance. Neel and Roth [17] extend the transfer theorems from this literature
to the related problem of adaptive data gathering, which was identified by Nie et al. [18].
Ligett and Shenfeld [16] give an algorithmic stability notion they call local statistical stability
(also defined with respect to a conditional data distribution) that they show asymptotically
characterizes the ability of mechanisms to offer high probability out-of-sample generalization
guarantees for linear queries. A related line of work initiated by Russo and Zou [22] and
extended by Xu and Raginsky [25] starts with weaker assumptions on the mechanism (mutual
information bounds), and derives weaker conclusions (bounds on bias, rather than high
probability generalization guarantees).
A more recent line of work aims at mitigating the fact that the worst-case bounds deriving
from transfer theorems do not give non-trivial guarantees on reasonably sized datasets. Zrnic
and Hardt [26] show that better bounds can be derived under the assumption that the data
analyst is restricted in various ways to not be fully adaptive. Feldman et al. [10] show that
overfitting by a classifier because of test-set re-use is mitigated in multi-label prediction
C. Jung, K. Ligett, S. Neel, A. Roth, S. Sharifi-Malvajerdi, and M. Shenfeld 31:5

problems, compared to binary prediction problems. Rogers et al. [20] give a method for
certifying the correctness of heuristically guessed confidence intervals, which they show often
out-perform the theoretical guarantees by orders of magnitude.
Finally, Elder [9, 8] proposes a Bayesian reformulation of the adaptive data analysis
problem. In the model of [9], the data distribution P is assumed to itself be drawn from a
prior that is commonly known to the data analyst and mechanism. In contrast, we work
in the standard adversarial setting originally introduced by Dwork et al. [6] in which the
mechanism must offer guarantees for worst case data distributions and analysts, and focus
our attention on conditional distributions purely as a proof technique.

2 Preliminaries

Let X be an abstract data domain, and let P be an arbitrary distribution over X . A dataset
of size n is a collection of n data records: S = {Si }ni=1 ∈ X n . We study datasets sampled
i.i.d. from P: S ∼ P n . We will write S to denote the random variable and x for realizations
of this random variable. A linear query is a function q : X ∗ → [0, 1] that takes the following
empirical average form when acting on a data set S ∈ X n :
n
1X
q(S) = q(Si ).
n i=1

We will be interested in estimating the expectations of linear queries over P. Abusing

notation, given a distribution D over datasets, we write q(D) to denote the expectation of q
over datasets drawn from D, and write Si ∼ S to denote a datapoint sampled uniformly at
random from a dataset S. Note that for linear queries we have:

q(D) = E [q(S)] = E [q(Si )]

S∼D S∼D,Si ∼S

We note that for linear queries, when the dataset distribution D = P n , we have q(P n ) =
Ex∼P [q(x)], which we write as q(P) when the notation is clear from context. However, the
more general definition will be useful because we will need to evaluate the expectation of q
over other (non-product) distributions over datasets in our arguments, and we will generalize
beyond linear queries in Appendices A.1 and A.2.
Given a family of queries Q, a statistical estimator is a (possibly stateful) randomized
algorithm M : X n ×Q∗ → R∗ parameterized by a dataset S that interactively takes as input a
stream of queries qi ∈ Q, and provides answers ai ∈ R. An analyst is an arbitrary randomized
algorithm A : R∗ → Q∗ that generates a stream of queries and receives a stream of answers
(which can inform the next queries it generates). When an analyst interacts with a statistical
estimator, they generate a transcript of their interaction π ∈ Π where Π = (Q × R)∗ is the
space of all transcripts. Throughout we write Π to denote the transcript’s random variable
and π for its realizations.
The interaction is summarized in Algorithm 1, and we write Interact(M, A; S) to refer
to it. When M and A are clear from context, we will abbreviate this notation and write
simply I(S). When we refer to an indexed query qj , this is implicitly a function of the
transcript π. Given a transcript π ∈ Π, write Qπ to denote the conditional distribution
on datasets conditional on Π = π: Qπ = (P n )|Interact(M, A; S) = π. Note that Qπ will
no longer generally be a product distribution. We will be interested in evaluating uniform
accuracy bounds, which control the worst-case error over all queries:

ITCS 2020
31:6 A New Analysis of Differential Privacy’s Generalization Guarantees

Algorithm 1 Interact(M, A; S): An Analyst Interacting with a Statistical Estimator to

Generate a Transcript.
Input: A statistical estimator M , an analyst A, and a dataset S ∈ X n .
1 for t = 1 to k do
2 The analyst generates a query qt ← A(a1 , . . . , at−1 ) and sends it to the statistical
estimator;
3 The statistical estimator generates an answer at ← M (S; qt );
4 return Π = ((q1 , a1 ), . . . , (qk , ak )).

I Definition 1 (Accuracy). M satisfies (α, β)-sample accuracy if for every data analyst A
and every data distribution P,
Pr [max |qj (S) − aj | ≥ α] ≤ β.
S∼P n ,Π∼Interact(M,A;S) j

We say M satisfies (α, β)-distributional accuracy if for every data analyst A and every data
distribution P,
Pr [max |qj (P n ) − aj | ≥ α] ≤ β.
S∼P n ,Π∼Interact(M,A;S) j

We will be interested in interactions I that satisfy differential privacy.

I Definition 2 (Differential Privacy [7]). Two datasets S, S 0 ∈ X n are neighbors if they differ
in at most one coordinate. An interaction Interact(M, · ; ·) satisfies (, δ)-differential privacy
if for all data analysts A, pairs of neighboring datasets S, S 0 ∈ X n , and for all events E ⊆ Π:
Pr [Π ∈ E] ≤ e · Pr [Π ∈ E] + δ.
Π∼Interact(M,A;S) Π∼Interact(M,A;S 0 )

If Interact(M, · ; ·) satisfies (, δ)-differential privacy, we will also say that M satisfies (, δ)-
differential privacy.
We introduce a novel quantity that will be crucial to our argument: it captures the effect
of the transcript on the change in the expectation of a query contained in the transcript.
I Definition 3. An interaction Interact(M, A; ·) is called (, δ)-posterior stable if for every
data distribution P:
Pr [max |qj (P n ) − qj (QΠ )| ≥ ] ≤ δ.
S∼P n ,Π∼Interact(M,A;S) j

3 An Elementary Proof of the Transfer Theorem

3.1 A General Transfer Theorem
In this section we prove a general transfer theorem for sample accurate mechanisms with low
posterior stability. In Section 3.2 we prove that differentially private mechanisms have low
posterior stability.
I Theorem 4 (General Transfer Theorem). Suppose that Interact (M, A; ·) is an (α, β)-sample
accurate, (, δ)-posterior stable interaction. Then for every c > 0 it also satisfies:
β
Pr [max |aj − qj (P)| > α + c + ] ≤ +δ
S∼P n , Π∼Interact (M,A;S) j c
β
i.e. it is (α0 , β 0 )-distributionally accurate for α0 = α + c + and β 0 = c + δ.
C. Jung, K. Ligett, S. Neel, A. Roth, S. Sharifi-Malvajerdi, and M. Shenfeld 31:7

The theorem follows easily from a change in perspective driven by an elementary observa-
tion. Imagine that after the interaction is run and results in a transcript π, the dataset S is
resampled from its conditional distribution Qπ . This does not change the joint distribution
on datasets and transcripts. This simple claim is formalized below: its elementary proof
appears in Appendix B.

I Lemma 5 (Resampling Lemma). Let E ⊆ X n × Π be any event. Then:

Pr [(S, Π) ∈ E] = Pr [(S 0 , Π) ∈ E]
S∼P n ,Π∼I(S) S∼P n ,Π∼I(S),S 0 ∼QΠ

The change in perspective suggested by the resampling lemma makes it easy to see why
the following must be true: any sample-accurate mechanism must in fact be accurate with
respect to the conditional distribution it induces. This is because if it can first commit
to answers, and guarantee that they are sample-accurate after the dataset is resampled
from the conditional, the answers it committed to must have been close to the conditional
means, because it is likely that the empirical answers on the resampled dataset will be. This
argument is generic and does not use differential privacy.

I Lemma 6. Suppose that M is (α, β)-sample accurate. Then for every c > 0 it also satisfies:

β
Pr [max |aj − qj (QΠ )| > α + c] ≤
S∼P n , Π∼Interact (M,A;S) j c

Proof. Denote by j ∗ (π) = arg max|aj − qj (Qπ )|. Given α ≥ 0 and c > 0, and expanding the
j
definition of qj ∗ (Π) (QΠ ) we get:

Pr aj ∗ (Π) − qj ∗ (Π) (QΠ ) > α + c
S∼P n ,Π∼I(S)

0

= Pr E aj (Π) − qj (Π) (S ) − α > c
∗ ∗
S∼P n ,Π∼I(S) S 0 ∼QΠ

0

≤ Pr 0
E max aj ∗ (Π) − qj ∗ (Π) (S ) − α, 0 > c
S∼P ,Π∼I(S) S ∼QΠ
n

(1) 1

0

≤ E E max aj ∗ (Π) − qj ∗ (Π) (S ) − α, 0
c S∼P n ,Π∼I(S) S 0 ∼QΠ
(2) 1

aj ∗ (Π) − qj ∗ (Π) (S 0 ) − α > 0

≤ E Pr
c S∼P ,Π∼I(S) S ∼QΠ
n 0

1
aj ∗ (Π) − qj ∗ (Π) (S 0 ) > α

= Pr
c S∼P n ,Π∼I(S),S 0 ∼QΠ
(3) 1
= Pr a j ∗ (Π) − qj ∗ (Π) (S) > α
c S∼P n ,Π∼I(S)

Here, inequality (1) follows from Markov’s inequality, inequality (2) follows from the fact that
aj ∗ (Π) − qj ∗ (Π) (S 0 ) − α ≤ 1, and equality 3 follows from the Resampling Lemma (Lemma 5).
Repeating this argument for qj ∗ (Π) (QΠ ) − aj ∗ (Π) yields a symmetric bound, so by combining
the two with the guarantee of (α, β)-sample accuracy we get,
1
Pr aj ∗ (Π) − qj ∗ (Π) (QΠ ) > α + c ≤ Pr aj ∗ (Π) − qj ∗ (Π) (S) > α
S∼P n ,Π∼I(S) c S∼P ,Π∼I(S)
n

β
≤ J
c

ITCS 2020
31:8 A New Analysis of Differential Privacy’s Generalization Guarantees

Because sample accuracy implies accuracy with respect to the conditional distribution,
together with a bound on posterior stability, the transfer theorem follows immediately:

Proof of Theorem 4. By the triangle inequality:

max |aj − qj (P)| ≤ max |ai − qi (QΠ )| + max |ql (QΠ ) − ql (P)|.
j i l

Lemma 6 bounds the first term by α + c with probability 1 − βc over Π, and the definition
of posterior stability bounds the second term by with probability 1 − δ over Π, which
concludes the proof. J

3.2 A Transfer Theorem for Differential Privacy

In this section we prove a transfer theorem for differentially private mechanisms by demon-
strating that they have low posterior stability and applying our general transfer theorem.
We here show that differentially private mechanisms have low posterior stability for
linear queries. In the Appendix we extend this argument to low-sensitivity and optimization
queries.

I Lemma 7. If M is (, δ)-differentially private, then for any data distribution P, any
analyst A, and any constant c > 0:

δ
Pr max |qj (QΠ ) − qj (P)| > (e − 1) + 2c ≤
S∼P n ,Π∼Interact (M,A;S) j c

i.e. it is (0 , δ 0 )-posterior stable for every data analyst A, where 0 = e − 1 + 2c and δ 0 = δc .

Proof. Given a transcript π ∈ Π, let j ∗ (π) ∈ arg maxj |qj (Qπ ) − qj (P)|. Define for an α > 0:

Πα = π ∈ Π| qj ∗ (π) (Qπ ) − qj ∗ (π) (P) > α

X + (π) = x ∈ X| Pr [Si = x] > Pr [Si = x]
S∼Qπ ,Si ∼S Si ∼P

[
Bα+ = X + (π) × {π}

π∈Πα

Π+ +

α (x) = π ∈ Π| (x, π) ∈ Bα

Fix any α. Suppose that Pr qj∗ (Π) (QΠ ) − qj∗ (Π) (P) > α > δc . We must have that either

δ δ
Pr qj ∗ (Π) (QΠ ) − qj ∗ (Π) (P) > α > 2c or Pr qj ∗ (Π) (P) − qj ∗ (Π) (QΠ ) > α > 2c . Without
loss of generality, assume

δ
Pr qj ∗ (Π) (QΠ ) − qj ∗ (Π) (P) > α = Pr [Π ∈ Πα ] > (1)
2c

Let Si be the random variable obtained by first sampling S ∼ P n and then sampling Si ∈ S
uniformly at random. We compare the probability measure of Bα+ under the joint distribution
on Si and Π with its corresponding measure under the product distribution of Si and Π:
C. Jung, K. Ligett, S. Neel, A. Roth, S. Sharifi-Malvajerdi, and M. Shenfeld 31:9

(Si , Π) ∈ Bα+ − Pr (Si , Π) ∈ Bα+

Pr
(Si ,Π) Si ⊗Π
X X
= Pr[Π = π] (Pr[Si = x|Π = π] − Pr[Si = x])
π∈Πα x∈X + (π)
X X
≥ Pr[Π = π] qj ∗ (π) (x) (Pr[Si = x|Π = π] − Pr[Si = x])
π∈Πα x∈X + (π)
X X
≥ Pr[Π = π] qj ∗ (π) (x) (Pr[Si = x|Π = π] − Pr[Si = x])
π∈Πα x∈X
X
= Pr[Π = π] qj ∗ (π) (Qπ ) − qj ∗ (π) (P)
π∈Πα

> α · Pr [Π ∈ Πα ]

On the other hand, using the definition of (, δ)-differential privacy (See Lemma 21 for the
elementary derivation of the first inequality):

(Si , Π) ∈ Bα+ − Pr (Si , Π) ∈ Bα+

Pr
(Si ,Π) Si ⊗Π
X
Pr[Si = x] Pr Π ∈ Π+ +

= α (x)|Si = x − Pr Π ∈ Πα (x)
x∈X
X
Pr[Si = x] (e − 1) Pr Π ∈ Π+

≤ α (x) + δ
x∈X

= (e − 1) Pr (Si , Π) ∈ Bα+ + δ

Si ⊗Π

≤ (e − 1) Pr [Π ∈ Πα ] + δ
< (e − 1) Pr [Π ∈ Πα ] + 2c Pr [Π ∈ Πα ] (by Equation (1))

= ((e − 1) + 2c) · Pr [Π ∈ Πα ]

This is a contradiction for α ≥ (e − 1) + 2c. J

I Remark 8. Note
1. Since differential privacy is closed under post processing, this claim can be generalized
beyond queries contained in the transcript to any query generated as function of the
transcript.
2. In the case of (, 0)-differential privacy, choosing c = 0, the claim holds for every query
with probability 1.

Combined with our general transfer theorem (Theorem 4), this directly yields a transfer
theorem for differential privacy:

I Theorem 9 (Transfer Theorem for (, δ)-Differential Privacy). Suppose that M is (, δ)-
differentially private and (α, β)-sample accurate for linear queries. Then for every analyst A
and c, d > 0 it also satisfies:

β δ
Pr [max |aj − qj (P)| > α + (e − 1) + c + 2d] ≤ +
S∼P n , Π∼Interact (M,A;S) j c d

β
i.e. it is (α0 , β 0 )-distributionally accurate for α0 = α + (e − 1) + c + 2d and β 0 = c + dδ .

ITCS 2020
31:10 A New Analysis of Differential Privacy’s Generalization Guarantees

I Remark 10. As we will see in Section 4, the Gaussian mechanism (and many other
differentially private mechanisms) has a sample accuracy bound that depends only on the
square root of the log of both 1/β and 1/δ. Thus, despite the Markov-like term β 0 = βc + dδ
in the above transfer theorem, together with the sample accuracy bounds of the Gaussian
mechanism, it yields Chernoff-like concentration.
Our technique extends easily to reason about arbitrary low sensitivity queries and
minimization queries. See Appendix A.1 and A.2 for more details.

4 Applications: The Gaussian Mechanism

We now apply our new transfer theorem to derive the concrete bounds that we plotted in
Figure 1. The Gaussian mechanism is extremely simple and has only a single parameter σ:
for each query qi that arrives, the Gaussian mechanism returns the answer ai ∼ N (qi (S), σ 2 )
where N (qi (S), σ 2 ) denotes the Gaussian distribution with mean qi (S) and standard deviation
σ. First, we recall the differential privacy properties of the Gaussian mechanism.

I Theorem 11 ([2]). When used to answer k linear queries, for every 0 < δ < 1, the
Gaussian mechanism with parameter σ satisfies (, δ)-differential privacy for:
v !
u r
k u k k
= 2 2 + t2 2 2 log π · 2 2 /δ
2n σ n σ 2n σ

It is also easy to see that the sample-accuracy of the Gaussian mechanism is characterized
by the CDF of the Gaussian distribution:

I Lemma 12. For any 0 < β < 1, the Gaussian mechanism with parameter σ is (αG , β)-
sample accurate for:
1/k ! s √
√ √ √

β β 2k
αG = 2σ · erfc−1 2−2 1− < 2σ · erfc−1 < 2σ log .
2 k πβ

Above, erfc(x) = 1 − erf(x) is the complementary error function.

With these quantities in hand, we can now apply Theorem 9 to derive distributional
accuracy bounds for the Gaussian mechanism:

I Theorem 13. Fix a desired confidence parameter 0 < β < 1. When σ is set optimally,
the Gaussian mechanism can be used to answer k linear queries while satisfying (α, β)-
distributional accuracy, where α is the solution to the following unconstrained minimization
problem:
 q 
√
δ k
+ 2 k
log
p
π· k
/δ

δ
2σ · erfc−1
2 2 2 2 2 2
α = min + e 2n σ n σ 2n σ
−1+6
σ,δ>0  k β 
C. Jung, K. Ligett, S. Neel, A. Roth, S. Sharifi-Malvajerdi, and M. Shenfeld 31:11

Proof. Using Theorem 9 and fixing β 0 = δ and c = d, we have that an (α0 , β 0 )-sample
accurate, (, δ)-differentially private mechanism is (α, β)-distributionally accurate for α =
α0 + (e − 1) + 3c and β = 2δ c where c can be an arbitrary parameter. For any fixed value
of β, we can take c = 2δ β , and see that we obtain (α, β)-distributional accuracy where
0
α = α + (e − 1) + 6 (δ/β). The theorem then follows from plugging in the privacy bound
from Theorem 11, the sample accuracy bound from Theorem 12, and optimizing over the
free variables σ and δ. J

5 Discussion
We have given a new proof of the transfer theorem for differential privacy that has several
appealing properties. Besides being simpler than previous arguments, it achieves substantially
better concrete bounds than previous transfer theorems, and uncovers new structural insights
about the role of differential privacy and sample accuracy. In particular, sample accuracy
serves to guarantee that the reported answers are close to their conditional means, and
differential privacy serves to guarantee that the conditional means are close to their true
answers. This focuses attention on the conditional data distribution as a key quantity of
interest, which we expect will be fruitful in future work. In particular, it may shed light on
what makes certain data analysts overfit less than worst-case bounds would suggest: because
they choose queries whose conditional means are closer to the prior than the worst-case query.
There seems to be one remaining place to look for improvement in our transfer theorem:
Lemmas 6 and 7 both exhibit a Markov-like tradeoff between a parameter c and β and
δ respectively. Although the dependence on β and δ in our ultimate bounds is only root-
logarithmic, it would still yield an improvement if this Markov-like dependence could be
replaced with a Chernoff-like dependence. It is possible to do this for the β parameter: we
give an alternative (and even simpler) proof of the transfer theorem for (, 0)-differential
privacy which shows that conditional distributions induced by private mechanisms exhibit
Chernoff-like concentration, in Appendix D. But the only way we know to extend this
argument to (, δ)-differential privacy requires dividing δ by a factor of n, which yields a
final theorem that is inferior to Theorem 9.

References
1 Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, and Jonathan
Ullman. Algorithmic stability for adaptive data analysis. In Proceedings of the forty-eighth
annual ACM symposium on Theory of Computing, pages 1046–1059. ACM, 2016.
2 Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions,
and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
3 Rachel Cummings, Katrina Ligett, Kobbi Nissim, Aaron Roth, and Zhiwei Steven Wu. Adaptive
learning with robust generalization guarantees. In Conference on Learning Theory, pages
772–814, 2016.
4 Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi, Omer Reingold, and Aaron
Roth. Generalization in adaptive data analysis and holdout reuse. In Advances in Neural
Information Processing Systems, pages 2350–2358, 2015.
5 Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and
Aaron Roth. The reusable holdout: Preserving validity in adaptive data analysis. Science,
349(6248):636–638, 2015.
6 Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and
Aaron Leon Roth. Preserving statistical validity in adaptive data analysis. In Proceed-
ings of the forty-seventh annual ACM symposium on Theory of computing, pages 117–126.
ACM, 2015.

ITCS 2020
31:12 A New Analysis of Differential Privacy’s Generalization Guarantees

7 Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to
sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284.
Springer, 2006.
8 Sam Elder. Bayesian adaptive data analysis guarantees from subgaussianity. arXiv preprint,
2016. arXiv:1611.00065.
9 Sam Elder. Challenges in bayesian adaptive data analysis. arXiv preprint, 2016. arXiv:
1604.02492.
10 Vitaly Feldman, Roy Frostig, and Moritz Hardt. The advantages of multiple classes for
reducing overfitting from test set reuse. In International Conference on Machine Learning,
pages 1892–1900, 2019.
11 Vitaly Feldman and Thomas Steinke. Generalization for Adaptively-chosen Estimators via
Stable Median. In Conference on Learning Theory, pages 728–757, 2017.
12 Vitaly Feldman and Thomas Steinke. Calibrating Noise to Variance in Adaptive Data Analysis.
In Conference On Learning Theory, pages 535–544, 2018.
13 Vitaly Feldman and Jan Vondrak. Generalization bounds for uniformly stable algorithms. In
Advances in Neural Information Processing Systems, pages 9747–9757, 2018.
14 Andrew Gelman and Eric Loken. The Statistical Crisis in Science. American Scientist,
102(6):460, 2014.
15 Moritz Hardt and Jonathan Ullman. Preventing false discovery in interactive data analysis
is hard. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages
454–463. IEEE, 2014.
16 Katrina Ligett and Moshe Shenfeld. A necessary and sufficient stability notion for adaptive
generalization. arXiv preprint, 2019. arXiv:1906.00930.
17 Seth Neel and Aaron Roth. Mitigating Bias in Adaptive Data Gathering via Differential
Privacy. In International Conference on Machine Learning (ICML), 2018.
18 Xinkun Nie, Xiaoying Tian, Jonathan Taylor, and James Zou. Why Adaptively Collected
Data Have Negative Bias and How to Correct for It. In International Conference on Artificial
Intelligence and Statistics, pages 1261–1269, 2018.
19 Kobbi Nissim and Uri Stemmer. Concentration Bounds for High Sensitivity Functions Through
Differential Privacy. Journal of Privacy and Confidentiality, 9(1), 2019.
20 Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, and Blake Woodworth.
Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis. arXiv preprint,
2019. arXiv:1906.09231.
21 Ryan Rogers, Aaron Roth, Adam Smith, and Om Thakkar. Max-information, differential
privacy, and post-selection hypothesis testing. In 2016 IEEE 57th Annual Symposium on
Foundations of Computer Science (FOCS), pages 487–494. IEEE, 2016.
22 Daniel Russo and James Zou. Controlling bias in adaptive data analysis using information
theory. In Artificial Intelligence and Statistics, pages 1232–1240, 2016.
23 Thomas Steinke and Jonathan Ullman. Interactive fingerprinting codes and the hardness of
preventing false discovery. In Conference on Learning Theory, pages 1588–1628, 2015.
24 Thomas Steinke and Jonathan Ullman. Subgaussian tail bounds via stability arguments. arXiv
preprint, 2017. arXiv:1701.03493.
25 Aolin Xu and Maxim Raginsky. Information-theoretic analysis of generalization capability of
learning algorithms. In Advances in Neural Information Processing Systems, pages 2524–2533,
2017.
26 Tijana Zrnic and Moritz Hardt. Natural Analysts in Adaptive Data Analysis. In International
Conference on Machine Learning, pages 7703–7711, 2019.
C. Jung, K. Ligett, S. Neel, A. Roth, S. Sharifi-Malvajerdi, and M. Shenfeld 31:13

A Extensions
A.1 Low Sensitivity Queries
Our technique extends easily to reason about arbitrary low sensitivity queries. We only need
to generalize our lemma about posterior stability.
I Definition 14. A query q : X n → R is called ∆-sensitive if for all pairs of neighbouring
datasets S, S 0 ∈ X n : |q(S) − q(S 0 )| ≤ ∆. Note that linear queries are (1/n)-sensitive.
I Lemma 15. If M is an (, δ)-differentially private mechanism for answering ∆-sensitive
queries, then for any data distribution P, analyst A, and any constant c > 0:

n δ
Pr max |qj (QΠ ) − qj (P )| > (e − 1 + 4c)n∆ ≤
S∼P n ,Π∼Interact (M,A;S)) j c

i.e. it is (0 , δc )-posterior stable for every A, where 0 = (e − 1 + 4c)n∆.

Proof. We introduce a useful bit of notation: q̄ (x≤i ) = E [q ((x≤i , S 0 ))]. Notice
S 0 ∼P n−i
that q̄ (x≤0 ) = q (P n ) and q̄ (x≤n ) = q (x). Given a transcript π ∈ Π, let j ∗ (π) ∈
arg maxj |qj (Qπ ) − qj (P n )|. Denote for any α ≥ 0

Πα = π ∈ Π | qj ∗ (π) (Qπ ) − qj ∗ (π) (P n ) > α

and for any z ∈ [0, 2∆] denote

Πα,z (x≤i ) = π ∈ Πα | q̄j ∗ (π) (x≤i ) − q̄j ∗ (π) (x≤i−1 ) > z − ∆

From the definition of differential privacy:

" #
X
En Pr [Π = π] q̄j ∗ (π) (S≤i ) − q̄j ∗ (π) (S≤i−1 ) + ∆
S∼P Π∼I(S)
π∈Πα
"Z #
2∆
= E Pr [Π ∈ Πα,z (S≤i )] dz
S∼P n 0 Π∼I(S)
"Z #
2∆
≤ E e Pr [Π ∈ Πα,z (S≤i )] + δ dz
S∼P n ,Y ∼P 0 Π∼I(S i←Y )
" #
X
e

= E Pr [Π = π] q̄j ∗ (π) (S≤i ) − q̄j ∗ (π) (S≤i−1 ) + ∆ + 2∆δ
S∼P n ,Y ∼P Π∼I(S i←Y )
π∈Πα
" #
X
i←Y

= E e Pr [Π = π] q̄j ∗ (π) S≤i − q̄j ∗ (π) (S≤i−1 ) + ∆ + 2∆δ
S∼P n ,Y ∼P Π∼I(S)
π∈Πα

where S i←Y = (S1 , . . . , Si−1 , Y, Si+1 , . . . , Sn ), and the last equality follows from the obser-
vation that (S, Y ) and (S i←Y , Si) are identically
distributed. Since Y ∼ P, independently
i←Y
from Π, we get that EY ∼P q̄j ∗ (π) S≤i = q̄j ∗ (π) (S≤i−1 ), so
" #
X
E Pr [Π = π] q̄j ∗ (π) (S≤i ) − q̄j ∗ (π) (S≤i−1 ) + ∆
S∼P n Π∼I(S)
π∈Πα

≤ E e Pr [Π ∈ Πα ] + 2δ ∆
S∼P n Π∼I(S)

= (e Pr [Π ∈ Πα ] + 2δ) ∆

ITCS 2020
31:14 A New Analysis of Differential Privacy’s Generalization Guarantees

Subtracting ∆ Pr [Π ∈ Πα ] from both sides we get

" #
X
Pr [Π = π] q̄j ∗ (π) (S≤i ) − q̄j ∗ (π) (S≤i−1 ) ≤ ((e − 1) Pr [Π ∈ Πα ] + 2δ) ∆

En
S∼P Π∼I(S)
π∈Πα
(2)

We now chooose α = (e − 1 + 4c) n∆. Suppose that Pr qj ∗ (Π) (QΠ ) − qj ∗ (Π) (P n ) > α
> δc. We must have that either Pr δ

qj ∗ (Π) (QΠ ) − qj ∗ (Π) (P n ) > α > 2c or
δ
n

Pr qj ∗ (Π) (P ) − qj ∗ (Π) (QΠ ) > α > 2c . Without loss of generality, assume
δ
Pr qj ∗ (Π) (QΠ ) − qj ∗ (Π) (P n ) > α = Pr [Π ∈ Πα ] >

(3)
2c
But this leads to a contradiction, since
Pr [Π ∈ Πα ] (e − 1 + 4c) n∆
X
Pr [Π = π] qj ∗ (π) (Qπ ) − qj ∗ (π) (P n )

<
π∈Πα
" #
X
n

= E Pr [Π = π] qj ∗ (π) (S) − qj ∗ (π) (P )
S∼P n Π∼I(S)
π∈Πα
n
" #
X X
= E Pr [Π = π] q̄j ∗ (π) (S≤i ) − q̄j ∗ (π) (S≤i−1 )
S∼P n Π∼I(S)
i=1 π∈Πα

≤ ((e − 1) Pr [Π ∈ Πα ] + 2δ) n∆ (by Equation (2))

< Pr [Π ∈ Πα ] (e − 1 + 4c) n∆ (by Equation (3)) J
We can combine this Lemma with Lemma 6 (which holds for any query type) to get our
transfer theorem:
I Theorem 16 (Transfer Theorem for Low Sensitivity Queries). Suppose that M is (, δ)-
differentially private and (α, β)-sample accurate for ∆-sensitive queries. Then for every
analyst A, c, d > 0 it also satisfies:

n β δ
Pr max |a j − qj (P )| > α + c + (e − 1 + 4d)n∆ ≤ +
n
S∼P , Π∼Interact (M,A;S) j c d
β
i.e. it is (α0 , β 0 )-distributionally accurate for α0 = α + c + (e − 1 + 4d)n∆ and β 0 = c + dδ .

A.2 Minimization Queries

I Definition 17. Minimization queries are specified by a loss function L : X n × Θ → [0, 1]
where Θ is generally known as the “parameter space”. An answer to a minimization query
L is a parameter θ ∈ Θ. We work with ∆-sensitive minimization queries: for all pairs of
neighbouring datasets S, S 0 ∈ X n and all θ ∈ Θ, |L(S, θ) − L(S 0 , θ)| ≤ ∆.
A mechanism M is (α, β)-sample accurate for minimization queries if for every data
analyst A and every dataset S ∈ X n :

Pr max Lj (S, θj ) − min Lj (S, θ) ≥ α ≤ β
Π∼Interact(M,A;S) j θ∈Θ

We say that M satisfies (α, β)-distributional accuracy for minimization queries if for every
data analyst A and every data distribution P:

0 0
n
Pr max E L j (S , θj ) − min Lj (S , θ) ≥ α ≤β
S∼P ,Π∼Interact(M,A;S) j S 0 ∼P n θ∈Θ
C. Jung, K. Ligett, S. Neel, A. Roth, S. Sharifi-Malvajerdi, and M. Shenfeld 31:15

I Remark 18. Note that

0 0 0 0
E [Lj (S , θj )] − min E [Lj (S , θ)] ≤ E Lj (S , θj ) − min Lj (S , θ)
S 0 ∼P n θ∈Θ S 0 ∼P n S 0 ∼P n θ∈Θ

So as long as the RHS is bounded, the LHS is bounded too.

I Remark 19. For a given ∆-sensitive minimization query Lj and an answer θj , define:

qj (S) := Lj (S, θj ) − min Lj (S, θ) and aj := 0

θ∈Θ

Note several things:

1. If Lj is ∆-sensitive, then qj is 2∆-sensitive.
2. The mapping from a minimization query transcript π = ((L1 , θ1 ), . . . , (Lk , θk )) to the
2∆-sensitive query transcript π 0 = ((q1 , a1 ), . . . , (qk , ak )) as defined above is a dataset-
independent post-processing π 0 = f (π).
3. π satisfies an (α, β)-accuracy guarantee if and only if π 0 does.
With the above observation, the transfer theorem for minimization queries immediately
follows by Lemma 15 and Lemma 6.

I Theorem 20 (Transfer Theorem for Minimization Queries). Suppose that M is (, δ)-
differentially private and (α, β)-sample accurate for ∆-sensitive minimization queries. Then
for every analyst A and c, d > 0 it also satisfies:

β δ
Pr max E Lj (S 0 , θj ) − min Lj (S 0 , θ) > α + c + 2(e − 1 + 4d)n∆ ≤ +
j S 0 ∼P n θ∈Θ c d
β
i.e. it is (α0 , β 0 )-distributionally accurate for α0 = α + c + 2(e − 1 + 4d)n∆ and β 0 = c + dδ .

B Details from Section 3.1

Proof of Lemma 5. This follows from the expansion of the definition, and an application of
Bayes Rule.

Pr [(S 0 , Π) ∈ E]
S∼P n ,Π∼I(S),S 0 ∼QΠ
XXX
= Pr[S = x] Pr[Π = π|S = x] 0Pr [S 0 = x0 ]1[(x0 , π) ∈ E]
S ∼Qπ
x π x0
XX
= Pr[Π = π] 0Pr [S 0 = x0 ]1[(x0 , π) ∈ E]
S ∼Qπ
π x0
XX
= Pr[Π = π] Pr[S = x0 |Π = π]1[(x0 , π) ∈ E]
π x0
XX Pr[Π = π|S = x0 ] · Pr[S = x0 ]
= Pr[Π = π] 1[(x0 , π) ∈ E]
Pr[Π = π]
π x0

= Pr [(S, Π) ∈ E] J
S∼P n ,Π∼I(S)

C Details from Section 3.2

I Lemma 21. If M is (, δ)-differentially private, then for any event E and datapoint x:

Pr [Π ∈ E|Si = x] ≤ e Pr [Π ∈ E] + δ
S∼P n ,Si ∼S,Π∼I(S) S∼P n ,Π∼I(S)

ITCS 2020
31:16 A New Analysis of Differential Privacy’s Generalization Guarantees

Proof. This follows from expanding the definitions.

n
1X
Pr [Π ∈ E|Si = x] = Pr [Π ∈ E|Si = x]
S∼P n ,Si ∼S,Π∼I(S) n i=1 S∼P n ,Π∼I(S)
n
1X X
= Pr [S = x] · Pr[Π ∈ E|S = (x−i , x)]
n i=1 n
S∼P n
x∈X
n
1X X
≤ Pr [S = x] · (e Pr[Π ∈ E|S = x] + δ)
n i=1 n
S∼P n
x∈X
= e Pr [Π ∈ E] + δ
S∼P n ,Π∼I(S)

where the inequality follows from the definition of differential privacy. J

D An (even) Simpler and Better Proof for -Differential Privacy

In this section we give an even simpler proof of an even better transfer theorem for (, 0)-
differential privacy. Rather than using Markov’s inequality as we did in the proof of Lemma
6, we can directly show that conditional distributions induced by differentially private
mechanisms exhibit Chernoff-like concentration.
I Lemma 22. If M is (, 0)-differentially private, then for any data distribution P, any
transcript π ∈ Π, any linear query q, and any η > 0:
" r #
2 ln(2/η)
Pr |q(S) − q(P)| ≥ (e − 1) + ≤η
S∼Qπ n
Pi
Proof. Define the random variables Vi = q(Si ) − E[q(Si )|S<i ], and let Xi = n1 j=1 Vj .
Then the sequence 0 = X0 , X1 , . . . , Xn forms a martingale and |Xi − Xi−1 | = n1 |Vi | ≤ n1 . We
can therefore apply Azuma’s inequality to conclude that:
" n n
# 2
1X 1X −t n
Pr q(Si ) − E [q(S i )|S<i ] ≥ t ≤ 2 exp (4)
n i=1 n i=1 2

Now fix any realization x, and consider each term: E[q(Si )|S<i = x<i ]. We have:
X
E [q(Si )|S<i = x<i ] = q(x) · Pr n [Si = x|Π = π, S<i = x<i ]
S∼Qπ S∼P
x
X PrS∼P n [Π = π|Si = x, S<i = x<i ] · PrS∼P n [Si = x]
= q(x) ·
x
Pr[Π = π|S<i = x<i ]
X

≤ e · q(x) · Pr n [Si = x]
S∼P
x

= e q(P)
where the inequality follows from the definition of (, 0)-differential privacy. Symmetrically,
we can show that ES∼Qπ [q(Si )|S<i = x<i ] ≥ e− q(P). Therefore we have that:
n
1X
e− q(P) ≤
E[q(Si )|S<i ] ≤ e q(P).
n i=1
Combining this with Equation 4 gives us that for any η > 0, with probability 1 − η when
S ∼ Qπ :
r r
2 ln(2/η) − 2 ln(2/η)
q(S) ≤ e q(P) + and q(S) ≥ e q(P) − J
n n
C. Jung, K. Ligett, S. Neel, A. Roth, S. Sharifi-Malvajerdi, and M. Shenfeld 31:17

A transfer theorem follows immediately from lemma 22.

I Theorem 23. Suppose that M is (, 0)-differentially private and (α, β)-sampleq
accurate.
2 ln(2/η)
Then for any η > 0 it is (α0 , β 0 )-distributionally accurate for α0 = α + (e − 1) + n
and β 0 = β + η.

Proof. For a given π, let j ∗ (π) = arg maxj |aj − qj (P)|. By the triangle inequality we have:

|aj ∗ (Π) − qj ∗ (Π) (P)| ≤ |aj ∗ (Π) − qj ∗ (Π) (S)| + |qj ∗ (Π) (S) − qj ∗ (Π) (P)|
≤ max |aj − qj (S)| + |qj ∗ (Π) (S) − qj ∗ (Π) (P)|
j

By the definition of (α, β)-sample accuracy, we have that with probability 1 − β, maxj |aj −
qj (S)| ≤ α. The Resampling Lemma (Lemma 5) gives us that:
" r #
2 ln(2/η)
Pr |qj ∗ (Π) (S) − qj ∗ (Π) (P)| ≥ (e − 1) +
S∼P n , Π∼I(S) n
" r #
0 2 ln(2/η)
= Pr |qj ∗ (Π) (S ) − qj ∗ (Π) (P)| ≥ (e − 1) +
S∼P n , Π∼I(S), S 0 ∼QΠ n
" " r ##
0 2 ln(2/η)
= E Pr |qj ∗ (Π) (S ) − qj ∗ (Π) (P)| ≥ (e − 1) +
S∼P n , Π∼I(S) S 0 ∼QΠ n
≤η

Because Lemma 22 guarantees us that for every π,

" r #
2 ln(2/η)
Pr |qj ∗ (π) (S 0 ) − qj ∗ (π) (P)| ≥ (e − 1) + ≤ η.
S 0 ∼Qπ n

The theorem then follows from a union bound. J

ITCS 2020

Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Gaussian Differential Privacy
No ratings yet
Gaussian Differential Privacy
86 pages
programmingdp
No ratings yet
programmingdp
124 pages
Advanced Mathematical Applications in Data Science
From Everand
Advanced Mathematical Applications in Data Science
Biswadip Basu Mallik
No ratings yet
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
From Everand
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
Wiley
No ratings yet
Luận Văn Nghiên Cứu Xây Dựng Một Số Giải Pháp Đảm Bảo an Toàn Thông Tin Trong Quá Trình Khai Phá Dữ Liệu
No ratings yet
Luận Văn Nghiên Cứu Xây Dựng Một Số Giải Pháp Đảm Bảo an Toàn Thông Tin Trong Quá Trình Khai Phá Dữ Liệu
16 pages
popets-2021-0040
No ratings yet
popets-2021-0040
20 pages
2302.11775v1
No ratings yet
2302.11775v1
19 pages
Thesis
No ratings yet
Thesis
151 pages
Privacy Book
No ratings yet
Privacy Book
281 pages
A Refreshment Stirred, Not Shaken (III): Can Swapping Be Differentially Private?
No ratings yet
A Refreshment Stirred, Not Shaken (III): Can Swapping Be Differentially Private?
27 pages
Differential-Privacy - Copy
No ratings yet
Differential-Privacy - Copy
40 pages
Sample-Efficient Private Learning of Mixtures of Gaussians
No ratings yet
Sample-Efficient Private Learning of Mixtures of Gaussians
52 pages
AIM_ an Adaptive and Iterative Mechanism for Differentially Private Synthetic Data
No ratings yet
AIM_ an Adaptive and Iterative Mechanism for Differentially Private Synthetic Data
20 pages
Waye Lucas
No ratings yet
Waye Lucas
8 pages
2106.05964
No ratings yet
2106.05964
72 pages
ppt1
No ratings yet
ppt1
24 pages
DIFFERENTIALLY PRIVATE INFERENCE VIA NOISY OPTIMIZATION
No ratings yet
DIFFERENTIALLY PRIVATE INFERENCE VIA NOISY OPTIMIZATION
26 pages
Fast John Ellipsoid Computation With Differential
No ratings yet
Fast John Ellipsoid Computation With Differential
49 pages
DP-TBART
No ratings yet
DP-TBART
15 pages
Optimizing Noise For - Differential Privacy Via Anti-Concentration and Stochastic Dominance
No ratings yet
Optimizing Noise For - Differential Privacy Via Anti-Concentration and Stochastic Dominance
32 pages
SoK: A Review of Differentially Private Linear
No ratings yet
SoK: A Review of Differentially Private Linear
21 pages
Research Paper 3
No ratings yet
Research Paper 3
20 pages
Kamath 等 - 2019 - Differentially Private Algorithms for Learning Mix
No ratings yet
Kamath 等 - 2019 - Differentially Private Algorithms for Learning Mix
62 pages
Enhancing protection in high-dimensional data_ Distributed differential privacy with feature selection
No ratings yet
Enhancing protection in high-dimensional data_ Distributed differential privacy with feature selection
20 pages
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
No ratings yet
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
23 pages
The Algorithmic Foundations of Differential Privacy
No ratings yet
The Algorithmic Foundations of Differential Privacy
281 pages
Data_Privacy_Preservation_Using_Differential_Privacy_and_Re-Identification_Attacks
No ratings yet
Data_Privacy_Preservation_Using_Differential_Privacy_and_Re-Identification_Attacks
6 pages
Distributed DP in Mixnets
No ratings yet
Distributed DP in Mixnets
38 pages
2.1 Differential Privacy
No ratings yet
2.1 Differential Privacy
12 pages
Privately Learning Markov Random Fields: Huanyu Zhang Gautam Kamath Janardhan Kulkarni Zhiwei Steven Wu August 17, 2020
No ratings yet
Privately Learning Markov Random Fields: Huanyu Zhang Gautam Kamath Janardhan Kulkarni Zhiwei Steven Wu August 17, 2020
29 pages
Differentially Private Depth Functions and Their Associated Medians
No ratings yet
Differentially Private Depth Functions and Their Associated Medians
22 pages
Mixtures of Gaussians Are Privately Learnable With A Polynomial Number of Samples
No ratings yet
Mixtures of Gaussians Are Privately Learnable With A Polynomial Number of Samples
28 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
PHD 2011 01
No ratings yet
PHD 2011 01
127 pages
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
No ratings yet
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
23 pages
Bayesian Differential Privacy for Linear Dynamic System
No ratings yet
Bayesian Differential Privacy for Linear Dynamic System
6 pages
Adjacent Initial States Based Differential Privacy F - 2024 - Expert Systems Wit
No ratings yet
Adjacent Initial States Based Differential Privacy F - 2024 - Expert Systems Wit
12 pages
Sanskrut Lesson 1 To 9 Month 11
No ratings yet
Sanskrut Lesson 1 To 9 Month 11
61 pages
The Promise of Differential Privacy: Cynthia Dwork, Microsoft Research
No ratings yet
The Promise of Differential Privacy: Cynthia Dwork, Microsoft Research
50 pages
Lvilhuber,+Journal+Manager,+Fulltext
No ratings yet
Lvilhuber,+Journal+Manager,+Fulltext
36 pages
A Statistical Framework For Differential Privacy
No ratings yet
A Statistical Framework For Differential Privacy
16 pages
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
100% (1)
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
9 pages
Differential Privacy: On The Trade-Off Between Utility and Information Leakage
No ratings yet
Differential Privacy: On The Trade-Off Between Utility and Information Leakage
26 pages
Paper 16
No ratings yet
Paper 16
4 pages
Q2 and 4
No ratings yet
Q2 and 4
4 pages
Fod New Paper 1
No ratings yet
Fod New Paper 1
4 pages
Introduction To Differential Privacy
No ratings yet
Introduction To Differential Privacy
11 pages
Aistats 23 Rebuttal
No ratings yet
Aistats 23 Rebuttal
3 pages
CERIAS Presentation PDF
No ratings yet
CERIAS Presentation PDF
17 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
Accuracy-Constrained Privacy-Preserving Access Control Mechanism For Relational Data
No ratings yet
Accuracy-Constrained Privacy-Preserving Access Control Mechanism For Relational Data
8 pages
Preserving Statistical Validity in Adaptive Data Analysis
No ratings yet
Preserving Statistical Validity in Adaptive Data Analysis
29 pages
Unit - 3 Notes Python
No ratings yet
Unit - 3 Notes Python
37 pages
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
No ratings yet
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
8 pages
Differentially Private Significance Tests For Regression Coefficients
No ratings yet
Differentially Private Significance Tests For Regression Coefficients
15 pages
Lecture Notes
100% (1)
Lecture Notes
324 pages
Final year bba-ca
No ratings yet
Final year bba-ca
10 pages
Differentially Private Decision Trees
No ratings yet
Differentially Private Decision Trees
5 pages
Differential Privacy
No ratings yet
Differential Privacy
56 pages
Differential Privacy
No ratings yet
Differential Privacy
12 pages
JD05 - Loops Presentation Slides
No ratings yet
JD05 - Loops Presentation Slides
63 pages
242 Digit DP
No ratings yet
242 Digit DP
60 pages
220244-1020233-Python-winter2022
No ratings yet
220244-1020233-Python-winter2022
22 pages
Updated B.Tech 5th & 6th Sem Structure
No ratings yet
Updated B.Tech 5th & 6th Sem Structure
28 pages
Set Max Transition$
No ratings yet
Set Max Transition$
4 pages
Transaction and Concurrency Control _ DPP 02
No ratings yet
Transaction and Concurrency Control _ DPP 02
4 pages
234bead38f16d7ca571e0af3dfca5504
No ratings yet
234bead38f16d7ca571e0af3dfca5504
14 pages
CADA Module 1
No ratings yet
CADA Module 1
39 pages
214.B.sc Computer Science 23-24
No ratings yet
214.B.sc Computer Science 23-24
23 pages
100+ Python Developer Interview Questions
No ratings yet
100+ Python Developer Interview Questions
51 pages
Bca 2 Sem Test Paper
No ratings yet
Bca 2 Sem Test Paper
11 pages
Subject: Artificial Intelligence 5. Planning: Faculty Name: Anita Patil Mrs. Jyoti Joshi
No ratings yet
Subject: Artificial Intelligence 5. Planning: Faculty Name: Anita Patil Mrs. Jyoti Joshi
49 pages
Quick Sort Algorithm
No ratings yet
Quick Sort Algorithm
11 pages
Codsoft Assignment
No ratings yet
Codsoft Assignment
5 pages
I - AI & DS_Python
No ratings yet
I - AI & DS_Python
102 pages
Cbse Questions Paper Solution - 2021-22: Subject: Computer Science Term-Ii
No ratings yet
Cbse Questions Paper Solution - 2021-22: Subject: Computer Science Term-Ii
5 pages
Laboratory Certificate: in Christ Academy, Bangalore
No ratings yet
Laboratory Certificate: in Christ Academy, Bangalore
16 pages
Daa Tut 6 Sudhanshu Raut: Pseudo Code For KMP Algorithm
No ratings yet
Daa Tut 6 Sudhanshu Raut: Pseudo Code For KMP Algorithm
11 pages
Understanding Deep Learning DNN RNN LSTM CNN and R-CNN
No ratings yet
Understanding Deep Learning DNN RNN LSTM CNN and R-CNN
6 pages
Awk Programming
No ratings yet
Awk Programming
5 pages
B.tech 2 1 Computer Science Engineering R20 Course Structure Syllabi
No ratings yet
B.tech 2 1 Computer Science Engineering R20 Course Structure Syllabi
22 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Result 4th Sem 2021
No ratings yet
Result 4th Sem 2021
13 pages
Bézier Curves: Step by Step Solution
No ratings yet
Bézier Curves: Step by Step Solution
4 pages
Java Cheatsheet - CodeWithHarry
No ratings yet
Java Cheatsheet - CodeWithHarry
17 pages
CHP 1 Test Ferderal Board 10th Class
No ratings yet
CHP 1 Test Ferderal Board 10th Class
2 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Winter Holiday Homework (CS)
No ratings yet
Winter Holiday Homework (CS)
3 pages
4.userid Generation Question and Answers
100% (1)
4.userid Generation Question and Answers
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DP Generalization Roth

Uploaded by

DP Generalization Roth

Uploaded by

A New Analysis of Differential Privacy’s

Digital Object Identifier 10.4230/LIPIcs.ITCS.2020.31

Related Version arXiv version available at https://arxiv.org/abs/1909.03577.

Funding Christopher Jung: Supported in part by NSF grant AF-1763307.

Seth Neel: Supported in part by an NSF Graduate Research Fellowship.

1.1 Proof Techniques

Consider an unknown data distribution P over a data-domain X , and a dataset S ∼ P n

a bound on out-of-sample accuracy via the triangle inequality.

1.2 Further Related Work

We will be interested in estimating the expectations of linear queries over P. Abusing

q(D) = E [q(S)] = E [q(Si )]

Algorithm 1 Interact(M, A; S): An Analyst Interacting with a Statistical Estimator to

We will be interested in interactions I that satisfy differential privacy.

3 An Elementary Proof of the Transfer Theorem

I Lemma 5 (Resampling Lemma). Let E ⊆ X n × Π be any event. Then:

Proof of Theorem 4. By the triangle inequality:

3.2 A Transfer Theorem for Differential Privacy

(Si , Π) ∈ Bα+ − Pr (Si , Π) ∈ Bα+

(Si , Π) ∈ Bα+ − Pr (Si , Π) ∈ Bα+

= (e − 1) Pr (Si , Π) ∈ Bα+ + δ

This is a contradiction for α ≥ (e − 1) + 2c. J

4 Applications: The Gaussian Mechanism

Above, erfc(x) = 1 − erf(x) is the complementary error function.

i.e. it is (0 , δc )-posterior stable for every A, where 0 = (e − 1 + 4c)n∆.

Πα = π ∈ Π | qj ∗ (π) (Qπ ) − qj ∗ (π) (P n ) > α

and for any z ∈ [0, 2∆] denote

From the definition of differential privacy:

Subtracting ∆ Pr [Π ∈ Πα ] from both sides we get

≤ ((e − 1) Pr [Π ∈ Πα ] + 2δ) n∆ (by Equation (2))

A.2 Minimization Queries

I Remark 18. Note that

So as long as the RHS is bounded, the LHS is bounded too.

qj (S) := Lj (S, θj ) − min Lj (S, θ) and aj := 0

Note several things:

B Details from Section 3.1

C Details from Section 3.2

Proof. This follows from expanding the definitions.

where the inequality follows from the definition of differential privacy. J

D An (even) Simpler and Better Proof for -Differential Privacy

A transfer theorem follows immediately from lemma 22.

Because Lemma 22 guarantees us that for every π,

The theorem then follows from a union bound. J

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

= (e − 1) Pr (Si , Π) ∈ Bα+ + δ

This is a contradiction for α ≥ (e − 1) + 2c. J

i.e. it is (0 , δc )-posterior stable for every A, where 0 = (e − 1 + 4c)n∆.

≤ ((e − 1) Pr [Π ∈ Πα ] + 2δ) n∆ (by Equation (2))

D An (even) Simpler and Better Proof for -Differential Privacy