Toward A Theory of Steganography: Nicholas J. Hopper
Toward A Theory of Steganography: Nicholas J. Hopper
Toward A Theory of Steganography: Nicholas J. Hopper
Nicholas J. Hopper
CMU-CS-04-157
July 2004
Thesis Committee:
Manuel Blum, Chair
Avrim Blum
Michael Reiter
Steven Rudich
David Wagner, U.C. Berkeley
Copyright
c 2004 Nicholas J. Hopper
This material is based upon work partially supported by the National Science Foundation under
Grants CCR-0122581 and CCR-0058982 (The Aladdin Center) and an NSF Graduate Fellowship;
the Army Research Office (ARO) and the Cylab center at Carnegie Mellon University; and a Siebel
Scholarship.
The views and conclusions contained in this document are those of the author and should not be
interpreted as representing the official policies, either expressed or implied, of the NSF, the U.S.
Government or any other entity.
Keywords: Steganography, Cryptography, Provable Security
Abstract
1 Introduction 1
1.1 Cryptography and Provable Security . . . . . . . . . . . . . . . . . . 2
1.2 Previous work on theory of steganography . . . . . . . . . . . . . . . 4
1.3 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Roadmap of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Symmetric-key Steganography 27
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 A Stateful Construction . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 An Alternative Construction . . . . . . . . . . . . . . . . . . . 39
i
3.3 Necessary Conditions for Steganography . . . . . . . . . . . . . . . . 41
3.3.1 Steganography implies one-way functions . . . . . . . . . . . . 42
3.3.2 Sampleable Channels are necessary . . . . . . . . . . . . . . . 44
4 Public-Key Steganography 47
4.1 Public key cryptography . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.1 Pseudorandom Public-Key Encryption . . . . . . . . . . . . . 49
4.1.2 Efficient Probabilistic Encryption . . . . . . . . . . . . . . . . 51
4.2 Public key steganography . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.1 Public-key stegosystems . . . . . . . . . . . . . . . . . . . . . 55
4.2.2 Steganographic Secrecy against Chosen Hiddentext Attack . . 56
4.2.3 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.4 Chosen Hiddentext security . . . . . . . . . . . . . . . . . . . 58
4.3 Steganographic Key Exchange . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
ii
6.3.1 With errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.2 Negligible error rate . . . . . . . . . . . . . . . . . . . . . . . 121
6.3.3 Converging to optimal . . . . . . . . . . . . . . . . . . . . . . 123
6.3.4 Unknown length . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.4 Robust Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4.1 Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4.2 Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Bibliography 165
iii
Chapter 1
Introduction
This dissertation focuses on the problem of steganography: how can two communicat-
ing entities send secret messages over a public channel so that a third party cannot
detect the presence of the secret messages? Notice how the goal of steganography
is different from classical encryption, which seeks to conceal the content of secret
messages: steganography is about hiding the very existence of the secret messages.
Steganographic “protocols” have a long and intriguing history that goes back to
antiquity. There are stories of secret messages written in invisible ink or hidden in love
letters (the first character of each sentence can be used to spell a secret, for instance).
More recently, steganography was used by prisoners, spies and soldiers during World
War II because mail was carefully inspected by both the Allied and Axis governments
at the time [38]. Postal censors crossed out anything that looked like sensitive in-
formation (e.g. long strings of digits), and they prosecuted individuals whose mail
seemed suspicious. In many cases, censors even randomly deleted innocent-looking
sentences or entire paragraphs in order to prevent secret messages from being deliv-
ered. More recently there has been a great deal of interest in digital steganography,
that is, in hiding secret messages in communications between computers.
1
of this, there have been numerous proposals for protocols to hide data in channels
containing pictures [37, 40], video [40, 43, 61], audio [32, 49], and even typeset text
[12]. Many of these protocols are extremely clever and rely heavily on domain-specific
properties of these channels. On the other hand, the literature on steganography also
contains many clever attacks which detect the use of such protocols. In addition, there
is no clear consensus in the literature about what it should mean for a stegosystem
to be secure; this ambiguity makes it unclear whether it is even possible to have a
secure protocol for steganography.
The main goal of this thesis is to rigorously investigate the open question: “under
what conditions do secure protocols for steganography exist?” We will give rigor-
ous cryptographic definitions of steganographic security in multiple settings against
several different types of adversary, and we will demonstrate necessary and sufficient
conditions for security in each setting, by exhibiting protocols which are secure under
these conditions.
The rigorous study of provably secure cryptography was initiated by Shannon [58], who
introduced an information-theoretic definition of security: a cryptosystem is secure if
an adversary who sees the ciphertext - the scrambled message sent by a cryptosystem
- receives no additional information about the plaintext - the unscrambled content.
Unfortunately, Shannon also proved that any cryptosystem which is perfectly secure
requires that if a sender wishes to transmit N bits of plaintext data, the sender and the
receiver must share at least N bits of random, secret data - the key. This limitation
means that only parties who already possess secure channels (for the exchange of
secret keys) can have secure communications.
2
parties who initially share a very small number of secret bits (in the case of public-
key cryptography, zero) to subsequently transmit an essentially unbounded number
of message bits securely.
3
1.2 Previous work on theory of steganography
The scientific study of steganography in the open literature began in 1983 when
Simmons [59] stated the problem in terms of communication in a prison. In his
formulation, two inmates, Alice and Bob, are trying to hatch an escape plan. The
only way they can communicate with each other is through a public channel, which is
carefully monitored by the warden of the prison, Ward. If Ward detects any encrypted
messages or codes, he will throw both Alice and Bob into solitary confinement. The
problem of steganography is, then: how can Alice and Bob cook up an escape plan
by communicating over the public channel in such a way that Ward doesn’t suspect
anything “unusual” is going on.
Anderson and Petitcolas [6] posed many of the open problems resolved in this
thesis. In particular, they pointed out that it was unclear how to prove the security
of a steganographic protocol, and gave an example which is similar to the protocol
we present in Chapter 3. They also asked whether it would be possible to have
steganography without a secret key, which we address in Chapter 4. Finally, they
point out that while it is easy to give a loose upper bound on the rate at which
hidden bits can be embedded in innocent objects, there was no known lower bound.
Since the paper of Anderson and Petitcolas, several works [16, 44, 57, 66] have
addressed information-theoretic definitions of steganography. Cachin’s work [16, 17]
formulates the problem as that of designing an encoding function so that the rela-
tive entropy between stegotexts, which encode hidden information, and independent,
identically distributed samples from some innocent-looking covertext probability dis-
tribution, is small. He gives a construction similar to one we describe in Chapter 3 but
concludes that it is computationally intractable; and another construction which is
provably secure but relies critically on the assumption that all orderings of covertexts
are equally likely. Cachin also points out several flaws in other published information-
theoretic formulations of steganography.
4
N bits of secret key can encode at most N hidden bits. In addition, techniques such
as public-key steganography and robust steganography are information-theoretically
impossible.
A symmetric key stegosystem allows two parties with a shared secret to send hidden
messages undetectably over a public channel. We give cryptographic definitions for
symmetric-key stegosystems and steganographic secrecy against a passive adversary
in terms of indistinguishability from a probabilistic channel process. By giving a
construction which provably satisfies these definitions, we show that the existence
of a one-way function is sufficient for the existence of secure steganography relative
to any channel. We also show that this condition is necessary by demonstrating a
construction of a one-way function from any secure stegosystem.
Public-Key Steganography
Informally, a public-key steganography protocol allows two parties, who have never
met or exchanged a secret, to send hidden messages over a public channel so that
an adversary cannot even detect that these hidden messages are being sent. Un-
like previous settings in which provable security has been applied to steganography,
public-key steganography is information-theoretically impossible. We introduce com-
putational security conditions for public-key steganography similar to those for the
symmetric-key setting, and give the first protocols for public-key steganography and
5
steganographic key exchange that are provably secure under standard cryptographic
assumptions.
6
Covert Computation
At a higher level, the technical contributions of this thesis suggest a powerful design
methodology for steganographic security goals. This methodology stems from the
observation that the uniform channel is universal for steganography: we give a trans-
formation from an arbitrary protocol which produces messages indistinguishable from
uniform random bits (given an adversary’s view) into a protocol which produces mes-
sages indistinguishable from an arbitrary channel distribution (given the adversary’s
view). Thus, in order to hide information from an adversary in a given channel, it is
sufficient to design a protocol which hides the information among pseudorandom bits
and apply our transformation. Examples of this methodology appear in Chapters 3,
4, 5, and 7; and the explicit transformation for a general task along with a proof of
its security is given in chapter 7, Theorem 7.5.
7
1.4 Roadmap of the thesis
Chapter 2 establishes the results and notation we will use from cryptography, and
describes our model of innocent communication. Chapter 3 discusses our results on
symmetric-key steganography and relies heavily on the material in Chapter 2. Chap-
ter 4 discusses our results on public-key steganography, and can be read independently
of chapter 3. Chapter 5 considers active attacks against stegosystems; section 5.1 de-
pends on material in Chapters 2 and 3, while the remaining sections also require some
familiarity with the material in Chapter 4. Chapter 6 discusses the rate of a stegosys-
tem, and depends on materials in Chapter 3, while the final section also requires
material from section 5.1. Finally, in Chapter 7 we extend steganography from the
concept of hidden communication to hidden computation. Chapter 7 depends only
on the material in chapter 2. Finally, in Chapter 8 we suggest directions for future
research.
8
Chapter 2
In this chapter we will introduce the notation and concepts from cryptography and
information theory that our results will use. The reader interested in a more general
treatment of the relationships between the various notions presented here is referred
to the works of Goldreich [25] and Goldwasser and Bellare [30].
2.1 Notation
We will often make use of Oracle PTMs (OPTM). An OPTM is a PTM with two
additional tapes: a “query” tape and a “response” tape; and two corresponding states
Qquery , Qresponse . An OPTM runs with respect to some oracle O, and when it enters
state Qquery with value y on its query tape, it goes in one step to state Qresponse , with
x ← O(y) written to its “response” tape. If O is a probabilistic oracle, then AO (y) is
a probability distribution on outputs taken over both the random tape of A and the
9
probability distribution on O’s responses.
We denote the length of a string or sequence s by |s|. We denote the empty string
or sequence by ε. The concatenation of string s1 and string s2 will be denoted by
s1 ks2 , and when we write “Parse s as s1 kt1 s2 kt2 · · · ktl−1 sl ” we mean to separate s into
strings s1 , . . . sl where each |si | = ti and s = s1 ks2 k · · · ksl . We will assume the use of
efficient and unambiguous pairing and unpairing operators on strings, so that (s1 , s2 )
may be uniquely interpreted as the pairing of s1 with s2 , and is not the same as s1 ks2 .
One example of such an operation is to encode (s1 , s2 ) by a prefix-free encoding of
|s1 |, followed by s1 , followed by a prefix-free encoding of |s2 | and then s2 . Unpairing
then reads |s1 |, reads that many bits from the input into s1 , and repeats the process
for s2 .
We will let Uk denote the uniform distribution on {0, 1}k . If X is a finite set, we
will denote by x ← X the action of uniformly choosing x from X. We denote by
U (L, l) the uniform distribution on functions f : {0, 1}L → {0, 1}l . For a probability
distribution D, we denote the support of D by [D]. For an integer n, we let [n] denote
the set {1, 2, . . . , n}.
Modern cryptography makes use of reductions to prove the security of protocols; that
is, to show that a protocol P is secure, we show how an attacker violating the security
of P can be used to solve a problem Q which is believed to be intractable. Since
solving Q is believed to be intractable, it then follows that violating the security of P
is also intractable. In this section, we will give examples from the theory of symmetric
cryptography to illustrate this approach, and introduce the notation to be used in
the rest of the dissertation.
Let X = {Xk }k∈N and Y = {Yk }k∈N denote two sequences of probability distributions
such that [Xk ] = [Yk ] for all k. Many cryptographic questions address the issue of
10
distinguishing between samples from X and samples from Y. For example, the dis-
tribution X could denote the possible encryptions of the message “Attack at Dawn”
while Y denotes the possible encryptions of “Retreat at Dawn;” a cryptanalyst would
like to distinguish between these distributions as accurately as possible, while a cryp-
tographer would like to show that they are hard to tell apart. To address this concept,
cryptographers have developed several notions of indistinguishability. The simplest
is the statistical distance:
Definition 2.1. (Statistical Distance) Define the statistical distance between X and
Y by
1 X
∆k (X , Y) = |Pr[Xk = x] − Pr[Yk = x]| .
2
x∈[Xk ]
On the other hand, it could be the case that ∆(X, Y ) is large but X and Y are
still difficult to distinguish by some methods. For example, if Xk is the distribution
on k-bit even-parity strings starting with 0 and Yk is the distribution on k-bit even-
parity strings starting with 1, then an algorithm which attempts to distinguish X and
Y based on the parity of its input will fail, even though ∆(X, Y ) = 1. To address
this situation, we define the advantage of a program:
AdvX ,Y
A (k) = | Pr[A(Xk ) = 1] − Pr[A(Yk ) = 1] | .
P
Thus in the previous example, for any program A that considers only i si mod 2,
it will be the case that AdvX ,Y
A (k) = 0.
While the class of adversaries who consider only the parity of a string is not very
interesting, we may consider more interesting classes: for example, the class of all
adversaries with running time bounded by t(k).
11
and we say that Xk and Yk are (t, ) indistinguishable if InSecX ,Y (t, k) ≤ .
If we are interested in the case that t(k) is bounded by some polynomial in k, then
we say that X and Y are computationally indistinguishable, written X ≈ Y, if for
X ,Y
every A ∈ T IM E(poly(k)), there is a negligible function ν such that AdvA (k) ≤
ν(k). (A function ν : N → (0, 1) is said to be negligible if for every c > 0, for all
sufficiently large n, ν(n) < 1/nc .)
We will make use, several times, of the following (well-known) facts about statis-
tical and computational distance:
∆(A(X), A(Y )) ≤ .
Proof.
1X
∆(A(X), A(Y )) = |Pr[A(X) = x] − Pr[A(Y ) = x]|
2 x
1 X −|r| X
= 2 (Pr[Ar (X) = x] − Pr[Ar (Y ) = x])
2 x r
1 −|r| X X
≤ 2 |Pr[Ar (X) = x] − Pr[Ar (Y ) = x]|
2 r x
1 X
≤ max |Pr[Ar (X) = x] − Pr[Ar (Y ) = x]|
2 r x
1 X X
≤ max |Pr[X = y] − Pr[Y = y]|
2 r x −1
y∈Ar (x)
≤ ∆(X, Y ) .
12
Proof. Let A ∈ T IM E(t) be any program with range {0, 1}. Then we have that
AdvX,Y
A (k) = | Pr[A(X) = 1] − Pr[A(Y ) = 1]|
Proof. The proof uses a “hybrid” argument. Consider any A ∈ T IM E(t); we wish
m ,Y m
to bound AdvX
A (k). To do so, we define a sequence of hybrid distributions
Z0 , . . . , Zm , where Z0 = X m , Zm = Y m , and Zi = (Y i , X m−i ). We will consider the
“experiment” of using A to distinguish Zi from Zi+1 .
Now notice that for each i, there is a program Bi which distinguishes X from Y with
the same advantage as A has in distinguishing Zi−1 from Zi : on input S, Bi draws
13
i − 1 samples from Y , m − i samples from X, and runs A with input (Y i−1 , S, X m−i ).
If S ← X, then Pr[Bi (S) = 1] = Pr[A(Zi−1 ) = 1], because the first i − 1 samples in
A’s input will be from Y , and the remaining samples will be from X. On the other
hand, if S ← Y , then Pr[Bi (S) = 1] = Pr[A(Zi ) = 1], because the first i samples in
A’s input will be from Y . So we have:
AdvX,Y
Bi (k) = | Pr[Bi (X) = 1] − Pr[Bi (Y ) = 1]|
Now since Bi takes as long as A to run (plus time at most (m − 1)T to draw the
additional samples from X,Y ), it follows that
AdvX,Y
Bi (k) ≤ InSecX,Y (t + (m − 1)T, k) ,
as claimed.
The style of proof we have used for this proposition, in which we attempt to state
as tightly as possible the relationship between the “security” of two related problems
without reference to asymptotic analysis, is referred to in the literature as concrete
security analysis. In this dissertation, we will give concrete security results except in
Chapter 8, in which the concrete analysis would be too cumbersome.
14
2.2.2 Universal Hash Functions
A Universal Hash Family is a family of functions H : {0, 1}l × {0, 1}m → {0, 1}n
where m ≥ n, such that for any x1 6= x2 ∈ {0, 1}m and y1 , y2 ∈ {0, 1}n ,
Universal hash functions are easy to construct for any m, n with l = 2m, by consid-
ering functions of the form ha,b (x) = ax + b, over the field GF (2m ), with truncation
to the least significant n bits. It is easy to see that such a family is universal, because
truncation is regular, and the full-rank system ax1 + b = y1 , ax2 + b = y2 has exactly
one solution over GF (2m ), which is selected with probability 2−2m . We will make use
of universal hash functions to convert distributions with large minimum entropy into
distributions which are indistinguishable from uniform.
Definition 2.7. (Entropy) Let D be a distribution with finite support X. Define the
minimum entropy of D, H∞ (D), as
1
H∞ (D) = min log2 .
x∈X PrD [x]
Define the Shannon entropy of D, HS (D) by
h i
HS (D) = E − log2 Pr[x] .
x←D D
Lemma 2.8. (Leftover Hash Lemma, [33]) Let H : {0, 1}l × {0, 1}m → {0, 1}n be a
universal hash family, and let X : {0, 1}m satisfy H∞ (X) ≥ k. Then
15
the PRG-advantage of A against G by:
Advprg
(k) = Pr[A(G(Uk )) = 1] − Pr[A(Ul(k) ) = 1]
A,G
InSecprg Advprg
G (t, k) = max A,G (k) .
A∈T IM E(t(k))
Let F : {0, 1}k × {0, 1}L → {0, 1}l denote a family of functions. Informally, F is a
pseudorandom function family (PRF) if F and U (L, l) are indistinguishable by oracle
queries. Formally, let A be an oracle probabilistic adversary. Define the prf-advantage
of A over F as
Advprf
F (·) k f k
A,F (k) = Pr [A K
(1 ) = 1] − Pr [A (1 ) = 1] .
K←U (k) f ←U (L,l)
where A(t, q) denotes the set of adversaries taking at most t steps and making at most
q oracle queries. Then Fk is a (t, q, )-pseudorandom function if InSecprf
F (t, q, k) ≤ .
Suppose that l(k) and L(k) are polynomials. A sequence {Fk }k∈N of families Fk :
{0, 1}k × {0, 1}L(k) → {0, 1}l(k) is called pseudorandom if for all polynomially bounded
adversaries A, Advprf
A,F (k) is negligible in k. We will sometimes write Fk (K, ·) as FK (·).
We will make use of the following results relating PRFs and PRGs.
16
Proposition 2.9. Let Fk : {0, 1}k × {0, 1}L(k) → {0, 1}l(k) be a PRF. Let q = d k+1
l(k)
e.
Define Gk : {0, 1}k → {0, 1}k+1 by G(X) = FX (0)kFX (1)k · · · kFX (q − 1). Then
InSecprg prf
G (t, k) ≤ InSecF (t + q, q, k)
If f is an element of F , then the string s is chosen exactly from G(Uk ). In this case,
we have
Pr[B FK (1k ) = 1] = Pr[A(G(Uk )) = 1] .
Advprf
FK k
B,F (k) = Pr[B
(1 ) = 1] − Pr[B f (1k ) = 1]
= |Pr[A(G(Uk )) = 1] − Pr[A(Uk+1 ) = 1]|
= Advprg
A,G (k)
Since B runs in the same time as A plus the time to make q oracle queries, we have
by definition of insecurity that
Advprf prf
B,F (k) ≤ InSecF (t + q, q, k) ,
Advprg prf
A,G (k) ≤ InSecF (t + q, q, k) ,
Intuitively, this proposition states that a pseudorandom function can be used to con-
struct a pseudorandom generator. This is because if we believe that F is pseudoran-
dom, we must believe that InSecprf
F (t, q, k) is small, and therefore that the insecurity
17
Proposition 2.10. ([27], Theorem 3) There exists a function family F G : {0, 1}k ×
{0, 1}k → {0, 1}k such that
InSecprf
FG
(t, q, k) ≤ qkInSecprg
G (t + qkT IM E(G), k) .
2.2.5 Encryption
• E.Generate : 1k → {0, 1}k generates shared keys ∈ {0, 1}k . We will abbreviate
E.Generate(1k ) by G(1k ), when it is clear which encryption scheme is meant.
• E.Encrypt : {0, 1}k × {0, 1}∗ → {0, 1}∗ uses a key to transform a plaintext into
a ciphertext. We will abbreviate E.Encrypt(K, ·) by EK (·).
• E.Decrypt : {0, 1}k × {0, 1}∗ → {0, 1}∗ uses a key to transform a ciphertext into
the corresponding plaintext. We will abbreviate E.Decrypt(K, ·) by DK (·).
Such that for all keys K, E.Decrypt(K, E.Encrypt(K, m)) = m. Informally, we will
say that a cryptosystem is secure if, after viewing encryptions of plaintexts of its
choosing, an adversary cannot distinguish ciphertexts from uniform random strings.
This is slightly different from the more standard notion in which it is assumed that
encryptions of distinct plaintexts are indistinguishable.
• $(·); that is, an oracle which on query m ignores its input and returns a uniformly
selected string of length |EK (m)|.
Let A(t, q, l) be the set of adversaries A which make q(k) queries to the oracle of
at most l(k) bits and run for t(k) time steps. Define the CPA advantage of A against
E as
18
Advcpa
EK k
A,E (k) = Pr[A
(1 ) = 1] − Pr[A$ (1k ) = 1]
where the probabilities are taken over the oracle draws and the randomness of A.
Define the insecurity of E as
InSeccpa Advcpa
E (t, q, l, k) = max A,E (k) .
A∈A(t,q,l)
Then E is (t, q, l, k, )-indistinguishable from random bits under chosen plaintext attack
if InSeccpa
E (t, q, l, k) ≤ . E is called (computationally) indistinguishable from random
bits under chosen plaintext attack (IND$-CPA) if for every PPTM A, Advcpa
A,E (k) is
negligible in k.
Proposition 2.12. Let F : {0, 1}k × {0, 1}k → {0, 1} be a function family. Define
the cryptosystem E F as follows:
• G(1k ) ← Uk .
• DK (ckx1 · · · xl ) = FK (c + 1) ⊕ x1 k · · · kFK (c + l) ⊕ xl .
Then
ql
InSeccpa
EF
(t, q, l, k) ≤ InSecprf
F (t + 2l, l, k) + .
2k−1
Advprf prf
B,F (k) ≤ InSecF (t + 2l, l, k) ,
19
which will yield the result.
B’s strategy is to play the part of the encryption oracle in A’s chosen-plaintext
attack game. Thus, B will run A, and whenever A makes an encryption query, B
will produce a response using its function oracle, which it will pass back to A. At the
conclusion of the chosen-plaintext game, A produces an output bit, which B will use
for its output. It remains to describe how B will respond to A’s encryption queries. B
will do so by executing the encryption program EK from above, but using its function
oracle in place of FK . Thus, on a query m1 · · · ml , B f will choose a c ← Uk , and give
A the response ckf (c + 1) ⊕ m1 k · · · kf (c + l) ⊕ ml .
Let us bound the advantage of B. In case B’s oracle is chosen from FK , B will
perfectly simulate an encryption oracle to A. Thus
Now suppose that B’s oracle is a uniformly chosen function, and let NC denote the
event that B does not query its oracle more than once on any input, and let C denote
the complement of NC - that is, the event that B queries its oracle at least twice on
at least one input. Conditioned on NC, every bit that B returns to A is uniformly
chosen, for a uniform choice of f , subject to the condition that none of the leading
values overlap, an event we will denote by N$, and which has identical probability to
NC. In this case B perfectly simulates a random-bit oracle to A, giving us
Advprf
B,F (k) = Pr[B
FK k
(1 ) = 1] − Pr[B f (1k ) = 1]
= Pr[AEK (1k ) = 1] − Pr[B f (1k ) = 1|NC] Pr[NC]
+ Pr[B f (1k ) = 1|C] Pr[C]
where we assume without loss of generality that Pr[AEK (1k ) = 1] ≥ Pr[A$ (1k ) = 1].
To finish the proof, we need only to bound Pr[C].
20
To bound the probability of the event C, let us further subdivide this event. During
the attack game, A will make q queries that B must answer, so that B chooses q k-bit
values c1 , . . . , cq to encrypt messages of length l1 , . . . , lq ; Let us denote by NCi the
event that after the ith encryption query made by A, B has not made any duplicate
queries to its function oracle f ; and let Ci denote the complement of NCi . We will
show that P
ili + j<i lj
Pr[Ci |NCi−1 ] ≤ ,
2k
and therefore we will have
Pr[C] = Pr[Cq ]
≤ Pr[Cq |NCq−1 ] + Pr[Cq−1 ]
q
X
≤ Pr[Ci |NCi−1 ]
i=1
q
!
1 X X
≤ k ili + lj
2 i=1 j<i
q
!
1 X
≤ k ili + ql
2 i=1
q
!
1 X
≤ k q li + ql
2 i=1
2ql
=
2k
Which establishes the desired bound, given the bound on Pr[Ci |NCi−1 ]. To establish
this conditional bound, fix any choice of the values c1 , . . . , ci−1 . The value ci will
cause a duplicate input to f if there is some cj such that cj − li ≤ ci ≤ cj + lj , which
happens with probability (li + lj )/2k , since ci is chosen uniformly. Thus by the union
bound, we have that
X
Pr[Ci |NCi−1 ] ≤ 2−k (li + lj )
j<i
21
2.3 Modeling Communication - Channels
As an example, if Alice and Bob are communicating over a computer network, they
might run the TCP protocol, in which case they communicate by sending “packets”
according to a format which specifies fields like a source and destination address,
packet length, and sequence number.
Once we have specified what kinds of strings Alice and Bob send to each other,
we also need to specify the probability that Ward will assign to each document. The
simplest notion might be to model the innocent communications between Alice and
Bob by a stationary distribution: each time Alice communicates with Bob, she makes
an independent draw from a probability distribution C and sends it to Bob. Notice
that in this model, all orderings of the messages output by Alice are equally likely.
This does not match well with our intuition about real-world communications; if we
continue the TCP analogy, we notice, for example, that in an ordered list of packets
sent from Alice to Bob, each packet should have a sequence number which is one
greater than the previous; Ward would become very suspicious if Alice sent all of the
odd-numbered packets first, and then all of the even.
Thus, we will use a notion of a channel which models a prior distribution on the
entire sequence of communication from one party to another:
Any particular sequence in the support of a channel describes one possible outcome
of all communications from Alice to Bob - the list of all packets that Alice’s computer
sends to Bob’s. The process of drawing from the channel, which results in a sequence
of documents, is equivalent to a process that repeatedly draws a single “next” docu-
ment from a distribution consistent with the history of already drawn documents - for
22
example, drawing only packets which have a sequence number that is one greater than
the sequence number of the previous packet. Therefore, we can think of communica-
tion as a series of these partial draws from the channel distribution, conditioned on
what has been drawn so far. Notice that this notion of a channel is more general than
the typical setting in which every symbol is drawn independently according to some
fixed distribution: our channel explicitly models the dependence between symbols
common in typical real-world communications.
Informativeness
We will require that a channel satisfy a minimum entropy constraint for all histories.
Specifically, we require that there exist constants L > 0, β > 0, α > 0 such that for all
h ∈ DL , either PrC [h] = 0 or H∞ (Chβ ) ≥ α. If a channel does not satisfy this property,
then it is possible for Alice to drive the information content of her communications
to 0, so this is a reasonable requirement. We say that a channel satisfying this
condition is (L, α, β)-informative, and if a channel is (L, α, β)-informative for all L >
0, we say it is (α, β)-always informative, or simply always informative. Note that
this definition implies an additive-like property of minimum entropy for marginal
23
distributions, specifically, H∞ (Chlβ ) ≥ lα . For ease of exposition, we will assume
channels are always informative in the remainder of this dissertation; however, our
theorems easily extend to situations in which a channel is L-informative. The only
complication in this situation is that there will be a bound in terms of (L, α, β) on
the number of bits of secret message which can be hidden before the channel runs out
of information.
Channel Access
In a multiparty setting, each ordered pair of parties (P, Q) will have their own channel
distribution CP →Q . To demonstrate that it is feasible to construct secure protocols
for steganography, we will assume that party A has oracle access to marginal channel
distributions CA→B,h for every other party B and history h. This is reasonable, because
if Alice can communicate innocently with Bob at all, she must be able to draw from
this distribution; thus we are only requiring that when using steganography, Alice
can “pretend” she is communicating innocently.
On the other hand, we will assume that the adversary, Ward, knows as much as
possible about the distribution on innocent communications. Thus he will be allowed
oracle access to marginal channel distributions CP →Q,h for every pair P, Q and every
history h. In addition, the adversary may be allowed access to an oracle which on
input (d, h, l) ∈ D∗ , returns an l-bit representation of PrCh [d].
These assumptions allow the adversary to learn as much as possible about any
channel distribution but do not require any legitimate participant to know the dis-
tribution on communications from any other participant. We will, however, assume
that each party knows (a summary of) the history of communications it has sent and
received from every other participant; thus Bob must remember some details about
the entire sequence of packets Alice sends to him.
24
Etc. . .
We will also assume that cryptographic primitives remain secure with respect to
oracles which draw from the marginal channel distributions CA→B,h . Thus channels
which can be used to solve the hard problems that standard primitives are based on
must be ruled out. In practice this is of little concern, since the existence of such
channels would have previously led to the conclusion that the primitive in question
was insecure.
Notice that the set of documents need not be literally interpreted as a set of
bitstrings to be sent over a network. In general, documents could encode any kind of
information, including things like actions – such as accessing a hard drive, or changing
1
the color of a pixel – and times – such as pausing an extra 2
second between words
of a speech. In the single-party case, our theory is general enough to deal with these
situations without any special treatment.
Messages are still drawn from a set D of documents. For simplicity we assume
that time proceeds in discrete timesteps. Each party P ∈ {P0 , P1 } maintains a history
hP , which represents a timestep-ordered list of all documents sent and received by P .
We call the set of well-formed histories H. We associate to each party P a family of
25
probability distributions C P = ChP h∈H on D.
We assume that party P can draw from ChP for any history h, and that the adver-
sary can draw from ChP for every party P and history h. We assume that the ability to
draw from these distributions does not contradict the cryptographic assumptions that
our results are based on. In the rest of the dissertation, all interactive communica-
tions will be assumed to conform to the bidirectional channel structure: parties only
communicate by sending documents from D to each other and parties not running a
protocol communicate according to the distributions specified by B. Parties running
a protocol strive to communicate using sequences of documents that appear to come
from B. As a convention, when B is compared to another random variable, we mean
a random variable which draws from the process B the same number of documents
as the variable we are comparing it to.
26
Chapter 3
Symmetric-key Steganography
Symmetric-key steganography is the most basic setting for steganography: Alice and
Bob possess a shared secret key and would like to use it to exchange hidden messages
over a public channel so that Ward cannot detect the presence of these messages.
Despite the apparent simplicity of this scenario, there has been little work on giving
a precise formulation of steganographic security. Our goal is to give such a formal
description.
In Section 3.1, we give definitions dealing with the correctness and security of
symmetric-key steganography. Then we show in Section 3.2 that these notions are
feasible by giving constructions which satisfy them, under the assumption that pseu-
dorandom function families exist. Finally, in section 3.3, we explore the necessary
conditions for the existence of secure symmetric-key steganography.
3.1 Definitions
We will first define a stegosystem in terms of syntax and correctness, and then proceed
to a security definition.
27
{0, 1}∗ (the hiddentext), and a message history h.
3.1.1 Correctness
where the randomization is over the key K and any coin tosses of SE, SD, and the
oracles accessed by SE,SD.
28
3.1.2 Security
Intuitively, what we would like to require is that no efficient warden can distinguish
between stegotexts output by SE and covertexts drawn from the channel distribution
Ch . As we stated in Section 2.3, we will assume that W knows the distribution Ch ;
we will also allow W to know the algorithms involved in S as well as the history h of
Alice’s communications to Bob. In addition, we will allow W to pick the hiddentexts
that Alice will hide, if she is in fact producing stegotexts. Thus, W ’s only uncertainty
is about the key K and the single bit denoting whether Alice’s outputs are stegotexts
or covertexts.
1. ST: The oracle ST has a uniformly chosen key K ← Uk and responds to queries
(m, h) with a StegoText drawn from SE(K, m, h).
2. CT: The oracle CT has a uniformly chosen K as well, and responds to queries
(m, h) with a CoverText of length ` = |SE(K, m, h)| drawn from Ch` .
W M (1k ) outputs a bit which represents its guess about the type of M .
Advss
ST k CT k
S,C,W (k) = Pr[W (1 ) = 1] − Pr[W (1 ) = 1] ,
where the probability is taken over the randomness of ST, CT, and W .
InSecss Advss
S,C (t, q, l, k) = max S,C,W (k) ,
W ∈W(t,q,l)
where W(t, q, l) denotes the set of all adversaries which make at most q(k) queries
totaling at most l(k) bits (of hiddentext) and running in time at most t(k).
29
Definition 3.4. (Steganographic secrecy) A Stegosystem Sk is called (t, q, l, ) stegano-
graphically secret against chosen hiddentext attack for the channel C ((t, q, l, )-SS-
CHA-C) if InSecss
S,C (t, q, l, k) ≤ .
3.2 Constructions
For our feasibility results, we have taken the approach of assuming a channel which can
be drawn from freely by the stegosystem; most current proposals for stegosystems act
on a single sample from the channel (one exception is [16]). While it may be possible
to define a stegosystem which is steganographically secret or robust and works in this
style, this is equivalent to a system in our model which merely makes a single draw on
the channel distribution. Further, we believe that the lack of reference to the channel
distribution may be one of the reasons for the failure of many such proposals in the
literature.
It is also worth noting that we assume that a stegosystem has very little knowledge
of the channel distribution — SE may only sample from an oracle according to the
distribution. This is because in many cases the full distribution of the channel has
never been characterized; for example, the oracle may be a human being, or a video
camera focused on some complex scene. However, our definitions do not rule out
encoding procedures which have more detailed knowledge of the channel distribution.
Sampling from Ch might not be trivial. In some cases the oracle for Ch might be a
human, and in others a simple randomized program. We stress that it is important to
minimize the use of such an oracle, because oracle queries can be extremely expensive.
30
In practice, this oracle is also the weakest point of all our constructions. We assume
the existence of a perfect oracle: one that can perform independent draws, one that
can be rewound, etc. This assumption can be justified in some cases, but not in
others. If the oracle is a human, the human may not be able to perform independent
draws from the channel as is required by our constructions. A real world Warden
would use this to his advantage. We therefore stress the following cautionary remark:
our protocols will be shown to be secure under the assumption that the channel oracle
is perfect.
Setup: We assume Alice and Bob share a channel and let C denote the channel
distribution. We write d ← Ch to denote the action of sampling d from the marginal
distribution Ch (via oracle access). We let FK (·, ·) denote a pseudorandom function
family indexed by k = |K| key bits which maps documents to bits, i.e. F : {0, 1}k ×
{0, 1}∗ → {0, 1}. We let Alice and Bob share a secret key K ∈ {0, 1}k and also a
synchronized d bit counter N (which need not be secret). The following procedures
allow Alice and Bob to encode and decode a single bit and to send it via their shared
channel.
The idea behind this construction is simple. The encoding algorithm makes ` copies
of the bit m. For the ith copy, the encoder attempts to find a document d such that
FK (i, d) = m, by drawing d ← Ch . If the encoder fails, it draws a second d0 ← Ch and
31
sends d0 instead of d. The decoder recovers ` bits by applying FK to each bit, and
outputs 1 if the majority of the bits are 1, and 0 otherwise. Intuitively, this works
because we expect each si to map to the bit m with probability about 34 , and so the
probability of a decoding failure should be negligible in `.
Pr[si = d] = Pr[d] .
Chi
Proof. Consider the two documents di , d0i that SE draws in iteration i. It will be
the case that FK (N + i, si ) = m exactly when either FK (N + i, di ) = m, which
happens with probability 21 , or when FK (N + i, di ) = 1 − m and FK (N + i, d0i ) = m,
which happens with probability 1
4
when di 6= d0i , and with probability 0 otherwise.
The theorem applies for any i because the function FK (N + i, ·) is independent of
FK (N + j, ·) for i 6= j when FK is uniformly chosen.
32
Lemma 3.9. Suppose C is (α, β)-always informative and F is a uniformly chosen
function. Then we have
1 1
Pr[FK (N + i, si ) = m] ≥ + (1 − 2−α/β )
i 2 4β
Proof. Because C is (α, β)-informative, for any h and any sequence d1 , . . . , dβ ← Chβ ,
there must be a j between 0 and β − 1 such that H∞ (C(h,ldots,dj ) ) ≥ α/β. If this were
not the case, then we would have h such that H∞ (Chβ ) < α. Thus for a string of
length ` drawn from Ch` , there must be `/β positions i which have H∞ (Chi ) ≥ α/β. In
these positions, the collision probability is at most 2−α/β . In the other positions, the
collision probability is at most 1. Applying the previous lemma yields the result.
1
where γ = 2( 4β (1 − 2−α/β ))2 and TSE is the time required to execute the inner loop of
OneBit.Encode.
A uses its function oracle f to emulate the action of SE encoding a uniformly chosen
bit m under history h, counting the number of documents with f (N + i, si ) = m. If
1
fewer than 2
of the si satisfy f (N + i, si ) = m, A outputs 1, otherwise A outputs 0.
Lemma 3.9 shows that Pr[Af (1k ) = 1] ≤ e−γ` , whereas
So by definition of advantage,
Advprf −γ`
A,F (k) ≥ Pr[SD(K, SE(K, m, h), h) 6= m] − e ,
But A runs in time `TSE and makes 2` function-oracle queries, which proves the
theorem.
33
Extending to multiple-bit messages
For completeness, we now state the obvious extension of the stegosystem OneBit to
multiple-bit hiddentexts. We assume the same setup as previously.
The MultiBit stegosystem works by simply repeatedly invoking OneBit on the indi-
vidual bits of the message m.
1
where γ = 2( 4β (1 − 2−α/β ))2 and TSE is the time required to execute the inner loop of
OneBit.Encode.
Proof. Because each si is generated using a different value of the counter N , each
execution of the inner loop of OneBit.Encode is independent when called with a
uniformly chosen function. Thus when a uniformly chosen function is used, executing
OneBit.Encode |m| times with different bits is the same as using |m| independent
keys, each with failure probability at most e−γ` ; a union bound shows that for a
random function f , Pr[SDf (SE f (m, h, N ), N ) 6= m] ≤ |m|(e−γ` ). To complete the
proof, we apply the same technique as in the proof of Theorem 3.10
We would like to make a security claim about the stegosystem MultiBit, but
because the stegosystem does not fit our syntactic definition, we need a slightly mod-
ified version of the chosen-hiddentext attack game. We will modify the definition of
the oracle distribution ST so that the oracle’s private state will include the value N ,
initialized to 0 and properly incremented between queries. With this modified game
in mind, we can state our theorem about the security of MultiBit:
34
Theorem 3.13. Let k = |K|. For any l ≤ 2d :
prf
InSecss
MultiBit,C (t, q, µ, k) ≤ InSecF (t + `µTSE , 2`µ, k)
Proof. For any warden, W , running in time t and making q queries totaling µ bits,
we construct a corresponding PRF adversary A, where
prf
Advss
MultiBit,C,W (k) = AdvF,A (k)
The running time of A is the running time of warden W plus the time to make `µ
passes through the inner loop of OneBit.Encode, or `µTSE . The number of samples
taken from C is at most 2`µ.
Af simply runs W , playing the role of the oracle M in the chosen-hiddentext attack
game. To respond to W ’s queries, A emulates the encoding procedure MultiBit.Encode
using the function oracle f in place of FK (·); A outputs the same bit as W . We con-
sider the two cases for the oracle f :
• When f is chosen from FK (·, ·), the documents submitted to W are distributed
identically to the output of MultiBit, by the definition of the construction.
Thus
Pr[AFK (1k ) = 1] = Pr[W ST (1k ) = 1] .
Advprf
(k) = Pr[AFK (1k ) = 1] − Pr[Af (1k ) = 1]
F,A
= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
MultiBit,C,W (k)
35
Corollary 3.14. If FK (·, ·) is pseudorandom then MultiBit is universally stegano-
graphically secret against chosen-hiddentext attacks.
in k. The definition of insecurity and Theorem 3.13 imply that for any cover channel,
C, the advantage of a warden will be negligible in k. This, in turn, implies the
corollary.
Having extended our construction to use multiple-bit messages, we can now re-
move the requirement for Alice and Bob to share a synchronized counter N . This
construction will utilize the same setup as the previous constructions, except that
Alice and Bob now share a second key κ ∈ {0, 1}k to a pseudorandom function
G : {0, 1}k × Dk → {0, 1}d/2 .
The NoState stegosystem works by choosing a long sequence from Ch (long enough
that it is unlikely to repeat in the chosen-hiddentext attack game) and uses it to derive
a value N , which is then used as the state for the MultiBit stegosystem. This value
is always a multiple of 2d/2 , so that if the value derived from the long sequence never
repeats, then any messages of length at most 2d/2 will never use a value of N used by
another message.
1
where γ = 2( 4β (1 − 2−α/β ))2 and TSE is the time required to execute the inner loop of
OneBit.Encode.
36
Proof. The theorem follows directly from Theorem 3.12
+ InSecprf
G (t + `µ, q, k)
q(q − 1) −d/2
+ (2 + 2−αk/β )
2
Proof. We reformulate the CT oracle in the chosen-hiddentext attack game so that the
oracle has a key κ ← Uk and evaluates Gκ on the first k documents of its reply (S, T )
to every query. Let NC denote the event that the values Gκ (S1 ), . . . , Gκ (Sq ) are all
distinct during the chosen-hiddentext attack game and let C denote the complement
of NC.
Let W be any adversary in W(t, q, µ), and assume without loss of generality that
Pr[W ST (1k ) = 1] > Pr[W CT (1k ) = 1]. We wish to bound W ’s advantage against the
stegosystem NoState.
Advss
NoState,C,W (k) = Pr[W
ST k
(1 ) = 1] − Pr[W CT (1k ) = 1]
= Pr[W ST (1k ) = 1|NC] Pr[NC] + Pr[W ST (1k ) = 1|C] Pr[C]
+ Pr[C]
≤ Pr[W ST (1k ) = 1|NC] − Pr[W CT (1k ) = 1|NC] + Pr[C]
We will show that for any W we can define an adversary X such that
Advss
ST k
MultiBit,C,X (k) ≥ Pr[W
(1 ) = 1|NC] − Pr[W CT (1k ) = 1|NC] .
37
and likewise that
Advss
ST k
MultiBit,C,X (k) = Pr[W
(1 ) = 1|NC] − Pr[W CT (1k ) = 1|NC] ,
and since X makes as many queries (of the same length) as W and runs in time
t + qTG , we have that
Pr[W ST (1k ) = 1|NC] − Pr[W CT (1k ) = 1|NC] ≤ InSecss
MultiBit,C (t + qTG , q, µ)
≤ InSecprf
F (t + qTG + `µTSE , 2`µ, k)
Consider a game played with the warden W in which a random function f is used
in place of the function Gκ , and let Cf denote the same event as C in the previous
game. Let S1 , . . . , Sq denote the k-document prefixes of the sequences returned by
the oracle in the chosen-hiddentext attack game and let Ni = f (Si ). Then the event
Cf happens when there exist i 6= j such that Ni = Nj , or equivalently f (Si ) = f (Sj );
and this event happens when Si = Sj or Si 6= Sj ∧ f (Si ) = f (Sj ). Thus for a random
f,
_
Pr[Cf ] = Pr[ ((Si = Sj ) ∨ (Si 6= Sj ∧ f (Si ) = f (Sj )))]
i<j<q
X
≤ Pr[Si = Sj ] + Pr[f (Si ) = f (Sj ) ∧ (Si 6= Sj )]
i<j<q
q(q − 1)
Pr[Si = Sj ] + 2−d/2
≤
2
q(q − 1) −αk/β
+ 2−d/2
≤ 2
2
Advprf
G,A (k) ≥ |Pr[C] − Pr[Cf ]| .
38
Pr[Af (1k ) = 1] = Pr[Cf ], which satisfies the claim. So to complete the proof, we
observe that
The following protocol also satisfies our definition for universal steganographic se-
crecy. This protocol (up to small differences) is not new and can be found in [6]; an
information theoretic version of the protocol can also be found in [16].
Let EK (·, ·) and DK (·) denote the encryption and decryption functions for a cryp-
tosystem which is indistinguishable from random bits under chosen plaintext attack
(i.e., IND$-CPA) [54]. Suppose Alice and Bob share a key K ∈ {0, 1}k , and a function
f such that ∆(f (Ch ), U1 ) ≤ for any h. One example of such a function would be
a uniformly chosen element of a universal hash family mapping Dk → {0, 1}; then
when C is (α, β)-informative, we would have ≤ 21−Ω(α/2β) except with negligible
probability. The following procedures allow encoding and decoding of messages in
a manner which is steganographically secret under chosen hiddentext attack for the
channel distribution C.
Construction 3.18. UHash Stegosystem
39
The idea behind this construction is also simple. First, the encoding algorithm
encrypts the hiddentext m to get a ciphertext c = EK (m). Next, is splits the cipher-
text c into single bits. Then it draws documents from the channel until it finds s1
such that f (s1 ) = c1 . After that it finds s2 such that f (s2 ) = c2 , and s3 , and so on.
This continues until all bits of c have been encoded. The decoding algorithm simply
applies f to all the si ’s to get c and then decrypts with DK to recover the plaintext.
Note that the execution time for SD is essentially linear in l and the execution time
for SE is an expected O(l). Also note that our assumption of an “unbiased” func-
tion implies that an error correcting code as in construction 3.15 is unnecessary: the
probability of failure here is at most (1/2 + )|K| . In the following, we will let `(q, l)
denote the total number of bits of ciphertext returned by encrypting q plaintexts of
total length l.
Lemma 3.19. For any warden W ∈ W(t, q, l), we can construct a corresponding
adversary A where
cpa
Advss
UHash,C,W (k) ≤ AdvE,A (k) + `(q, l) .
The running time of A is at most t + O(kl), and A makes q encryption queries for a
total of l bits of plaintext.
Proof. We are given a program W for distinguishing between the output of construc-
tion 3.18 and the ordinary traffic of the channel, and an oracle for sampling blocks
from that channel. We construct a program A which plays the CPA game — distin-
guishing an EK oracle from a uniform $ oracle — with nearly the same advantage as
W . A simply runs W , responding to his queries by executing the encoding procedure
UHash.Encode using its oracle in place of EK . Consider the following two cases:
• O(m) = EK (m). Then the stegotexts output by the encoding procedure will be
identically distributed to stegotexts resulting from the normal use of construc-
tion 3.18.
• O(m) = $(m) is chosen uniformly from strings of appropriate length. Then the
stegotexts output by the encoding procedure will be `(l, q)-statistically close
to samples from Ch . To see that this is so, imagine instead that the ith bit of
40
the ciphertext, ci was chosen so that Pr[ci = 0] = Pr[f (Chi ) = 0]. In this case
the the ith stegotext will come from a distribution identical to Chi . But since
∆(ci , U1 ) ≤ , it must be the case that ∆(si , Chi ) ≤ as well, by proposition 2.4.
Thus A can simply use the decision of W to gain advantage close to that of W .
More formally,
Advcpa
EK k
E,A (k) = Pr[A
(1 ) = 1] − Pr[A$ (1k ) = 1]
= Pr[W ST (1k ) = 1] − Pr[A$ (1k ) = 1]
≥ Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1] + `(q, l)
= Advss
UHash,C,W (k) + `(q, l)
cpa
Theorem 3.20. InSecss
UHash,C (t, q, l, k) ≤ InSecE (t + O(kl), q, l, k) + `(q, l).
Proof. The theorem follows from Lemma 3.19 and the definition of insecurity.
The previous section demonstrates that relative to an oracle for C, the existence of
one-way functions is sufficient for the existence of secure steganography. In this section
we will explore weaker definitions of steganographic secrecy, and establish two results.
First, one-way functions are necessary for steganography; thus, relative to a channel
oracle, the existence of one-way functions and secure steganography are equivalent.
Second, we will show that in the “standard model,” without access to a channel
oracle, the existence of a secure stegosystem implies the existence of a program which
samples from Ch ; and thus in the standard model, secure steganography for C exists
if and only if Ch is efficiently sampleable.
41
3.3.1 Steganography implies one-way functions
To strengthen our result, we develop the weaker notion of security against known-
hiddentext attacks (KHA). In a (l, µ)-KHA attack against distribution D, the adver-
sary is given a history h of length l, a hiddentext drawn from Dµ , and a sequence
of documents s ∈ D|SE(K,m,h)| . The adversary’s task is to decide whether s ← Ch or
s ← SE(K, m, h). We define the KHA-advantage of W by
|SE(K,m,h)|
Advkha-D (k, l, µ) = Pr[W (h, m, SE(K, m, h)) = 1] − Pr[W (h, m, C ) = 1]
S,C,W h
and say that S is secure against known hiddentext attack with respect to D and C (SS-
KHA-D-C) if for every PPT W , for all polynomially-bounded l, µ, Advkha-D
S,C,W (k, l(k),
µ(k)) is negligible in k.
2. f (Uk ) ≈ g(Uk0 )
Theorem 3.22. ([33], Lemma 4.16) If there exists a false entropy generator, then
there exists a pseudorandom generator
42
Proof. We will show how to construct a false entropy generator from S.Encode, which
when combined with Proposition 3.22 will imply the result.
Consider the function f which draws a hiddentext m of length |k|2 from D, and
outputs (SE(K, m, ε), m). Likewise, consider the function g which draws a hiddentext
|SE(K,m,ε)|
m of length |K|2 from D and has the output distribution (Cε , m). Because S
is SS-KHA-D-C secure, it must be the case that f (Uk ) ≈ g(Uk0 ). Thus f and g satisfy
condition (1) from definition 3.21.
|SE(K,m,ε)|
Now, consider HS (Cε ) versus HS (SE(K, m, h)) We must have one of three
cases:
|SE(K,m,ε)|
1. HS (Cε ) > HS (SE(K, m, ε)); in this case the program that samples from
Cε is a false entropy generator and we are done.
|SE(K,m,ε)|
2. HS (Cε ) < HS (SE(K, m, ε)); in this case SE is a false entropy generator,
and again we are done.
|SE(K,m,ε)|
3. HS (Cε ) = HS (SE(K, m, ε)); In this case, we have that
whereas
HS (m|SE(K, m, ε)) ≤ (1 + ν)|K|
for a negligible function ν. To see that this is the case, notice that m =
SD(K, SE(K, m, ε)) and so is determined (up to a negligible probability) by K,
and HS (K) = |K|. Thus asymptotically, we have that HS (f (Uk )) > HS (g(Uk0 )),
and f is a false entropy generator relative to an oracle for C.
Corollary 3.24. Relative to an oracle for C, secure steganography for C exists if and
only if one-way functions exist.
Proof. The corollary follows from Theorem 3.23 and the results of Section 3.2 and [33].
43
3.3.2 Sampleable Channels are necessary
is negligible in k. Notice that for any efficiently sampleable channel C, the results of
the previous sections prove that secure steganography with respect to C exists if and
only if one-way functions exist in the standard model - e.g., without assuming oracle
access to the channel C. Here we will introduce a very weak notion of security with
respect to C and show that if secure steganography exists for C in the standard model,
then C is efficiently sampleable.
A weaker attack yet than the KHA attack is the Known Distribution Attack game:
In a l-KDA attack against distribution D, the adversary is given a history h of length
l, and a sequence of documents s ∈ D|SE(K,D,h)| . The adversary’s task is to decide
whether s ← Ch or s ← SE(K, D, h). We define the KDA-advantage of W by
kda-D
`
AdvS,C,W (k, l) = Pr [W (SE(K, m, h)) = 1] − Pr [W (Ch ) = 1]
h←Cεl ,m←D h←Cεl
and say that S is secure against known distribution attack with respect to D and C
(SS-KDA-D-C) if for every PPT W , for all polynomially-bounded l, Advkda-D
S,C,W (k, l(k))
is negligible in k. This attack is weaker yet than a KHA attack in that the length of
the hiddentext is shorter and the hiddentext is unknown to W .
Theorem 3.25. If there exists an efficiently sampleable D such that there is a SS-
KDA-D-C secure stegosystem S in the standard model, then C is efficiently sampleable.
Proof. Consider the program CS with the following behavior: on input (1k , h), CS picks
K ← {0, 1}k , picks m ← D, and returns the first document of S.Encode(K, m, h).
Consider any PPT distinguisher A. We will that the KDA adversary W which passes
the first document of its input to A and outputs A’s decision has at least the advantage
of A. This is because in case W ’s input is drawn from SE, the input it passes to A
is exactly distributed according to CS (1k , h); and when W ’s input is drawn from Ch ,
44
the input it passes to A is exactly distributed according to Ch :
Advkda-D
S,C,W (k, |h|) = | Pr[W (SE(K, m, h)) = 1] − Pr[W (Ch ) = 1]|
= Pr[A(1k , CS (1k , h)) = 1] − Pr[A(1k , Ch ) = 1] .
45
46
Chapter 4
Public-Key Steganography
The results of the previous chapter assume that the sender and receiver share a secret,
randomly chosen key. In the case that some exchange of key material was possible
before the use of steganography was necessary, this may be a reasonable assumption.
In the more general case, two parties may wish to communicate steganographically,
without prior agreement on a secret key. We call such communication public key
steganography. Whereas previous work has shown that symmetric-key steganography
is possible – though inefficient – in an information-theoretic model, public steganog-
raphy is information-theoretically impossible. Thus our complexity-theoretic formu-
lation of steganographic secrecy is crucial to the security of the constructions in this
chapter.
In Section 4.1 we will introduce some required basic primitives from the theory
of public-key cryptography. In Section 4.2 we will give definitions for public-key
steganography and show how to use the primitives to construct a public-key stegosys-
tem. Finally, in Section 4.3 we introduce the notion of steganographic key exchange
and give a construction which is secure under the Integer Decisional Diffie-Hellman
assumption.
47
4.1 Public key cryptography
Our results build on several well-established cryptographic assumptions from the the-
ory of public-key cryptography. We will briefly review them here, for completeness.
Let P and Q be primes such that Q divides P − 1, let Z∗P be the multiplicative
group of integers modulo P , and let g ∈ Z∗P have order Q. Let A be an adversary
that takes as input three elements of Z∗P and outputs a single bit. Define the DDH
advantage of A over (g, P, Q) as: Advddh a b ab
A (g, P, Q) = | Pra,b [A(g , g , g , g, P, Q) =
Advow
Π,A (k) = Pr [A(π(x)) = x] .
(π,τ )←G(1k ),x←Uk
ow
Define the insecurity of Π by InSecow
Π (t, k) = maxA∈A(t) AdvΠ,A (k) , where A(t)
denotes the set of all adversaries running in time t(k). We say that Π is a trap-
door one-way permutation family if for every probabilistic polynomial-time (PPT) A,
Advow
Π,A (k) is negligible in k.
48
Trapdoor one-way predicates
Advtp
P,A (k) = Pr [A(x, Sp ) = p(x)] .
(p,Sp )←G(1k ),x←Dp
InSectp Advtp
P (t, k) = max P,A (k) ,
A∈A(t)
where A(t) denotes the set of all adversaries running in time t(k). We say that P is a
trapdoor one-way predicate family if for every probabilistic polynomial-time (PPT)
A, Advtp
P,A (k) is negligible in k.
Notice that one way to construct a trapdoor one-way predicate is to utilize the
Goldreich-Levin hard-core bit [28] of a trapdoor one-way permutation. That is, for a
permutation family Π, the associated trapdoor predicate family PΠ works as follows:
the predicate pπ has domain Dom(π) × {0, 1}k , and is defined by p(x, r) = π −1 (x) · r,
where · denotes the vector inner product on GF (2)k . [28] prove that there exist
polynomials such that InSectp ow
Pπ (t, k) ≤ poly(InSecΠ (poly(t), k)).
We will require public-key encryption schemes that are secure in a slightly non-
standard model, which we will denote by IND$-CPA in contrast to the more standard
IND-CPA. The main difference is that security against IND$-CPA requires the output
of the encryption algorithm to be indistinguishable from uniformly chosen random
bits, whereas IND-CPA only requires the output of the encryption algorithm to be
indistinguishable from encryptions of other messages.
49
• E.Generate : 1k → PKk × SKk generates (public, secret) key pairs (P K, SK).
We will abbreviate E.Generate(1k ) by G(1k ), when it is clear which encryption
scheme is meant.
• E.Encrypt : PK × {0, 1}∗ → {0, 1}∗ uses a public key to transform a plaintext
into a ciphertext. We will abbreviate E.Encrypt(P K, ·) by EP K (·).
• E.Decrypt : SK × {0, 1}∗ → {0, 1}∗ uses a secret key to transform a cipher-
text into the corresponding plaintext. We will abbreviate E.Decrypt(SK, ·) by
DSK (·).
Such that for all key pairs (P K, SK) ∈ G(1k ), Decrypt(SK, Encrypt(P K, m)) = m.
To formally define the security condition for a public-key encryption scheme, con-
sider a game in which an adversary A is given a public key drawn from G(1k ) and
chooses a message mA . Then A is given either EP K (mA ) or a uniformly chosen string
of the same length. Let A(t, l) be the set of adversaries A which produce a message
of length at most l(k) bits and run for at most t(k) time steps. Define the IND$-CPA
advantage of A against E as
cpa
AdvE,A (k) = Pr [A(P K, EP K (mA )) = 1] − Pr [A(P K, U|EP K (mA )| ) = 1]
PK PK
(k). E is called indistinguishable from random bits under chosen plaintext attack
(IND$-CPA) if for every probabilistic polynomial-time (PPT) A, Advcpa
E,A (k) is negli-
50
predicates exist if there exist trapdoor one-way permutations on {0, 1}k , for
example.
51
IND$-CPA-ness follows by the pseudorandomness of the bit sequence b1 , . . . , bl
generated by the scheme and the fact that xl is uniformly distributed in {0, 1}k .
RSA-based construction
The RSA function EN,e (x) = xe mod N is believed to be a trapdoor one-way permu-
tation family when N is selected as the product of two large, random primes. The
following construction uses Young and Yung’s Probabilistic Bias Removal Method
(PBRM) [65] to remove the bias incurred by selecting an element from Z∗N rather
than Uk .
The IND$-CPA security of the scheme follows from the correctness of PBRM and the
fact that the least-significant bit is a hardcore bit for RSA. Notice that the expected
number of repeats in the encryption routine is at most 2.
DDH-based construction
Let E(·) (·), D(·) (·) denote the encryption and decryption functions of a private-key
encryption scheme satisfying IND$-CPA, keyed by κ-bit keys, and let κ ≤ k/3. (We
give an example of such a scheme in Chapter 2.) Let Hk be a family of pairwise-
independent hash functions H : {0, 1}k → {0, 1}κ . We let P be a k-bit prime (so
52
2k−1 < P < 2k ), and let P = rQ + 1 where (r, Q) = 1 and Q is also a prime. Let g
generate Z∗P and ĝ = g r mod P generate the unique subgroup of order Q. The security
of the following scheme follows from the Decisional Diffie-Hellman assumption, the
leftover-hash lemma, and the security of (E, D):
The security proof considers two hybrid encryption schemes: H1 replaces the value
(ĝ a )b by a random element of the subgroup of order Q, ĝ c , and H2 replaces K by
a random draw from {0, 1}κ . Clearly distinguishing H2 from random bits requires
distinguishing some EK (m) from random bits. The Leftover Hash Lemma gives that
the statistical distance between H2 and H1 is at most 2−κ . Thus
AdvH1 ,$ cpa
A (k) ≤ InSecE (t, |κ|) + 2
−κ
.
Finally, we show that any distinguisher A for H1 from the output of Encrypt with
advantage can be used to construct a distinguisher B that solves the DDH problem
with advantage at least /2. B takes as input a triple (ĝ x , ĝ y , ĝ z ) and attempts to
decide whether z = xy, as follows. First, B computes r̂ as the least integer such that
rr̂ = 1 mod Q, and then picks β ← Zr . Then B computes s = (ĝ y )r̂ g βQ . If s > 2k−1 ,
B outputs 0. Otherwise, B submits ĝ x to A to get the message mA , draws H ← Hk ,
and outputs the decision of A(ĝ x , HkskEH(ĝz ) (mA )). We claim that:
53
ZP −1 that satisfies these conditions, for every y and β. Thus s is uniformly
chosen.
1
• B halts and outputs 0 with probability at most 2
over input and random choices;
and conditioned on not halting, the value s is uniformly distributed in {0, 1}k .
This is true because 2k /P < 12 , by assumption.
• When z = xy, the input HkskEH(ĝz ) (mA )) is selected exactly according to the
output of Encrypt(ĝ x , mA ). This is because
• When z 6= xy, the input HkskEH(ĝz ) (mA )) is selected exactly according to the
output of H1 , by construction.
Thus,
2k
Pr[B(ĝ x , ĝ y , ĝ xy ) = 1] = Pr[A(ĝ x , Encrypt(ĝ x , mA )) = 1] ,
P
and
2k
Pr[B(ĝ x , ĝ y , ĝ z ) = 1] =
Pr[A(ĝ x , H1 (mA )) = 1] .
P
And thus Advddh 1
B (ĝ, P, Q) ≥ 2 . Thus, we have that overall,
We will first give definitions of public-key stegosystems and security against chosen-
hiddentext attack, and then give a construction of a public-key stegosystem to demon-
strate the feasibility of these notions. The construction is secure assuming the exis-
tence of a public-key IND$-CPA-secure cryptosystem.
54
4.2.1 Public-key stegosystems
As with the symmetric case, we will first define a stegosystem in terms of syntax and
correctness, and then proceed to a security definition.
As in the private key case, we will also require that a public-key stegosystem is
correct:
where the randomization is also over any coin tosses of SE, SD, SG and the oracle
to Ch .
55
Remark. While in general Alice will need to remember the history of documents
transmitted to Bob, it is most desirable if Bob is not required to store the history of
documents he has received from Alice. Some of our protocols require Bob to store
this history, but it is straightforward to rewrite them so that Bob need only remember
a collision-intractable digest of the history. In this paper we are not concerned with
attacks in which Ward attempts to disrupt the communication between Alice and
Bob, so the dependence of decoding on accurate history is chiefly a storage concern.
where the probabilities are taken over (P K, SK) ← SG(1k ), (mW , hW ) ← W ( P K),
s ← SE(P K, mW , hW ), the random bits of W and the responses of the oracle C.
Define the insecurity of S over channel C by
InSeccha Advcha
S,C (t, l, k) = max S,C,W (k) ,
W ∈W(t,l)
56
where W(t, l) denotes the set of all adversaries which choose a message of length at
most l(k) bits, and run in time at most t(k).
4.2.3 Construction
Note that for infinitely many Ch there is no perfectly unbiased function f . As with
construction 3.18, this can be rectified by using a (global) universal hash function.
Lemma 4.7. Any channel C which is always informative can be compiled into a
channel C (k) which admits an efficiently computable function f such that for any
(k) 1
polynomial-length sequence h1 , . . . , hn satisfying PrC [hi ] 6= 0, Pr[f (Chi ) = 1] − 2 is
Proof. Let l(k) = ω(log k). Then the channel C (k) is simply a distribution on sequences
(k)
of documents which are elements of Dl(k) and the marginal distributions Ch are
57
l(k)
simply Ch . Because C is always informative, we have that for any h which has
(k)
non-zero probability, H∞ (Ch ) = ω(log k).
Let h1 , h2 , ..., hm be any sequence of histories which all have non-zero probability
under C (k) and let f : {0, 1}m(k) × D × {0, 1} be a universal hash function. Let
(k)
Y, Z ← Um(k) , B ← Um , and Di ← Chi . Let L(k) = mini H∞ (Di ), and note that
L(k) = ω(log k). Then the Leftover Hash Lemma ([33]) implies that
from which it is immediate that if we choose Y ← Um(k) once and publicly, then for all
1 ≤ i ≤ m, fY will have negligible bias for Chi except with negligible probability.
Proof. To see that this is so, imagine that the ith bit of the input to Basic Encode,
ci , was chosen so that Pr[ci = 0] = Pr[f (Chi ) = 0]. In this case the the ith document
output by Basic Encode will come from a distribution identical to Chi . But since
∆(ci , U1 ) ≤ , it must be the case that ∆(si , Chi ) ≤ as well, by proposition 2.4.
The statistical distance between the entire sequences must then be at most l, by the
triangle inequality.
Using these lemmata, we will show that public-key steganography is possible in any
channel that is always informative. We note that procedure Basic Encode has a small
probability of failure: Basic Decode(Basic Encode(c, h, k)) might not equal c. This
probability of failure, however, is negligible in k.
Let EP K (·) and DSK (·) denote the encryption and decryption algorithms for a public-
key cryptosystem E which is indistinguishable from random bits under chosen plain-
text attack (IND$-CPA). Let ` be the expansion function of E, i.e., |EP K (m)| = `(|m|).
The following procedures allow encoding and decoding of messages in a manner which
58
is steganographically secret under chosen hiddentext attack for the channel distribu-
tion C:
Lemma 4.10. Assume f is -biased on Ch for all h. For any warden W ∈ W(t, l),
we can construct an IND$-CPA adversary A where
cpa
Advcha
CHA,C,W (k) ≤ AdvE,A (k) + `(l).
Proof. We are given a program W for distinguishing between the output of Con-
struction 4.9 and the ordinary traffic of the channel, and an oracle for sampling
blocks from that channel. We construct a program A which plays the IND$-CPA
game: distinguishing EP K (mW ) oracle from U`(l) . A(P K) simply runs W (P K), to
get (mW , hW ) and returns mW for mA . Then A(P K, c) uses the oracle Ch to compute
s = Basic Encode(c, hW , k), and returns the output of W (P K, s). Consider the cases
for A’s input.
by lemma 4.8.
59
Combining the cases, we have
Advcha `
CHA,C,W (k) = | Pr[W (P K, SE(P K, mW , hW )) = 1] − Pr[W (P K, ChW ) = 1]|
as claimed.
cpa
InSeccha
CHA,C (t, l, k) ≤ InSecE (t + O(kl), l, k) + `(l) .
In many cases in which steganography might be desirable, it may not be possible for
either Alice or Bob to publish a public key without raising suspicion. In these cases, a
natural alternative to public-key steganography is steganographic key exchange: Alice
and Bob exchange a sequence of messages, indistinguishable from normal communi-
cation traffic, and at the end of this sequence they are able to compute a shared key.
So long as this key is indistinguishable from a random key to the warden, Alice and
Bob can proceed to use their shared key in a symmetric-key stegosystem. In this
section, we will formalize this notion.
60
We say that S is correct if these algorithms satisfy the property that there exists a
negligible function µ(k) satisfying:
We call the output of SD(1k , ra , SE(1k , rb )) the result of the protocol, and denote this
result by SKE (ra , rb ). We denote by S(1k , ra , rb ) the triple (SE(1k , ra ), SE(1k , rb ),
SKE (ra , rb )).
Alice and Bob perform a key exchange using S by sampling private randomness
ra , rb , asynchronously sending SE(1k , ra ) and SE(1k , rb ) to each other, and using the
result of the protocol as a key. Notice that in this definition a SKEP must be an
asynchronous single-round scheme, ruling out multi-round key exchange protocols.
This is for ease of exposition only.
We remark that, as in our other definitions, W also has access to bidirectional channel
oracles C a , C b .
Let W(t) denote the set of all wardens running in time t. The SKE insecurity of
S on bidirectional channel B with security parameter k is given by InSecske
S,B (t, k) =
ske
maxW ∈W(t) AdvS,B,W (k) .
61
(t, )-secure for bidirectional channel B if InSecske
S,B (t, k) ≤ (k). S is said to be secure
4.3.1 Construction
The idea behind behind the construction for steganographic key exchange is simple:
let g generate Z∗P , let Q be a large prime with P = rQ + 1 and r coprime to Q, and
let ĝ = g r generate the subgroup of order Q. Alice picks random values a ∈ ZP −1
uniformly at random until she finds one such that g a mod P has its most significant
bit (MSB) set to 0 (so that g a mod P is uniformly distributed in the set of bit strings
of length |P |−1). She then uses Basic Encode to send all the bits of g a mod P except
for the MSB (which is zero anyway). Bob does the same and sends all the bits of g b
mod P except the most significant one (which is zero anyway) using Basic Encode.
Bob and Alice then perform Basic Decode and agree on the key value ĝ ab :
Lemma 4.15. Let f be -biased on B. Then for any warden W ∈ W(t), we can
construct a DDH adversary A where Advddh 1 ske
A (ĝ, P, Q) ≥ 4 AdvSKE,B,W (k) − 2k. The
Proof. A takes as input a triple (ĝ a , ĝ b , ĝ c ) and attempts to decide whether c = ab, as
follows. First, A computes r̂ as the least integer such that rr̂ = 1 mod Q, and then
picks α, β ← Zr . Then A computes ca = (ĝ a )r̂ g αQ and cb = (ĝ b )r̂ g βQ . If ca > 2k−1
or cb > 2k−1 , A outputs 0. Otherwise, A computes sa = Basic Encode(ca ), and
sb = Basic Encode(cb ); A then outputs the result of computing W (sa , sb , ĝ c ). We
claim that:
62
• The element ca , cb are uniformly chosen element of Z∗P , when a, b ← ZQ . To see
that this is true, observe that the exponent of sa , ξa = rr̂a + αQ, is congruent to
a mod Q and αQ mod r; and that for uniform α, αQ is also a uniform residue
mod r. By the Chinese remainder theorem, there is exactly one element of
ZrQ = ZP −1 that satisfies these conditions, for every a and α. Thus ca is
uniformly chosen. The same argument holds for cb .
3
• B halts and outputs 0 with probability at most 4
over input and random choices;
and conditioned on not halting, the values ca , cb are uniformly distributed in
{0, 1}k . This is true because 2k /P < 21 , by assumption.
crb
a = (g
rr̂a+αQ rb
)
= g (γQ+1)rab+rQ(αb)
= g rab = ĝ c
• When c 6= ab, the input HkskEH(ĝz ) (mA )) is selected exactly according to the
output of H1 , by construction.
Thus,
2
2k
a b ab
Pr[A(ĝ , ĝ , ĝ ) = 1] = Pr[W (S(a, b)) = 1] ,
P
and
Pr[A(ĝ a , ĝ b , ĝ c ) = 1] − Pr[W (B, K) = 1] ≤ 2k .
K
InSecske ddh 2
SKE,B (t, k) ≤ 4InSecĝ,P,Q (t + O(k ))) + 8k .
63
64
Chapter 5
The results of the previous two chapters show that a passive adversary (one who
simply eavesdrops on the communications between Alice and Bob) cannot hope to
subvert the operation of a stegosystem. In this chapter, we consider the notion of an
active adversary who is allowed to introduce new messages into the communications
channel between Alice and Bob. In such a situation, an adversary could have two
different goals: disruption or detection.
65
5.1 Robust Steganography
Robust steganography can be thought of as a game between Alice and Ward in which
Ward is allowed to make some alterations to Alice’s messages. Ward wins if he can
sometimes prevent Alice’s hidden messages from being read; while Alice wins if she
can pass a hidden message with high probability, even when Ward alters her public
messages. For example, if Alice passes a single bit per document and Ward is unable
to change the bit with probability at least 12 , Alice may be able to use error correcting
codes to reliably transmit her message. It will be important to state the limitations we
impose on Ward, since otherwise he can replace all messages with a new (independent)
draw from the channel distribution, effectively destroying any hidden information. In
this section we give a formal definition of robust steganography with respect to a
limited adversary.
66
1. W is given oracle access to the channel distribution C and to SE(K, ·, ·). W
may access these oracles at any time throughout the game.
0
SuccR
S,W (k) = Pr[SD(K, sW , hW ) 6= mW ] ,
where the probability is taken over the choice of K and the random choices of S and
W . Define the failure rate of S by
FailR SuccR
S (t, q, l, µ, k) = max S,W (k) ,
W ∈W(R,t,q,l,µ)
where W(R, t, q, l) denotes the set of all R-bounded active wardens that submit at
most q(k) encoding queries of total length at most l(k), produce a plaintext of length
at most µ(k) and run in time at most t(k).
Definition 5.1. A sequence of stegosystems {Sk }k∈N is called substitution robust for
C against R if it is steganographically secret for C and there is a negligible function
ν(k) such that for every PPT W , for all sufficiently large k, SuccR
S,W (k) < ν(k).
Consider the question of what conditions on the relation R are necessary to allow
communication to take place between Alice and Bob. Surely it should not be the case
that R = D×D, since in this case Ward’s “substitutions” can be chosen independently
of Alice’s transmissions, and Bob will get no information about what Alice has said.
67
then when h has transpired, Ward can effectively prevent the transfer of information
from Alice to Bob by sending the document d0 regardless of the document transmitted
by Alice, because the probability Alice picks a document related to d0 is 1. That is,
after history h, regardless of Alice’s transmission d, Ward can replace it by d0 , so
seeing d0 will give Bob no information about what Alice said.
Since we model the attacker as controlling the history h, then, a necessary condi-
tion on R and C for robust communication is that
X
∀h. Pr[h] = 0 or max Pr[x] < 1 .
C y Ch
(x,y)∈R
P
We denote by I(R, D) the function maxy (x,y)∈R PrD [x]. We say that the pair
(R, D) is δ-admissible if I(R, D) ≤ δ and a pair (R, C) is δ-admissible if ∀h PrC [h] =
0 or I(R, Ch ) ≤ δ. Our necessary condition states that (R, C) must be δ-admissible
for some δ < 1.
It turns out that this condition (on R) will be sufficient, for an efficiently sam-
pleable channel, for the existence of a stegosystem which is substitution-robust against
R.
In this section we give a stegosystem which is substitution robust against any admis-
sible bounding relation R, under a slightly modified assumption on the channel, and
assuming that Alice and Bob know some efficiently computable, δ-admissible relation
R0 such that R0 is a superset of R. As with most of our constructions, this stegosystem
is not really practical but it serves as a proof that robust steganography is possible
for any admissible relation.
68
counter N . We will let n(k) = ω(log k) be a “robustness parameter.” We begin with
a stegosystem which robustly encodes a single bit.
The idea behind this construction is this: suppose that instead of sharing a key
to a pseudorandom function F , Alice and Bob shared two secret documents d0 , d1
drawn independently from Ch . Then Alice could send Bob the message bit m by
sending document dm , and Bob could recover m by checking to see if the document
he received was related (by R0 ) to d0 or d1 . Since the adversary is R bounded and
(D, R0 ) is δ-admissible, the probability of a decoding error — caused either by the
adversary, or by accidental draw of d0 , d1 — would be at most δ. Intuitively, ROneBit
reduces the probability of decoding error to δ n by encoding each hiddentext bit n
times.
prf
InSecss
ROneBit,C (t, q, q) ≤ InSecF (t + O(nq), nq).
Proof. Let W be a passive warden which runs in time t, and makes at most q queries
of total length at most q (each query can be only one bit, because of the type). We
construct a PRF adversary A which runs in time t + O(nq) and makes at most nq
queries to F , such that
Advprf ss
F,A (k) = AdvS,C,W (k) .
69
The PRF adversary takes a function oracle f , and emulates W (1k ), responding to the
queries W makes to the encoder SE by using f in place of FK (·, ·). More formally,
we define the subroutine SSE f : {0, 1}∗ × {0, 1}∗ → {0, 1}∗ as follows:
Procedure SSE f :
Input: bit m, history h
for i = 1 . . . n do
set σi = C(1k , h, f (N, m))
increment N
set h = (h, σi )
Output: σ1 , . . . , σn
f
Then we define Af (1k ) = W SSE (1k ); A’s advantage over F is then:
Advprf
FK k
F,A (k) = Pr[A
(1 ) = 1] − Pr[Af (1k ) = 1]
= Pr[W ST (1k ) = 1] − Pr[Af (1k ) = 1]
= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
ROneBit,C,W (k) .
• f is chosen from FK (·, ·). Then the output of SSE f is distributed identically
to the encoding function of ROneBit. That is,
70
oracle f in place of FK (·, ·) to emulate ROneBit.Encode in responding to the queries
of W . Let mW , sW be the hiddentext and the stegotext sequence returned by W ,
respectively. Then Af returns 1 iff SDf (sW , hW ) 6= mW . Consider the following two
cases for f :
• f is chosen uniformly from all appropriate functions. Then, for each i, the
stegotexts σi = C(1k , hi , f (N + i, mW )) are distributed independently according
to Chi . Consider the sequence of “alternative stegotexts” di = C(1k , h0i , f (N +
i, 1 − mW )); each of these is also distributed independently according to Ch0i ;
and since W is never given access to the di , the si are independent of the di .
Now SD will fail (causing Af (1k ) to output 1) only if the event ∀i.(di , si ) ∈ R0
occurs. Because the di are independent of the actions of W , and because (D, R0 )
is δ-admissible, each event (di , si ) ∈ R0 happens independently with probability
at most δ. So the probability of failure is at most δ n :
Pr[Af (1k ) = 1] ≤ δ n .
Advprf
F,A (k) = Pr[A
FK k
(1 ) = 1] − Pr[Af (1k ) = 1]
= SuccR f k
ROneBit,W (k) − Pr[A (1 ) = 1]
≥ SuccR n
ROneBit,W (l) − δ .
71
We now show how to extend ROneBit to handle multiple-bit messages. We assume
the same setup as previously, i.e., Alice and Bob share a synchronized counter N and
a key K to a PRF F : {0, 1}k ×{0, 1}∗ → {0, 1}k ; and know an efficiently computable,
admissible relation R0 ⊇ R. We assume that the “state updates” made by calls to
ROneBit are maintained across invocations.
Proof. Suppose we are given a warden W ∈ W(t, q, l) against the stegosystem Robust.
Then we can construct a warden X ∈ W(t, l, l) against ROneBit. X M works by
simulating W , responding to each oracle query m, h by computing h0 = h, and
σi,1...n = M (mi , hi−1 ), hi = h, σi,1...n for 1 ≤ i ≤ |m|, and returning σ1 , . . . , σ| m|.
Consider the cases for X’s oracle M :
Advss
ST k CT k
ROneBit,C,X (k) = Pr[X (1 ) = 1] − Pr[X (1 ) = 1]
= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
Robust,C,W (k)
72
Combining the fact that X makes l queries to ROneBit.Encode and runs in time
t + O(l) with the result of lemma 5.3, we get
prf
Advss
Robust,C,W (k) ≤ InSecF (t + O(nl), nl, k) .
prf
Lemma 5.8. FailR n
Robust (t, q, l, µ, k) ≤ InSecF (t + O(nl), nl, k) + µδ .
• f is chosen uniformly from all appropriate functions. Then, for each i, the
stegotexts σi,j = C(1k , hi,j , f (N + (i − 1)n + j, mW,i )) are distributed inde-
pendently according to Chni . Consider the sequence of “alternative stegotexts”
di,j = C(1k , h0i,j , f (N + (i − 1)n + j, 1 − mW,i )); each of these is also distributed
independently according to Ch0i,j ; and since W is never given access to the di,j ,
the si,j are independent of the di,j . Now SD will fail (causing Af (1k ) to out-
put 1) only if the event ∀j.(di,j , si,j ) ∈ R0 occurs for some i. Because the di,j
are independent of the actions of W , and because (D, R0 ) is δ-admissible, each
event (di,j , si,j ) ∈ R0 happens independently with probability at most δ. So the
probability of failure for any i is at most δ n . A union bound then gives us:
Pr[Af (1k ) = 1] ≤ µδ n .
73
Taking the difference of these probabilities, we get:
Advprf
F,A (k) = Pr[A
FK k
(1 ) = 1] − Pr[Af (1k ) = 1]
= SuccR f k
Robust,W (k) − Pr[A (1 ) = 1]
≥ SuccR n
Robust,W (k) − µδ .
74
from random bits under chosen-ciphertext attack.
IND$-CCA Security
Advcca
EK ,DK k $,DK k
E,A (k) = Pr[A (1 ) = 1] − Pr[A (1 ) = 1] ,
InSeccca Advcca
E (t, qe , qd , µe , µd , k) = max E,A (k) ,
A∈A(t,qe ,qd ,µe ,µd )
where A(t, qe , qd , l∗ , µe , µd ) denotes the set of adversaries running in time t, that make
qe queries of µe bits to e, and qd queries of µd bits to DK .
random bits under chosen ciphertext attack (IND$-CCA) if for every PPT A, Advcca
A,E (k)
is negligible in k.
Construction. We let E be any IND$-CPA-secure symmetric encryption scheme and
let F : {0, 1}k × {0, 1}∗ → {0, 1}k be a pseudorandom function. We let K, κ ← Uk .
We construct a cryptosystem E as follows:
75
Theorem 5.9.
• E5 , D5 : choose K, κ ← Uk .
76
If we define the function
AdviA (k) = Pr[AEi ,Di (1k ) = 1] − Pr[AEi+1 ,Di+1 (1k ) = 1] ,
Advcca
E.EK ,E.DK k $,E.DK k
A,E (k) = Pr[A (1 ) = 1] − Pr[A (1 ) = 1]
= Pr[AE1 ,D1 (1k ) = 1] − Pr[AE5 ,D5 (1k ) = 1]
4
X
≤ Pr[AEi ,Di (1k ) = 1] − Pr[AEi+1 ,Di+1 (1k ) = 1]
i=1
4
X
= AdviA (k)
i=1
B picks K ← Uk and runs A. B uses its function oracle f to respond to A’s queries
as follows:
which gives us
Adv1A (k) = Pr[AE1 ,D1 (1k ) = 1] − Pr[AE2 ,D2 (1k ) = 1]
= Pr[B FK (1k ) = 1] − Pr[B f (1k ) = 1]
= Advprf prf 0
B,F (k) ≤ InSecF (t , qe + qd , k)
77
as claimed.
Advcpa 2
B,E (k) ≥ AdvA (k) − qd 2
−k
.
Let V denote the event that A submits a decryption query that would cause B to
halt. Then, conditioned on ¬V, when B’s oracle is $, B perfectly simulates E3 , D3 to
A:
Pr[B $ (1k ) = 1] = Pr[AE3 ,D3 (1k ) = 1|¬V] .
Also, conditioned on ¬V, when B’s oracle is E.EK , B perfectly simulates E2 , D2 to A:
≤ qd 2−k + InSeccpa 0
E (t , qe , µe , k)
78
Where the last line follows because each decryption query causes B to halt with
probability 2−k ; the union bound gives the result.
qe2
Lemma 5.12. Adv3A (k) ≤ 2k
Proof. Notice that unless E3 chooses the same values of (r, c) at least twice, E3 and
E4 are identical. Denote this event by C. Then we have:
Adv3A (k) = Pr[AE3 ,D3 (1k ) = 1] − Pr[AE4 ,D4 (1k ) = 1]
= Pr[AE3 ,D3 (1k ) = 1|C] Pr[C] + Pr[AE3 ,D3 (1k ) = 1|¬C] Pr[¬C]
− Pr[AE4 ,D4 (1k ) = 1|C] Pr[C] + Pr[AE4 ,D4 (1k ) = 1|¬C] Pr[¬C]
≤ Pr[C] Pr[AE3 ,D3 (1k ) = 1|C] − Pr[AE4 ,D4 (1k ) = 1|C]
+ Pr[¬C] Pr[AE3 ,D3 (1k ) = 1|¬C] − Pr[AE2 ,D2 (1k ) = 1|¬C]
= Pr[C] Pr[AE3 ,D3 (1k ) = 1|C] − Pr[AE4 ,D4 (1k ) = 1|C]
≤ Pr[C]
−k qe
≤2
2
Advprf 4
B,F (k) = AdvA (k) .
79
so by definition of advantage, we get:
Adv4A (k) = Pr[AE5 ,D5 (1k ) = 1] − Pr[AE4 ,D4 (1k ) = 1]
= Pr[B FK (1k ) = 1] − Pr[B f (1k ) = 1]
= Advprf prf 0
B,F (k) ≤ InSecF (t , qd , k)
InSecscca Advscca
S,C (t, qe , qd , µe , µd , k) = max S,C,W (k) ,
W ∈W(t,~
q ,~
µ)
where W(t, ~q, ~µ) denotes the class of all W running in time t which make at most qe
encoding queries of µe bits and at most qd decoding queries of µd bits.
We say that S is (t, ~q, ~µ, k, ) secure against symmetric chosen-covertext attack
with respect to C if
InSecscca q , ~µ, k) ≤ ,
S,C (t, ~
80
Construction
Procedure DEncode:
Input: bits c1 , . . . , cl , history h, bound k, randomness r1 , . . . rlk ∈ {0, 1}k
Let ι = 0
for i = 1 . . . l do
Let j = 0
repeat:
compute si = C(1k , h, rι ), increment j, ι
until f (si ) = ci OR (j > k)
set h = h, si
Output: s1 , s2 , . . . , sl
For a given sequence of lk random inputs, this routine has exactly one possible
encoding for any message. Building on this routine, we will assume that Alice and Bob
share a key K for a symmetric IND$-CCA-secure encryption scheme E = (G, E, D)
such that for any key K, for any l ≥ k, Pr[U`(l) ∈ [EK ({0, 1}l )]] ≤ 1/2k (An example is
the scheme of the previous section). We will also assume that G : {0, 1}k → {0, 1}k×lk
is a pseudorandom generator. The the following scheme is secure against symmetric
chosen-covertext attack:
81
where t0 ≤ t + O((µe + µd )k).
Clearly Pr[W O1 ,SDK (1k ) = 1] = Pr[W OC ,SDK (1k ) = 1] and Pr[W O4 ,SDK (1k ) = 1] =
Pr[W SEK ,SDK (1k ) = 1]. Thus
O4 ,SDK k O1 ,SDK k
Advscca
S,C,W (k) = Pr[W (1 ) = 1] − Pr[W (1 ) = 1]
Advprg 2
G,A (k) ≥ 1/qe AdvW (k) .
82
A works as follows: first, A picks a key K ← Uk to use in responding to the queries
W makes to SDK . Suppose A is given as input qe strings r1 , . . . , rqe of length k × lk
and asked to decide whether they are all samples from Uk×lk or samples from G(Uk ).
Then A can achieve advantage precisely Adv2W (k) by emulating W , responding to
its decoding queries using K, and responding to the ith encoding query (m, h) by
drawing c ← U`(|m|+k) and giving the response DEncode(c, h, k, ri ). If all of the ri are
drawn from Uk×lk , then A perfectly simulates O1 to W , and if all are drawn from
G(Uk ), A perfectly simulates O2 . Thus A’s advantage in distinguishing G(Uk )qe and
qe
Uk×lk is exactly Adv2W (k). The lemma follows from this fact and proposition 2.6 (a
straightforward hybrid argument).
Proof. We will construct an adversary A that plays the chosen-ciphertext attack game
against E with advantage
Advcca 3
A,E (k) ≥ AdvW (k) .
• on decoding query (s, h), A computes c = Basic Decode(s, h); if c was previ-
ously generated by an encoding query, A returns ⊥, otherwise A uses its decryp-
tion oracle to compute rkk m = DK (c). If c 6=⊥ and s = DEncode(c, h, k, G(r)),
A returns m, otherwise A returns ⊥.
In other words, A simulates running the routines sCCA.Encode and sCCA.Decode with
its oracles; with the exception that because A is playing the IND$-CCA game, he is
not allowed to query DK on the result of an encryption query: thus a decoding query
that has the same underlying ciphertext c must be dealt with specially.
83
This is because when c = EK (rkm) then the test s = DEncode(c, h, k, G(r)) would
fail anyways.
Pr[A$,DK (1k ) = 1] = Pr[A$,DK (1k ) = 1|¬V] Pr[¬V] + Pr[A$,DK (1k ) = 1|V] Pr[V]
≤ Pr[W O3 ,SDK (1k ) = 1|¬U] Pr[¬U] + Pr[V]
≤ Pr[W O3 ,SDK (1k ) = 1] + Pr[V]
≤ Pr[W O3 ,SDK (1k ) = 1] + qe 2−k ,
Advcca
A,E (k) = Pr[A
EK ,DK k
(1 ) = 1] − Pr[A$,DK (1k ) = 1]
= Pr[W O4 ,SDK (1k ) = 1] − Pr[A$,DK (1k ) = 1]
≥ Pr[W O4 ,SDK (1k ) = 1] − Pr[W O3 ,SDK (1k ) = 1] − qe 2−k
= Adv3W (k) − qe 2−k
84
IND$-CCA
3. A continues to query DSK subject to the restriction that A may not query
DSK (c∗ ). A outputs a bit.
where m∗ ← ADSK (P K) and (P K, SK) ← G(1k ), and define the CCA insecurity of
E by
∗
InSeccca Advcca
E (t, q, µ, l , k) = max E,A (k) ,
A∈A(t,q,,µ,l∗ )
∗
where A(t, q, µ, l ) denotes the set of adversaries running in time t, that make q
queries of total length µ, and issue a challenge message m∗ of length l∗ . Then E
is (t, q, µ, l∗ , k, )-indistinguishable from random bits under chosen ciphertext attack if
InSeccca ∗
E (t, q, µ, l , k) ≤ . E is called indistinguishable from random bits under chosen
• Generate(1k ): draws (π, π −1 ) ← Πk ; the public key is π and the private key is
π −1 .
85
• Encrypt(π, m): draws a random x ← Uk , computes K = H(x), c = EK (m),
y = π(x) and returns ykc.
Theorem 5.20.
cca 0
InSeccca ow
E (t, q, µ, l, k) ≤ InSecΠ (t, k) + InSecSE (t , 1, q, l, µ, k) ,
where t0 ≤ t + O(qH ).
Proof. We will show how to use any adversary A ∈ A(t, q, µ, l) against E to create an
adversary B which plays both the IND$-CCA game against SE and the OWP game
against Π so that B succeeds in at least one game with success close to that of A.
B receives as input an element π ∈ Π and a y ∗ ∈ {0, 1}k and also has access to
encryption and decryption oracles O, DK for SE. B keeps a list L of (y, z) pairs,
0
where y ∈ {0, 1}k and z ∈ {0, 1}k , initially, L is empty. B runs A with input π and
answers the decryption and random oracle queries of A as follows:
• When A queries H(x), B first computes y = pi(x), and checks to see whether
y ∗ = y; if it does, B “decides” to play the OWP game and outputs x, the inverse
of y ∗ . Otherwise, B checks to see if there is an entry in L of the form (y, z); if
there is, B returns z to A. If there is no such entry, B picks a z ← Uk0 , adds
(y, z) to L and returns z to A.
• When A queries DSK (ykc), first check whether y = y ∗ ; if so, return DK (c).
Otherwise, check whether there is an entry in L of the form (y, z); if not, choose
z ← Uk0 and add one. Return SE.Dz (y).
Advow
B,Π (k) = Pr[P] .
86
Now, conditioned on ¬P, when B’s oracle O is a random string oracle, c∗ ← U` and
B perfectly simulates the random-string world to A. And (still conditioned on ¬P)
when B’s oracle O is EK , B perfectly simulates the ciphertext world to A. Thus, we
have that:
Advcca
B,SE (k) = Pr[B
$,SE.DK
(π, y) = 1] − Pr[B SE.EK ,SE.DK (π, y) = 1]
= Pr[AE.DSK (U` ) = 1|¬P] − Pr[AE.DSK (E.E(π, m∗ )) = 1|¬P]
E.DSK
Advcca
A,E (k) = Pr[A (U` ) = 1] − Pr[AE.DSK (E.E(π, m∗ )) = 1]
= Pr[AE.DSK (U` ) = 1|¬P] − Pr[AE.DSK (E.E(π, m∗ )) = 1|¬P] Pr[¬P]
SS-CCA Game
87
where (m∗ , h∗ ) ← W SDSK (P K) and (P K, SK) ← SG(1k ). We define the sCCA
insecurity of S with respect to C by
∗
InSecscca Advscca
S,C (t, q, µ, l , k) = max S,C,W (k) ,
W ∈W(t,q,µ,l∗ )
where W(t, q, µ, l∗ ) denotes the class of all W running in time t which make at most
q oracle queries of µ bits and submit a challenge hiddentext of length at most l∗ .
Construction
InSecscca cca 0
pCCA,C (t, q, µ, l, k) ≤ InSecE (t , q, µ, l, k) + 2
−t
+ `(l + k) + InSecprg 0
G (t , k) ,
where t0 ≤ t + O(lk).
88
Proof. Choose an arbitrary W ∈ W(t, q, µ, l); let (P K, SK) ← G(1k ) and let
(m∗ , h∗ ) ← W SDSK (P K) .
bution:
`(l+k)
• D1 : Ch∗
• D2 : DEncode(U`(l+k) , h∗ , k, Uk×lk )
• D3 : DEncode(U`(l+k) , h∗ , k, G(Uk ))
Advscca
SD
W,pCCA,C (k) = Pr[W
(D4 ) = 1] − Pr[W SD (D1 ) = 1]
≤ Pr[W SD (D2 ) = 1] − Pr[W SD (D1 ) = 1]
+ Pr[W SD (D3 ) = 1] − Pr[W SD (D2 ) = 1]
+ Pr[W SD (D4 ) = 1] − Pr[W SD (D3 ) = 1]
Advprg 2
G,A (k) = AdvW (k) .
89
A works as follows: first, A picks a key pair (P K, SK) ← G(1k ) to use in responding
to the queries W makes to SD. A is given as input a string r ∈ {0, 1}k×lk and
asked to decide whether r ← Uk×lk or r ← G(Uk ). Then A can achieve advantage
precisely Adv2W (k) by emulating W , responding to its decoding queries using SK,
and responding to the challenge hiddentext (m∗ , h∗ ) by drawing c ← U`(l+k) and giving
the response s = DEncode(c, h, k, r). If r ← Uk×lk , then s ← D1 , and if r ← G(Uk ),
then s ← D2 . Thus A’s advantage in distinguishing G(Uk ) and Uk×lk is exactly:
Advprg
A,G (k) = |Pr[A(G(Uk )) = 1] − Pr[A(Uk×lk ) = 1]|
= Pr[W SD (D2 ) = 1] − Pr[W SD (D1 ) = 1]
= Adv2W (k)
Proof. We will construct an adversary A that plays the chosen-ciphertext attack game
against E with advantage
Advcca 3
A,E (k) ≥ AdvW (k) .
s∗ = DEncode(c∗ , h∗ , k, G(r∗ ))
to W .
90
In other words, A simulates running sCCA.Decode with its DSK oracle, except
that because A is playing the IND$-CCA game, he is not allowed to query DSK on the
challenge value c∗ : thus a decoding query that has the same underlying ciphertext c∗
must be dealt with specially.
This is because when c∗ = EK (r∗ km∗ ) then the test s = DEncode(c, h, k, G(r)) would
fail anyways.
Advcca
A,E (k) = Pr[A
DSK
(P K, EP K (r∗ km∗ ) = 1] − Pr[ADSK (P K, U` ) = 1]
= Pr[W SD (P K, D4 ) = 1] − Pr[ADSK (P K, U` ) = 1]
≥ Pr[W SD (P K, D4 ) = 1] − Pr[W SD (P K, D3 ) = 1] − 2−t
= Adv3W (k) − 2−t
91
5.2.2 Authenticated Stegosystems
Before we can do so, however, it is necessary to extend the syntax and correctness
definitions of a public-key stegosystem to include steganographic signatures.
• S.SigGen (abbreviated SSG when S is clear from the context) takes as input
a security parameter 1k and generates a key pair (SV K, SSK) ∈ SVK × SSK.
92
SE(P K, SSK, m, h) returns a sequence of documents s1 , s2 , . . . , sl (the stego-
text) from the support of Chl .
where the probability is over the choices (P K, SK) ← SG(1k ), (SV K, SSK) ←
SSG(1k ), and any coin tosses of SE, SD, SG, SSG and the oracle to Ch .
2. W is given access to a second oracle which is either STatk or CTatk . The oracle
ST (for StegoText) will model the case in which the pair Alice and Bob are com-
municating steganographically, while the oracle CT (for CoverText) will model
the case in which they are not. The exact distributions over STatk , CTatk vary
depending on the attack model, atk. Below we will specify these distributions
for atk ∈ {cxo, csa}. Both oracles respond to the null query with the public and
verification keys generated by SG and SSG.
93
We define the advantage of W against stegosystem S over channel C in attack model
atk by
Advatk
STatk k CTatk k
S,C,W (k) = Pr[W (1 ) = 1] − Pr[W (1 ) = 1] ,
where the warden uses random bits r and the probabilities are also taken over the
oracles. Define the insecurity of S over channel C by
InSecatk Advatk
S,C (t, q, l, k) = max S,C,W (k) ,
W ∈W(t,q,l)
where W(t, q, l) denotes the set of all adversaries which make at most q oracle queries
totaling at most l bits, and running in time at most t.
.
For the next construction, we will require the notion of a digital signature scheme
that is existentially unforgeable under chosen message attack:
Consider the following game that an adversary A plays against SG: the adversary
A is given V K and oracle access to SSK , where (SK, V K) ← G(1k ). A makes q oracle
94
queries of at most l bits to get back {SSK (M1 ), ..., SSK (Mq )}. A then outputs a pair
(M, σM ). A wins if M ∈
/ {M1 , ..., Mq } and V (V K, M, σM ) = 1.
Denote the event of A winning the game by winA,SG (k), Let A(t, q, l) be the set
of adversaries A which make q queries to the oracle of at most l bits and run for t
time steps. Define the EUF-CMA advantage of A against SG as
Advcma
A,SG (k) = Pr[winA,SG (k)] .
InSeccma Advcma
SG (t, q, l, k) = max SG,A (k) .
A∈A(t,q,l)
EUF-CMA signature schemes exist if and only if one-way functions exist[47, 55]..
In a chosen exactly-one attack, Ward may both submit hiddentexts to Alice for en-
coding, and submit stegotexts to Bob for decoding, but he is prohibited from accessing
both at the same time (i.e., with the same history). This models a scenario in which
Ward can inject messages onto a network, but cannot replace them; for example, this
may be the situation if he is connected to the same shared access network as Bob,
but does not control the medium. As in all of our attack models, Ward’s goal is to
determine whether the specific pair of Alice and Bob are communicating stegano-
graphically rather than whether Alice or Bob are using steganography at all. In the
case that Ward asks Bob to decode stegotexts, his goal is to detect whether Bob is re-
ceiving steganography from Alice by watching Bob’s reaction to stegotexts (allegedly
from Alice) generated by Ward himself. If Bob is not receiving steganography from
Alice, he will always output the distinguished message ⊥, meaning “no reaction.” If
Bob is receiving steganography from Alice, he will decode the stegotexts.
We formally define the oracle distributions STcxo , CTcxo as follows: First, draw
(P K, SK) ← SG(1k ), and set φ = {}. Respond to queries using these programs:
95
STcxo (b ∈ {enc, dec}, m, h) CTcxo (b ∈ {enc, dec}, m, h)
if (b = enc) then: if (b = enc) then:
Sample s ← SE(P K, m, h) |SE(P K,m,h)|
Sample s ← Ch
Set φ = φ ∪ {h} Set φ = φ ∪ {h}
return s return s
else else
If h ∈ φ return ε If h ∈ φ return ε
else return SD(SK, m, h) else return ⊥
SS-CXO is the strongest notion of security that we are able to achieve in the stan-
dard model. Since any stegotext encoded by Alice can be thought of as a “challenge
stegotext,” (Ward’s goal is only to detect that it is, in fact, a stegotext rather than
a covertext) this condition is somewhat analogous to non-adaptive chosen ciphertext
security (IND-CCA1) for public-key encryption.
InSeccxo cma
CXO,C (t, q, l, k) ≤ InSecSG (t + O(kl), q, l, k)
+ InSeccpa
E (t + O(kl), q, l + q`σ , k) + `(l + q`σ ) .
96
Proof. Informally, we will consider the hybrid oracle H which answers encoding
queries using CXO Encode and answers all decoding queries with ⊥. Distinguish-
ing this hybrid from STcxo equates to distinguishing CXO Decode from the constant
oracle ⊥ on some history h for which no query of the form (enc, ∗, h) has been made.
This can only happen if a decoding query contains a signature on a (m, h) pair which
was never signed by CXO Encode (because no encoding queries were ever made with
the history h). So, intuitively, distinguishing between H and STcxo requires forging
a signature. Similarly, since both H and CTcxo answer all dec queries by ⊥, distin-
guishing between them amounts to a chosen-hiddentext attack, which by Lemma 4.10
would give an IND$-CPA attacker for E. The result follows by the triangle inequality.
More formally, Let W ∈ W(t, q, l). We will show that W must either forge a
signature or distinguish the output of E from random bits. We will abuse notation
slightly and denote W STcxo by W SE,SD , and W CTcxo by W C,⊥ . Then we have that
Consider the “hybrid” distribution which results by answering encoding queries using
CXO Encode but answering all decoding queries with ⊥. (We denote this oracle by
(SE, ⊥))
97
Then we have the following inequalities:
Where the last line follows because (let D denote the event W SE,SD = 1, and notice
that D|F ≡ W SE,⊥ = 1):
Pr[D] − Pr[W SE,⊥ = 1] = Pr[D|F ] Pr[F ] + Pr[D|F ] Pr[F ] − (Pr[W SE,⊥ = 1])
= Pr[D|F ] Pr[F ] + Pr[W SE,⊥ = 1](1 − Pr[F ])
− (Pr[W SE,⊥ = 1])
= Pr[F ](Pr[D|F ] − Pr[D|F ])
≤ Pr[F ]
= Advcma
(GS ,S,V ) (Af )
The theorem follows by the definition of insecurity, since both Ad and Af run in time
at most t + O(kl).
Chosen-Stegotext Attacks
We formally define the oracle distributions STcsa , CTcsa as follows: First, draw
(P K, SK) ← SG(1k ), and set φ = {}. Respond to queries using these programs:
98
STcsa (b ∈ {enc, dec}, m, h) CTcsa (b ∈ {enc, dec}, m, h)
if(b = enc) then: if(b = enc) then:
Sample s ← SE(P K, m, h) |SE(P K,m,h)|
Sample s ← Ch
Set φ = φ ∪ {(s, h)} Set φ = φ ∪ {(s, h)}
return s return s
else else
If (m, h) ∈ φ return ε If (m, h) ∈ φ return ε
else return SD(SK, m, h) else return ⊥
Thus, in a chosen-stegotext attack, Ward may ask Alice to encode any (message,
history) pair of his choosing, as often as he likes, and may query Bob on any pair
(s, h) where s was not a result of an encoding query for history h. Notice that if
Ward can ask Bob to decode any message output by Alice for the same history it
was encoded with, he can detect the use of steganography between Alice and Bob;
this is why we do not allow Ward to query Bob on such stegotexts. This restriction
is roughly analogous to the standard restriction that an adaptive chosen-ciphertext
attacker may not query his decryption oracle on the challenge ciphertext. Advantage
and insecurity for SS-CSA are defined analogously to SS-CXO, except that we count
encoding and decoding queries separately (as qe and qd ) as well as counting the number
of queries made to random oracles.
Construction.
99
Procedure CSA EncodeF,G,H : Procedure CSA DecodeF,G,H :
Input: m1 · · · m` , h, πA−1 , πB Input: s1 , . . . , sl , h, πA , πB−1
Choose r ← Uk Let c = Basic Decode(s1 , . . . , sl )
Let σ = πA−1 (F (r, m, h)) Parse c as y||e||τ .
Let e = H1 (r) ⊕ (m, σ) Set r = πB−1 (y).
Let τ = H2 (r, m, h) If s 6= UEncodeG (c, r, h) return ⊥.
Let y = πB (r) Let (m, σ) = e ⊕ H1 (r)
Let c = y||e||τ If τ 6= H2 (r, m, h) return ⊥.
Output: UEncodeG (c, r, h) If πA (σ) 6= F (r, m, h) return ⊥.
Output: m
Procedure UEncodeG :
Input: c ∈ {0, 1}l , r ∈ {0, 1}k , h
for i = 1 . . . l do
Let j = 0
repeat:
set si = C(h; G(h, r, c, j))
increment j
until f (si ) = ci OR (j > k)
set h = (h, si )
Output: s1 , s2 , . . . , sl
0
InSeccsa q , l, k) ≤ (2qe + qF )InSecow
CSA,C (t, ~
2 k
π (t , k) + (l + 3qe k) + (qe + 2qd )/2 ,
where t0 ≤ t + (qG + qF + qH1 + qH2 )(qe + qd )Tπ + k(l + 3qe k)TC , Tπ is the time to
evaluate members of π, and TC is the running time of C.
Proof. Intuitively, this stegosystem is secure because the encryption scheme employed
is non-malleable, the signature scheme is strongly unforgeable, and each triple of
hiddentext, history, and random-bits has a unique valid stegotext, which contains a
signature on (m, h, r). Thus any adversary making a valid decoding query which was
not the result of an encoding query can be used to forge a signature for Alice — that
is, invert the one-way permutation πA .
2. P1(b, m, h) responds to dec queries as in P0, and responds to enc queries using
CSA EncodeF,G,H but with calls to UEncodeG replaced by calls to Basic Encode.
100
3. P2(b, m, h) responds to dec queries as in P1, and responds to enc queries using
CSA EncodeF,G,H .
We are given a CSA attacker W ∈ W(t, qe , qd , qF , qH , qH1 , qH2 , l) and wish to bound
his advantage. Notice that
Advcsa
CSA,C,W (k) ≤ | Pr[W
P0 k
(1 ) = 1] − Pr[W P 1 (1k )]| +
| Pr[W P 1 (1k ) = 1] − Pr[W P 2 (1k ) = 1]| +
| Pr[W P 2 (1k ) = 1] − Pr[W P 3 (1k ) = 1]| .
Hence, we can bound the advantage of W by the sum of its advantages in distin-
guishing the successive hybrids. For hybrids P, Q we will denote this advantage by
AdvP,Q P k Q k
W (k) = | Pr[W (1 ) = 1] − Pr[W (1 ) = 1]|.
Proof. Assume WLOG that Pr[W P 1 (1k ) = 1] > Pr[W P 0 (1k ) = 1]. Let Er denote the
event that, when W queries P1, the random value r never repeats, and let Eq denote
the event that W never makes random oracle queries of the form H1 (r) or H2 (r, ∗, ∗)
for an r used by CSA EncodeF,G,H , and let E ≡ Er ∧ Eq . Then:
AdvP0,P1
W (k) = Pr[W P 1 (1k ) = 1] − Pr[W P 0 (1k ) = 1]
= Pr[W P 1 (1k ) = 1|E](1 − Pr[E]) + Pr[W P 1 (1k ) = 1|E] Pr[E]
− Pr[W P 0 (1k ) = 1]
= Pr[E] Pr[W P 1 (1k ) = 1|E] − Pr[W P 1 (1k ) = 1|E]
101
because if r never repeats and W never queries H1 (r) or H2 (r, ∗, ∗) for some r used
by CSA EncodeF,G,H , then W cannot distinguish between the ciphertexts passed to
Basic Encode and random bit strings.
• enc queries are answered as follows: on query j 6= i, respond using the program
for CSA EncodeF,G,H with calls to UEncodeG replaced by calls to Basic Encode.
On the i-th query respond with s = Basic Encode(πB (x)||e1 ||τ1 , h) where e1 =
h1 ⊕ (m, σ1 ) and h1 , σ1 , τ1 are chosen uniformly at random from the set of all
strings of the appropriate length (|e1 | = |m| + k and |τ1 | = k), and set φ =
φ ∪ {(s, h)}.
1
It should be clear that Pr[A(πB (x)) = x] ≥ qe
(Pr[Eq ]).
Proof. Assume WLOG that Pr[W P 2 (1k ) = 1] > Pr[W P 1 (1k ) = 1]. Denote by Er the
event that, when answering queries for W , the random value r of CSA EncodeF,G,H
never repeats, and by Eq the event that W never queries G(∗, r, πB (r)||∗, ∗) for some
102
r used by CSA EncodeF,G,H , and let E ≡ Er ∧ Eq . Then:
AdvP1,P2
W (k) = Pr[W P 2 (1k ) = 1] − Pr[W P 1 (1k ) = 1]
= Pr[W P 2 (1k ) = 1|E] Pr[E] + Pr[W P 2 (1k ) = 1|E] Pr[E]
≤ Pr[E]
qe (qe − 1)
≤2−k + Pr[Eq ]
2
Given W ∈ W(t, qe , qd , qF , qG , qH1 , qH2 , l) we construct a one-way permutation adver-
sary A against πB which is given a value πB (x) and uses W in an attempt to find
x. A picks (πA , πA−1 ) from Πk and i uniformly from {1, . . . , qE }, and then runs W
answering all its oracle queries as follows:
1
It should be clear that Pr[A(πB (x)) = x] ≥ qe
(Pr[Eq ]).
103
Proof. Given W ∈ W(t, qe , qd , qF , qG , qH1 , qH2 , l) we construct a one-way permutation
adversary A against πA which is given a value πA (x) and uses W in an attempt to
find x. A chooses (πB , πB−1 ) from Πk and i uniformly from {1, . . . , qF }, and then runs
W answering all its oracle queries as follows:
• enc queries are answered using CSA EncodeF,G,H except that σ is chosen at
random and F (r, m, h) is set to be πA (σ). If F (r, m, h) was already set, fail the
simulation.
• dec queries are answered using CSA DecodeF,G,H , with the additional constraint
that we reject any stegotext for which there hasn’t been an oracle query of the
form H2 (r, m, h) or F (r, m, h).
• Queries to G, F, H1 and H2 are answered in the standard manner (if the query
has been made before, answer with the same answer, and if the query has not
been made before, answer with a uniformly chosen string of the appropriate
length) except that the i-th query to F is answered using πA (x).
A then searches all the queries that W made to the decryption oracle for a value σ
such that πA (σ) = πA (x). This completes the description of A.
Notice that the simulation has a small chance of failure: at most qe /2k . For the
rest of the proof, we assume that the simulation doesn’t fail. Let E be the event that
W makes a decryption query that is rejected in the simulation, but would not have
been rejected by the standard CSA DecodeF,G,H . It is easy to see that Pr[E] ≤ qd /2k−1 .
Since the only way to differentiate P3 from P2 is by making a decryption query that
P3 accepts but P2 rejects, and, conditioned on E, this can only happen by inverting
πA on a some F (r, m, h), we have that:
0
AdvP2,P3
W (k) ≤ qF InSecow
Π (t , k) + qd /2
k−1
+ qe /2k
104
The theorem follows, because:
InSeccsa q , l, k) ≤ Advcsa
CSA,C (t, ~ CSA,C,Wmax (k)
≤ AdvP0,P1
W (k) + AdvP1,P2
W (k) + AdvP2,P3
W (k)
2
0 qe − qe
≤ qe InSecow
Π (t , k) + + (l + 3qe k) + AdvP1,P3
W (k)
2k+1
0 −k 2 P2,P3
≤ 2qe InSecow
Π (t , k) + 2 (qe − qe ) + (l + 3qe k) + AdvW (k)
0 −k 2
≤ (2qe + qF )InSecow
Π (t , k) + 2 (qe + 2qd ) + (l + 3qe k)
In this section, we define the notion of a nontrivial relation R and show that if a
stegosystem is substitution robust with respect to any nontrivial R then it is inse-
cure against both chosen-covertext and chosen-stegotext attacks. This result implies
that no stegosystem can be simultaneously (nontrivially) secure against disrupting and
distinguishing active adversaries.
105
is negligible. Then any steganographically secret stegosystem is trivially robust
against R, because no efficient adversary can produce a related stegotext sW 6= σ
(except with negligible probability) in the substitution attack game; and thus the de-
coding of sW will be s, except with negligible probability. Thus in order for robustness
of a stegosystem to be “interesting” we will require that this is not the case.
We say that R is non-trivial for C if it is ρ(k)-nontrivial for some ρ(k) > 1/poly(k).
Intuitively, whether this attack is against a CSA or sCCA oracle, it has a significant
advantage because when the sequence σ1 , . . . , σ` is a stegotext, then the response to
the decoding query will be m (because S is robust); but when it is a covertext, the
probability of decoding to m should be low (again because S is robust). We will now
formalize this intuition.
Theorem 5.37.
−l
Advscca ss R
S,C,WA (k) ≥ ρ(k) − InSecS,C (tA , 1, l, k) − FailS (tA , 0, 0, l, k) − 2
Advscca
S,C,WA (k) = Pr[W
SD
(SE(mW )) = 1] − Pr[W SD (Ch` ) = 1] .
106
Let us first bound Pr[W SD (Ch` ) = 1]. Recall that W SD (σ) = 1 when
SD(s1 , σ2 , . . . , σ` ) = mW .
Let SR denote the event that in the sCCA game played against stegotext, s1 6=
σ1 ∧ (σ1 , s1 ) ∈ R. Now notice that
Also, notice that we can exhibit an efficient SS-CHA adversary Wρ against S such
that
Advss
S,C,Wρ (k) ≥ ρ(k) − Pr[SR] .
whereas
Pr[Wρ (SE(m∗ )) = 1] = Pr[SR] ,
by construction. Since Wρ runs in the time it takes to run A and makes 1 encoding
query of k bits, we have that
ss
InSecss
S,C (tA , 1, l, k) ≥ AdvS,C,Wρ (k)
107
Which by rearranging of terms gives us:
≥ (1 − FailR ss
S (tA , 0, 0, l, k))(ρ(k) − InSecS,C (tA , 1, l, k))
R
≥ ρ(k) − InSecss
S,C (tA , 1, l, k) − FailS (tA , 0, 0, l, k))
Theorem 5.38.
Advcsa R ss
S,C,WA (k) ≥ (1 − FailS (tA , 0, 0, l, k))(ρ(k) − InSecS,C (tA , 1, l, k))
Advcsa STcsa k
S,C,WA (k) = Pr[WA (1 ) = 1] − Pr[WACTcsa (1k ) = 1] .
It is easy to see that Pr[WACTcsa (1k ) = 1] = 0, since querying CTcsa (enc, s, hA ) will
always result in ⊥ or ε, and never mW . The lower bound for Pr[WASTcsa (1k ) = 1] is
proven identically to the stegotext case in the previous proof.
108
Chapter 6
A trivial upper bound on the rate of a stegosystem is log |D|. Prior to our work,
there were no provably secure stegosystems, and so there was no known lower bound.
The rate of the stegosystems defined in the previous chapters is o(1), that is, as
the security parameter k goes to infinity, the rate goes to 0. In this chapter, we will
address the question of what the optimal rate is for a (universal) stegosystem. We first
formalize the definition of the rate of a universal stegosystem. We will then tighten
the trivial upper bound by giving a rate M AX such that any universal stegosystem
with rate exceeding M AX is insecure. We will then give a matching lower bound by
exhibiting a provably secure stegosystem with rate (1 − o(1))M AX. Finally we will
address the question of what rate a robust stegosystem may achieve.
109
6.1 Definitions
A universal stegosystem S accepts an oracle for the channel C and is secure against
chosen-hiddentext attack with respect to C as long as C does not violate the hardness
assumptions S is based on. Universality is important because typically there is no
good description of the marginal distributions on a channel.
• A block encoding function BE that encodes a block of input bits into a block of
l documents.
• A block decoding function BD that inverts BE, that is, that transforms a ste-
gotext block into a block of bits.
110
A (h, l, λ)-blockwise stegosystem has single-block lookahead if BE(K, c, h) draws
samples only from Chl and Ch,d
l
, where d ∈ Dl . Any stegosystem with multi-block
lookahead can be transformed into one with single-block lookahead with a larger
blocksize.
We consider the class S(h, l, t) of stegosystems which draw at most t samples from
Chl ; we will show two upper bounds on the rate RC (S) for any S ∈ S(h, l, t). The
first, M AXt (S) is in terms of the number of samples, t. The second, M AXC (S) is
in terms of the min entropy H∞ (Chl ) of the channel C. We call the combined upper
bound M AXC (h, l, t) and define it by
For any stegosystem S ∈ S(h, l, t), we will show that there exists a channel C such
that S is insecure relative to C if RC (S) − log t is any positive constant. Thus it
follows that M AXt (S) ≤ log t.
Theorem 6.1.
−c(t,k)
InSecss
S,C (O(t + k), 1, k) ≥ 1 − 2 − 2−k − ρ(k) ,
where ρ(k) = Prm←U,K,h [SD(K, SE(K, m, h), h) 6= m] and RC (S) ≥ log t + c(t, k).
111
To draw from Ch , we draw x ← Uk , compute y = ph (x) and output xky. Notice that
if S.SE(K, m, h) draws at most t samples (x1 , y1 ), . . . , (xt , yt ) ← Ch , and outputs a
pair (x0 , y 0 ) such that x0 6∈ {x1 , . . . , xt }, then Pr[y 0 = ph (x0 )] ≤ 2−k . On the other
hand, an adversary can draw t + 1 samples from Ch , and compute ph by interpolation.
Thus when SE(K, m, h) outputs a pair (x0 , y 0 ) 6∈ {(x1 , y1 ), . . . , (xt , yt )}, an adversary
can distinguish between SE(m, h) and Ch by checking whether y 0 = ph (x0 ).
Clearly, A runs in time O(t + k). We will now compute the advantage of A. First,
notice that given a covertext oracle, A will always output 0:
Pr[ACT (1k ) = 1] = 0 .
Now, let NS denote the event that SE(K, m, h) draws samples (x01 , y10 ), . . . , (x0t , yt0 ) ←
Ch and outputs a stegotext (x∗ , y ∗ ) 6∈ {(x01 , y10 ), . . . , (x0t , yt0 )}. Since in this case,
Pr[y ∗ = ph (x∗ )] ≤ 2−k , we have that
Thus we only need to give a lower bound on Pr[NS] to complete the proof.
−1
Fix a tuple (K, m, h) and consider the set SDK,h (m) = {s ∈ D : SD(K, s, h) =
m}. Since RC (S, h, k) ≥ log t + c(t, k), SD partitions D into t × 2c(t,k) such sets. Then
for any fixed set of samples (x0i , yi0 ), the probability over m that SE(K, m, h) has a
−1
sample (x0i , yi0 ) ∈ SDK,h (m) is at most t
2c(t,k) t
= 2−c(t,k) . Let E denote the event that
SE(K, m, h) outputs an s∗ such that SD(K, s∗ , h) 6= m. Then
−1
Pr[NS] ≥ Pr[∀j, (x0j , yj0 ) 6∈ SEK,h (m)] − Pr[E]
≥ 1 − 2−c(t,k) − ρ(k) ,
112
6.2.2 M AXC (S)
by the unique decoding property. Even if SE is randomized, then for any fixed
random bits r, we have
But then by an averaging argument, there must be some m∗ ∈ {0, 1}`+1 such that
Pr[SE(m∗ ) = s∗ ] < 2−(`+1) . In contrast, a covertext oracle CT will have Pr[CT (m) =
s∗ ] = 2−` , for any m ∈ {0, 1}∗ . This gap is wide enough to detect with high confidence,
given poly(2` ) chosen hiddentext samples. And since we are restricted to ` = O(log t)
by M AXt (S) this yields a polynomial-time distinguisher between a covertext oracle
and a stegotext oracle.
Proof. We define an adversary W with the stated advantage. W O executes the fol-
lowing steps:
• W takes n2 samples from Chl . Let ŝ be the most commonly occurring l-document
in the sample set and let p̂ be the number of times ŝ occurs in the sample.
113
• For each document m ∈ {0, 1}`+1 , W draws n2 samples from O(m). Let p̂m be
the number of occurrences of ŝ in the samples from O(m).
We will bound the probability that W outputs 1 given a stegotext oracle, and a
covertext oracle, respectively.
From the preceding paragraph, we know that when W has a stegotext oracle,
there exists an m∗ such that E[p̂m∗ ] ≤ 2−`+1 , and we know that E[p̂] = 2−` . So W
will only output 0 if p̂ is much smaller then expected, or if p̂m∗ is much larger than
expected. Specifically, we have:
3 3 3 3
Pr[W ST (1k ) = 0] = Pr[p̂ < 2−` ∧ pmˆ ∗ ≥ p̂] + Pr[p̂ ≥ 2−` ∧ pm ˆ∗ ≥ p̂]
4 4 4 4
3 3 3 3 −`
≤ Pr[p̂ < 2−` ] + Pr[pmˆ ∗ ≥ p̂|p̂ ≥ 2−` ] Pr[p̂ ≥ 2 ]
4 4 4 4
3 −` 3 3
≤ Pr[p̂ < 2 ] + Pr[pm ˆ ∗ ≥ p̂|p̂ ≥ 2−` ]
4 4 4
3 −` 9 −`+1
= Pr[p̂ < 2 ] + Pr[pm ˆ∗ ≥ 2 ]
4 8
≤ e−n/32 + e−n/96
Where the last line follows by multiplicative Chernoff bounds. Thus we have
We know that when W has a covertext oracle, it should be the case that for every
m ∈ {0, 1}∗ , E[pˆm ] = 2−` . Thus W should only output 1 when p̂ is much larger than
expected, or some pˆm is much smaller than its expectation. Specifically, we have that
7 3 7 3
Pr[W CT (1k ) = 1] = Pr[p̂ > 2−` ∧ ∃m.pˆm < p̂] + Pr[p̂ ≤ 2−` ∧ ∃m.pˆm < p̂]
6 4 6 4
7 −` 3 7 −`
≤ Pr[p̂ > 2 ] + Pr[∃m.pˆm < p̂|p̂ ≤ 2 ]
6 4 6
7 −` 7 −`
≤ Pr[p̂ > 2 ] + 2n Pr[pˆm < 2 ]
6 8
−n/108 −n/128
≤e + 2ne
Where the last two lines follow by the union bound and multiplicative Chernoff
bounds.
114
Combining these bounds, we have
Advss
W,S,C (k) = Pr[W
ST
(1k ) = 1] − Pr[W CT (1k ) = 1]
≥ 1 − e−n/32 + e−n/96 − e−n/108 + 2ne−n/128
We note that for any distribution D which covers {0, 1}`+1 , W can easily be
adapted to be a KHA-D attack against S; and that W can even be extended to
a KDA-U(`+1)n2 attack against S by counting the occurrences of s∗ for the various
blocks.
Suppose Alice and Bob wish to communicate steganographically at a high rate, using
a bidirectional channel – that is, Bob is allowed to respond to Alice’s messages with
messages drawn from his own channel distribution. A natural question is, “can Alice
and Bob conspire to increase the rate at which Alice may securely transmit informa-
tion to Bob?” We will show that an interactive stegosystem can increase the rate at
which information is transmitted by at most one bit. The proof is straightforward
we show that any stegosystem S for a bidirectional channel B can be converted into
a stegosystem S 0 with the same rate for a unidirectional channel C on pairs from
B.D × B.D. S 0 runs in the same time as S and draws as many samples from C as
Alice and Bob (combined) draw from B in each time step. Thus the upper bound
M AXt (S 0 ) implies that at most one bit more can be sent due to sampling, and the
M AXC (S 0 ) bound implies that minimum entropy constraints still apply.
In order to address this issue, we will first need to formalize the concept of a
bidirectional stegosystem.
115
SE(K, I, m, h, T ) returns a document from D, or the special symbol done
We denote the transcript of (K, m, h) by the result of setting s0 (A) = SE(K, m, hA , ε),
s0 (B) = SF (K, hB , ε), and
• STbcha (m, h): has uniformly chosen key K ← Uk and responds to queries (m, h)
by computing a transcript of (K, m, h).
• CTbcha (m, h): responds to queries by drawing a sequence from B of the same
length as a (K, m, h) transcript.
Theorem 6.4. Any (h, l, t)-sampling bidirectional stegosystem S for the bidirectional
channel B can be converted to a (h, l, 2t)-sampling stegosystem S 0 for a channel C such
that RC (S 0 ) = RB (S), and InSecss 0 bcha 0
S 0 ,C (t , q, µ, k) = InSecS,B (t , q, µ, k).
116
Proof. We define the channel C as a distribution on sequences of pairs from B.D×B.D.
The marginal distribution Ch is defined by (B.ChA , B.ChB ).
The stegosystem S 0 has the same keyspace as S. The new encoding algorithm
S 0 .SE(K, m, h) works by computing a transcript s0 , . . . , sl form (K, m, h) as follows:
For each i, S 0 .SE simulates S.SE by making draws from Ch,s0 ,... ,si−1 and extracting
the first element of the pair; likewise S 0 .SE simulates S.SF my making draws from
Ch,s0 ,...,si−1 and extracting the second element of each pair. S 0 .SE(K.m, h) then out-
puts this transcript as the stegotext. Notice that since S.SE and S.SF each draw at
most t samples from B.ChA , and B.ChB , respectively, S 0 .SE draws at most 2t samples
from Ch . The decoding algorithm S 0 .SD(K, s, h) exactly runs the decoding algorithm
S.SD(K, s, h).
Notice that the chosen-hiddentext oracles ST, CT for S 0 are identically distributed
to the bidirectional chosen-hiddentext oracles STbcha , CTbcha for S. Thus any warden
for S 0 is a warden for S and vice versa, with the advantage preserved. Thus S 0 is as
secure for C as S is for B, as claimed. It is also clear that the rate of S 0 is the same
as that of S, which completes the proof.
We will prove a lower bound on the achievable rate which matches the stated upper
bound, up to a multiplicative factor which converges to 1. The lower bound is proven
by giving a stegosystem which achieves this rate. We will first give one that achieves
rate (1 − )M AX for any > 0, but which has polynomial rather than negligible
error rate. We will then use error-correcting codes to improve the error rate, and
finally give a construction which does not require Alice or Bob to know the minimum
entropy of C.
We will assume for the moment that both Alice and Bob know a value l so that λ <
(1−)M AX(Chl , t). We let F : {0, 1}k ×{0, 1}∗ → {0, 1}λ be a pseudorandom function,
117
and assume Alice and Bob share a key K ∈ {0, 1}k . The following construction allows
Alice to send Bob a λ-bit message, with error probability at most λ2−λ .
Theorem 6.6.
l
Pr[SD(K, SE(K, m, h, N, l), h, N ) 6= m] ≤ e−λ +λ2λ−H∞ (Ch ) +InSecprf λ λ
F (O(λ2 ), λ2 , k)
Let C denote the event that OneBlock.Encodef (m) outputs an si with c[si ] > 1.
This happens when there is at least one j < i such that sj = si . Thus by the union
bound, we have
X
Pr[C] ≤ Pr[sj = si ] .
j<i
−H∞ (Chl )
Since for each j, Pr[sj = si ] ≤ 2 and since i < λ2λ , we get the bound
l
Pr[C] ≤ λ2λ−H∞ (Ch ) .
Let D denote the event that OneBlock.Encodef (m) outputs sλ2λ . This happens
when each of the previous λ2λ tests f (N, h, c[si ], si ) = m fails. Since each test involves
118
a distinct point of f , each of these happens independently with probability 1 − 1/2λ .
Since the events are independent, we can bound Pr[D] by
λ2λ
1
Pr[D] = 1− λ ≤ e−λ .
2
Since the only other condition under which OneBlock.Encodef (m) outputs si is if
f (N, h, 1, si ) = m, we have that
l
Pr[SDf (SE f (m)) 6= m] = Pr[C ∧ D] ≤ e−λ + λ2λ−H∞ (Ch ) .
We now describe a PRF adversary A for F . Af picks m ∈ {0, 1}λ and runs
OneBlock.Encodef (m, ε, 0, l) to get a sequence s ∈ Dl . Af then outputs 1 if f (s) 6= m.
Clearly, when A’s oracle f ← FK , we have
and when f is a randomly chosen function from {0, 1}∗ → {0, 1}l , we have shown
that
l
Pr[Af (1k ) = 1] ≤ e−λ + λ2λ−H∞ (Ch )
It follows that
Advprf
A,F (k) = Pr[A
FK k
(1 ) = 1] − Pr[Af (1k ) = 1]
l
≥ Pr[SD(K, SE(K, m, h), h) 6= m] − e−λ + λ2λ−H∞ (Ch )
Theorem 6.7.
prf 0
InSecss λ
OneBlock,C (t, q, qλ, k) ≤ InSecF (t , qλ2 , k) ,
Where t0 ≤ t + O(qλ2λ ).
119
Proof. Fix any nonce-respecting W ∈ W(t, q, qλ). We will show how to construct a
PRF adversary A for F such that
Advprf ss
A,F (k) = AdvW,OneBlock,C (k) .
Advprf
FK k
A,F (k) = Pr[A
(1 ) = 1] − Pr[Af (1k ) = 1]
= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
W,OneBlock,C (k)
Proof. Suppose that M AXC (h, l, t) = H∞ (Chl ). In this case, by choice of l and λ,
OneBlock sends λ = (1 − )H∞ (Chl ) = (1 − )M AX bits in l documents. On the other
hand, if M AXC (h, l, λ2λ ) = log(λ2λ ) = λ + log λ, then since OneBlock sends λ bits
in l documents, we have that
RC (OneBlock) λ
= ≥ (1 − ) ,
M AX λ + log λ
where the last inequality holds for sufficiently large λ.
120
6.3.2 Negligible error rate
Let K = GF (2λ ). This next construction utilizes the following well-known fact:
Theorem 6.11.
prf
InSecss λ λ
MultiBlock,C (t, q, qηλ, k) ≤ InSecF (t + O(qηλ2 ), qηλ2 , k)
Proof. We will show how to use an arbitrary W ∈ W(t, q, qηλ) against MultiBlock
to create an adversary X ∈ W(t, qη, qηλ) for OneBlock such that
Advss ss
W,MultiBlock,C (k) = AdvX,OneBlock,C (k) .
The stated bound follows from the definition of insecurity and theorem 6.7.
121
produce the result s1 , . . . , sn , which is returned to W . Now when O ← STOneBlock , it
is clear that X is perfectly simulating MultiBlock to W , so
Advss
STOneBlock k CTOneBlock k
X,OneBlock,C (k) = Pr[X (1 ) = 1] − Pr[X (1 ) = 1]
= Pr[W STMultiBlock (1k ) = 1] − Pr[W CTMultiBlock (1k ) = 1]
= Advss
W,MultiBlock,C (k)
which is negligible in n = 2λ .
Proof. As long as there are at most ρn errors, Proposition 6.9 ensures us that Correct
can recover the message m0 , . . . , mη−1 . Thus the probability of a decoding error is at
most the probability of ρn blocks having decoding error in OneBlock.Decode. But
Theorem 6.6 states that the probability of decoding error in OneBlock.Decode is
at most ρ when F is pseudorandom; applying a Chernoff bound yields the stated
result.
Proof. The rate of MultiBlock is the rate of OneBlock multiplied by the rate of the
error-correcting code used in encoding. Since this rate is (1 − 2ρ) = 1 − λ2−λ+3 ,
we have that the rate converges to 1 as λ −→ ∞, that is, the rate of the code is
(1 − o(1)).
122
6.3.3 Converging to optimal
We notice that if (k) = 1/λ the MultiBlock construction has error rate at most
λ /3
e−λ2 , and has rate (1 − o(1))M AXC (h, t, l). Thus under appropriate parameter
settings, the rate of the construction converges to the optimal rate in the limit.
Suppose Alice and Bob agree at the time of key exchange to use the MultiBlock
stegosystem with hiddentext block size λ. Since neither Alice nor Bob necessarily
know the values (α, β) such that C is (α, β)-informative, there is no way to calculate
or exchange beforehand the stegotext block size l so that λ ≤ (1 − )H∞ (Chl ).
The idea behind this construction is simple: Alice tries using MultiBlock with
block lengths l = 1, 2, . . . until she finds one such that the decoding of the encoding of
her message is correct. With high probability, if H∞ (Chln ) ≤ λn decoding will fail (the
block error rate will be at least 1 − λ1 ), and as we have seen, when H∞ (Chln ) ≥ (λ + λ1 )n
decoding fails with only negligible probability. Since C is (α, β)-informative, Alice will
123
need to try at most d αλ
β
e values of L. Alice also encodes kl bits of “check” information
with her message, so that when Bob decodes with the wrong block size, he will be
fooled with probability only 2−lk . The rate penalty for this check data is k
n+k
= o(1)
when n = ω(k). Thus for sufficiently large λ the rate of this construction will still
converge to the optimal rate for Ch .
Recall that a stegosystem is said to be substitution robust with respect to the relation
R if an adversary, by making substitutions permitted by R is unable to change the
decoding of a stegotext, except with negligible probability. Since an adversary is
allowed to make changes to stegotexts, increasing the rate of a robust stegosystem is
a more challenging task. Here we will show that if a stegosystem is robust against
any δ-admissible relation R (given access to R), then it can encode at most log 1/δ
bits per document. We will also demonstrate an efficient, robust stegosystem which
encodes (1 − − o(1)) log 1/δ bits per document, for any constant > 0, showing that
this upper bound is tight,
Theorem 6.15. Let S be a universal stegosystem. For every 0 < δ < 1, there exist
a channel C and relation R such that
√
−c
FailR
S,C (t, 0, 0, (1 + )`, k) ≥ 1 − 2
`
,
Proof. We let C be the uniform distribution on n bit strings, and R(x, y) = 1 iff the
hamming distance of x and y is at most d, where d and n are constants chosen to
124
make I(R, C) ≤ δ. We will give an attacker W which achieves the stated success
probability. For notational convenience, we define l = −` log δ.
0 ∗
SuccR
W,S,C (k) = Pr[SD(K, s ) 6= m ] ,
where this probability is taken over K, m∗ , s∗ , and s0 . Notice that the adversary W is
identical to a noisy discrete memoryless channel, with p(s0 |s∗ ) defined as the uniform
distribution on {s ∈ {0, 1}n : |s − s∗ | ≤ d}. This channel has Shannon capacity
exactly − log I(R, C) = − log δ. Furthermore, any robust stegosystem is a candidate
code for the channel. The strong converse to Shannon’s coding theorem [62] tells us
that any code with rate (1 + ) log 1/δ will have average error probability at least
√
1 − 2−c ` , where c = 2−4n+2 log(1/δ) (which is a constant depending on δ).
Since the event that the adversary W is successful is identical to the event that a
decoding error occurs in the code induced by SE(K, ·), SD(K, ·), we have that
√
−c
SuccR
W,S,C (k) ≥ 1 − 2
`
,
125
An inefficient construction
We give a stegosystem with stegotext block size ` and hiddentext block size l =
(1−)` log 1δ . Suppose that the channel distribution C is efficiently sampleable. (Recall
that C is efficiently sampleable if there is an efficient algorithm C such that, given a
uniformly chosen string s ∈ {0, 1}k , a security parameter 1k and history h, C(h, 1k , s)
is indistinguishable from Ch ). We will assume that Alice, Bob, and Ward all have
access to this algorithm. Furthermore, we assume Alice and Bob share a key K to a
pseudorandom function family F : {0, 1}k ×{0, 1}∗ → {0, 1}k ; and have a synchronized
counter N .
The idea behind this construction is this: suppose that instead of sharing a key to
a pseudorandom function F , Alice and Bob shared (1/δ)l secret documents dm drawn
independently from Ch . Then Alice could send Bob the message bit m by sending
document dm , and Bob could recover m by checking, for each m in turn, to see if the
document he received was related (by R0 ) to dm . Since the adversary is R bounded
and (D, R0 ) is δ-admissible, the probability of a decoding error — caused either by
the adversary, or by accidental draw of the dm — would be at most 2l δ ` = δ ` .
Proof. Let W be a passive warden which runs in time t, and makes at most q queries
of total length at most ql (each query can be only l bits, because of the input type).
126
We construct a PRF adversary A which runs in time t + O(q`) and makes at most q`
queries to F , such that
Advprf ss
A,F (k) = AdvW,RLBit,C (k) .
The PRF adversary takes a function oracle f , and emulates W (1k ), responding to the
queries W makes to the encoder SE by using f in place of FK (·, ·). More formally,
we define the subroutine SSE f : {0, 1}∗ × {0, 1}∗ → {0, 1}∗ as follows:
Procedure SSE f :
Input: m ∈ {0, 1}l , history h
for i = 1 . . . ` do
set σi = C(1k , h, f (N, m))
increment N
set h = (h, σi )
Output: σ1 , . . . , σ`
f
Then we define Af (1k ) = W SSE (1k ); A’s advantage over F is then:
Advprf
FK k
F,A (k) = Pr[A
(1 ) = 1] − Pr[Af (1k ) = 1]
= Pr[W ST (1k ) = 1] − Pr[Af (1k ) = 1]
= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
RLBit,C,W (k) .
• f is chosen from FK (·, ·). Then the output of SSE f is distributed identically
to the encoding function of RLBit. That is,
127
prf
Lemma 6.18. FailR l l `
RLBit (t, q, ql, l, k) ≤ InSecF (t + O(q`) + 2 `, q` + 2 `, k) + δ .
Proof. Let W be an active R-bounded (t, q, ql, l) warden. We construct a PRF ad-
versary A which runs in time t + O(q`), makes at most q` PRF queries, and satisfies
Advprf R ` f
F,A (k) ≥ SuccW,RLBit,C (k) − δ . A works by emulating W , using its function
• f is chosen uniformly from all appropriate functions. Then, for each i, the stego-
texts σi = C(1k , hi , f (N +i, mW )) are distributed independently according to Chi .
Consider the sequence of “alternative stegotexts” di [m0 ] = C(1k , h0i , f (N +i, m0 ))
for each m0 6= mW ∈ {0, 1}l ; each of these is also distributed independently ac-
cording to Chi [m0 ] ; and since W is never given access to the di [m0 ], the si are
independent of the di [m0 ]. Now SD will fail (causing Af (1k ) to output 1) only if
the event ∃m0 .∀i.(di [m0 ], si ) ∈ R0 occurs. Because the di [m0 ] are independent of
the actions of W , and because (C, R0 ) is δ-admissible, each event (di [m0 ], si ) ∈ R0
happens independently with probability at most δ. So for each m0 , the proba-
bility of failure is at most δ ` , and thus by a union bound, we have that
X
Pr[Af (1k ) = 1] ≤ δ ` = δ ` .
m0 ∈{0,1}l
Advprf
F,A (k) = Pr[A
FK k
(1 ) = 1] − Pr[Af (1k ) = 1]
= SuccR f k
RLBit,W (k) − Pr[A (1 ) = 1]
≥ SuccR `
RLBit,W (l) − δ .
128
Improving the run-time
Notice that because the running time of the decoding procedure for RLBit is expo-
nential in `, the proof of robustness is not very strong: the information-theoretic
bound on the success of W is essentially polynomial in the running time of the PRF
adversary we construct from W . Still, if we set ` = poly(log k), and assume subexpo-
nential hardness for F , we obtain a negligible bound on the success probability, but
a quasi-polynomial time decoding routine. We will now give a construction with a
polynomial-time decoding algorithm, at the expense of a o(1) factor in the rate.
The idea behind this procedure is to break a message up into l-bit blocks, and
encode each one using the RLBit.Encode procedure; and then append the encoding
129
of L documents of message-dependent redundancy. To decode, we iteratively attempt
to match each stegotext block against each of the 2l = k possible hiddentext blocks;
there will always be one matching block, and with some small probability k − there
will be an additional match. We perform a depth-first search on these matches to
find a list of candidate messages, and then test each message to see whether the
redundant encoding matches. Any candidate match from the depth-first search will
also have matching redundancy with probability 2−k , and a union bound will thus
bound the probability of a decoding failure by (1 + 1 )2−k . Furthermore, the total
expected number of nodes explored by Decode is at most (1 + 1 )n; thus our reduction
will be efficient.
Proof. Let W be a passive warden which runs in time t, and makes at most q queries
of total length at most lµ (each query must be a multiple of l bits, because of the
input type). We construct a PRF adversary A which runs in time t + O(µ`) and
makes at most µ` queries to F , such that
Advprf ss
A,F (k) = AdvW,RMBit,C (k) .
The PRF adversary takes a function oracle f , and emulates W (1k ), responding to
the queries W makes to its oracle O by running RMBit.Encode, using f in place of
FK (·, ·). Consider the following cases for f :
• f is chosen from FK (·, ·). Then the responses to W ’s queries are distributed
identically to the encoding function of RMBit. That is,
130
A’s advantage over F is then:
Advprf
FK k
F,A (k) = Pr[A
(1 ) = 1] − Pr[Af (1k ) = 1]
= Pr[W ST (1k ) = 1] − Pr[Af (1k ) = 1]
= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
W,RMBit,C (k) .
prf 0 −k
FailR
RMBit (t, q, lµ, ln, k) ≤ InSecF (t , 2n(1 + 1/) + l(µ + n), k) + (1 + 1/)2 + (e/4)n ,
Proof. Let W be an active R-bounded (t, q, lµ, ln) warden. We construct a PRF
adversary A which runs in time t0 , makes at most 2n(1 + 1/) + l(µ + n) PRF queries,
and satisfies Advprf R
A,F (k) ≥ SuccW,RMBit,C (k) − (1 + 1/)2
−k
− (e/4)n . Af works by
emulating W , using its function oracle f in place of FK (·, ·) to emulate RMBit.Encode
in responding to the queries of W . Let m∗ , s∗ be the hiddentext and the stegotext
sequence returned by W , respectively. Then Af returns 1 iff SDf (s∗ , h∗ ) 6= m∗ . To
ensure that the number of queries and running time are at most t0 , and 2n(1 + 1/) +
l(µ + n), we halt whenever SDf queries makes more than 2n(1 + 1/) to f , an event
we will denote by TB. We will show that Pr[TB] ≤ (e/4)n when f is a randomly
chosen function. Thus we can neglect this case in our analyses of the cases for f .
131
Since there are 2l(n−j) messages matching m∗ in the first j blocks, we have that
Advprf
F,A (k) = Pr[A
FK k
(1 ) = 1] − Pr[Af (1k ) = 1]
= SuccR f k
RMBit,W (k) − Pr[A (1 ) = 1]
−k
≥ SuccR
RMBit,W (l) − (1 + 1/)2 − Pr[TB] .
It remains to show that Pr[TB] ≤ (e/4)n . Notice that the expected number of
queries to f by A is just the number of messages that match a j`-document prefix of
s∗ , for 1 ≤ j ≤ n, times k. Let Xm = 1 if m ∈ {0, 1}j` matches a j-block prefix of s∗ .
Let X = nj=1 m∈{0,1}j ` Xm denote the number of matching prefix messages. Then
P P
132
Theorem 6.22. RC (RMBit) = (1 − ) log(1/δ) − o(1)
(1 − ) log(1/δ)`n O(k)
= (1 − ) log(1/δ) −
`n + k/ log(1/δ) `n + O(k)
k
≥ (1 − ) log(1/δ) −
n
For any choice of n = ω(k), the second term is o(1), as claimed.
133
134
Chapter 7
Covert Computation
7.1 Introduction
Secure two-party computation allows Alice and Bob to evaluate a function of their
secret inputs so that neither learns anything other than the output of the function.
A real-world example that is often used to illustrate the applications of this primitive
is when Alice and Bob wish to determine if they are romantically interested in each
other. Secure two-party computation allows them to do so without revealing their
true feelings unless they are both attracted. By securely evaluating the AND of the
bits representing whether each is attracted to the other, both parties can learn if
there is a match without risking embarrassment: if Bob is not interested in Alice, for
instance, the protocol does not reveal whether Alice is interested in him. So goes the
example.
However, though often used to illustrate the concept, this example is not entirely
logical. The very use of two-party computation already reveals possible interest from
one party: “would you like to determine if we are both attracted to each other?”
A similar limitation occurs in a variety of other applications where the very use
of the primitive raises enough suspicion to defeat its purpose. To overcome this lim-
itation we introduce covert two-party computation, which guarantees the following
(in addition to leaking no additional knowledge about the individual inputs): (A) no
135
outside eavesdropper can determine whether the two parties are performing the com-
putation or simply communicating as they normally do; (B) before learning f (xA , xB ),
neither party can tell whether the other is running the protocol; (C) at any point prior
to or after the conclusion of the protocol, each party can only determine if the other
ran the protocol insofar as they can distinguish f (xA , xB ) from uniformly chosen
random bits. By defining a functionality g(xA , xB ) such that g(xA , xB ) = f (xA , xB )
whenever f (xA , xB ) ∈ Y and g(xA , xB ) is pseudorandom otherwise, covert two-party
computation allows the construction of protocols that return f (xA , xB ) only when it
is in a certain set of interesting values Y but for which neither party can determine
whether the other even ran the protocol whenever f (xA , xB ) ∈
/ Y . Among the many
important potential applications of covert two-party computation we mention the
following:
• Cheating in card games. Suppose two parties playing a card game want
to determine whether they should cheat. Each of them is self-interested, so
cheating should not occur unless both players can benefit from it. Using covert
two-party computation with both players’ hands as input allows them to com-
pute if they have an opportunity to benefit from cheating while guaranteeing
that: (1) neither player finds out whether the other attempted to cheat unless
136
they can both benefit from it; (2) none of the other players can determine if the
two are secretly planning to collude.
• Covert Authentication. Imagine that Alex works for the CIA and Bob works
for Mossad. Both have infiltrated a single terrorist cell. If they can discover
their “mutual interest” they could pool their efforts; thus both should be look-
ing for potential collaborators. On the other hand, suggesting something out
of the ordinary is happening to a normal member of the cell would likely be
fatal. Running a covert computation in which both parties’ inputs are their
(unforgeable) credentials and the result is 1k if they are allies and uniform bits
otherwise will allow Alex and Bob to authenticate each other such that if Bob
is NOT an ally, he will not know that Alex was even asking for authentica-
tion, and vice-versa. (Similar situations occur in, e.g., planning a coup d’etat or
constructing a zombie network)
• Cooperation between competitors. Imagine that Alice and Bob are com-
peting online retailers and both are being compromised by a sophisticated
cracker. Because of the volume of their logs, neither Alice nor Bob can draw a
reliable inference about the location of the hacker; statistical analysis indicates
about twice as many attack events are required to isolate the cracker. Thus if
Alice and Bob were to compare their logs, they could solve their problem. But
if Alice admits she is being hacked and Bob is not, he will certainly use this
137
information to take her customers; and vice-versa. Using covert computation to
perform the log analysis online can break this impasse. If Alice is concerned that
Bob might fabricate data to try and learn something from her logs, the com-
putation could be modified so that when an attacker is identified, the output is
both an attacker and a signed contract stating that Alice is due a prohibitively
large fine (for instance, $1 Billion US) if she can determine that Bob falsified
his log, and vice-versa. Similar situations occur whenever cooperation might
benefit mutually distrustful competitors.
Our protocols make use of provably secure steganography [4, 7, 34, 53] to hide the
computation in innocent-looking communications. Steganography alone, however, is
not enough. Combining steganography with two-party computation in the obvious
black-box manner (i.e., forcing all the parties participating in an ordinary two-party
protocol to communicate steganographically) yields protocols that are undetectable to
an outside observer but does not guarantee that the participants will fail to determine
if the computation took place. Depending on the output of the function, we wish to
hide that the computation took place even from the participants themselves.
138
Hiding Computation vs. Hiding inputs
Notice that covert computation is not about hiding which function Alice and Bob are
interested in computing, which could be accomplished via standard SFE techniques:
Covert Computation hides the fact that Alice and Bob are interested in computing a
function at all. This point is vital in the case of, e.g., covert authentication, where
expressing a desire to do anything out of the ordinary could result in the death of
one of the parties. In fact, we assume that the specific function to be computed (if
any) is known to all parties. This is analogous to the difference in security goals
between steganography – where the adversary is assumed to know which message, if
any, is hidden – and encryption, where the adversary is trying to decide which of two
messages are hidden.
Roadmap.
The high-level view of our presentation is as follows. First, we will define the secu-
rity properties of covert two-party computation. Then we will present two protocols.
The first protocol we present will be a modification of Yao’s “garbled circuit” two-
party protocol in which, except for the oblivious transfer, all messages generated are
indistinguishable from uniform random bits. We construct a protocol for oblivious
transfer that generates messages that are indistinguishable from uniform random bits
(under the Decisional Diffie-Hellman assumption) to yield a complete protocol for
two-party secure function evaluation that generates messages indistinguishable from
random bits. We then use steganography to transform this into a protocol that gener-
ates messages indistinguishable from “ordinary” communications. The protocol thus
constructed, however, is not secure against malicious adversaries nor is it fair (since
neither is Yao’s protocol by itself). We therefore construct another protocol, which
uses our modification of Yao’s protocol as a subroutine, that satisfies fairness and is
secure against malicious adversaries, in the Random Oracle Model. The major diffi-
culty in doing so is that the standard zero-knowledge-based techniques for converting
a protocol in the honest-but-curious model into a protocol secure against malicious
adversaries cannot be applied in our case, since they reveal that that the other party
is running the protocol.
139
Related Work.
Secure two-party computation was introduced by Yao [63]. Since then, there have
been several papers on the topic and we refer the reader to a survey by Goldreich [26]
for further references. Constructions that yield fairness for two-party computation
were introduced by Yao [64], Galil et al. [24], Brickell et al. [15], and many others
(see [51] for a more complete list of such references). The notion of covert two-party
computation, however, appears to be completely new.
Notation.
We say a function µ : N → [0, 1] is negligible if for every c > 0, for all sufficiently
large k, µ(k) < 1/k c . We denote the length (in bits) of a string or integer s by |s|
and the concatenation of string s1 and string s2 by s1 ||s2 . We let Uk denote the
uniform distribution on k bit strings. If D is a distribution with finite support X,
we define the minimum entropy of D as H∞ (D) = minx∈X {log2 (1/ PrD [x])}. The
statistical distance between two distributions C and D with joint support X is defined
P
by ∆(C, D) = (1/2) x∈X | PrD [x] − PrC [x]|. Two sequences of distributions, {Ck }k
and {Dk }k , are called computationally indistinguishable, written C ≈ D, if for any
probabilistic polynomial-time A, AdvC,D
A (k) = |Pr[A(Ck ) = 1] − Pr[A(Dk ) = 1]| is
negligible in k.
We now present a protocol for covert two-party computation that is secure against
semi-honest adversaries in the standard model (without Random Oracles) and as-
sumes that the decisional Diffie-Hellman problem is hard. The protocol is based on
Yao’s well-known function evaluation protocol [63].
140
necessary modifications to turn it into a covert computation protocol. The definition
presented in this section is only against honest-but-curious adversaries and is unfair in
that only one of the parties obtains the result. In Section 4 we will define covert two-
party computation against malicious adversaries and present a protocol that is fair:
either both parties obtain the result or neither of them does. The protocol in Section
4 uses the honest-but-curious protocol presented in this section as a subroutine.
7.2.1 Definitions
For σ ∈ {0, 1}, we denote by VΠPσ (x0 , x1 ) the view of party Pσ on input xσ when
interacting with P1−σ on input x1−σ . The view includes Pσ ’s input xσ , private random
bits, and all messages sent by P0 and P1 . We say Π securely realizes the functionality
f if Π correctly realizes f and, for any Pσ0 and x1−σ , there is a simulator Pσ00 and an
P0
xσ such that Pσ00 (f (x0 , x1 )) ≈ VΠ σ (x0 , x1 ). Notice that given f (x0 , x1 ), Pσ0 could just
use Pσ00 to simulate his interaction with P1−σ without actually running Π. Thus if Π
securely implements f , neither party learns more from the interaction than could be
learned from just f (x0 , x1 ).
141
this “protocol” by Π : Bσ .
P0 P0 P1
2. (Internal covertness): For any input x̄, VΠ,n (x̄) ≈ VΠ:B1 ,n
(x̄) and VΠ,n−1 (x̄) ≈
P1
VΠ:B0 ,n−1
(x̄).
3. (Final Covertness): For every PPT D there exists a PPT D0 and a negligible
P1 P1
V (X0 ,x1 ),VΠ:B (X0 ,x1 )
ν such that for any x1 and any distribution X0 , AdvDΠ 0
(k) ≤
f (X0 ,x1 ),Ul
AdvD0 (k) + ν(k).
In other words, until the final round, neither party can distinguish between the
case that the other is running the protocol or just drawing from B; and after the final
message, P0 still cannot tell, while P1 can only distinguish the cases if f (x0 , x1 ) and
Um are distinguishable. Note that property 2 implies property 1, since P0 could apply
the distinguisher to his view (less the random bits).
We will slightly abuse notation and say that a protocol which has messages indis-
tinguishable from random bits (even given one party’s view) is covert for the uniform
channel U.
Yao’s protocol [63] securely (not covertly) realizes any functionality f that is expressed
as a combinatorial circuit. Our description is based on [46]. The protocol is run
between two parties, the Input Owner A and the Program Owner B. The input of
A is a value x, and the input of B is a description of a function f . At the end of
the protocol, B learns f (x) (and nothing else about x), and A learns nothing about
142
f . The protocol requires two cryptographic primitives, pseudorandom functions and
oblivious transfer, which we describe here for completeness.
Pseudorandom Functions.
Let {F : {0, 1}k × {0, 1}L(k) → {0, 1}l(k) }k denote a sequence of function families.
Let A be an oracle probabilistic adversary. We define the prf-advantage of A over
F as Advprf
A,F (k) = | PrK [A
FK (·) k
(1 ) = 1] − Prg [Ag (1k ) = 1]|, where K ← Uk and g
is a uniformly chosen function from L(k) bits to l(k) bits. Then F is pseudorandom
if Advprf
A,F (k) is negligible in k for all polynomial-time A. We will write FK (·) as
Oblivious Transfer.
1-out-of-2 oblivious transfer (OT21 ) allows two parties, the sender who knows the
values m0 and m1 , and the chooser whose input is σ ∈ {0, 1}, to communicate in such
a way that at the end of the protocol the chooser learns mσ , while learning nothing
about m1−σ , and the sender learns nothing about σ. Formally, let O = (S, C) be a pair
of interactive PPT programs. We say that O is correct if Pr[OC ((m0 , m1 ), σ) = mσ ] ≥
1 − (k) for negligible . We say that O has chooser privacy if for any PPT S 0 and
any m0 , m1 , Pr[S 0 (hS 0 (m0 , m1 ), C(σ)i) = σ] − 1 ≤ (k) and O has sender privacy if
2
0
for any PPT C 0 there exists a σ and a PPT C 00 such that C 00 (mσ ) ≈ VΠC ((m0 , m1 ), σ).
We say that O securely realizes the functionality OT21 if O is correct and has chooser
and sender privacy.
Yao’s Protocol.
143
Given the garbled inputs to g, Tg does not disclose any information about the garbled
output of g for any other inputs, nor does it reveal the actual values of the input bits
or the output bit.
Assume g has two input wires (i, j) and one output wire out (gates with higher
fan in or fan out can be accommodated with straightforward modifications). The
construction of Tg uses a pseudorandom function F whose output length is k + 1.
The table Tg is as follows:
To compute f (x), B computes garbled tables Tg for each gate g, and sends the tables
to A. Then, for each circuit input wire i, A and B perform an oblivious transfer,
where A plays the role of the chooser (with σ = bi ) and B plays the role of the
sender, with m0 = Wi0 kπi (0) and m1 = Wi1 kπi (1). A computes πj (bj ) for each output
wire j of the circuit (by trickling down the garbled inputs using the garbled tables)
and sends these values to B, who applies πj−1 to learn bj . Alternatively, B can send
the values πj (for each circuit output wire j) to A, who then learns the result. Notice
that the first two columns of Tg can be implicitly represented, leaving a “table” which
is indistinguishable from uniformly chosen bits.
144
probability distribution on D satisfying H∞ (D) = `(k) where k is the security pa-
rameter. The following constructions hide and recover m uniformly-chosen bits in a
distribution indistinguishable from D when `(k) − m = ω(log k) and m = O(log k).
The result follows from the Leftover Hash Lemma ([33], Lemma 4.8). Intuitively,
it guarantees that Basic Encode(c) will be (statistically) indistinguishable from the
messages exchanged in a bidirectional channel whenever c is a uniformly chosen bit
string. (When we refer to Basic Encode with only a single argument, we implicitly
assume that an appropriate h has been chosen and is publicly accessible to all parties.)
Thus, to guarantee covertness for channel B, we will ensure that all our protocols
generate messages that are indistinguishable from uniformly chosen random bits and
then encode these messages with Basic Encode. Formally, suppose Π = (P0 , P1 )
is an arbitrary two-party protocol which securely realizes the functionality f . We
will construct a protocol ΣΠ = (S0P0 , S1P1 ) which has the property that if VΠPb (x̄) is
indistinguishable from uniformly chosen bits (that is, Π covertly realizes f for the
uniform channel), then ΣΠ covertly realizes the functionality f for channel B. We
assume that P0 , P1 have the property that, given a partial input, they return the
string ε, indicating that more bits of input are needed. Then SbPb has the following
round function (which simply uses Basic Encode and Basic Decode to encode and
decode all messages exchanged by P0 and P1 ):
145
Construction 7.4. (Transformation to a covert protocol)
Procedure SbPb :
Input: history h ∈ H, state, document s ∈ D
draw d ← BhPb
if (state.status = “receiving”) then
set state.msg = state.msgkBasic Decode(s)
set c = Pb (state.msg)
if (c 6= ε) set state.status = “sending”; set state.msg = c
if (state.status = “sending”) then
if (d 6=⊥) then
set c = first m bits of state.msg
set state.msg = state.msg without the first m bits
Pb
6=⊥)
set d = Basic Encode(Ch (c)
if state.msg = “” set state.status = “receiving”
Output: message d, state
Theorem 7.5. If Π covertly realizes the functionality f for the uniform channel, then
ΣΠ covertly realizes f for the bidirectional channel B.
Proof. Let k c be an upper bound on the number of bits in hP0 (x0 ), P1 (x1 )i. Then ΣΠ
transmits at most 2k c /m (non-empty) documents. Suppose there is a distinguisher
Sb
D for VΣSb (x̄) from VΣ:B1−b
(x̄) with significant advantage . Then D can be used to
distinguish VΠPb (x̄) from VΠ:U
Pb
1−b
(x̄), by simulating each round as in Σ to produce a
transcript T ; If the input is uniform, then ∆(T, B) ≤ (k c /m)22−(`(k)−m)/2 = ν(k),
and if the input is from Π, then T is identical to VΣSb (x̄). Thus D’s advantage in
distinguishing Π from Π : U1−b is at least − ν(k).
IMPORTANT: For the remainder of the paper we will present protocols Π that
covertly realize f for U. It is to be understood that the final protocol is meant to
be ΣΠ , and that when we state that “Π covertly realizes the functionality f ” we are
referring to ΣΠ .
146
7.2.4 Covert Oblivious Transfer
As mentioned above, we guarantee the security of our protocols by ensuring that all
the messages exchanged are indistinguishable from uniformly chosen random bits. To
this effect, we present a modification of the Naor-Pinkas [45] protocol for oblivious
transfer that ensures that all messages exchanged are indistinguishable from uniform
when the input messages m0 and m1 are uniformly chosen. Our protocol relies on the
well-known integer decisional Diffie-Hellman assumption:
Let P and Q be primes such that Q divides P − 1, let Z∗P be the multiplicative
group of integers modulo P , and let g ∈ Z∗P have order Q. Let A be an adversary
that takes as input three elements of Z∗P and outputs a single bit. Define the DDH
advantage of A over (g, P, Q) as: Advddh a b ab
A (g, P, Q) = | Pra,b,r [Ar (g , g , g , g, P, Q) =
Setup.
Let p = rq + 1 where 2k < p < 2k+1 , q is a large prime, and gcd(r, q) = 1; let g
generate Z∗p and thus γ = g r generates the unique multiplicative subgroup of order
q; let r̂ be the least integer r such that rr̂ = 1 mod q. Assume |m0 | = |m1 | < k/2.
Let H : {0, 1}2k × Zp → {0, 1}k/2 be a pairwise-independent family of hash functions.
Define the randomized mapping φ : hγi → Z∗p by φ(h) = hr̂ g βq , for a uniformly chosen
β ∈ Zr ; notice that φ(h)r = h and that for a uniformly chosen h ∈ hγi, φ(h) is a
uniformly chosen element of Z∗p . The following protocol is a simple modification of
the Naor-Pinkas 2-round oblivious transfer protocol [45]:
147
1. On input σ ∈ {0, 1}, C chooses uniform a, b ∈ Zq , sets cσ = ab mod q and
uniformly chooses c1−σ ∈ Zq . C sets x = γ a , y = γ b , z0 = γ c0 , z1 = γ c1 and sets
x0 = φ(x), y 0 = φ(y), z00 = φ(z0 ), z10 = φ(z1 ). If the most significant bits of all of
x0 , y 0 , z00 , z10 are 0, C sends the least significant k bits of each to S; otherwise C
picks new a, b, c1−σ and starts over.
Lemma 7.7. S cannot distinguish between the case that C is following the COT
protocol and the case that C is drawing from Uk ; that is,
S S
VCOT (m0 , m1 , σ) ≈ VCOT:UC
(m0 , m1 , σ).
Proof. Suppose that there exists a distinguisher D with advantage . Then there
exists a DDH adversary A with advantage at least /8 − ν(k) for a negligible ν. A
takes as input a triple (γ a , γ b , γ c ), picks a random bit σ, sets zσ = γ c and picks a
0
uniform z1−σ ∈ {0, 1}k , and computes x0 = φ(γ a ), y 0 = φ(γ b ), zσ0 = φ(zσ ); if all three
are at most 2k , then A outputs D(x0 , y 0 , z00 , z10 ), otherwise A outputs 0.
since the elements passed by A to D are uniformly chosen and D calls A with proba-
bility at least 1/8 (since each of x0 , y 0 , zσ0 are greater than 2k with probability at most
1/2). But when c = ab, then
148
since the elements passed by A to D are chosen exactly according to the distribution
on C’s output specified by COT ; and since the probability that D is invoked by A
is at least 1/8 when c 6= ab it can be at most ν(k) less when c = ab, by the Integer
DDH assumption. Thus the DDH advantage of A is at least /8 − ν(k). Since /8
must be negligible by the DDH assumption, we have that D’s advantage must also
be negligible.
Lemma 7.8. When m0 , m1 ← Uk/2 , C cannot distinguish between the case that S is
following the COT protocol and the case that S is sending uniformly chosen strings.
C C
That is, VCOT (Uk/2 , Uk/2 , σ) ≈ VCOT:US
(Uk/2 , Uk/2 , σ).
Proof. The group elements w0 , w1 are uniformly chosen by S; thus when m0 , m1 are
uniformly chosen, the message sent by S must also be uniformly distributed.
Lemma 7.9. The COT protocol securely realizes the OT21 functionality.
Proof. The protocol described by Pinkas and Naor is identical to the COT protocol,
with the exception that φ is not applied to the group elements x, y, z0 , z1 , w0 , w1 and
these elements are not rejected if they are greater than 2k . Suppose an adversarial
sender can predict σ with advantage in COT; then he can be used to predict σ
with advantage /16 − ν(k) in the Naor-Pinkas protocol, by applying the map φ
to the elements x, y, z0 , z1 and predicting a coin flip if not all are less than 2k , and
otherwise using the sender’s prediction against the message that COT would send.
Likewise, any bit a chooser can predict about (m0 , m1 ) with advantage in COT,
can be predicted with advantage /4 in the Naor-Pinkas protocol: the Chooser’s
message can be transformed into elements of hγi by taking the components to the
power r, and the resulting message of the Naor-Pinkas sender can be transformed by
sampling from w00 = φ(w0 ), w10 = φ(w1 ) and predicting a coin flip if either is greater
than 2k , but otherwise giving the prediction of the COT chooser on w00 kf0 kf0 (K0 ) ⊕
m0 kw10 kf1 kf1 (K1 ) ⊕ m1 .
149
7.2.5 Combining The Pieces
Proof. That (Alice, Bob) securely realize the functionality f follows from the security
of Yao’s protocol. Now consider the distribution of each message sent from Alice to
Bob:
• Final values: these are masked by the uniformly chosen bits that Bob chose in
garbling the output gates. To an observer, they are uniformly distributed.
Thus Bob’s view, until the last round, is in fact identically distributed when Alice
is running the protocol and when she is drawing from U. Likewise, consider the
messages sent by Bob:
• In each execution of COT: because the Wib from Yao’s protocol are uniformly
distributed, Theorem 7.10 implies that Bob’s messages are indistinguishable
from uniform strings.
• When sending the garbled circuit, the pseudorandomness of F and the uniform
choice of the Wib imply that each garbled gate, even given one garbled input
pair, is indistinguishable from a random string.
Thus Alice’s view after all rounds of the protocol is indistinguishable from her view
when Bob draws from U.
150
If Bob can distinguish between Alice running the protocol and drawing from B
after the final round, then he can also be used to distinguish between f (XA , xB ) and
Ul . The approach is straightforward: given a candidate y, use the simulator from
Yao’s protocol to generate a view of the “data layer.” If y ← f (XA , xB ), then, by
the security of Yao’s protocol, this view is indistinguishable from Bob’s view when
Alice is running the covert protocol. If y ← Ul , then the simulated view of the final
step is distributed identically to Alice drawing from U. Thus Bob’s advantage will be
preserved, up to a negligible additive term.
151
result in time T , the other party can compute the result in time at most O(T ). Our
protocol is secure in the random oracle model, under the Decisional Diffie Hellman
assumption. We show at the end of this section, however, that our protocol can be
made to satisfy a slightly weaker security condition without the use of a random
oracle. (We note that the technique used in this section has some similarities to one
that appears in [1].)
7.3.1 Definitions
Let f denote the functionality we wish to compute. We say that f is fair if for
every distinguisher Dσ distinguishing f (X0 , X1 ) from U given Xσ with advantage at
least , there is a distinguisher D1−σ with advantage at least − ν(k), for a negligible
function ν. (That is, if P0 can distinguish f (X0 , X1 ) from uniform, so can P1 .) We
say f is strongly fair if (f (X0 , X1 ), X0 ) ≈ (f (X0 , X1 ), X1 ).
• (Strong Internal Covertness): There exists a PPT E (an extractor) such that
Pσ Pσ
if PPT D(V ) distinguishes between VΠ,i (x̄) and VΠ:B1−σ ,i
(x̄) with advantage ,
E D (VΠPσ (x̄)) computes f (x̄) with probability at least /poly(k)
• (Strong Fairness): If the functionality f is fair, then for any Cσ running in time
σ
T such that Pr[Cσ (VΠ,i (x̄)) = f (x̄)] ≥ , there exists a C1−σ running in time
1−σ
O(T ) such that Pr[C1−σ (VΠ,i (x̄)) = f (x̄)] = Ω().
152
• (Final Covertness): For every PPT D there exists a PPT D0 and a negligible ν
VΠPσ (X1−σ ,xσ ),VΠ:B
Pσ
(X1−σ ,xσ )
1−σ
such that for any xσ and distribution X1−σ , AdvD (k) ≤
f (X ,x ),U
AdvD0 1−σ σ l (k) + ν(k).
Intuitively, the Internal Covertness requirement states that “Alice can’t tell if Bob is
running the protocol until she gets the answer,” while Strong Fairness requires that
“Alice can’t get the answer unless Bob can.” Combined, these requirements imply
that neither party has an advantage over the other in predicting whether the other is
running the protocol.
7.3.2 Construction
As before, we have two parties, P0 (Alice) and P1 (Bob), with inputs x0 and x1 ,
respectively, and the function Alice and Bob wish to compute is f : {0, 1}l0 ×{0, 1}l1 →
{0, 1}l , presented by the circuit Cf . The protocol proceeds in three stages: COMMIT,
COMPUTE, and REVEAL. In the COMMIT stage, Alice picks k + 2 strings, r0 , and
s0 [0], . . . , s0 [k], each k bits in length. Alice computes commitments to these values,
using a bitwise commitment scheme which is indistinguishable from random bits, and
sends the commitments to Bob. Bob does likewise (picking strings r1 , s1 [0], . . . , s1 [k]).
The next two stages involve the use of a pseudorandom generator G : {0, 1}k →
{0, 1}l which we will model as a random oracle for the security argument only: G
itself must have an efficiently computable circuit. In the COMPUTE stage, Alice and
Bob compute two serial runs (“rounds”) of the covert Yao protocol described in the
previous section. If neither party cheats, then at the conclusion of the COMPUTE
stage, Alice knows f (x0 , x1 )⊕G(r1 ) and Bob’s value s1 [0]; while Bob knows f (x0 , x1 )⊕
G(r0 ) and Alice’s value s0 [0]. The REVEAL stage consists of k rounds of two runs
each of the covert Yao protocol. At the end of each round i, if nobody cheats, Alice
learns the ith bit of Bob’s string r1 , labeled r1 [i], and also Bob’s value s1 [i], and Bob
learns r0 [i], s0 [i]. After k rounds in which neither party cheats, Alice thus knows r1
and can compute f (x0 , x1 ) by computing the exclusive-or of G(r1 ) with the value she
learned in the COMPUTE stage, and Bob can likewise compute the result.
Each circuit sent by Alice must check that Bob has obeyed the protocol; thus at
153
every round of every stage, the circuit that Alice sends to Bob takes as input the
opening of all of Bob’s commitments, and checks to see that all of the bits Alice has
learned so far are consistent with Bob’s input. The difficulty to overcome with this
approach is that the result of the check cannot be returned to Alice without giving
away that Bob is running the protocol. To solve this problem, Alice’s circuits also take
as input the last value s0 [i − 1] that Bob learned. If Alice’s circuit ever finds that the
bits she has learned are inconsistent with Bob’s input, or that Bob’s input for s0 [i − 1]
is not consistent with the actual value of s0 [i − 1], the output is a uniformly chosen
string of the appropriate length. Once this happens, all future outputs to Bob will
also be independently and uniformly chosen, because he will have the wrong value for
s0 [i], which will give him the wrong value for s0 [i+1], etc. Thus the values s0 [1, . . . , k]
serve as “state” bits that Bob maintains for Alice. The analogous statements hold
for Bob’s circuits and Alice’s inputs.
COMPUTE stage. The COMPUTE stage consists of two serial runs of the covert-
yao protocol.
1. Bob garbles the circuit compute1 shown in figure 7.1, which takes x0 , r0 ,
s0 [0], . . . ,s0 [k], and ρ0 as input and outputs G(r1 ) ⊕ f (x0 , x1 )ks1 [0] if K1 is
a commitment to X0 . If this check fails, COMPUTE1 outputs a uniformly
chosen string, which has no information about f (x0 , x1 ) or s1 [0]. Bob and Alice
perform the covert-yao protocol; Alice labels her result F0 kS0 [0].
2. Alice garbles the circuit compute0 shown in figure 7.1, which takes x1 , r1 ,
154
computeσ (x1−σ , r, s[0 . . . k], ρ) = revealiσ (x1−σ , S1−σ [i−1], r, s1−σ [0 . . . k], ρ) =
if (Kσ = CM T (x1−σ , r, s; ρ)) Let F = G(r) ⊕ f (x0 , x1 )
then set F = G(rσ ) ⊕ f (x0 , x1 ) if (Kσ = CM T (x1−σ , r, s1−σ ; ρ) and
set S = sσ [0] else draw F ← Ul , F = Fσ and
draw S ← Uk . Rσ [i − 1] = r[i − 1] and
output F kS S1−σ [i − 1] = sσ [i − 1] and
Sσ [i − 1] = s1−σ [i − 1]) then
set R = rσ [i], S = sσ [i]
else draw R ← {0, 1}, S ← Uk
output RkS
s1 [0],. . . ,s1 [k], and ρ1 as input and outputs G(r0 ) ⊕ f (x0 , x1 )ks0 [0] if K0 is a
commitment to X1 . If this check fails, compute0 outputs a uniformly chosen
string, which has no information about f (x0 , x1 ) or s0 [0]. Bob and Alice perform
the covert-yao protocol; Bob labels his result F1 kS1 [0].
REVEAL stage. The REVEAL stage consists of k rounds, each of which consists
of 2 runs of the covert-yao protocol:
1. in round i, Bob garbles the circuit reveali1 shown in figure 7.1, which takes
input x0 , S0 [i − 1], r0 , s0 [0 . . . k], ρ0 and checks that:
155
If all of these checks succeed, Bob’s circuit outputs bit i of r1 and state s1 [i];
otherwise the circuit outputs a uniformly chosen k + 1-bit string. Alice and Bob
perform covert-yao and Alice labels the result R0 [i], S0 [i].
2. Alice garbles the circuit reveali0 depicted in figure 7.1 which performs the
analogous computations to reveali1 , and performs the covert-yao protocol
with Bob. Bob labels the result R1 [i], S1 [i].
After k such rounds, if Alice and Bob have been following the protocol, we have
R1 = r0 and R0 = r1 and both parties can compute the result. The “states” s are
what allow Alice and Bob to check that all previous outputs and key bits (bits of r0
and r1 ) sent by the other party have been correct, without ever receiving the results
of the checks or revealing that the checks fail or succeed.
Theorem 7.13. Construction 7.12 is a strongly fair covert protocol realizing the
functionality f
Proof. The correctness of the protocol follows by inspection. The two-party security
follows by the security of Yao’s protocol. Now suppose that some party, wlog Alice,
cheats (by sending a circuit which computes an incorrect result) in round j. Then, the
key bit R0 [j + 1] and state S0 [j + 1] Alice computes in round j + 1 will be randomized;
and with overwhelming probability every subsequent result that Alice computes will
be useless. Assuming Alice can distinguish f (x0 , X1 ) from uniform, she can still
compute the result in at most 2k−j time by exhaustive search over the remaining key
bits. By successively guessing the round at which Alice began to cheat, Bob can
compute the result in time at most 2k−j+2 . If Alice aborts at round j, Bob again
can compute the result in time at most 2k−j+1 . If Bob cheats in round j by giving
inconsistent inputs, with high probability all of his remaining outputs are randomized;
thus cheating in this way gives him no advantage over aborting in round j − 1. Thus,
the fairness property is satisfied.
If G is a random oracle, neither Alice nor Bob can distinguish anything in their
view from uniformly chosen bits without querying G at the random string chosen by
P0
the other. So given a distinguisher D running in time p(k) for VΠ,i (x̄) with advantage
, it is simple to write an extractor which runs D, recording its queries to G, picks
156
one such query (say, q) uniformly, and outputs G(q) ⊕ F0 . Since D can only have an
advantage when it queries r1 , E will pick q = r1 with probability at least 1/p(k) and
in this case correctly outputs f (x0 , x1 ). Thus the Strong Internal Covertness property
is satisfied.
We can achieve a slightly weaker version of covertness without using random oracles.
Π is said to be a weakly fair covert protocol for the channel B if Π is externally covert,
and has the property that if f is strongly fair, then for every distinguisher Dσ for
Pσ P
VΠ,i (x̄) with significant advantage , there is a distinguisher D1−σ for VΠ,i1−σ (x̄) with
advantage Ω(). Thus in a weakly fair covert protocol, we do not guarantee that both
parties get the result, only that if at some point in the protocol, one party can tell
that the other is running the protocol with significant advantage, the same is true for
the other party.
157
158
Chapter 8
While this thesis has resolved several of the open questions pertaining to univer-
sal steganography, there are still many interesting open questions about theoretical
steganography. In this section we highlight those that seem most important.
We have shown that for a universal blockwise stegosystem with bounded sample access
to a channel, the optimal rate is bounded above by both the minimum entropy of the
channel and the logarithm of the sample bound. Three general research directions
arise from this result. First, a natural question is what happens to this bound if
we remove the universality and blockwise constraints. A second natural direction to
pursue is the question of efficiently detecting the use of a stegosystem that exceeds
the maximum secure rate. A third interesting question to explore is the relationship
between extractors and stegosystems.
159
is it secure in a blockwise model, but the rate approaches the Shannon entropy for
any efficiently sampleable channel with entropy bounded by the logarithm of the
security parameter k. Thus it is natural to wonder whether there is a reasonable
security model and a reasonable class of nonuniversally accessible stegosystems which
are provably secure under this model, yet have rate which substantially exceeds that
of the construction in Chapter 6.
We show that any blockwise stegosystem which exceeds the minimum entropy
can be detected by giving a detection algorithm which draws many samples from the
channel. It is an interesting question whether the number of samples required can be
reduced significantly for some channels. It is not hard to see that artificial channels
can be designed for which this is the case using, for instance, a trapdoor permutation
for which the warden knows the trapdoor. However, a more natural example would
be of interest.
The necessary and sufficient conditions for the existence of a public-key stegosystem
constitute an open question. Certainly for a universal stegosystem the necessary and
sufficient condition is the existence of a trapdoor predicate family with domains that
are computationally indistinguishable from a polynomially dense set: as we showed in
160
Chapter 4, such primitives are sufficient for IND$-CPA public-key encryption; while
on the other hand, the existence of a universal public-key stegosystem implies the
existence of a public-key stegosystem for the uniform channel, which is by itself a
trapdoor predicate family with domains that are computationally indistinguishable
from a set of density 1. Unlike the case with symmetric steganography, however,
we are not aware of a reduction from a stegosystem for an arbitrary channel to a
dense-domain trapdoor predicate family.
161
problem, Backes and Cachin [7] have introduced the notion of Replayable Chosen
Covertext (RCCA) security, which is identical to sCCA security, with the exception
that the adversary is forbidden to submit covertexts which decode to the challenge
hiddentext. The problem with this approach is that the replay attack seems to be a
viable attack in the real world. Thus it is an interesting question to investigate the
possibility of notions “in-between” sCCA and RCCA.
The most important open question concerning robust steganography is the mis-
match between substitution robustness and the types of attacks perpetrated against
typical proposals for robust steganography. Such attacks include strategies such as
splitting a single document into a series of smaller documents with the same mean-
ing, merging two or more documents into a single document with the same meaning,
and reordering documents in a list. Especially if there is no bound on the length of
sequences to which these operations can be applied, it seems difficult to even write a
general description of the rules such a warden must follow; and although it is reason-
ably straightforward to counteract any single attack in the previous list, composing
several of them with relation-bounded substitutions as well seems to lead to attacks
which are difficult to defend against.
162
8.4 Covert Computation
In the area of covert computation, this thesis leaves room for improvement and open
problems. For example, can (strongly) fair covert two-party computation secure
against malicious adversaries be satisfied without random oracles? It seems at least
plausible that constructions based on concrete assumptions such as the “knowledge-
of-exponent” assumption or the “generalized BBS” assumption may allow construc-
tion of such protocols, yet the obvious applications always destroy the final covertness
property. A related question is whether covert two-party computation can be based on
general cryptographic assumptions rather than the specific Decisional Diffie-Hellman
assumption used here.
Another open question is that of improving the efficiency of the protocols presented
here, either by designing protocols for specific goals or through adapting efficient
two-party protocols to provide covertness. A possible direction to pursue would be
“optimistic” fairness involving a trusted third party. In this case, though, there is the
question of how the third party could “complete” the computation without revealing
participation.
163
8.5 Other models
The results of Chapter 3 show that the ability to sample from a channel in our model is
necessary for steganographic communication using that channel. Since in many cases
we do not understand the channel well enough to sample from it, a natural question
is whether there exist models where less knowledge of the distribution is necessary;
such a model will necessarily restrict the adversary’s knowledge of the channel as well.
One intuition is that typical steganographic adversaries are not monitoring the traffic
between a specific pair of individuals in an effort to confirm suspicious behavior, but
are monitoring a high-volume stream of traffic between many points looking for the
“most suspicious” behavior; so stegosystems which could be detected by analyzing
a long sequence of communications might go undetected if only single messages are
analyzed. This type of model is tantalizing because there are unconditionally secure
cryptosystems under various assumptions about adversaries with bounded storage
[18, 50], but it remains an interesting challenge to give a satisfying formal model and
provably secure construction for this scenario.
164
Bibliography
[2] Luis von Ahn, Manuel Blum and John Langford. Telling Humans and Computers
Apart (Automatically) or How Lazy Cryptographers do AI.
[3] Luis von Ahn and Nicholas J. Hopper. Public-Key Steganography. Submitted to
crypto 2003.
[6] Ross J. Anderson and Fabien A. P. Petitcolas. Stretching the Limits of Steganog-
raphy. In: Proceedings of the first International Information Hiding Workshop.
1996.
[7] M. Backes and C. Cachin. Public-Key Steganography with Active Attacks. IACR
e-print archive report 2003/231, 2003.
[9] M. Bellare and P. Rogaway. Random Oracles are Practical. Computer and Com-
munications Security: Proceedings of ACM CCS 93, pages 62–73, 1993.
165
[12] J. Brassil, S. Low, N. F. Maxemchuk, and L. O’Gorman. Hiding Information in
Documents Images. In: Conference on Information Sciences and Systems, 1995.
[21] R. Cramer and V. Shoup. Universal Hash Proofs and a Paradigm for Adap-
tive Chosen Ciphertext Secure Public-Key Encryption. Advances in Cryptology:
EUROCRYPT 2002, Springer LNCS 2332, pages 45-64. 2002.
166
[25] O. Goldreich. Foundations of Cryptography: Basic Tools. Cambridge University
Press, 2001.
[28] O. Goldreich and L.A. Levin. A Hardcore predicate for all one-way functions.
In: Proceedings of 21st STOC, pages 25–32, 1989.
[29] O. Goldreich, S. Micali and A. Wigderson. How to Play any Mental Game.
Nineteenth Annual ACM Symposium on Theory of Computing, pages 218-229.
[31] S. Goldwasser and S. Micali. Probabilistic Encryption & how to play mental
poker keeping secret all partial information. In: Proceedings of the 14th STOC,
pages 365–377, 1982.
[32] D. Gruhl, W. Bender, and A. Lu. Echo Hiding. In: Information Hiding: First
International Workshop, pages 295–315, 1996.
[34] N. Hopper, J. Langford and L. Von Ahn. Provably Secure Steganography. Ad-
vances in Cryptology – Proceedings of CRYPTO ’02, pages 77-92, 2002.
[35] Nicholas J. Hopper, John Langford, and Luis von Ahn. Provably Secure Steganog-
raphy. CMU Tech Report CMU-CS-TR-02-149, 2002.
[36] Russell Impagliazzo and Michael Luby. One-way Functions are Essential for
Complexity Based Cryptography. In: 30th FOCS, November 1989.
[39] J. Katz and M. Yung. Complete characterization of security notions for prob-
abilistic private-key encryption. In: Proceedings of 32nd STOC, pages 245–254,
1999.
167
[40] Stefan Katzenbeisser and Fabien A. P. Petitcolas. Information hiding techniques
for steganography and digital watermarking. Artech House Books, 1999.
[41] T. Van Le. Efficient Provably Secure Public Key Steganography IACR e-print
archive report 2003/156, 2003.
[45] M. Naor and B. Pinkas. Efficient Oblivious Transfer Protocols. In: Proceedings of
the 12th Annual ACM/SIAM Symposium on Discrete Algorithms (SODA 2001),
pages 448–457. 2001.
[46] M. Naor, B. Pinkas and R. Sumner. Privacy Preserving Auctions and Mechanism
Design. In: Proceedings, 1999 ACM Conference on Electronic Commerce.
[47] M. Naor and M. Yung. Universal One-Way Hash Functions and their Crypto-
graphic Applications. 21st Symposium on Theory of Computing (STOC 89), pages
33-43. 1989.
[48] M. Naor and M. Yung. Public-key cryptosystems provably secure against chosen
ciphertext attacks. 22nd Symposium on Theory of Computing (STOC 90), pages
427-437. 1990.
[53] L. Reyzin and S. Russell. Simple Stateless Steganography. IACR e-print archive
report 2003/093, 2003.
168
[54] Phillip Rogaway, Mihir Bellare, John Black and Ted Krovetz. OCB: A Block-
Cipher Mode of Operation for Efficient Authenticated Encryption. In: Proceedings
of the Eight ACM Conference on Computer and Communications Security (CCS-
8). November 2001.
[55] J. Rompel. One-way functions are necessary and sufficient for secure signatures.
22nd Symposium on Theory of Computing (STOC 90), pages 387-394. 1990.
[58] C.E. Shannon. Communication theory of secrecy systems. In: Bell System Tech-
nical Journal, 28 (1949), pages 656-715.
[59] G.J. Simmons. The Prisoner’s Problem and the Subliminal Channel. In: Pro-
ceedings of CRYPTO ’83. 1984.
[60] L. Welch and E.R. Berlekamp. Error correction of algebraic block codes. US
Patent Number 4,663,470, December 1986.
[63] A. C. Yao. Protocols for Secure Computation. Proceedings of the 23rd IEEE
Symposium on Foundations of Computer Science, 1982, pages 160–164.
[64] A. C. Yao. How to Generate and Exchange Secrets. Proceedings of the 27th IEEE
Symposium on Foundations of Computer Science, 1986, pages 162–167.
169