Toward A Theory of Steganography: Nicholas J. Hopper

Toward a theory of Steganography
Nicholas J. Hopper
CMU-CS-04-157
July 2004
School of Computer Science

Carnegie Mellon University
Pittsburgh, PA 15213
Thesis Committee:
Manuel Blum, Chair
Avrim Blum
Michael Reiter
Steven Rudich
David Wagner, U.C. Berkeley
Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy.
Copyright
c 2004 Nicholas J. Hopper
This material is based upon work partially supported by the National Science Foundation under
Grants CCR-0122581 and CCR-0058982 (The Aladdin Center) and an NSF Graduate Fellowship;
the Army Research Office (ARO) and the Cylab center at Carnegie Mellon University; and a Siebel
Scholarship.
The views and conclusions contained in this document are those of the author and should not be
interpreted as representing the official policies, either expressed or implied, of the NSF, the U.S.
Government or any other entity.
Keywords: Steganography, Cryptography, Provable Security
Abstract
Informally, steganography refers to the practice of hiding secret mes-

sages in communications over a public channel so that an eavesdropper
(who listens to all communications) cannot even tell that a secret message
is being sent. In contrast to the active literature proposing new concrete
steganographic protocols and analysing flaws in existing protocols, there
has been very little work on formalizing steganographic notions of secu-
rity, and none giving complete, rigorous proofs of security in a satisfying
model.
My thesis initiates the study of steganography from a cryptographic
point of view. We give a precise model of a communication channel and
a rigorous definition of steganographic security, and prove that relative
to a channel oracle, secure steganography exists if and only if one-way
functions exist. We give tightly matching upper and lower bounds on the
maximum rate of any secure stegosystem. We introduce the concept of
steganographic key exchange and public-key steganography, and show that
provably secure protocols for these objectives exist under a variety of stan-
dard number-theoretic assumptions. We consider several notions of active
attacks against steganography, show how to achieve each under standard
assumptions, and consider the relationships between these notions. Fi-
nally, we extend the concept of steganograpy as covert communication to
include the more general concept of covert computation.
Acknowledgments
I profusely thank Manuel Blum for five years of constant support,

interesting discussions, and strange questions. I hope I am able to live up
to his standard of advising.
Much of this work was done in collaboration with Luis von Ahn and
John Langford. The work was “born” on our car trip back to Pittsburgh
from CCS 2001 in Philadelphia. I owe many thanks to both for their
challenging questions and simplifying explanations.
My other committee members - Avrim Blum, Steven Rudich, Michael
Reiter, and David Wagner - all made valuable comments about my thesis
proposal and earlier versions of this thesis; I’m sure that it is stronger
because of them.
And of course, I am extremely thankful to my wife Jennie for many
things, not the least of which was following me to Pittsburgh; and my
daughter Allie for being herself.
For Jennie and Allie.
Contents
1 Introduction 1
1.1 Cryptography and Provable Security . . . . . . . . . . . . . . . . . . 2
1.2 Previous work on theory of steganography . . . . . . . . . . . . . . . 4
1.3 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Roadmap of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Model and Definitions 9

2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Cryptography and Provable Security . . . . . . . . . . . . . . . . . . 10
2.2.1 Computational Indistinguishability . . . . . . . . . . . . . . . 10
2.2.2 Universal Hash Functions . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Pseudorandom Generators . . . . . . . . . . . . . . . . . . . . 15
2.2.4 Pseudorandom Functions . . . . . . . . . . . . . . . . . . . . . 16
2.2.5 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Modeling Communication - Channels . . . . . . . . . . . . . . . . . . 22
2.4 Bidirectional Channels: modeling interaction . . . . . . . . . . . . . . 25
3 Symmetric-key Steganography 27
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 A Stateful Construction . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 An Alternative Construction . . . . . . . . . . . . . . . . . . . 39
i
3.3 Necessary Conditions for Steganography . . . . . . . . . . . . . . . . 41
3.3.1 Steganography implies one-way functions . . . . . . . . . . . . 42
3.3.2 Sampleable Channels are necessary . . . . . . . . . . . . . . . 44
4 Public-Key Steganography 47
4.1 Public key cryptography . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.1 Pseudorandom Public-Key Encryption . . . . . . . . . . . . . 49
4.1.2 Efficient Probabilistic Encryption . . . . . . . . . . . . . . . . 51
4.2 Public key steganography . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.1 Public-key stegosystems . . . . . . . . . . . . . . . . . . . . . 55
4.2.2 Steganographic Secrecy against Chosen Hiddentext Attack . . 56
4.2.3 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.4 Chosen Hiddentext security . . . . . . . . . . . . . . . . . . . 58
4.3 Steganographic Key Exchange . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Security against Active Adversaries 65

5.1 Robust Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.1 Definitions for Substitution-Robust Steganography . . . . . . 66
5.1.2 Necessary conditions for robustness . . . . . . . . . . . . . . . 67
5.1.3 Universally Substitution-Robust Stegosystem . . . . . . . . . . 68
5.2 Active Distinguishing Attacks . . . . . . . . . . . . . . . . . . . . . . 74
5.2.1 Chosen-covertext attacks . . . . . . . . . . . . . . . . . . . . . 74
5.2.2 Authenticated Stegosystems . . . . . . . . . . . . . . . . . . . 92
5.3 Relationship between robustness and integrity . . . . . . . . . . . . . 105
6 Maximizing the Rate 109

6.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.2 Upper bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2.1 M AXt (S) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2.2 M AXC (S) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.2.3 Bidirectional communication does not help . . . . . . . . . . . 115
6.3 Lower bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
ii
6.3.1 With errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.2 Negligible error rate . . . . . . . . . . . . . . . . . . . . . . . 121
6.3.3 Converging to optimal . . . . . . . . . . . . . . . . . . . . . . 123
6.3.4 Unknown length . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.4 Robust Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4.1 Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4.2 Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7 Covert Computation 135

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2 Covert Two-Party Computation Against Semi-Honest Adversaries . . 140
7.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.2.2 Yao’s Protocol For Two-Party Secure Function Evaluation . . 142
7.2.3 Steganographic Encoding . . . . . . . . . . . . . . . . . . . . . 144
7.2.4 Covert Oblivious Transfer . . . . . . . . . . . . . . . . . . . . 147
7.2.5 Combining The Pieces . . . . . . . . . . . . . . . . . . . . . . 150
7.3 Fair Covert Two-party Computation Against Malicious Adversaries . 151
7.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.3.2 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8 Future Research Directions 159

8.1 High-rate steganography . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.2 Public Key Steganography . . . . . . . . . . . . . . . . . . . . . . . . 160
8.3 Active attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.4 Covert Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.5 Other models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Bibliography 165
iii
Chapter 1
Introduction
This dissertation focuses on the problem of steganography: how can two communicat-
ing entities send secret messages over a public channel so that a third party cannot
detect the presence of the secret messages? Notice how the goal of steganography
is different from classical encryption, which seeks to conceal the content of secret
messages: steganography is about hiding the very existence of the secret messages.
Steganographic “protocols” have a long and intriguing history that goes back to
antiquity. There are stories of secret messages written in invisible ink or hidden in love
letters (the first character of each sentence can be used to spell a secret, for instance).
More recently, steganography was used by prisoners, spies and soldiers during World
War II because mail was carefully inspected by both the Allied and Axis governments
at the time [38]. Postal censors crossed out anything that looked like sensitive in-
formation (e.g. long strings of digits), and they prosecuted individuals whose mail
seemed suspicious. In many cases, censors even randomly deleted innocent-looking
sentences or entire paragraphs in order to prevent secret messages from being deliv-
ered. More recently there has been a great deal of interest in digital steganography,
that is, in hiding secret messages in communications between computers.
The recent interest in digital steganography is fueled by the increased amount

of communication which is mediated by computers and by the numerous potential
commercial applications: hidden information could potentially be used to detect or
limit the unauthorized propagation of the innocent-looking “carrier” data. Because
1
of this, there have been numerous proposals for protocols to hide data in channels
containing pictures [37, 40], video [40, 43, 61], audio [32, 49], and even typeset text
[12]. Many of these protocols are extremely clever and rely heavily on domain-specific
properties of these channels. On the other hand, the literature on steganography also
contains many clever attacks which detect the use of such protocols. In addition, there
is no clear consensus in the literature about what it should mean for a stegosystem
to be secure; this ambiguity makes it unclear whether it is even possible to have a
secure protocol for steganography.
The main goal of this thesis is to rigorously investigate the open question: “under
what conditions do secure protocols for steganography exist?” We will give rigor-
ous cryptographic definitions of steganographic security in multiple settings against
several different types of adversary, and we will demonstrate necessary and sufficient
conditions for security in each setting, by exhibiting protocols which are secure under
these conditions.
1.1 Cryptography and Provable Security
The rigorous study of provably secure cryptography was initiated by Shannon [58], who
introduced an information-theoretic definition of security: a cryptosystem is secure if
an adversary who sees the ciphertext - the scrambled message sent by a cryptosystem
- receives no additional information about the plaintext - the unscrambled content.
Unfortunately, Shannon also proved that any cryptosystem which is perfectly secure
requires that if a sender wishes to transmit N bits of plaintext data, the sender and the
receiver must share at least N bits of random, secret data - the key. This limitation
means that only parties who already possess secure channels (for the exchange of
secret keys) can have secure communications.
To address these limitations, researchers introduced a theory of security against

computationally limited adversaries: a cryptosystem is computationally secure if an
adversary who sees the ciphertext cannot compute (in, e.g. polynomial time) any
additional information about the plaintext than he could without the ciphertext [31].
Potentially, a cryptosystem which could be proven secure in this way would allow two
2
parties who initially share a very small number of secret bits (in the case of public-
key cryptography, zero) to subsequently transmit an essentially unbounded number
of message bits securely.
Proving that a system is secure in the computational sense has unfortunately

proved to be an enormous challenge: doing so would resolve, in the negative, the
open question of whether P = N P . Thus the cryptographic theory community has
borrowed a tool from complexity theory: reductions. To prove a cryptosystem secure,
one starts with a computational problem which is presumed to be intractable, and
a model of how an adversary may attack a cryptosystem, and proves via reduction
that computing any additional information from a ciphertext is equivalent to solving
the computational problem. Since the computational problem is assumed to be in-
tractable, a computationally limited adversary capable of breaking the cryptosystem
would be a contradiction and thus should not exist. In general, computationally se-
cure cryptosystems have been shown to exist if and only if “one-way functions,” which
are easy to compute but computationally hard to invert, exist. Furthermore, it has
been shown that the difficulty of a wide number of well-investigated number-theoretic
problems would imply the existence of one-way functions, for example the problem
of computing the factors of a product of two large primes [13], or computing discrete
logarithms in a finite field [14].
Subsequent to these breakthrough ideas [13, 31], cryptographers have investigated

a wide variety of different ways in which an adversary may attack a cryptosystem.
For example, he may be allowed to make up a plaintext message and ask to see
its corresponding ciphertext, (called a chosen-plaintext attack), or even to make up
a ciphertext and ask to see what the corresponding plaintext is (called a chosen-
ciphertext attack [48, 52]). Or the adversary may have a different goal entirely [8,
23, 39] - for example, to modify a ciphertext so that if it previously said “Attack” it
now reads as “Retreat” and vice-versa. We will draw on this practice to consider the
security of a steganographic protocol under several different kinds of attack.
These notions will be explored in further detail in Chapter 2.
3
1.2 Previous work on theory of steganography
The scientific study of steganography in the open literature began in 1983 when
Simmons [59] stated the problem in terms of communication in a prison. In his
formulation, two inmates, Alice and Bob, are trying to hatch an escape plan. The
only way they can communicate with each other is through a public channel, which is
carefully monitored by the warden of the prison, Ward. If Ward detects any encrypted
messages or codes, he will throw both Alice and Bob into solitary confinement. The
problem of steganography is, then: how can Alice and Bob cook up an escape plan
by communicating over the public channel in such a way that Ward doesn’t suspect
anything “unusual” is going on.
Anderson and Petitcolas [6] posed many of the open problems resolved in this
thesis. In particular, they pointed out that it was unclear how to prove the security
of a steganographic protocol, and gave an example which is similar to the protocol
we present in Chapter 3. They also asked whether it would be possible to have
steganography without a secret key, which we address in Chapter 4. Finally, they
point out that while it is easy to give a loose upper bound on the rate at which
hidden bits can be embedded in innocent objects, there was no known lower bound.
Since the paper of Anderson and Petitcolas, several works [16, 44, 57, 66] have
addressed information-theoretic definitions of steganography. Cachin’s work [16, 17]
formulates the problem as that of designing an encoding function so that the rela-
tive entropy between stegotexts, which encode hidden information, and independent,
identically distributed samples from some innocent-looking covertext probability dis-
tribution, is small. He gives a construction similar to one we describe in Chapter 3 but
concludes that it is computationally intractable; and another construction which is
provably secure but relies critically on the assumption that all orderings of covertexts
are equally likely. Cachin also points out several flaws in other published information-
theoretic formulations of steganography.
All information-theoretic formulations of steganography are severely limited, how-

ever, because it is easy to show that information-theoretically secure steganography
implies information-theoretically secure encryption; thus any secure stegosystem with
4
N bits of secret key can encode at most N hidden bits. In addition, techniques such
as public-key steganography and robust steganography are information-theoretically
impossible.
1.3 Contributions of the thesis
The primary contribution of this thesis is a rigorous, cryptographic theory of steganog-

raphy. The results which establish this theory fall under several categories: symmetric-
key steganography, public-key steganography, steganography with active adversaries,
steganographic rate, and steganographic computation. Here we summarize the results
in each category.
Symmetric Key Steganography.
A symmetric key stegosystem allows two parties with a shared secret to send hidden
messages undetectably over a public channel. We give cryptographic definitions for
symmetric-key stegosystems and steganographic secrecy against a passive adversary
in terms of indistinguishability from a probabilistic channel process. By giving a
construction which provably satisfies these definitions, we show that the existence
of a one-way function is sufficient for the existence of secure steganography relative
to any channel. We also show that this condition is necessary by demonstrating a
construction of a one-way function from any secure stegosystem.
Public-Key Steganography
Informally, a public-key steganography protocol allows two parties, who have never
met or exchanged a secret, to send hidden messages over a public channel so that
an adversary cannot even detect that these hidden messages are being sent. Un-
like previous settings in which provable security has been applied to steganography,
public-key steganography is information-theoretically impossible. We introduce com-
putational security conditions for public-key steganography similar to those for the
symmetric-key setting, and give the first protocols for public-key steganography and
5
steganographic key exchange that are provably secure under standard cryptographic
assumptions.
Steganography with active adversaries
We consider the security of a stegosystem against an adversary who actively attempts

to subvert its operation by introducing new messages to the communication between
Alice and Bob. We consider two classes of such adversaries: disrupting adversaries
and distinguishing adversaries. Disrupting adversaries attempt to prevent Alice and
Bob from communicating steganographically, subject to some set of publicly-known
restrictions; we give a formal definition of robustness against such an attack and
give the first construction of a provably robust stegosystem. Distinguishing adver-
saries introduce additional traffic between Alice and Bob in hopes of tricking them
into revealing their use of steganography; we consider the security of symmetric- and
public-key stegosystems against active distinguishers and give constructions which
are secure against such adversaries. We also show that no stegosystem can be simul-
taneously secure against both disrupting and distinguishing active adversaries.
Bounds on steganographic rate
The rate of a stegosystem is defined by the (expected) ratio of hiddentext size to

stegotext size. Prior to this work there was no known lower bound on the achievable
rate (since there were no provably secure stegosystems), and only a trivial upper
bound. We give an upper-bound MAX in terms of the number of samples from a
probabilistic channel oracle and the minimum-entropy of the channel, and show that
this upper bound is tight by giving a provably secure symmetric-key stegosystem with
rate (1 − o(1))MAX. We also give an upper bound RMAX on the rate achievable by
a robust stegosystem and exhibit a construction of a robust stegosystem with rate
(1 − )RMAX for any > 0.
6
Covert Computation
We introduce the novel concept of covert two-party computation. Whereas ordinary

secure two-party computation only guarantees that no more knowledge is leaked about
the inputs of the individual parties than the result of the computation, covert two-
party computation employs steganography to yield the following additional guaran-
tees: (A) no outside eavesdropper can determine whether the two parties are per-
forming the computation or simply communicating as they normally do; (B) before
learning f (xA , xB ), neither party can tell whether the other is running the proto-
col; (C) after the protocol concludes, each party can only determine if the other ran
the protocol insofar as they can distinguish f (xA , xB ) from uniformly chosen random
bits. Covert two-party computation thus allows the construction of protocols that
return f (xA , xB ) only when it equals a certain value of interest (such as “Yes, we
are romantically interested in each other”) but for which neither party can determine
whether the other even ran the protocol whenever f (xA , xB ) does not equal the value
of interest. We introduce security definitions for covert two-party computation and
we construct protocols with provable security based on the Decisional Diffie-Hellman
assumption.
A steganographic design methodology
At a higher level, the technical contributions of this thesis suggest a powerful design
methodology for steganographic security goals. This methodology stems from the
observation that the uniform channel is universal for steganography: we give a trans-
formation from an arbitrary protocol which produces messages indistinguishable from
uniform random bits (given an adversary’s view) into a protocol which produces mes-
sages indistinguishable from an arbitrary channel distribution (given the adversary’s
view). Thus, in order to hide information from an adversary in a given channel, it is
sufficient to design a protocol which hides the information among pseudorandom bits
and apply our transformation. Examples of this methodology appear in Chapters 3,
4, 5, and 7; and the explicit transformation for a general task along with a proof of
its security is given in chapter 7, Theorem 7.5.
7
1.4 Roadmap of the thesis
Chapter 2 establishes the results and notation we will use from cryptography, and
describes our model of innocent communication. Chapter 3 discusses our results on
symmetric-key steganography and relies heavily on the material in Chapter 2. Chap-
ter 4 discusses our results on public-key steganography, and can be read independently
of chapter 3. Chapter 5 considers active attacks against stegosystems; section 5.1 de-
pends on material in Chapters 2 and 3, while the remaining sections also require some
familiarity with the material in Chapter 4. Chapter 6 discusses the rate of a stegosys-
tem, and depends on materials in Chapter 3, while the final section also requires
material from section 5.1. Finally, in Chapter 7 we extend steganography from the
concept of hidden communication to hidden computation. Chapter 7 depends only
on the material in chapter 2. Finally, in Chapter 8 we suggest directions for future
research.
8
Chapter 2
Model and Definitions
In this chapter we will introduce the notation and concepts from cryptography and
information theory that our results will use. The reader interested in a more general
treatment of the relationships between the various notions presented here is referred
to the works of Goldreich [25] and Goldwasser and Bellare [30].
2.1 Notation
We will model all parties by Probabilistic Turing Machines (PTMs). A PTM is a

standard Turing machine with an additional read-only “randomness” tape that is
initially set so that every cell is a uniformly, independently chosen bit. If A is a
PTM, we will denote by x ← A(y) the event that x is drawn from the probability
distribution defined by A’s output on input y for a uniformly chosen random tape.
We will write Ar (y) to denote the output of A with random tape fixed to r on input
y.
We will often make use of Oracle PTMs (OPTM). An OPTM is a PTM with two
additional tapes: a “query” tape and a “response” tape; and two corresponding states
Qquery , Qresponse . An OPTM runs with respect to some oracle O, and when it enters
state Qquery with value y on its query tape, it goes in one step to state Qresponse , with
x ← O(y) written to its “response” tape. If O is a probabilistic oracle, then AO (y) is
a probability distribution on outputs taken over both the random tape of A and the
9
probability distribution on O’s responses.
We denote the length of a string or sequence s by |s|. We denote the empty string
or sequence by ε. The concatenation of string s1 and string s2 will be denoted by
s1 ks2 , and when we write “Parse s as s1 kt1 s2 kt2 · · · ktl−1 sl ” we mean to separate s into
strings s1 , . . . sl where each |si | = ti and s = s1 ks2 k · · · ksl . We will assume the use of
efficient and unambiguous pairing and unpairing operators on strings, so that (s1 , s2 )
may be uniquely interpreted as the pairing of s1 with s2 , and is not the same as s1 ks2 .
One example of such an operation is to encode (s1 , s2 ) by a prefix-free encoding of
|s1 |, followed by s1 , followed by a prefix-free encoding of |s2 | and then s2 . Unpairing
then reads |s1 |, reads that many bits from the input into s1 , and repeats the process
for s2 .
We will let Uk denote the uniform distribution on {0, 1}k . If X is a finite set, we
will denote by x ← X the action of uniformly choosing x from X. We denote by
U (L, l) the uniform distribution on functions f : {0, 1}L → {0, 1}l . For a probability
distribution D, we denote the support of D by [D]. For an integer n, we let [n] denote
the set {1, 2, . . . , n}.
2.2 Cryptography and Provable Security
Modern cryptography makes use of reductions to prove the security of protocols; that
is, to show that a protocol P is secure, we show how an attacker violating the security
of P can be used to solve a problem Q which is believed to be intractable. Since
solving Q is believed to be intractable, it then follows that violating the security of P
is also intractable. In this section, we will give examples from the theory of symmetric
cryptography to illustrate this approach, and introduce the notation to be used in
the rest of the dissertation.
2.2.1 Computational Indistinguishability
Let X = {Xk }k∈N and Y = {Yk }k∈N denote two sequences of probability distributions
such that [Xk ] = [Yk ] for all k. Many cryptographic questions address the issue of
10
distinguishing between samples from X and samples from Y. For example, the dis-
tribution X could denote the possible encryptions of the message “Attack at Dawn”
while Y denotes the possible encryptions of “Retreat at Dawn;” a cryptanalyst would
like to distinguish between these distributions as accurately as possible, while a cryp-
tographer would like to show that they are hard to tell apart. To address this concept,
cryptographers have developed several notions of indistinguishability. The simplest
is the statistical distance:
Definition 2.1. (Statistical Distance) Define the statistical distance between X and
Y by
1 X
∆k (X , Y) = |Pr[Xk = x] − Pr[Yk = x]| .
2
x∈[Xk ]
If ∆(X, Y ) is small, it will be difficult to distinguish between X and Y , because

most outcomes occur with similar probability under both distributions.
On the other hand, it could be the case that ∆(X, Y ) is large but X and Y are
still difficult to distinguish by some methods. For example, if Xk is the distribution
on k-bit even-parity strings starting with 0 and Yk is the distribution on k-bit even-
parity strings starting with 1, then an algorithm which attempts to distinguish X and
Y based on the parity of its input will fail, even though ∆(X, Y ) = 1. To address
this situation, we define the advantage of a program:
Definition 2.2. (Advantage) We will denote the advantage of a program A in dis-

tinguishing X and Y by
AdvX ,Y
A (k) = | Pr[A(Xk ) = 1] − Pr[A(Yk ) = 1] | .
P
Thus in the previous example, for any program A that considers only i si mod 2,
it will be the case that AdvX ,Y
A (k) = 0.
While the class of adversaries who consider only the parity of a string is not very
interesting, we may consider more interesting classes: for example, the class of all
adversaries with running time bounded by t(k).
Definition 2.3. (Insecurity) We denote the insecurity of X, Y by

n o
InSecX ,Y (t, k) = max AdvX A
,Y
(k)
A∈T IM E(t(k))
11
and we say that Xk and Yk are (t, ) indistinguishable if InSecX ,Y (t, k) ≤ .
If we are interested in the case that t(k) is bounded by some polynomial in k, then
we say that X and Y are computationally indistinguishable, written X ≈ Y, if for
X ,Y
every A ∈ T IM E(poly(k)), there is a negligible function ν such that AdvA (k) ≤
ν(k). (A function ν : N → (0, 1) is said to be negligible if for every c > 0, for all
sufficiently large n, ν(n) < 1/nc .)
We will make use, several times, of the following (well-known) facts about statis-
tical and computational distance:
Proposition 2.4. Let ∆(X, Y ) = . Then for any probabilistic program A,
∆(A(X), A(Y )) ≤ .
Proof.
1X
∆(A(X), A(Y )) = |Pr[A(X) = x] − Pr[A(Y ) = x]|
2 x

1 X −|r| X
= 2 (Pr[Ar (X) = x] − Pr[Ar (Y ) = x])

2 x r

1 −|r| X X
≤ 2 |Pr[Ar (X) = x] − Pr[Ar (Y ) = x]|
2 r x
1 X
≤ max |Pr[Ar (X) = x] − Pr[Ar (Y ) = x]|
2 r x
1 X X
≤ max |Pr[X = y] − Pr[Y = y]|
2 r x −1
y∈Ar (x)
≤ ∆(X, Y ) .
Proposition 2.5. For any t, InSecX,Y (t, k) ≤ ∆(X, Y )
12
Proof. Let A ∈ T IM E(t) be any program with range {0, 1}. Then we have that
AdvX,Y
A (k) = | Pr[A(X) = 1] − Pr[A(Y ) = 1]|
= |(1 − Pr[A(X) = 0]) − (1 − Pr[A(Y ) = 0])|

= | Pr[A(X) = 0] − Pr[A(Y ) = 0]|
1
= (| Pr[A(X) = 0] − Pr[A(Y ) = 0]| + | Pr[A(X) = 1] − Pr[A(Y ) = 1]|)
2
= ∆(A(X), A(Y )) .
And thus, by the previous proposition, AdvX,Y

A (k) ≤ ∆(X, Y ). Since this holds for
every A, we then have that

n o
InSecX,Y (t, k) = max AdvX,Y
A (k) ≤ ∆(X, Y ) .
A∈T IM E(t)
Proposition 2.6. For any m ∈ N, InSecX m ,Y m (t, k) ≤ mInSecX,Y (t + (m − 1)T, k),

where T = max{Time to sample from X, Time to sample from Y}.
Proof. The proof uses a “hybrid” argument. Consider any A ∈ T IM E(t); we wish
m ,Y m
to bound AdvX
A (k). To do so, we define a sequence of hybrid distributions
Z0 , . . . , Zm , where Z0 = X m , Zm = Y m , and Zi = (Y i , X m−i ). We will consider the
“experiment” of using A to distinguish Zi from Zi+1 .
Notice that starting from the definition of advantage, we have:

m ,Y m
AdvX
A (k) = | Pr[A(X m ) = 1] − Pr[A(Y m ) = 1]|
= | Pr[A(Z0 ) = 1] − Pr[A(Zm ) = 1]|
= |(Pr[A(Z0 ) = 1] − Pr[A(Z1 ) = 1]) + (Pr[A(Z1 ) = 1] − Pr[A(Z2 ) = 1])
+ · · · + (Pr[A(Zm ) = 1] − Pr[A(Zm−1 ) = 1])|
Xm
≤ | Pr[A(Zi ) = 1] − Pr[A(Zi−1 ) = 1]|
i=1
m
X Z ,Zi
= AdvAi−1 (k)
i=1
Now notice that for each i, there is a program Bi which distinguishes X from Y with
the same advantage as A has in distinguishing Zi−1 from Zi : on input S, Bi draws
13
i − 1 samples from Y , m − i samples from X, and runs A with input (Y i−1 , S, X m−i ).
If S ← X, then Pr[Bi (S) = 1] = Pr[A(Zi−1 ) = 1], because the first i − 1 samples in
A’s input will be from Y , and the remaining samples will be from X. On the other
hand, if S ← Y , then Pr[Bi (S) = 1] = Pr[A(Zi ) = 1], because the first i samples in
A’s input will be from Y . So we have:
AdvX,Y
Bi (k) = | Pr[Bi (X) = 1] − Pr[Bi (Y ) = 1]|
= | Pr[A(Zi−1 ) = 1] − Pr[A(Zi ) = 1]|

Z ,Zi
= AdvAi−1 (k) .
And therefore we can bound A’s advantage in distinguishing X m ,Y m by

m
m ,Y m X
AdvX
A (k) ≤ AdvX,Y
Bi (k) .
i=1
Now since Bi takes as long as A to run (plus time at most (m − 1)T to draw the
additional samples from X,Y ), it follows that
AdvX,Y
Bi (k) ≤ InSecX,Y (t + (m − 1)T, k) ,
so we can conclude that

m ,Y m
AdvX
A (k) ≤ mInSecX,Y (t + (m − 1)T, k) .
Since the theorem holds for any A ∈ T IM E(t), we have that

n o
X m ,Y m
InSecX m ,Y m (t, k) ≤ max AdvA (k) ≤ mInSecX,Y (t + (m − 1)T, k) ,
A∈T IM E(t)
as claimed.
The style of proof we have used for this proposition, in which we attempt to state
as tightly as possible the relationship between the “security” of two related problems
without reference to asymptotic analysis, is referred to in the literature as concrete
security analysis. In this dissertation, we will give concrete security results except in
Chapter 8, in which the concrete analysis would be too cumbersome.
14
2.2.2 Universal Hash Functions
A Universal Hash Family is a family of functions H : {0, 1}l × {0, 1}m → {0, 1}n
where m ≥ n, such that for any x1 6= x2 ∈ {0, 1}m and y1 , y2 ∈ {0, 1}n ,
Pr [H(Z, x1 ) = y1 ∧ H(Z, x2 ) = y2 ] = 2−2n .

Z←Ul
Universal hash functions are easy to construct for any m, n with l = 2m, by consid-
ering functions of the form ha,b (x) = ax + b, over the field GF (2m ), with truncation
to the least significant n bits. It is easy to see that such a family is universal, because
truncation is regular, and the full-rank system ax1 + b = y1 , ax2 + b = y2 has exactly
one solution over GF (2m ), which is selected with probability 2−2m . We will make use
of universal hash functions to convert distributions with large minimum entropy into
distributions which are indistinguishable from uniform.
Definition 2.7. (Entropy) Let D be a distribution with finite support X. Define the
minimum entropy of D, H∞ (D), as

1
H∞ (D) = min log2 .
x∈X PrD [x]
Define the Shannon entropy of D, HS (D) by
h i
HS (D) = E − log2 Pr[x] .
x←D D
Lemma 2.8. (Leftover Hash Lemma, [33]) Let H : {0, 1}l × {0, 1}m → {0, 1}n be a
universal hash family, and let X : {0, 1}m satisfy H∞ (X) ≥ k. Then
∆((Z, H(Z, X)), (Z, Un )) ≤ 2−(k−n)/2+1
As a convention, we will sometimes refer to H as a family of functions and identify

elements of H by their index, e.g., when we say h ∈ H, then h(x) refers to H(h, x).
2.2.3 Pseudorandom Generators

Let G = Gk : {0, 1}k → {0, 1}l(k) k∈N denote a sequence of functions, with l(k) > k.
Then G is a pseudorandom generator (PRG) if G(Uk ) ≈ Ul(k) . More formally, define
15
the PRG-advantage of A against G by:
Advprg

(k) = Pr[A(G(Uk )) = 1] − Pr[A(Ul(k) ) = 1]
A,G
And the PRG-Insecurity of G by
InSecprg Advprg

G (t, k) = max A,G (k) .
A∈T IM E(t(k))
Then Gk is a (t, )-secure PRG if InSecprg

G (t, k) ≤ , and G is a PRG if for every
A ∈ T IM E(poly(k)), there is a negligible µ such that Advprg

A,G (k) ≤ µ(k).
Pseudorandom generators can be seen as the basic primitive on which symmetric

cryptography is built. If G is a (t, )-PRG, then G(Uk ) can be used in place of Ul(k) for
any application, and the loss in security against T IM E(t) adversaries will be at most
. It was shown by Håstad et al [33] that asymptotically, PRGs exist if and only if
one-way functions (OWFs) exist; thus when we say that the existence of a primitive is
equivalent to the existence of one-way functions, we may show it by giving reductions
to and from PRGs.
2.2.4 Pseudorandom Functions
Let F : {0, 1}k × {0, 1}L → {0, 1}l denote a family of functions. Informally, F is a
pseudorandom function family (PRF) if F and U (L, l) are indistinguishable by oracle
queries. Formally, let A be an oracle probabilistic adversary. Define the prf-advantage
of A over F as

Advprf
F (·) k f k

A,F (k) = Pr [A K
(1 ) = 1] − Pr [A (1 ) = 1] .
K←U (k) f ←U (L,l)
Define the insecurity of F as

n o
InSecprf
F (t, q, k) = max Advprf
A,F (k)
A∈A(t,q)
where A(t, q) denotes the set of adversaries taking at most t steps and making at most
q oracle queries. Then Fk is a (t, q, )-pseudorandom function if InSecprf
F (t, q, k) ≤ .
Suppose that l(k) and L(k) are polynomials. A sequence {Fk }k∈N of families Fk :
{0, 1}k × {0, 1}L(k) → {0, 1}l(k) is called pseudorandom if for all polynomially bounded
adversaries A, Advprf
A,F (k) is negligible in k. We will sometimes write Fk (K, ·) as FK (·).
We will make use of the following results relating PRFs and PRGs.
16
Proposition 2.9. Let Fk : {0, 1}k × {0, 1}L(k) → {0, 1}l(k) be a PRF. Let q = d k+1
l(k)
e.
Define Gk : {0, 1}k → {0, 1}k+1 by G(X) = FX (0)kFX (1)k · · · kFX (q − 1). Then
InSecprg prf
G (t, k) ≤ InSecF (t + q, q, k)
Proof. Consider an arbitrary PRG adversary A. We will construct a PRF adversary

B with the same advantage against F as A has against G. B has oracle access to a
function f . B makes q queries to f , constructing the string s = f (0)k · · · kf (q − 1),
and then returns the output of A on s. If f is a uniformly chosen function, the string
s is uniformly chosen; thus
Pr[B f (1k ) = 1] = Pr[A(Uk+1 ) = 1] .
If f is an element of F , then the string s is chosen exactly from G(Uk ). In this case,
we have
Pr[B FK (1k ) = 1] = Pr[A(G(Uk )) = 1] .
Combining the cases gives us
Advprf
FK k

B,F (k) = Pr[B
(1 ) = 1] − Pr[B f (1k ) = 1]
= |Pr[A(G(Uk )) = 1] − Pr[A(Uk+1 ) = 1]|
= Advprg
A,G (k)
Since B runs in the same time as A plus the time to make q oracle queries, we have
by definition of insecurity that
Advprf prf
B,F (k) ≤ InSecF (t + q, q, k) ,
and thus, for every A, we have
Advprg prf
A,G (k) ≤ InSecF (t + q, q, k) ,
which yields the stated theorem.
Intuitively, this proposition states that a pseudorandom function can be used to con-
struct a pseudorandom generator. This is because if we believe that F is pseudoran-
dom, we must believe that InSecprf
F (t, q, k) is small, and therefore that the insecurity
of the construction G, InSecprg

G (k) is also small.
17
Proposition 2.10. ([27], Theorem 3) There exists a function family F G : {0, 1}k ×
{0, 1}k → {0, 1}k such that
InSecprf
FG
(t, q, k) ≤ qkInSecprg
G (t + qkT IM E(G), k) .
2.2.5 Encryption
A symmetric cryptosystem E consists of three (randomized) algorithms:
• E.Generate : 1k → {0, 1}k generates shared keys ∈ {0, 1}k . We will abbreviate
E.Generate(1k ) by G(1k ), when it is clear which encryption scheme is meant.
• E.Encrypt : {0, 1}k × {0, 1}∗ → {0, 1}∗ uses a key to transform a plaintext into
a ciphertext. We will abbreviate E.Encrypt(K, ·) by EK (·).
• E.Decrypt : {0, 1}k × {0, 1}∗ → {0, 1}∗ uses a key to transform a ciphertext into
the corresponding plaintext. We will abbreviate E.Decrypt(K, ·) by DK (·).
Such that for all keys K, E.Decrypt(K, E.Encrypt(K, m)) = m. Informally, we will
say that a cryptosystem is secure if, after viewing encryptions of plaintexts of its
choosing, an adversary cannot distinguish ciphertexts from uniform random strings.
This is slightly different from the more standard notion in which it is assumed that
encryptions of distinct plaintexts are indistinguishable.
To formally define the security condition for a cryptosystem, consider a game in

which an adversary A is given access to an oracle O which is either:
• EK for K ← G(1k ); that is, an oracle which given a message m, returns a

sample from EK (m); or
• $(·); that is, an oracle which on query m ignores its input and returns a uniformly
selected string of length |EK (m)|.
Let A(t, q, l) be the set of adversaries A which make q(k) queries to the oracle of
at most l(k) bits and run for t(k) time steps. Define the CPA advantage of A against
E as
18
Advcpa
EK k

A,E (k) = Pr[A
(1 ) = 1] − Pr[A$ (1k ) = 1]
where the probabilities are taken over the oracle draws and the randomness of A.
Define the insecurity of E as
InSeccpa Advcpa

E (t, q, l, k) = max A,E (k) .
A∈A(t,q,l)
Then E is (t, q, l, k, )-indistinguishable from random bits under chosen plaintext attack
if InSeccpa
E (t, q, l, k) ≤ . E is called (computationally) indistinguishable from random
bits under chosen plaintext attack (IND$-CPA) if for every PPTM A, Advcpa
A,E (k) is
negligible in k.
It was shown by [33] that the existence of secure symmetric cryptosystems is

equivalent to the existence of OWFs.
Proposition 2.11. ([36], Theorem 4.3) Let E be a symmetric cryptosystem. Then

there is a generator GE such that G is a PRG if E is IND$-CPA.
Proposition 2.12. Let F : {0, 1}k × {0, 1}k → {0, 1} be a function family. Define
the cryptosystem E F as follows:
• G(1k ) ← Uk .
• EK (m1 · · · ml ) = c ← Uk kFK (c + 1) ⊕ m1 k · · · kFK (c + l) ⊕ ml .
• DK (ckx1 · · · xl ) = FK (c + 1) ⊕ x1 k · · · kFK (c + l) ⊕ xl .
Then
ql
InSeccpa
EF
(t, q, l, k) ≤ InSecprf
F (t + 2l, l, k) + .
2k−1
Proof. Let A be a chosen-plaintext attacker for E. We will construct a PRF attacker

for F which has advantage at least
ql
Advprf cpa
B,F (k) ≥ AdvA,E (k) − .
2k−1
B will run in time t + 2l and make l queries to its function oracle, so that
Advprf prf
B,F (k) ≤ InSecF (t + 2l, l, k) ,
19
which will yield the result.
B’s strategy is to play the part of the encryption oracle in A’s chosen-plaintext
attack game. Thus, B will run A, and whenever A makes an encryption query, B
will produce a response using its function oracle, which it will pass back to A. At the
conclusion of the chosen-plaintext game, A produces an output bit, which B will use
for its output. It remains to describe how B will respond to A’s encryption queries. B
will do so by executing the encryption program EK from above, but using its function
oracle in place of FK . Thus, on a query m1 · · · ml , B f will choose a c ← Uk , and give
A the response ckf (c + 1) ⊕ m1 k · · · kf (c + l) ⊕ ml .
Let us bound the advantage of B. In case B’s oracle is chosen from FK , B will
perfectly simulate an encryption oracle to A. Thus
Pr[B FK (1k ) = 1] = Pr[AEK (1k ) = 1] .
Now suppose that B’s oracle is a uniformly chosen function, and let NC denote the
event that B does not query its oracle more than once on any input, and let C denote
the complement of NC - that is, the event that B queries its oracle at least twice on
at least one input. Conditioned on NC, every bit that B returns to A is uniformly
chosen, for a uniform choice of f , subject to the condition that none of the leading
values overlap, an event we will denote by N$, and which has identical probability to
NC. In this case B perfectly simulates a random-bit oracle to A, giving us
Pr[B f (1k )|NC] = Pr[A$ (1k ) = 1|N$] .
By conditioning on NC and C, we find that
Advprf
B,F (k) = Pr[B
FK k
(1 ) = 1] − Pr[B f (1k ) = 1]
= Pr[AEK (1k ) = 1] − Pr[B f (1k ) = 1|NC] Pr[NC]
+ Pr[B f (1k ) = 1|C] Pr[C]

≥ Pr[AEk (1k ) = 1] − Pr[A$ (1k ) = 1 ∧ N$] − Pr[C]

≥ Pr[AEk (1k ) = 1] − Pr[A$ (1k ) = 1] − Pr[C]
= Advcpa
A,E (k) − Pr[C] ,
where we assume without loss of generality that Pr[AEK (1k ) = 1] ≥ Pr[A$ (1k ) = 1].
To finish the proof, we need only to bound Pr[C].
20
To bound the probability of the event C, let us further subdivide this event. During
the attack game, A will make q queries that B must answer, so that B chooses q k-bit
values c1 , . . . , cq to encrypt messages of length l1 , . . . , lq ; Let us denote by NCi the
event that after the ith encryption query made by A, B has not made any duplicate
queries to its function oracle f ; and let Ci denote the complement of NCi . We will
show that P
ili + j<i lj
Pr[Ci |NCi−1 ] ≤ ,
2k
and therefore we will have
Pr[C] = Pr[Cq ]
≤ Pr[Cq |NCq−1 ] + Pr[Cq−1 ]
q
X
≤ Pr[Ci |NCi−1 ]
i=1
q
!
1 X X
≤ k ili + lj
2 i=1 j<i
q
!
1 X
≤ k ili + ql
2 i=1
q
!
1 X
≤ k q li + ql
2 i=1
2ql
=
2k
Which establishes the desired bound, given the bound on Pr[Ci |NCi−1 ]. To establish
this conditional bound, fix any choice of the values c1 , . . . , ci−1 . The value ci will
cause a duplicate input to f if there is some cj such that cj − li ≤ ci ≤ cj + lj , which
happens with probability (li + lj )/2k , since ci is chosen uniformly. Thus by the union
bound, we have that
X
Pr[Ci |NCi−1 ] ≤ 2−k (li + lj )
j<i
and rearranging gives the stated bound:

X
Pr[Ci |NCi−1 ] ≤ 2−k (ili + lj ) .
j<i
21
2.3 Modeling Communication - Channels
We seek to define steganography in terms of indistinguishability from a “usual” or

innocent-looking pattern of communication. In order to do so, we must characterize
this pattern. We begin by supposing that Alice and Bob communicate via documents:
Definition 2.13. (Documents) Let D be an efficiently recognizable, prefix-free set

of strings, or documents.
As an example, if Alice and Bob are communicating over a computer network, they
might run the TCP protocol, in which case they communicate by sending “packets”
according to a format which specifies fields like a source and destination address,
packet length, and sequence number.
Once we have specified what kinds of strings Alice and Bob send to each other,
we also need to specify the probability that Ward will assign to each document. The
simplest notion might be to model the innocent communications between Alice and
Bob by a stationary distribution: each time Alice communicates with Bob, she makes
an independent draw from a probability distribution C and sends it to Bob. Notice
that in this model, all orderings of the messages output by Alice are equally likely.
This does not match well with our intuition about real-world communications; if we
continue the TCP analogy, we notice, for example, that in an ordered list of packets
sent from Alice to Bob, each packet should have a sequence number which is one
greater than the previous; Ward would become very suspicious if Alice sent all of the
odd-numbered packets first, and then all of the even.
Thus, we will use a notion of a channel which models a prior distribution on the
entire sequence of communication from one party to another:
Definition 2.14. A channel is a distribution on sequences s ∈ DΩ .
Any particular sequence in the support of a channel describes one possible outcome
of all communications from Alice to Bob - the list of all packets that Alice’s computer
sends to Bob’s. The process of drawing from the channel, which results in a sequence
of documents, is equivalent to a process that repeatedly draws a single “next” docu-
ment from a distribution consistent with the history of already drawn documents - for
22
example, drawing only packets which have a sequence number that is one greater than
the sequence number of the previous packet. Therefore, we can think of communica-
tion as a series of these partial draws from the channel distribution, conditioned on
what has been drawn so far. Notice that this notion of a channel is more general than
the typical setting in which every symbol is drawn independently according to some
fixed distribution: our channel explicitly models the dependence between symbols
common in typical real-world communications.
Let C be a channel. We let Ch denote the marginal channel distribution on a single

document from D conditioned on the history h of already drawn documents; we let
Chl denote the marginal distribution on sequences of l documents conditioned on h.
Concretely, for any d ∈ D, we will say that
P
s∈{(h,d)}×D ∗ PrC [s]
Pr[d] = P ,
Ch
s∈{h}×D∗ PrC [s]
and that for any d~ ∈ dl ,

P
s∈{(h,d)}×D ∗ PrC [s]
~ = P
Pr[d] .
Chl s∈{h}×D∗ PrC [s]
When we write “sample x ← Ch ” we mean that a single document should be returned

according to the distribution conditioned on h. When it is not clear from context, we
will use CA→B,h to denote the channel distribution on the communication from party
A to party B.
Informativeness
We will require that a channel satisfy a minimum entropy constraint for all histories.
Specifically, we require that there exist constants L > 0, β > 0, α > 0 such that for all
h ∈ DL , either PrC [h] = 0 or H∞ (Chβ ) ≥ α. If a channel does not satisfy this property,
then it is possible for Alice to drive the information content of her communications
to 0, so this is a reasonable requirement. We say that a channel satisfying this
condition is (L, α, β)-informative, and if a channel is (L, α, β)-informative for all L >
0, we say it is (α, β)-always informative, or simply always informative. Note that
this definition implies an additive-like property of minimum entropy for marginal
23
distributions, specifically, H∞ (Chlβ ) ≥ lα . For ease of exposition, we will assume
channels are always informative in the remainder of this dissertation; however, our
theorems easily extend to situations in which a channel is L-informative. The only
complication in this situation is that there will be a bound in terms of (L, α, β) on
the number of bits of secret message which can be hidden before the channel runs out
of information.
Intuitively, L-informativeness requires that Alice always sends at least L non-null

packets over her TCP connection to Bob, and at least one out of every β packets she
sends has some probable alternative. Thus, we are requiring that Alice always says
at least L/β “interesting things” to Bob.
Channel Access
In a multiparty setting, each ordered pair of parties (P, Q) will have their own channel
distribution CP →Q . To demonstrate that it is feasible to construct secure protocols
for steganography, we will assume that party A has oracle access to marginal channel
distributions CA→B,h for every other party B and history h. This is reasonable, because
if Alice can communicate innocently with Bob at all, she must be able to draw from
this distribution; thus we are only requiring that when using steganography, Alice
can “pretend” she is communicating innocently.
On the other hand, we will assume that the adversary, Ward, knows as much as
possible about the distribution on innocent communications. Thus he will be allowed
oracle access to marginal channel distributions CP →Q,h for every pair P, Q and every
history h. In addition, the adversary may be allowed access to an oracle which on
input (d, h, l) ∈ D∗ , returns an l-bit representation of PrCh [d].
These assumptions allow the adversary to learn as much as possible about any
channel distribution but do not require any legitimate participant to know the dis-
tribution on communications from any other participant. We will, however, assume
that each party knows (a summary of) the history of communications it has sent and
received from every other participant; thus Bob must remember some details about
the entire sequence of packets Alice sends to him.
24
Etc. . .
We will also assume that cryptographic primitives remain secure with respect to
oracles which draw from the marginal channel distributions CA→B,h . Thus channels
which can be used to solve the hard problems that standard primitives are based on
must be ruled out. In practice this is of little concern, since the existence of such
channels would have previously led to the conclusion that the primitive in question
was insecure.
Notice that the set of documents need not be literally interpreted as a set of
bitstrings to be sent over a network. In general, documents could encode any kind of
information, including things like actions – such as accessing a hard drive, or changing
1
the color of a pixel – and times – such as pausing an extra 2
second between words
of a speech. In the single-party case, our theory is general enough to deal with these
situations without any special treatment.
2.4 Bidirectional Channels: modeling interaction
Some of our protocols require an even more general definition of communications, to

account for the differences in communications caused by interaction. For example, if
Alice is a web browser and Bob is a web server, Alice’s packets will depend on the
packets she gets from Bob: if Bob sends Alice a web page with links to a picture, then
Alice will also send Bob a request for that picture; and Alice’s next request might
more likely be a page linked from the page she is currently viewing. To model this
interactive effect on communications, we will need a slightly augmented model. The
main difference is that this channel is shared among two participants and messages
sent by each participant might depend on previous messages sent by either one of
them. To emphasize this difference, we use the term bidirectional channel.
Messages are still drawn from a set D of documents. For simplicity we assume
that time proceeds in discrete timesteps. Each party P ∈ {P0 , P1 } maintains a history
hP , which represents a timestep-ordered list of all documents sent and received by P .
We call the set of well-formed histories H. We associate to each party P a family of
25

probability distributions C P = ChP h∈H on D.
The communication over a bidirectional channel B = (D, H, C P0 , C P1 ) proceeds

as follows. At each timestep, each party P receives messages sent to them in the
previous timestep, updates hP accordingly, and draws a document d ← ChPP (the
draw could result in the empty message ⊥, signifying that no action should be taken
that timestep). The document d is then sent to the other party and hP is updated.
We assume for simplicity that all messages sent at a given timestep are received at
the next one. Denote by ChPP 6=⊥ the distribution ChPP conditioned on not drawing ⊥.
We will consider families of bidirectional channels {Bk }k≥0 such that: (1) the length
of elements in Dk is polynomially-bounded in k; (2) for each h ∈ Hk and party P ,
either Pr[ChP =⊥] = 1 or Pr[ChP =⊥] ≤ 1 − δ, for constant δ; and (3) there exists a
function `(k) = ω(log k) so that for each h ∈ Hk , H∞ ((ChP )k 6=⊥) ≥ `(k) (that is,
there is some variability in the communications).
Alternatively, a bi-directional channel can be thought of as a distribution on in-

finite sequences of pairs from D0 × D0 , where D0 = D ∪ {⊥}, and the marginal
distributions are distributions on the individual documents in a pair.
We assume that party P can draw from ChP for any history h, and that the adver-
sary can draw from ChP for every party P and history h. We assume that the ability to
draw from these distributions does not contradict the cryptographic assumptions that
our results are based on. In the rest of the dissertation, all interactive communica-
tions will be assumed to conform to the bidirectional channel structure: parties only
communicate by sending documents from D to each other and parties not running a
protocol communicate according to the distributions specified by B. Parties running
a protocol strive to communicate using sequences of documents that appear to come
from B. As a convention, when B is compared to another random variable, we mean
a random variable which draws from the process B the same number of documents
as the variable we are comparing it to.
Bidirectional channels provide a model of the distribution on communications

between two parties and are general enough to express almost any form of communi-
cation between the parties.
26
Chapter 3
Symmetric-key Steganography
Symmetric-key steganography is the most basic setting for steganography: Alice and
Bob possess a shared secret key and would like to use it to exchange hidden messages
over a public channel so that Ward cannot detect the presence of these messages.
Despite the apparent simplicity of this scenario, there has been little work on giving
a precise formulation of steganographic security. Our goal is to give such a formal
description.
In Section 3.1, we give definitions dealing with the correctness and security of
symmetric-key steganography. Then we show in Section 3.2 that these notions are
feasible by giving constructions which satisfy them, under the assumption that pseu-
dorandom function families exist. Finally, in section 3.3, we explore the necessary
conditions for the existence of secure symmetric-key steganography.
3.1 Definitions
We will first define a stegosystem in terms of syntax and correctness, and then proceed
to a security definition.
Definition 3.1. (Stegosystem) A steganographic protocol S, or stegosystem, is a

pair of probabilistic algorithms:
• S.Encode (abbreviated SE) takes as input a key K ∈ {0, 1}k , a string m ∈
27
{0, 1}∗ (the hiddentext), and a message history h.
SE(K, m, h) returns a sequence of documents s1 ||s2 || . . . ||sl (the stegotext) from

the support of Chl .
• S.Decode (abbreviated SD) takes as input a key K, a sequence of documents

s1 ||s2 || . . . ||sl , and a message history h.
SD(K, s, h) returns a hiddentext m ∈ {0, 1}∗ .
3.1.1 Correctness
Of course, in order for a stegosystem to be useful, it must be correct: when using

the same key and history, decoding should recover any encoded message, most of the
time:
Definition 3.2. (Correctness) A stegosystem S is correct if for every polynomial

p(k), there exists a negligible function µ(k) such that SE and SD also satisfy the
relationship:
∀m ∈ {0, 1}p(k) , h ∈ D∗ : Pr(SD(K, SE(K, m, h), h) = m) ≥ 1 − µ(k) ,
where the randomization is over the key K and any coin tosses of SE, SD, and the
oracles accessed by SE,SD.
An equivalent approach is to require that for any single-bit message, decoding

correctly recovers an encoded bit with probability bounded away from 12 . In this case,
multiple encodings under independent keys can be combined with error-correcting
codes to make the probability of single-bit decoding failure negligible in k (we take
a similar approach in our feasibility result). If the probability of decoding failure for
a single-bit message is a negligible function µ(k), then for any polynomial p(k), a
union bound is sufficient to show that the probability of decoding failure for p(k)-bit
messages is at most p(k)µ(k), which is still negligible in k.
28
3.1.2 Security
Intuitively, what we would like to require is that no efficient warden can distinguish
between stegotexts output by SE and covertexts drawn from the channel distribution
Ch . As we stated in Section 2.3, we will assume that W knows the distribution Ch ;
we will also allow W to know the algorithms involved in S as well as the history h of
Alice’s communications to Bob. In addition, we will allow W to pick the hiddentexts
that Alice will hide, if she is in fact producing stegotexts. Thus, W ’s only uncertainty
is about the key K and the single bit denoting whether Alice’s outputs are stegotexts
or covertexts.
As with encryption schemes, we will model an attack against a stegosystem as a

game played by a passive warden, W , who is allowed to know the details of S and
the channel C.
Definition 3.3. (Chosen Hiddentext Attack) In a chosen hiddentext attack, W is

given access to a “mystery oracle” M which is chosen from one of the following
distributions:
1. ST: The oracle ST has a uniformly chosen key K ← Uk and responds to queries
(m, h) with a StegoText drawn from SE(K, m, h).
2. CT: The oracle CT has a uniformly chosen K as well, and responds to queries
(m, h) with a CoverText of length ` = |SE(K, m, h)| drawn from Ch` .
W M (1k ) outputs a bit which represents its guess about the type of M .
We define W ’s advantage against a stegosystem S for channel C by
Advss
ST k CT k

S,C,W (k) = Pr[W (1 ) = 1] − Pr[W (1 ) = 1] ,
where the probability is taken over the randomness of ST, CT, and W .
Define the insecurity of S with respect to channel C by
InSecss Advss

S,C (t, q, l, k) = max S,C,W (k) ,
W ∈W(t,q,l)
where W(t, q, l) denotes the set of all adversaries which make at most q(k) queries
totaling at most l(k) bits (of hiddentext) and running in time at most t(k).
29
Definition 3.4. (Steganographic secrecy) A Stegosystem Sk is called (t, q, l, ) stegano-
graphically secret against chosen hiddentext attack for the channel C ((t, q, l, )-SS-
CHA-C) if InSecss
S,C (t, q, l, k) ≤ .
Definition 3.5. (Universal Steganographic Secrecy) A Stegosystem S is called (t, q, l,

)-universally steganographically secret against chosen hiddentext attack ((t, q, l, )-
USS-CHA) if it is (t, q, l, )-SS-CHA-C for every always-informative channel C.
A stegosystem is called universally steganographically secret USS-CHA if for every

channel C and for every PPT W , Advss
S,C,W (k) is negligible in k.
Note that steganographic secrecy can be thought of roughly as encryption which

is indistinguishable from arbitrary distributions D.
3.2 Constructions
For our feasibility results, we have taken the approach of assuming a channel which can
be drawn from freely by the stegosystem; most current proposals for stegosystems act
on a single sample from the channel (one exception is [16]). While it may be possible
to define a stegosystem which is steganographically secret or robust and works in this
style, this is equivalent to a system in our model which merely makes a single draw on
the channel distribution. Further, we believe that the lack of reference to the channel
distribution may be one of the reasons for the failure of many such proposals in the
literature.
It is also worth noting that we assume that a stegosystem has very little knowledge
of the channel distribution — SE may only sample from an oracle according to the
distribution. This is because in many cases the full distribution of the channel has
never been characterized; for example, the oracle may be a human being, or a video
camera focused on some complex scene. However, our definitions do not rule out
encoding procedures which have more detailed knowledge of the channel distribution.
Sampling from Ch might not be trivial. In some cases the oracle for Ch might be a
human, and in others a simple randomized program. We stress that it is important to
minimize the use of such an oracle, because oracle queries can be extremely expensive.
30
In practice, this oracle is also the weakest point of all our constructions. We assume
the existence of a perfect oracle: one that can perform independent draws, one that
can be rewound, etc. This assumption can be justified in some cases, but not in
others. If the oracle is a human, the human may not be able to perform independent
draws from the channel as is required by our constructions. A real world Warden
would use this to his advantage. We therefore stress the following cautionary remark:
our protocols will be shown to be secure under the assumption that the channel oracle
is perfect.
3.2.1 A Stateful Construction
Setup: We assume Alice and Bob share a channel and let C denote the channel
distribution. We write d ← Ch to denote the action of sampling d from the marginal
distribution Ch (via oracle access). We let FK (·, ·) denote a pseudorandom function
family indexed by k = |K| key bits which maps documents to bits, i.e. F : {0, 1}k ×
{0, 1}∗ → {0, 1}. We let Alice and Bob share a secret key K ∈ {0, 1}k and also a
synchronized d bit counter N (which need not be secret). The following procedures
allow Alice and Bob to encode and decode a single bit and to send it via their shared
channel.
Construction 3.6. OneBit Stegosystem
Procedure OneBit.Encode: Procedure OneBit.Decode:

Input: K ∈ {0, 1}k , m ∈ {0, 1}, h ∈ D∗ , N Input: K ∈ {0, 1}k , s1 , . . . , s` , N
for i = 1 . . . ` do set C = 0
draw di , d0i ← Ch for i = 1 . . . l do
If FK (N, di ) = m then set C = C + FK (N, si )
set si = di increment N
else if C > `/2 then
set si = d0i set m = 1
set h = (h, si ), increment N else
Output: s1 , s2 , . . . , s` set m = 0
Output: m
The idea behind this construction is simple. The encoding algorithm makes ` copies
of the bit m. For the ith copy, the encoder attempts to find a document d such that
FK (i, d) = m, by drawing d ← Ch . If the encoder fails, it draws a second d0 ← Ch and
31
sends d0 instead of d. The decoder recovers ` bits by applying FK to each bit, and
outputs 1 if the majority of the bits are 1, and 0 otherwise. Intuitively, this works
because we expect each si to map to the bit m with probability about 34 , and so the
probability of a decoding failure should be negligible in `.
Lemma 3.7. Let s1 , . . . , s` ← SE(K, m, h) and suppose F is a uniformly chosen

function. Then for any d ∈ D,
Pr[si = d] = Pr[d] .
Chi
Proof. The event si = d happens exactly when di = d and FK (N + i, di ) = m or

when d0i = d and FK (N + i, di ) = 1 − m. Because di and d0i are drawn independently
from Chi and independently of FK , and the conditions on FK (N + i, ·) are mutually
exclusive, we get:
Pr[si = d] = Pr[(FK (N + i, di ) = m ∧ di = d) ∨ (FK (N + i, di ) = 1 − m ∧ d0i = d)]

= Pr[FK (N + i, di ) = m ∧ di = d] + Pr[FK (N + i, di ) = 1 − m ∧ d0i = d]
= Pr[FK (N + i, di ) = m] Pr[di = d] + Pr[FK (N + i, di ) = 1 − m] Pr[d0i = d]
1 1
= Pr[d] + Pr[d]
2 Chi 2 Chi
= Pr[d]
Chi
Lemma 3.8. Let s1 , . . . , s` ← SE(K, m, h), and suppose F is a uniformly chosen

function. Then for any i,
1 1
Pr[FK (N + i, si ) = m] = + Pr [d0 6= d1 ]
2 4 d0 ,d1 ←Chi
Proof. Consider the two documents di , d0i that SE draws in iteration i. It will be
the case that FK (N + i, si ) = m exactly when either FK (N + i, di ) = m, which
happens with probability 21 , or when FK (N + i, di ) = 1 − m and FK (N + i, d0i ) = m,
which happens with probability 1
4
when di 6= d0i , and with probability 0 otherwise.
The theorem applies for any i because the function FK (N + i, ·) is independent of
FK (N + j, ·) for i 6= j when FK is uniformly chosen.
32
Lemma 3.9. Suppose C is (α, β)-always informative and F is a uniformly chosen
function. Then we have
1 1
Pr[FK (N + i, si ) = m] ≥ + (1 − 2−α/β )
i 2 4β
Proof. Because C is (α, β)-informative, for any h and any sequence d1 , . . . , dβ ← Chβ ,
there must be a j between 0 and β − 1 such that H∞ (C(h,ldots,dj ) ) ≥ α/β. If this were
not the case, then we would have h such that H∞ (Chβ ) < α. Thus for a string of
length ` drawn from Ch` , there must be `/β positions i which have H∞ (Chi ) ≥ α/β. In
these positions, the collision probability is at most 2−α/β . In the other positions, the
collision probability is at most 1. Applying the previous lemma yields the result.
Theorem 3.10. If C is (α, β)-informative, then
Pr[SD(K, SE(K, m, h), h) 6= m] ≤ e−γ` + InSecprf

F (TSE `, 2`, k) ,
1
where γ = 2( 4β (1 − 2−α/β ))2 and TSE is the time required to execute the inner loop of
OneBit.Encode.
Proof. Lemma 3.9 implies that if FK is a random function, then
Pr[SD(K, SE(K, m, h), h) 6= m] ≤ e−γ` .
We describe a PRF-adversary A for F that has advantage

Pr[SD(K, SE(K, m, h), h) 6= m] − e−γ` .

A uses its function oracle f to emulate the action of SE encoding a uniformly chosen
bit m under history h, counting the number of documents with f (N + i, si ) = m. If
1
fewer than 2
of the si satisfy f (N + i, si ) = m, A outputs 1, otherwise A outputs 0.
Lemma 3.9 shows that Pr[Af (1k ) = 1] ≤ e−γ` , whereas
Pr[AFk (1k ) = 1] = Pr[SD(K, SE(K, m, h), h) 6= m] .
So by definition of advantage,
Advprf −γ`

A,F (k) ≥ Pr[SD(K, SE(K, m, h), h) 6= m] − e ,

and it follows that this quantity is at most InSecprf

F (T IM E(A), QU ERIES(A), k).
But A runs in time `TSE and makes 2` function-oracle queries, which proves the
theorem.
33
Extending to multiple-bit messages
For completeness, we now state the obvious extension of the stegosystem OneBit to
multiple-bit hiddentexts. We assume the same setup as previously.
Construction 3.11. MultiBit Stegosystem
Procedure MultiBit.Encode: Procedure MultiBit.Decode:

Input: K ∈ {0, 1}k , m ∈ {0, 1}L , h ∈ D∗ , N Input: K ∈ {0, 1}k , s1 , . . . , sL` , N
for i = 1 . . . L do for i = 1 . . . L do
draw si ← OneBit.Encode(K, mi , h, N ) set Si = s(i−1)` , . . . , si`−1
set h = (h, si ), N = N + ` set mi = OneBit.Decode(K, Si , N )
Output: s1 , s2 , . . . , s|m| set N = N + `
Output: m1 k · · · kmL
The MultiBit stegosystem works by simply repeatedly invoking OneBit on the indi-
vidual bits of the message m.
Pr[SD(K, SE(K, m, h, N ), N ) 6= m] ≤ |m|(e−γ` ) + InSecprf

F (|m|TSE `, 2|m|`, k)),
1
OneBit.Encode.
Proof. Because each si is generated using a different value of the counter N , each
execution of the inner loop of OneBit.Encode is independent when called with a
uniformly chosen function. Thus when a uniformly chosen function is used, executing
OneBit.Encode |m| times with different bits is the same as using |m| independent
keys, each with failure probability at most e−γ` ; a union bound shows that for a
random function f , Pr[SDf (SE f (m, h, N ), N ) 6= m] ≤ |m|(e−γ` ). To complete the
proof, we apply the same technique as in the proof of Theorem 3.10
We would like to make a security claim about the stegosystem MultiBit, but
because the stegosystem does not fit our syntactic definition, we need a slightly mod-
ified version of the chosen-hiddentext attack game. We will modify the definition of
the oracle distribution ST so that the oracle’s private state will include the value N ,
initialized to 0 and properly incremented between queries. With this modified game
in mind, we can state our theorem about the security of MultiBit:
34
Theorem 3.13. Let k = |K|. For any l ≤ 2d :
prf
InSecss
MultiBit,C (t, q, µ, k) ≤ InSecF (t + `µTSE , 2`µ, k)
Proof. For any warden, W , running in time t and making q queries totaling µ bits,
we construct a corresponding PRF adversary A, where
prf
Advss
MultiBit,C,W (k) = AdvF,A (k)
The running time of A is the running time of warden W plus the time to make `µ
passes through the inner loop of OneBit.Encode, or `µTSE . The number of samples
taken from C is at most 2`µ.
Af simply runs W , playing the role of the oracle M in the chosen-hiddentext attack
game. To respond to W ’s queries, A emulates the encoding procedure MultiBit.Encode
using the function oracle f in place of FK (·); A outputs the same bit as W . We con-
sider the two cases for the oracle f :
• When f is a uniformly chosen function, then by lemma 3.7, the documents

output by OneBit.Encode are distributed exactly according to C. So
Pr[Af (1k ) = 1] = Pr[W CT (1k ) = 1] .
• When f is chosen from FK (·, ·), the documents submitted to W are distributed
identically to the output of MultiBit, by the definition of the construction.
Thus
Pr[AFK (1k ) = 1] = Pr[W ST (1k ) = 1] .
So the advantage of A is:
Advprf

(k) = Pr[AFK (1k ) = 1] − Pr[Af (1k ) = 1]
F,A

= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
MultiBit,C,W (k)
And the theorem follows by the definition of insecurity.
35
Corollary 3.14. If FK (·, ·) is pseudorandom then MultiBit is universally stegano-
graphically secret against chosen-hiddentext attacks.
Proof. If F is pseudorandom then for all PPT adversaries A, Advprf

F,A (k) is negligible
in k. The definition of insecurity and Theorem 3.13 imply that for any cover channel,
C, the advantage of a warden will be negligible in k. This, in turn, implies the
corollary.
Removing the need for state
Having extended our construction to use multiple-bit messages, we can now re-
move the requirement for Alice and Bob to share a synchronized counter N . This
construction will utilize the same setup as the previous constructions, except that
Alice and Bob now share a second key κ ∈ {0, 1}k to a pseudorandom function
G : {0, 1}k × Dk → {0, 1}d/2 .
Construction 3.15. NoState Stegosystem
Procedure NoState.Encode: Procedure NoState.Decode:

Input: K, κ ∈ {0, 1}k , m ∈ {0, 1}L , h ∈ D∗ Input: K, κ ∈ {0, 1}k , S1 , S2
S1 ← Chk N = 2d/2 Gκ (S1 )
N = 2d/2 Gκ (S1 ) m = MultiBit.Decode(K, S2 , N)
S2 ← MultiBit.Encode(K, m, (h, S1 ), N ) Output: m
Output: S1 , S2
The NoState stegosystem works by choosing a long sequence from Ch (long enough
that it is unlikely to repeat in the chosen-hiddentext attack game) and uses it to derive
a value N , which is then used as the state for the MultiBit stegosystem. This value
is always a multiple of 2d/2 , so that if the value derived from the long sequence never
repeats, then any messages of length at most 2d/2 will never use a value of N used by
another message.
Pr[SD(K, SE(K, m, h)) 6= m] ≤ |m|(e−γ` ) + InSecprf

F (|m|TSE `, 2|m|`, k)),
1
OneBit.Encode.
36
Proof. The theorem follows directly from Theorem 3.12
Theorem 3.17. If C is (α, β)-informative, then for any q, µ ≤ 2d/2 :

prf
InSecss
NoState,C (t, q, µ, k) ≤ InSecF (t + qTG + `µTSE , 2`µ, k)
+ InSecprf
G (t + `µ, q, k)
q(q − 1) −d/2
+ (2 + 2−αk/β )
2
Proof. We reformulate the CT oracle in the chosen-hiddentext attack game so that the
oracle has a key κ ← Uk and evaluates Gκ on the first k documents of its reply (S, T )
to every query. Let NC denote the event that the values Gκ (S1 ), . . . , Gκ (Sq ) are all
distinct during the chosen-hiddentext attack game and let C denote the complement
of NC.
Let W be any adversary in W(t, q, µ), and assume without loss of generality that
Pr[W ST (1k ) = 1] > Pr[W CT (1k ) = 1]. We wish to bound W ’s advantage against the
stegosystem NoState.
Advss
NoState,C,W (k) = Pr[W
ST k
(1 ) = 1] − Pr[W CT (1k ) = 1]
= Pr[W ST (1k ) = 1|NC] Pr[NC] + Pr[W ST (1k ) = 1|C] Pr[C]

− Pr[W CT (1k ) = 1|NC] Pr[NC] + Pr[W CT (1k ) = 1|C] Pr[C]

≤ Pr[W ST (1k ) = 1|NC] Pr[NC] − Pr[W CT (1k ) = 1|NC] Pr[NC]

+ Pr[C]

≤ Pr[W ST (1k ) = 1|NC] − Pr[W CT (1k ) = 1|NC] + Pr[C]
We will show that for any W we can define an adversary X such that
Advss
ST k

MultiBit,C,X (k) ≥ Pr[W
(1 ) = 1|NC] − Pr[W CT (1k ) = 1|NC] .
X plays the nonce-respecting chosen hiddentext attack game against MultiBit by

running W and emulating W ’s oracle. To do this, X picks a key κ ← Uk , and when
W makes the query (m, h), X draws S1 ← Chk , and computes N = 2d/2 Gκ (S1 ). If N
is the same as some previous nonce used by X, X halts and outputs 0. Otherwise,
X queries its oracle on (m, (h, S1 ), N ) to get a sequence S2 , and then responds to W
with S1 , S2 . Notice that
Pr[X ST (1k ) = 1] = Pr[W ST (1k ) = 1|NC] ,
37
and likewise that
Pr[X CT (1k ) = 1] = Pr[W CT (1k ) = 1|NC] .
Thus we have that
Advss
ST k

MultiBit,C,X (k) = Pr[W
(1 ) = 1|NC] − Pr[W CT (1k ) = 1|NC] ,
and since X makes as many queries (of the same length) as W and runs in time
t + qTG , we have that

Pr[W ST (1k ) = 1|NC] − Pr[W CT (1k ) = 1|NC] ≤ InSecss
MultiBit,C (t + qTG , q, µ)
≤ InSecprf
F (t + qTG + `µTSE , 2`µ, k)
by Theorem 3.13. Thus we need only to bound the term Pr[C].
Consider a game played with the warden W in which a random function f is used
in place of the function Gκ , and let Cf denote the same event as C in the previous
game. Let S1 , . . . , Sq denote the k-document prefixes of the sequences returned by
the oracle in the chosen-hiddentext attack game and let Ni = f (Si ). Then the event
Cf happens when there exist i 6= j such that Ni = Nj , or equivalently f (Si ) = f (Sj );
and this event happens when Si = Sj or Si 6= Sj ∧ f (Si ) = f (Sj ). Thus for a random
f,
_
Pr[Cf ] = Pr[ ((Si = Sj ) ∨ (Si 6= Sj ∧ f (Si ) = f (Sj )))]
i<j<q
X
≤ Pr[Si = Sj ] + Pr[f (Si ) = f (Sj ) ∧ (Si 6= Sj )]
i<j<q
q(q − 1)
Pr[Si = Sj ] + 2−d/2

≤
2
q(q − 1) −αk/β
+ 2−d/2

≤ 2
2
Finally, observe that for every W ∈ W(t, q, µ) we can construct a PRF-Adversary

A for G in A(t + `µ, q) such that
Advprf
G,A (k) ≥ |Pr[C] − Pr[Cf ]| .
A runs W , using its oracle f in place of Gκ to respond to W ’s queries. A outputs

1 if the event Cf occurs, and 0 otherwise. Notice that Pr[AGκ (1k ) = 1] = Pr[C] and
38
Pr[Af (1k ) = 1] = Pr[Cf ], which satisfies the claim. So to complete the proof, we
observe that
Pr[C] ≤ |Pr[C] − Pr[Cf ]| + Pr[Cf ]

≤ InSecprf
G (t + `µ, q, k) + Pr[Cf ]
q(q − 1) −αk/β
≤ InSecprf + 2−d/2

G (t + `µ, q, k) + 2
2
3.2.2 An Alternative Construction
The following protocol also satisfies our definition for universal steganographic se-
crecy. This protocol (up to small differences) is not new and can be found in [6]; an
information theoretic version of the protocol can also be found in [16].
Let EK (·, ·) and DK (·) denote the encryption and decryption functions for a cryp-
tosystem which is indistinguishable from random bits under chosen plaintext attack
(i.e., IND$-CPA) [54]. Suppose Alice and Bob share a key K ∈ {0, 1}k , and a function
f such that ∆(f (Ch ), U1 ) ≤ for any h. One example of such a function would be
a uniformly chosen element of a universal hash family mapping Dk → {0, 1}; then
when C is (α, β)-informative, we would have ≤ 21−Ω(α/2β) except with negligible
probability. The following procedures allow encoding and decoding of messages in
a manner which is steganographically secret under chosen hiddentext attack for the
channel distribution C.
Construction 3.18. UHash Stegosystem
Procedure UHash.Encode: Procedure UHash.Decode:

Input: key K, hiddentext m, history h Input: key K, Stegotext s1 , . . . , s`
Let c = EK (m) for i = 1 . . . l do
Parse c as c1 ||1 c2 ||1 · · · ||1 c` set ci = f (si )
for i = 1 . . . ` do set c = c1 kc2 k · · · kcl .
set j = 0 Output: DK (c)
repeat:
increment j
si ← Ch
until f (si ) = ci or j = |K|
set h = (h, si )
Output: s1 , s2 , . . . , s`
39
The idea behind this construction is also simple. First, the encoding algorithm
encrypts the hiddentext m to get a ciphertext c = EK (m). Next, is splits the cipher-
text c into single bits. Then it draws documents from the channel until it finds s1
such that f (s1 ) = c1 . After that it finds s2 such that f (s2 ) = c2 , and s3 , and so on.
This continues until all bits of c have been encoded. The decoding algorithm simply
applies f to all the si ’s to get c and then decrypts with DK to recover the plaintext.
Note that the execution time for SD is essentially linear in l and the execution time
for SE is an expected O(l). Also note that our assumption of an “unbiased” func-
tion implies that an error correcting code as in construction 3.15 is unnecessary: the
probability of failure here is at most (1/2 + )|K| . In the following, we will let `(q, l)
denote the total number of bits of ciphertext returned by encrypting q plaintexts of
total length l.
Lemma 3.19. For any warden W ∈ W(t, q, l), we can construct a corresponding
adversary A where
cpa
Advss
UHash,C,W (k) ≤ AdvE,A (k) + `(q, l) .
The running time of A is at most t + O(kl), and A makes q encryption queries for a
total of l bits of plaintext.
Proof. We are given a program W for distinguishing between the output of construc-
tion 3.18 and the ordinary traffic of the channel, and an oracle for sampling blocks
from that channel. We construct a program A which plays the CPA game — distin-
guishing an EK oracle from a uniform $ oracle — with nearly the same advantage as
W . A simply runs W , responding to his queries by executing the encoding procedure
UHash.Encode using its oracle in place of EK . Consider the following two cases:
• O(m) = EK (m). Then the stegotexts output by the encoding procedure will be
identically distributed to stegotexts resulting from the normal use of construc-
tion 3.18.
• O(m) = $(m) is chosen uniformly from strings of appropriate length. Then the
stegotexts output by the encoding procedure will be `(l, q)-statistically close
to samples from Ch . To see that this is so, imagine instead that the ith bit of
40
the ciphertext, ci was chosen so that Pr[ci = 0] = Pr[f (Chi ) = 0]. In this case
the the ith stegotext will come from a distribution identical to Chi . But since
∆(ci , U1 ) ≤ , it must be the case that ∆(si , Chi ) ≤ as well, by proposition 2.4.
Thus A can simply use the decision of W to gain advantage close to that of W .
More formally,
Advcpa
EK k

E,A (k) = Pr[A
(1 ) = 1] − Pr[A$ (1k ) = 1]

= Pr[W ST (1k ) = 1] − Pr[A$ (1k ) = 1]

≥ Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1] + `(q, l)
= Advss
UHash,C,W (k) + `(q, l)
cpa
Theorem 3.20. InSecss
UHash,C (t, q, l, k) ≤ InSecE (t + O(kl), q, l, k) + `(q, l).
Proof. The theorem follows from Lemma 3.19 and the definition of insecurity.
3.3 Necessary Conditions for Steganography
The previous section demonstrates that relative to an oracle for C, the existence of
one-way functions is sufficient for the existence of secure steganography. In this section
we will explore weaker definitions of steganographic secrecy, and establish two results.
First, one-way functions are necessary for steganography; thus, relative to a channel
oracle, the existence of one-way functions and secure steganography are equivalent.
Second, we will show that in the “standard model,” without access to a channel
oracle, the existence of a secure stegosystem implies the existence of a program which
samples from Ch ; and thus in the standard model, secure steganography for C exists
if and only if Ch is efficiently sampleable.
41
3.3.1 Steganography implies one-way functions
To strengthen our result, we develop the weaker notion of security against known-
hiddentext attacks (KHA). In a (l, µ)-KHA attack against distribution D, the adver-
sary is given a history h of length l, a hiddentext drawn from Dµ , and a sequence
of documents s ∈ D|SE(K,m,h)| . The adversary’s task is to decide whether s ← Ch or
s ← SE(K, m, h). We define the KHA-advantage of W by

|SE(K,m,h)|
Advkha-D (k, l, µ) = Pr[W (h, m, SE(K, m, h)) = 1] − Pr[W (h, m, C ) = 1]

S,C,W h
and say that S is secure against known hiddentext attack with respect to D and C (SS-
KHA-D-C) if for every PPT W , for all polynomially-bounded l, µ, Advkha-D
S,C,W (k, l(k),
µ(k)) is negligible in k.
Thus a stegosystem is secure against known-hiddentext attack if given the history

h, and a plaintext m, an adversary cannot distinguish (asymptotically) between a
stegotext encoding m and a covertext of the appropriate length drawn from Ch . We
will show that one-way functions are necessary even for this much weaker notion of
security. In order to do so, we will use the following results from [33]:
Definition 3.21. ([33], Definition 3.9) A polynomial-time computable function f :

{0, 1}k → {0, 1}`(k) is called a false entropy generator if there exists a polynomial-time
0
computable g : {0, 1}k → {0, 1}`(k) such that:
1. HS (g(Uk0 )) > HS (f (Uk )), and
2. f (Uk ) ≈ g(Uk0 )
Thus, a function is a false entropy generator (FEG) if it’s output is indistinguish-

able from a distribution with higher (Shannon) entropy. It is shown in [33] that if
FEGs exist, then PRGs exist:
Theorem 3.22. ([33], Lemma 4.16) If there exists a false entropy generator, then
there exists a pseudorandom generator
Theorem 3.23. If there is a stegosystem S which is SS-KHA-D-C secure for some

hiddentext distribution D and some channel C, then there exists a pseudorandom
generator, relative to an oracle for C.
42
Proof. We will show how to construct a false entropy generator from S.Encode, which
when combined with Proposition 3.22 will imply the result.
Consider the function f which draws a hiddentext m of length |k|2 from D, and
outputs (SE(K, m, ε), m). Likewise, consider the function g which draws a hiddentext
|SE(K,m,ε)|
m of length |K|2 from D and has the output distribution (Cε , m). Because S
is SS-KHA-D-C secure, it must be the case that f (Uk ) ≈ g(Uk0 ). Thus f and g satisfy
condition (1) from definition 3.21.
|SE(K,m,ε)|
Now, consider HS (Cε ) versus HS (SE(K, m, h)) We must have one of three
cases:
|SE(K,m,ε)|
1. HS (Cε ) > HS (SE(K, m, ε)); in this case the program that samples from
Cε is a false entropy generator and we are done.
|SE(K,m,ε)|
2. HS (Cε ) < HS (SE(K, m, ε)); in this case SE is a false entropy generator,
and again we are done.
|SE(K,m,ε)|
3. HS (Cε ) = HS (SE(K, m, ε)); In this case, we have that
HS (m|Cε|SE(K,m,ε)| ) = |K|2 HS (D) ,
whereas
HS (m|SE(K, m, ε)) ≤ (1 + ν)|K|
for a negligible function ν. To see that this is the case, notice that m =
SD(K, SE(K, m, ε)) and so is determined (up to a negligible probability) by K,
and HS (K) = |K|. Thus asymptotically, we have that HS (f (Uk )) > HS (g(Uk0 )),
and f is a false entropy generator relative to an oracle for C.
Corollary 3.24. Relative to an oracle for C, secure steganography for C exists if and
only if one-way functions exist.
Proof. The corollary follows from Theorem 3.23 and the results of Section 3.2 and [33].
43
3.3.2 Sampleable Channels are necessary
We say that a channel C is efficiently sampleable if there exists an algorithm C such

that for any polynomial time A, for any polynomial l,

Prl(k) [A(1k , C(h, 1k , Uk ))] − Prl(k) [A(1k , Ch )]

h←Cε h←Cε
is negligible in k. Notice that for any efficiently sampleable channel C, the results of
the previous sections prove that secure steganography with respect to C exists if and
only if one-way functions exist in the standard model - e.g., without assuming oracle
access to the channel C. Here we will introduce a very weak notion of security with
respect to C and show that if secure steganography exists for C in the standard model,
then C is efficiently sampleable.
A weaker attack yet than the KHA attack is the Known Distribution Attack game:
In a l-KDA attack against distribution D, the adversary is given a history h of length
l, and a sequence of documents s ∈ D|SE(K,D,h)| . The adversary’s task is to decide
whether s ← Ch or s ← SE(K, D, h). We define the KDA-advantage of W by

kda-D
`

AdvS,C,W (k, l) = Pr [W (SE(K, m, h)) = 1] − Pr [W (Ch ) = 1]

h←Cεl ,m←D h←Cεl
and say that S is secure against known distribution attack with respect to D and C
(SS-KDA-D-C) if for every PPT W , for all polynomially-bounded l, Advkda-D
S,C,W (k, l(k))
is negligible in k. This attack is weaker yet than a KHA attack in that the length of
the hiddentext is shorter and the hiddentext is unknown to W .
Theorem 3.25. If there exists an efficiently sampleable D such that there is a SS-
KDA-D-C secure stegosystem S in the standard model, then C is efficiently sampleable.
Proof. Consider the program CS with the following behavior: on input (1k , h), CS picks
K ← {0, 1}k , picks m ← D, and returns the first document of S.Encode(K, m, h).
Consider any PPT distinguisher A. We will that the KDA adversary W which passes
the first document of its input to A and outputs A’s decision has at least the advantage
of A. This is because in case W ’s input is drawn from SE, the input it passes to A
is exactly distributed according to CS (1k , h); and when W ’s input is drawn from Ch ,
44
the input it passes to A is exactly distributed according to Ch :
Advkda-D
S,C,W (k, |h|) = | Pr[W (SE(K, m, h)) = 1] − Pr[W (Ch ) = 1]|

= Pr[A(1k , CS (1k , h)) = 1] − Pr[A(1k , Ch ) = 1] .
But because S is SS-KDA-D-C secure, we know that W ’s advantage must be negligible,

and thus no efficient A can distinguish this from the first document drawn from
|SE(K,D,h)|
Ch . So the output of CCS is computationally indistinguishable from C.
As a consequence of this theorem, if a designer is interested in developing a

stegosystem for some channel C in the standard model, he can focus exclusively on
designing an efficient sampling algorithm for C. If his stegosystem is secure, it will
include one anyway; and if he can design one, he can “plug it in” to the constructions
from section 3.2 and get a secure stegosystem based on “standard” assumptions.
45
46
Chapter 4
Public-Key Steganography
The results of the previous chapter assume that the sender and receiver share a secret,
randomly chosen key. In the case that some exchange of key material was possible
before the use of steganography was necessary, this may be a reasonable assumption.
In the more general case, two parties may wish to communicate steganographically,
without prior agreement on a secret key. We call such communication public key
steganography. Whereas previous work has shown that symmetric-key steganography
is possible – though inefficient – in an information-theoretic model, public steganog-
raphy is information-theoretically impossible. Thus our complexity-theoretic formu-
lation of steganographic secrecy is crucial to the security of the constructions in this
chapter.
In Section 4.1 we will introduce some required basic primitives from the theory
of public-key cryptography. In Section 4.2 we will give definitions for public-key
steganography and show how to use the primitives to construct a public-key stegosys-
tem. Finally, in Section 4.3 we introduce the notion of steganographic key exchange
and give a construction which is secure under the Integer Decisional Diffie-Hellman
assumption.
47
4.1 Public key cryptography
Our results build on several well-established cryptographic assumptions from the the-
ory of public-key cryptography. We will briefly review them here, for completeness.
Integer Decisional Diffie-Hellman.
Let P and Q be primes such that Q divides P − 1, let Z∗P be the multiplicative
group of integers modulo P , and let g ∈ Z∗P have order Q. Let A be an adversary
that takes as input three elements of Z∗P and outputs a single bit. Define the DDH
advantage of A over (g, P, Q) as: Advddh a b ab
A (g, P, Q) = | Pra,b [A(g , g , g , g, P, Q) =
1] − Pra,b,c [A(g a , g b , g c , g, P, Q) = 1]|, where a, b, c are chosen uniformly at random

from ZQ and all the multiplications are over Z∗P . The Integer Decisional Diffie-Hellman
assumption (DDH) states that for every PPT A, for every sequence {(Pk , Qk , gk )}k
satisfying |Pk | = k and |Qk | = Θ(k), Advddh
A (gk , Pk , Qk ) is negligible in k.
Trapdoor One-way Permutations.
A trapdoor one-way permutation family Π is a sequence of sets {Πk }k , where each Πk

is a set of bijective functions π : {0, 1}k → {0, 1}k , along with a triple of algorithms
(G, E, I). G(1k ) samples an element π ∈ Πk along with a trapdoor τ ; E(π, x) evaluates
π(x) for x ∈ {0, 1}k ; and I(τ, y) evaluates π −1 (y). For a PPT A running in time t(k),
denote the advantage of A against Π by
Advow
Π,A (k) = Pr [A(π(x)) = x] .
(π,τ )←G(1k ),x←Uk
ow
Define the insecurity of Π by InSecow

Π (t, k) = maxA∈A(t) AdvΠ,A (k) , where A(t)
denotes the set of all adversaries running in time t(k). We say that Π is a trap-
door one-way permutation family if for every probabilistic polynomial-time (PPT) A,
Advow
Π,A (k) is negligible in k.
48
Trapdoor one-way predicates
A trapdoor one-way predicate family P is a sequence {Pk }k , where each Pk is a set of

efficiently computable predicates p : Dp → {0, 1}, along with an algorithm G(1k ) that
samples pairs (p, Sp ) uniformly from Pk ; Sp is an algorithm that, on input b ∈ {0, 1}
samples x uniformly from Dp subject to p(x) = b. For a PPT A running in time t(k),
denote the advantage of A against P by
Advtp
P,A (k) = Pr [A(x, Sp ) = p(x)] .
(p,Sp )←G(1k ),x←Dp
Define the insecurity of P by
InSectp Advtp

P (t, k) = max P,A (k) ,
A∈A(t)
where A(t) denotes the set of all adversaries running in time t(k). We say that P is a
trapdoor one-way predicate family if for every probabilistic polynomial-time (PPT)
A, Advtp
P,A (k) is negligible in k.
Notice that one way to construct a trapdoor one-way predicate is to utilize the
Goldreich-Levin hard-core bit [28] of a trapdoor one-way permutation. That is, for a
permutation family Π, the associated trapdoor predicate family PΠ works as follows:
the predicate pπ has domain Dom(π) × {0, 1}k , and is defined by p(x, r) = π −1 (x) · r,
where · denotes the vector inner product on GF (2)k . [28] prove that there exist
polynomials such that InSectp ow
Pπ (t, k) ≤ poly(InSecΠ (poly(t), k)).
4.1.1 Pseudorandom Public-Key Encryption
We will require public-key encryption schemes that are secure in a slightly non-
standard model, which we will denote by IND$-CPA in contrast to the more standard
IND-CPA. The main difference is that security against IND$-CPA requires the output
of the encryption algorithm to be indistinguishable from uniformly chosen random
bits, whereas IND-CPA only requires the output of the encryption algorithm to be
indistinguishable from encryptions of other messages.
Formally, a public-key (or asymmetric) cryptosystem E consists of three (random-

ized) algorithms:
49
• E.Generate : 1k → PKk × SKk generates (public, secret) key pairs (P K, SK).
We will abbreviate E.Generate(1k ) by G(1k ), when it is clear which encryption
scheme is meant.
• E.Encrypt : PK × {0, 1}∗ → {0, 1}∗ uses a public key to transform a plaintext
into a ciphertext. We will abbreviate E.Encrypt(P K, ·) by EP K (·).
• E.Decrypt : SK × {0, 1}∗ → {0, 1}∗ uses a secret key to transform a cipher-
text into the corresponding plaintext. We will abbreviate E.Decrypt(SK, ·) by
DSK (·).
Such that for all key pairs (P K, SK) ∈ G(1k ), Decrypt(SK, Encrypt(P K, m)) = m.
To formally define the security condition for a public-key encryption scheme, con-
sider a game in which an adversary A is given a public key drawn from G(1k ) and
chooses a message mA . Then A is given either EP K (mA ) or a uniformly chosen string
of the same length. Let A(t, l) be the set of adversaries A which produce a message
of length at most l(k) bits and run for at most t(k) time steps. Define the IND$-CPA
advantage of A against E as

cpa
AdvE,A (k) = Pr [A(P K, EP K (mA )) = 1] − Pr [A(P K, U|EP K (mA )| ) = 1]

PK PK
Define the insecurity of E as InSeccpa cpa

E (t, l, k) = maxA∈A(t,l) AdvE,A (k) . E is (t, l, k, )
- indistinguishable from random bits under chosen plaintext attack if InSeccpa
E (t, l, k) ≤
(k). E is called indistinguishable from random bits under chosen plaintext attack
(IND$-CPA) if for every probabilistic polynomial-time (PPT) A, Advcpa
E,A (k) is negli-
gible in k. We show how to construct IND$-CPA public-key encryption schemes from

a variety of well-established cryptographic assumptions.
IND$-CPA public-key encryption schemes can be constructed from any primitive

which implies trapdoor one-way predicates p with domains Dp satisfying one of the
following conditions:
• Dp is computationally or statistically indistinguishable from {0, 1}poly(k) : in this

case it follows directly that encrypting the bit b by sampling from p−1 (b) yields
an IND$-CPA scheme. The results of Goldreich and Levin imply that such
50
predicates exist if there exist trapdoor one-way permutations on {0, 1}k , for
example.
• Dp has an efficiently recognizable, polynomially dense encoding in {0, 1}poly(k ) ;

in this case, we let q(·) denote the polynomial such that every Dp has den-
sity at least 1/q(k). Then to encrypt a bit b, we draw ` = kq(k) samples
d1 , . . . , d` ← Upoly(k) ; let i be the least i such that di ∈ Dp ; then transmit
d1 , . . . , di−1 , p−1 (b), di+1 , . . . , d` . (This assumption is similar to the requirement
for common-domain trapdoor systems used by [19], and all (publicly-known)
public-key encryption systems seem to support construction of trapdoor predi-
cates satisfying this condition.)
Stronger assumptions allow construction of more efficient schemes. Here we will

construct schemes satisfying IND$-CPA under the following assumptions: trapdoor
one-way permutations on {0, 1}k , the RSA assumption, and the Decisional Diffie-
Hellman assumption. Notice that although both of the latter two assumptions imply
the former through standard constructions, the standard constructions exhibit con-
siderable security loss which can be avoided by our direct constructions.
4.1.2 Efficient Probabilistic Encryption
The following “EPE” encryption scheme is described in [30], and is a generalization

of the protocol given by [13]. When used in conjunction with a family of trapdoor
one-way permutations on domain {0, 1}k , it is easy to see that the scheme satisfies
IND$-CPA:
Construction 4.1. (EPE Encryption Scheme)
Procedure Encrypt: Procedure Decrypt:

Input: m ∈ {0, 1}∗ , tOWP π Input: (x, r, c), trapdoor π −1
Sample x0 , r ← Uk let l = |c|, xl = x
let l = |m| for i = l . . . 1 do
for i = 1 . . . l do set xi−1 = π −1 (xi )
set bi = xi−1 r set bi = xi−1 r
set xi = f (xi−1 ) Output: c ⊕ b
Output: xl , r, b ⊕ m
51
IND$-CPA-ness follows by the pseudorandomness of the bit sequence b1 , . . . , bl
generated by the scheme and the fact that xl is uniformly distributed in {0, 1}k .
RSA-based construction
The RSA function EN,e (x) = xe mod N is believed to be a trapdoor one-way permu-
tation family when N is selected as the product of two large, random primes. The
following construction uses Young and Yung’s Probabilistic Bias Removal Method
(PBRM) [65] to remove the bias incurred by selecting an element from Z∗N rather
than Uk .
Construction 4.2. (RSA-based Pseudorandom Encryption Scheme)

Input: plaintext m; public key N, e Input: x0 , c; (N, d)
let k = |N |, l = |m| let l = |c|, k = |N |
repeat: if (x0 > N ) set xl = x0
Sample x0 ← Z∗N else set xl = 2k − x0
for i = 1 . . . l do for i = l . . . 1 do
set bi = xi−1 mod 2 set xi−1 = xdi mod N
set xi = xei−1 mod N set bi = xi−1 mod 2
sample c ← U1 Output: c ⊕ b
until (xl ≤ 2k − N ) OR c = 1
if (x1 ≤ 2k − N ) and c = 0 set x0 = x
if (x1 ≤ 2k − N ) and c = 1 set x0 = 2k − x
Output: x0 , b ⊕ m
The IND$-CPA security of the scheme follows from the correctness of PBRM and the
fact that the least-significant bit is a hardcore bit for RSA. Notice that the expected
number of repeats in the encryption routine is at most 2.
DDH-based construction
Let E(·) (·), D(·) (·) denote the encryption and decryption functions of a private-key
encryption scheme satisfying IND$-CPA, keyed by κ-bit keys, and let κ ≤ k/3. (We
give an example of such a scheme in Chapter 2.) Let Hk be a family of pairwise-
independent hash functions H : {0, 1}k → {0, 1}κ . We let P be a k-bit prime (so
52
2k−1 < P < 2k ), and let P = rQ + 1 where (r, Q) = 1 and Q is also a prime. Let g
generate Z∗P and ĝ = g r mod P generate the unique subgroup of order Q. The security
of the following scheme follows from the Decisional Diffie-Hellman assumption, the
leftover-hash lemma, and the security of (E, D):
Construction 4.3. DDHRand Public-key cryptosystem.

Input: m ∈ {0, 1}∗ ; (g, ĝ x , P ) Input: (H, s, c); private key (x, P, Q)
Sample H ← Hk let r = (P − 1)/Q
repeat: set K = H(srx mod P )
Sample y ← ZP −1 Output: DK (c)
until (g y mod P ) ≤ 2k−1
set K = H((ĝ x )y mod P )
Output: H, g y , EK (m)
The security proof considers two hybrid encryption schemes: H1 replaces the value
(ĝ a )b by a random element of the subgroup of order Q, ĝ c , and H2 replaces K by
a random draw from {0, 1}κ . Clearly distinguishing H2 from random bits requires
distinguishing some EK (m) from random bits. The Leftover Hash Lemma gives that
the statistical distance between H2 and H1 is at most 2−κ . Thus
AdvH1 ,$ cpa
A (k) ≤ InSecE (t, |κ|) + 2
−κ
.
Finally, we show that any distinguisher A for H1 from the output of Encrypt with
advantage can be used to construct a distinguisher B that solves the DDH problem
with advantage at least /2. B takes as input a triple (ĝ x , ĝ y , ĝ z ) and attempts to
decide whether z = xy, as follows. First, B computes r̂ as the least integer such that
rr̂ = 1 mod Q, and then picks β ← Zr . Then B computes s = (ĝ y )r̂ g βQ . If s > 2k−1 ,
B outputs 0. Otherwise, B submits ĝ x to A to get the message mA , draws H ← Hk ,
and outputs the decision of A(ĝ x , HkskEH(ĝz ) (mA )). We claim that:
• The element s is a uniformly chosen element of Z∗P , when y ← ZQ . To see

that this is true, observe that the exponent of s, ξ = rr̂y + βQ, is congruent to
y mod Q and βQ mod r; and that for uniform β, βQ is also a uniform residue
mod r. By the Chinese remainder theorem, there is exactly one element of ZrQ =
53
ZP −1 that satisfies these conditions, for every y and β. Thus s is uniformly
chosen.
1
• B halts and outputs 0 with probability at most 2
over input and random choices;
and conditioned on not halting, the value s is uniformly distributed in {0, 1}k .
This is true because 2k /P < 12 , by assumption.
• When z = xy, the input HkskEH(ĝz ) (mA )) is selected exactly according to the
output of Encrypt(ĝ x , mA ). This is because
(ĝ x )ξ = (g rr̂y+βQ )rx

= g (αQ+1)rxy+rQ(βx)
= g rxy = ĝ z
• When z 6= xy, the input HkskEH(ĝz ) (mA )) is selected exactly according to the
output of H1 , by construction.
Thus,
2k
Pr[B(ĝ x , ĝ y , ĝ xy ) = 1] = Pr[A(ĝ x , Encrypt(ĝ x , mA )) = 1] ,
P
and
2k
Pr[B(ĝ x , ĝ y , ĝ z ) = 1] =
Pr[A(ĝ x , H1 (mA )) = 1] .
P
And thus Advddh 1
B (ĝ, P, Q) ≥ 2 . Thus, we have that overall,
InSeccpa ddh cpa

DDHRand (t, l, k) ≤ InSecg,P,Q (t, k) + InSec(E,D) t, l, 1, k + 2
−κ
.
4.2 Public key steganography
We will first give definitions of public-key stegosystems and security against chosen-
hiddentext attack, and then give a construction of a public-key stegosystem to demon-
strate the feasibility of these notions. The construction is secure assuming the exis-
tence of a public-key IND$-CPA-secure cryptosystem.
54
4.2.1 Public-key stegosystems
As with the symmetric case, we will first define a stegosystem in terms of syntax and
correctness, and then proceed to a security definition.
Definition 4.4. (Stegosystem) A public-key stegosystem S is a triple of probabilistic

algorithms:
• S.Generate takes as input a security parameter 1k and generates a key pair

(P K, SK) ∈ PK × SK. When it is clear from the context which stegosystem
we are referring to, we will abbreviate S.Generate by SG.
• S.Encode (abbreviated SE when S is clear from the context) takes as input

a public key P K ∈ PK, a string m ∈ {0, 1}∗ (the hiddentext), and a message
history h. As with the symmetric case, we will also assume for our feasibility
results that SE has access to a channel oracle for some channel C, which can
sample from Ch for any h.
SE(P K, m, h) returns a sequence of documents s1 , s2 , . . . , sl (the stegotext) from

the support of Chl .
• S.Decode (abbreviated SD) takes as input a secret key SK ∈ SK, a sequence

of documents s1 , s2 , . . . , sl , and a message history h.
SD(SK, s, h) returns a hiddentext m ∈ {0, 1}∗ .
As in the private key case, we will also require that a public-key stegosystem is
correct:
Definition 4.5. (Correctness) A public-key stegosystem S is correct if for every poly-

nomial p(k) there exists a negligible ν(k) such that
∀m ∈ {0, 1}p(k) , h ∈ D∗ : Pr [SD(SK, SE(P K, m, h), h) = m] ≥ 1 − µ(k)

(P K,SK)←SG(1k )
where the randomization is also over any coin tosses of SE, SD, SG and the oracle
to Ch .
55
Remark. While in general Alice will need to remember the history of documents
transmitted to Bob, it is most desirable if Bob is not required to store the history of
documents he has received from Alice. Some of our protocols require Bob to store
this history, but it is straightforward to rewrite them so that Bob need only remember
a collision-intractable digest of the history. In this paper we are not concerned with
attacks in which Ward attempts to disrupt the communication between Alice and
Bob, so the dependence of decoding on accurate history is chiefly a storage concern.
4.2.2 Steganographic Secrecy against Chosen Hiddentext At-

tack
In a chosen hiddentext attack, W is allowed to supply hiddentexts to Alice, in hopes

of detecting whether or not she is using steganography in her communications with
Bob. (Note that a chosen hiddentext attack is the steganographic equivalent of a
chosen plaintext attack against an encryption scheme.)
We will model a warden attacking a stegosystem as an efficient oracle machine

which plays the following game. First, a pair (P K, SK) is drawn from SG(1k ) and W
is allowed to access P K and sample from Ch for any history h, to produce a message
mW and a history hW . Then W is given access to a sequence s ∈ D∗ , which is drawn
|s|
either from SE(P K, mW , hW ) or Ch , and continues to access Ch for any history h
(This oracle allows W to learn the covertext distribution on all communications).
Finally W outputs a bit which is a “guess” about whether s is a stegotext or a
covertext.
We define the CHA advantage of W against stegosystem S over channel C by

|s|
Advcha (k) = Pr[W (P K, s) = 1] − Pr[W (P K, C ) = 1] ,

S,C,W hW
where the probabilities are taken over (P K, SK) ← SG(1k ), (mW , hW ) ← W ( P K),
s ← SE(P K, mW , hW ), the random bits of W and the responses of the oracle C.
Define the insecurity of S over channel C by
InSeccha Advcha

S,C (t, l, k) = max S,C,W (k) ,
W ∈W(t,l)
56
where W(t, l) denotes the set of all adversaries which choose a message of length at
most l(k) bits, and run in time at most t(k).
Thus, S is secure against chosen-hiddentext attacks if the warden, even given

Bob’s public key, cannot tell whether Alice’s message to Bob encodes any hiddentext
at all, even one of his choice.
4.2.3 Construction
Our protocols build on the following construction, a generalization of construction 3.18.

Let f : D → {0, 1} be a public function (recall that C is a distribution on sequences
of elements of D). If f is is perfectly unbiased on Ch for all h, then the following en-
coding procedure, on uniformly distributed l-bit input c, produces output distributed
exactly according to Chl :
Construction 4.6. (Basic encoding/decoding routines)
Procedure Basic Encode: Procedure Basic Decode:

Input: bits c1 , . . . , cl , history h, bound k Input: Stegotext s1 , s2 , . . . , sl
for i = 1 . . . l do for i = 1 . . . l do
Let j = 0 set ci = f (si )
repeat: set c = c1 ||c2 || · · · ||cl .
sample si ← Ch , increment j Output: c
until f (si ) = ci OR (j > k)
set h = (h, si )
Output: s1 , s2 , . . . , sl
Note that for infinitely many Ch there is no perfectly unbiased function f . As with
construction 3.18, this can be rectified by using a (global) universal hash function.
Lemma 4.7. Any channel C which is always informative can be compiled into a
channel C (k) which admits an efficiently computable function f such that for any
(k) 1
polynomial-length sequence h1 , . . . , hn satisfying PrC [hi ] 6= 0, Pr[f (Chi ) = 1] − 2 is

negligible in k for all 1 ≤ i ≤ n.
Proof. Let l(k) = ω(log k). Then the channel C (k) is simply a distribution on sequences
(k)
of documents which are elements of Dl(k) and the marginal distributions Ch are
57
l(k)
simply Ch . Because C is always informative, we have that for any h which has
(k)
non-zero probability, H∞ (Ch ) = ω(log k).
Let h1 , h2 , ..., hm be any sequence of histories which all have non-zero probability
under C (k) and let f : {0, 1}m(k) × D × {0, 1} be a universal hash function. Let
(k)
Y, Z ← Um(k) , B ← Um , and Di ← Chi . Let L(k) = mini H∞ (Di ), and note that
L(k) = ω(log k). Then the Leftover Hash Lemma ([33]) implies that
∆(hY, fY (D1 ), ..., fY (Dm )i, hY, Bi) ≤ m2−L(k)/2+1 ,
from which it is immediate that if we choose Y ← Um(k) once and publicly, then for all
1 ≤ i ≤ m, fY will have negligible bias for Chi except with negligible probability.
Lemma 4.8. If f is -biased on Ch for all h, then for any k and s1 , s2 , . . . , sl :
∆(Basic Encode(Ul , h, k), Chl ) ≤ l .
Proof. To see that this is so, imagine that the ith bit of the input to Basic Encode,
ci , was chosen so that Pr[ci = 0] = Pr[f (Chi ) = 0]. In this case the the ith document
output by Basic Encode will come from a distribution identical to Chi . But since
∆(ci , U1 ) ≤ , it must be the case that ∆(si , Chi ) ≤ as well, by proposition 2.4.
The statistical distance between the entire sequences must then be at most l, by the
triangle inequality.
Using these lemmata, we will show that public-key steganography is possible in any
channel that is always informative. We note that procedure Basic Encode has a small
probability of failure: Basic Decode(Basic Encode(c, h, k)) might not equal c. This
probability of failure, however, is negligible in k.
4.2.4 Chosen Hiddentext security
Let EP K (·) and DSK (·) denote the encryption and decryption algorithms for a public-
key cryptosystem E which is indistinguishable from random bits under chosen plain-
text attack (IND$-CPA). Let ` be the expansion function of E, i.e., |EP K (m)| = `(|m|).
The following procedures allow encoding and decoding of messages in a manner which
58
is steganographically secret under chosen hiddentext attack for the channel distribu-
tion C:
Construction 4.9. (Chosen Hiddentext Security)
Procedure CHA Encode: Procedure CHA Decode:

Input: m ∈ {0, 1}∗ , h ∈ D∗ , key P K Input: s1 , . . . , sl ∈ Dl , secret key SK
Let c = EP K (m) Let c = Basic Decode(s1 , . . . , sl )
Output: Basic Encode(c, h, k) Output: DSK (c)
Lemma 4.10. Assume f is -biased on Ch for all h. For any warden W ∈ W(t, l),
we can construct an IND$-CPA adversary A where
cpa
Advcha
CHA,C,W (k) ≤ AdvE,A (k) + `(l).
The running time of A is at most t + O(kl) (k is the security parameter of (P K, SK)

and is used as a bound for Basic Encode), and A makes q encryption queries for a
total of l bits of plaintext.
Proof. We are given a program W for distinguishing between the output of Con-
struction 4.9 and the ordinary traffic of the channel, and an oracle for sampling
blocks from that channel. We construct a program A which plays the IND$-CPA
game: distinguishing EP K (mW ) oracle from U`(l) . A(P K) simply runs W (P K), to
get (mW , hW ) and returns mW for mA . Then A(P K, c) uses the oracle Ch to compute
s = Basic Encode(c, hW , k), and returns the output of W (P K, s). Consider the cases
for A’s input.
• If c ← EP K (mW ), then s ← CHA Encode(P K, mW , hW ); so we have
Pr[A(P K, EP K (mA )) = 1] = Pr[W (P K, SE(P K, mW , hW )) = 1] .
• If c ← U` , then s is distributed identically to Basic Encode(U` , hW , k); thus

Pr[A(P K, U` ) = 1] − Pr[W (P K, Ch` ) = 1] ≤ `
W
by lemma 4.8.
59
Combining the cases, we have
Advcha `
CHA,C,W (k) = | Pr[W (P K, SE(P K, mW , hW )) = 1] − Pr[W (P K, ChW ) = 1]|
≤ | Pr[W (P K, SE(P K, mW , hW )) = 1] − Pr[A(P K, U` ) = 1]| + `(l)

= | Pr[A(P K, EP K (mA )) = 1] − Pr[A(P K, U` ) = 1]| + `(l)
= Advcpa
E (A, k) + `(l) ,
as claimed.
Theorem 4.11. If f is -biased on Ch for all h, then
cpa
InSeccha
CHA,C (t, l, k) ≤ InSecE (t + O(kl), l, k) + `(l) .
4.3 Steganographic Key Exchange
In many cases in which steganography might be desirable, it may not be possible for
either Alice or Bob to publish a public key without raising suspicion. In these cases, a
natural alternative to public-key steganography is steganographic key exchange: Alice
and Bob exchange a sequence of messages, indistinguishable from normal communi-
cation traffic, and at the end of this sequence they are able to compute a shared key.
So long as this key is indistinguishable from a random key to the warden, Alice and
Bob can proceed to use their shared key in a symmetric-key stegosystem. In this
section, we will formalize this notion.
Definition 4.12. (Steganographic Key Exchange Protocol) A steganographic key ex-

change protocol, or SKEP S, is a pair of efficient probabilistic algorithms:
• S.Encode Key (Abbreviated SE): takes as input a security parameter 1k and a

string of random bits. SE(1k , Uk ) outputs a sequence of l(k) documents.
• S.Compute Key: (Abbreviated SD): takes as input a security parameter 1k ,

a string of random bits, and a sequence s of l(k) documents. SD(1k , s, Uk )
outputs an element of the key space K.
60
We say that S is correct if these algorithms satisfy the property that there exists a
negligible function µ(k) satisfying:
Pr [SD(1k , ra , SE(1k , rb )) = SD(1k , rb , SE(1k , ra ))] ≥ 1 − µ(k) .

ra ,rb
We call the output of SD(1k , ra , SE(1k , rb )) the result of the protocol, and denote this
result by SKE (ra , rb ). We denote by S(1k , ra , rb ) the triple (SE(1k , ra ), SE(1k , rb ),
SKE (ra , rb )).
Alice and Bob perform a key exchange using S by sampling private randomness
ra , rb , asynchronously sending SE(1k , ra ) and SE(1k , rb ) to each other, and using the
result of the protocol as a key. Notice that in this definition a SKEP must be an
asynchronous single-round scheme, ruling out multi-round key exchange protocols.
This is for ease of exposition only.
We remark that many authenticated cryptographic key exchange protocols require

three flows without a public-key infrastructure. Our SKE scheme will be secure with
only two flows because we won’t consider the same class of attackers as these protocols;
in particular we will not worry about active attackers who alter the communications
between Alice and Bob, and so Diffie-Hellman style two-flow protocols are possible.
This may be a more plausible assumption in the SKE setting, since an attacker will
not even be able to detect that a key exchange is taking place, while cryptographic
key exchanges are typically easy to recognize.
Let W be a warden running in time t. We define W ’s SKE advantage against S

on bidirectional channel B with security parameter k by:

ske
k

AdvS,B,W (k) = Pr [W (S(1 , ra , rb )) = 1] − Pr[W (B, K) = 1] .

ra ,rb K
We remark that, as in our other definitions, W also has access to bidirectional channel
oracles C a , C b .
Let W(t) denote the set of all wardens running in time t. The SKE insecurity of
S on bidirectional channel B with security parameter k is given by InSecske
S,B (t, k) =
ske

maxW ∈W(t) AdvS,B,W (k) .
Definition 4.13. (Secure Steganographic Key Exchange) A SKEP S is said to be
61
(t, )-secure for bidirectional channel B if InSecske
S,B (t, k) ≤ (k). S is said to be secure
for B if for all PPT adversaries W , Advske

S,B,W (k) is negligible in k.
4.3.1 Construction
The idea behind behind the construction for steganographic key exchange is simple:
let g generate Z∗P , let Q be a large prime with P = rQ + 1 and r coprime to Q, and
let ĝ = g r generate the subgroup of order Q. Alice picks random values a ∈ ZP −1
uniformly at random until she finds one such that g a mod P has its most significant
bit (MSB) set to 0 (so that g a mod P is uniformly distributed in the set of bit strings
of length |P |−1). She then uses Basic Encode to send all the bits of g a mod P except
for the MSB (which is zero anyway). Bob does the same and sends all the bits of g b
mod P except the most significant one (which is zero anyway) using Basic Encode.
Bob and Alice then perform Basic Decode and agree on the key value ĝ ab :
Construction 4.14. (Steganographic Key Exchange)
Procedure SKE Encode: Procedure SKE Compute Key:

Input: primes P, Q, h, g ∈ Z∗P Input: Stegotext s1 , . . . , sl ; a ∈ ZP −1
repeat: Let cb = Basic Decode(s1 , . . . , sl )
sample a ← U (ZP −1 ) Output: crab mod P = ĝ ab
until MSB of g a mod P equals 0
Let ca = all bits of g a except MSB
Output: Basic Encode(ca , h, k)
Lemma 4.15. Let f be -biased on B. Then for any warden W ∈ W(t), we can
construct a DDH adversary A where Advddh 1 ske
A (ĝ, P, Q) ≥ 4 AdvSKE,B,W (k) − 2k. The
running time of A is at most t + O(k 2 ).
Proof. A takes as input a triple (ĝ a , ĝ b , ĝ c ) and attempts to decide whether c = ab, as
follows. First, A computes r̂ as the least integer such that rr̂ = 1 mod Q, and then
picks α, β ← Zr . Then A computes ca = (ĝ a )r̂ g αQ and cb = (ĝ b )r̂ g βQ . If ca > 2k−1
or cb > 2k−1 , A outputs 0. Otherwise, A computes sa = Basic Encode(ca ), and
sb = Basic Encode(cb ); A then outputs the result of computing W (sa , sb , ĝ c ). We
claim that:
62
• The element ca , cb are uniformly chosen element of Z∗P , when a, b ← ZQ . To see
that this is true, observe that the exponent of sa , ξa = rr̂a + αQ, is congruent to
a mod Q and αQ mod r; and that for uniform α, αQ is also a uniform residue
mod r. By the Chinese remainder theorem, there is exactly one element of
ZrQ = ZP −1 that satisfies these conditions, for every a and α. Thus ca is
uniformly chosen. The same argument holds for cb .
3
• B halts and outputs 0 with probability at most 4
over input and random choices;
and conditioned on not halting, the values ca , cb are uniformly distributed in
{0, 1}k . This is true because 2k /P < 21 , by assumption.
• The sequence (sa , sb ) is 2k statistically close to B. This follows because of

Lemma 4.8.
• When c = ab, the element ĝ c is exactly the output of SD(a, sb ) = SD(b, sa ).

This is because
crb
a = (g
rr̂a+αQ rb
)
= g (γQ+1)rab+rQ(αb)
= g rab = ĝ c
• When c 6= ab, the input HkskEH(ĝz ) (mA )) is selected exactly according to the
output of H1 , by construction.
Thus,
2
2k

a b ab
Pr[A(ĝ , ĝ , ĝ ) = 1] = Pr[W (S(a, b)) = 1] ,
P
and
Pr[A(ĝ a , ĝ b , ĝ c ) = 1] − Pr[W (B, K) = 1] ≤ 2k .

K
And therefore Advddh 1 ske

A (ĝ, P, Q) ≥ 4 AdvS,B,W (k) − 2k.
Theorem 4.16. If f is -biased on B, then
InSecske ddh 2
SKE,B (t, k) ≤ 4InSecĝ,P,Q (t + O(k ))) + 8k .
63
64
Chapter 5
Security against Active Adversaries
The results of the previous two chapters show that a passive adversary (one who
simply eavesdrops on the communications between Alice and Bob) cannot hope to
subvert the operation of a stegosystem. In this chapter, we consider the notion of an
active adversary who is allowed to introduce new messages into the communications
channel between Alice and Bob. In such a situation, an adversary could have two
different goals: disruption or detection.
Disrupting adversaries attempt to prevent Alice and Bob from communicating

steganographically, subject to some set of publicly-known restrictions. We call a
stegosystem which is secure against this type of attack robust. In this chapter we will
give a formal definition of robustness against such an attack, consider what type of
restrictions on an adversary are necessary (under this definition) for the existence of a
robust stegosystem, and give the first construction of a provably robust stegosystem
against any set of restrictions satisfying this necessary condition. Our protocol is
secure assuming the existence of pseudorandom functions.
Distinguishing adversaries introduce additional traffic between Alice and Bob in

hopes of tricking them into revealing their use of steganography. We consider the
security of symmetric- and public-key stegosystems against active distinguishers, and
give constructions that are secure against various notions of active distinguishing
attacks. We also show that no stegosystem can be simultaneously secure against both
disrupting and distinguishing active adversaries.
65
5.1 Robust Steganography
Robust steganography can be thought of as a game between Alice and Ward in which
Ward is allowed to make some alterations to Alice’s messages. Ward wins if he can
sometimes prevent Alice’s hidden messages from being read; while Alice wins if she
can pass a hidden message with high probability, even when Ward alters her public
messages. For example, if Alice passes a single bit per document and Ward is unable
to change the bit with probability at least 12 , Alice may be able to use error correcting
codes to reliably transmit her message. It will be important to state the limitations we
impose on Ward, since otherwise he can replace all messages with a new (independent)
draw from the channel distribution, effectively destroying any hidden information. In
this section we give a formal definition of robust steganography with respect to a
limited adversary.
We will model the constraint on Ward’s power by a relation R which is constrained

to not corrupt the channel too much. That is, if Alice sends document d, Bob must
receive a document d0 such that (d, d0 ) ∈ R. This general notion of constraint is
sufficient to include many simpler notions such as (for example) “only alter at most
10% of the bits”. We will assume that it would be feasible for Alice and Bob to
check (after the fact) if in fact, Ward has obeyed this constraint; thus both Alice and
Bob know the “rules” Ward must play by. Note however, that Ward’s strategy is still
unknown to Alice and Bob.
We consider robustness in a symmetric-key setting only, since unless Alice and

Bob share some initial secret they cannot hope to accurately exchange keys. One
could alternatively consider a scenario in which the adversary is not allowed to alter
some initial amount of communications between Alice and Bob; but in this case,
using a steganographic key exchange followed by a symmetric-key robust stegosystem
is sufficient.
5.1.1 Definitions for Substitution-Robust Steganography
We model an R-bounded active warden W as an adversary which plays the following

game against a stegosystem S:
66
1. W is given oracle access to the channel distribution C and to SE(K, ·, ·). W
may access these oracles at any time throughout the game.
2. W presents an arbitrary message mW ∈ {0, 1}l2 and history hW .
3. W is then given a sequence of documents σ = (σ1 , . . . σ` ) ← SE(K, mW , hW ),

and produces a sequence sW = (s1 , . . . , s` ) ∈ D` , where (σi , si ) ∈ R for each
1 ≤ i ≤ `.
Define the success of W against S by
0
SuccR
S,W (k) = Pr[SD(K, sW , hW ) 6= mW ] ,
where the probability is taken over the choice of K and the random choices of S and
W . Define the failure rate of S by
FailR SuccR

S (t, q, l, µ, k) = max S,W (k) ,
W ∈W(R,t,q,l,µ)
where W(R, t, q, l) denotes the set of all R-bounded active wardens that submit at
most q(k) encoding queries of total length at most l(k), produce a plaintext of length
at most µ(k) and run in time at most t(k).
Definition 5.1. A sequence of stegosystems {Sk }k∈N is called substitution robust for
C against R if it is steganographically secret for C and there is a negligible function
ν(k) such that for every PPT W , for all sufficiently large k, SuccR
S,W (k) < ν(k).
5.1.2 Necessary conditions for robustness
Consider the question of what conditions on the relation R are necessary to allow
communication to take place between Alice and Bob. Surely it should not be the case
that R = D×D, since in this case Ward’s “substitutions” can be chosen independently
of Alice’s transmissions, and Bob will get no information about what Alice has said.
Furthermore, if there is some document d0 and history h for which

X
Pr[d] = 1
Ch
(d,d0 )∈R
67
then when h has transpired, Ward can effectively prevent the transfer of information
from Alice to Bob by sending the document d0 regardless of the document transmitted
by Alice, because the probability Alice picks a document related to d0 is 1. That is,
after history h, regardless of Alice’s transmission d, Ward can replace it by d0 , so
seeing d0 will give Bob no information about what Alice said.
Since we model the attacker as controlling the history h, then, a necessary condi-
tion on R and C for robust communication is that
X
∀h. Pr[h] = 0 or max Pr[x] < 1 .
C y Ch
(x,y)∈R
P
We denote by I(R, D) the function maxy (x,y)∈R PrD [x]. We say that the pair
(R, D) is δ-admissible if I(R, D) ≤ δ and a pair (R, C) is δ-admissible if ∀h PrC [h] =
0 or I(R, Ch ) ≤ δ. Our necessary condition states that (R, C) must be δ-admissible
for some δ < 1.
It turns out that this condition (on R) will be sufficient, for an efficiently sam-
pleable channel, for the existence of a stegosystem which is substitution-robust against
R.
5.1.3 Universally Substitution-Robust Stegosystem
In this section we give a stegosystem which is substitution robust against any admis-
sible bounding relation R, under a slightly modified assumption on the channel, and
assuming that Alice and Bob know some efficiently computable, δ-admissible relation
R0 such that R0 is a superset of R. As with most of our constructions, this stegosystem
is not really practical but it serves as a proof that robust steganography is possible
for any admissible relation.
Suppose that the channel distribution C is efficiently sampleable. (Recall that C is

efficiently sampleable if there is an efficient algorithm C such that, given a uniformly
chosen string s ∈ {0, 1}k , a security parameter 1k and history h, C(h, 1k , s) is indis-
tinguishable from Ch ). We will assume that Alice, Bob, and Ward all have access
to this algorithm. Furthermore, we assume Alice and Bob share a key K to a pseu-
dorandom function family F : {0, 1}k × {0, 1}∗ → {0, 1}k ; and have a synchronized
68
counter N . We will let n(k) = ω(log k) be a “robustness parameter.” We begin with
a stegosystem which robustly encodes a single bit.
Construction 5.2. ROneBit
Procedure Encode: Procedure Decode:

Input: K, m ∈ {0, 1}, h Input: K, s1 , . . . , sn ∈ Dn , h
for i = 1 . . . n do set m = 0 Set h0 = h1 = ()
set σi = C(h, 1k , FK (N, m)) for i = 1 . . . n do
increment N for b ∈ {0, 1} do
set h = (h, si ) set σb = C((h, hb ), 1k , FK (N, b))
Output: σ1 , . . . , σn set hb = (hb , σb )
if (m = 0 ∧ (σ0 , si ) ∈ R0 ) then m = 0
else m = 1
set h = (h, hm )
Output: m
The idea behind this construction is this: suppose that instead of sharing a key
to a pseudorandom function F , Alice and Bob shared two secret documents d0 , d1
drawn independently from Ch . Then Alice could send Bob the message bit m by
sending document dm , and Bob could recover m by checking to see if the document
he received was related (by R0 ) to d0 or d1 . Since the adversary is R bounded and
(D, R0 ) is δ-admissible, the probability of a decoding error — caused either by the
adversary, or by accidental draw of d0 , d1 — would be at most δ. Intuitively, ROneBit
reduces the probability of decoding error to δ n by encoding each hiddentext bit n
times.
Lemma 5.3. ROneBit is steganographically secret against a nonce-respecting chosen

hiddentext attack:
prf
InSecss
ROneBit,C (t, q, q) ≤ InSecF (t + O(nq), nq).
Proof. Let W be a passive warden which runs in time t, and makes at most q queries
of total length at most q (each query can be only one bit, because of the type). We
construct a PRF adversary A which runs in time t + O(nq) and makes at most nq
queries to F , such that
Advprf ss
F,A (k) = AdvS,C,W (k) .
69
The PRF adversary takes a function oracle f , and emulates W (1k ), responding to the
queries W makes to the encoder SE by using f in place of FK (·, ·). More formally,
we define the subroutine SSE f : {0, 1}∗ × {0, 1}∗ → {0, 1}∗ as follows:
Procedure SSE f :
Input: bit m, history h
for i = 1 . . . n do
set σi = C(1k , h, f (N, m))
increment N
set h = (h, σi )
Output: σ1 , . . . , σn
f
Then we define Af (1k ) = W SSE (1k ); A’s advantage over F is then:
Advprf
FK k

F,A (k) = Pr[A
(1 ) = 1] − Pr[Af (1k ) = 1]

= Pr[W ST (1k ) = 1] − Pr[Af (1k ) = 1]

= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
ROneBit,C,W (k) .
Where the following cases for f justify the substitutions:
• f is chosen from FK (·, ·). Then the output of SSE f is distributed identically
to the encoding function of ROneBit. That is,
Pr[AFK (1k ) = 1] = Pr[W ST (1k ) = 1] .
• f is chosen uniformly. Then by assumption on C, the output of SSE f is dis-

tributed identically to samples from Chn . that is,
Pr[Af (1k ) = 1] = Pr[W CT (1k ) = 1] .
The claim follows by the definition of insecurity.

prf
Lemma 5.4. FailR n
ROneBit (t, q, q, 1, k) ≤ InSecF (t + O(nq), nq, k) + δ .
Proof. Let W be an active R-bounded (t, q, q, 1) warden. We construct a PRF ad-

versary A which runs in time t + O(nq), makes at most nq PRF queries, and satisfies
Advprf R n f
F,A (k) ≥ SuccROneBit,W (k) − δ . A works by emulating W , using its function
70
oracle f in place of FK (·, ·) to emulate ROneBit.Encode in responding to the queries
of W . Let mW , sW be the hiddentext and the stegotext sequence returned by W ,
respectively. Then Af returns 1 iff SDf (sW , hW ) 6= mW . Consider the following two
cases for f :
• f is chosen uniformly from all appropriate functions. Then, for each i, the
stegotexts σi = C(1k , hi , f (N + i, mW )) are distributed independently according
to Chi . Consider the sequence of “alternative stegotexts” di = C(1k , h0i , f (N +
i, 1 − mW )); each of these is also distributed independently according to Ch0i ;
and since W is never given access to the di , the si are independent of the di .
Now SD will fail (causing Af (1k ) to output 1) only if the event ∀i.(di , si ) ∈ R0
occurs. Because the di are independent of the actions of W , and because (D, R0 )
is δ-admissible, each event (di , si ) ∈ R0 happens independently with probability
at most δ. So the probability of failure is at most δ n :
Pr[Af (1k ) = 1] ≤ δ n .
• f is chosen uniformly from FK . Then AF (1k ) outputs 1 exactly when W succeeds

against ROneBit, by the definition of ROneBit.
Pr[AFK (1k ) = 1] = SuccR

ROneBit,W (k) .
Taking the difference of these probabilities, we get:
Advprf
F,A (k) = Pr[A
FK k
(1 ) = 1] − Pr[Af (1k ) = 1]
= SuccR f k
ROneBit,W (k) − Pr[A (1 ) = 1]
≥ SuccR n
ROneBit,W (l) − δ .
Theorem 5.5. If F is pseudorandom then ROneBit is substitution-robust against R

for C.
Proof. The theorem follows by the conjunction of the previous lemmata.
71
We now show how to extend ROneBit to handle multiple-bit messages. We assume
the same setup as previously, i.e., Alice and Bob share a synchronized counter N and
a key K to a PRF F : {0, 1}k ×{0, 1}∗ → {0, 1}k ; and know an efficiently computable,
admissible relation R0 ⊇ R. We assume that the “state updates” made by calls to
ROneBit are maintained across invocations.
Construction 5.6. Robust

Input: K, m ∈ {0, 1}l , h Input: K, s1,1 , . . . , sl,n ∈ Dl×n , h
for i = 1 . . . l do for i = 1 . . . l do
set σi,1...n = ROneBit.SE(K, m, h, N ) set mi = ROneBit.SD(K, si,1...n , h, N )
Output: σ1,1 , . . . , σl,n Output: m1 , . . . , ml
Lemma 5.7. Robust is steganographically secret against a nonce-respecting chosen
hiddentext attack:
prf
InSecss
Robust,C (t, q, l, k) ≤ InSecF (t + O(nl), nl, k).
Proof. Suppose we are given a warden W ∈ W(t, q, l) against the stegosystem Robust.
Then we can construct a warden X ∈ W(t, l, l) against ROneBit. X M works by
simulating W , responding to each oracle query m, h by computing h0 = h, and
σi,1...n = M (mi , hi−1 ), hi = h, σi,1...n for 1 ≤ i ≤ |m|, and returning σ1 , . . . , σ| m|.
Consider the cases for X’s oracle M :
• If M ← ROneBit.Encode, then X’s responses are distributed identically to those

of Robust.Encode. Thus
Pr[X ST (1k ) = 1] = Pr[W ST (1k ) = 1] .
• if M ← Chn , then the response of X to query m, h is distributed identically to

|m|×n
Ch . Thus
Pr[X CT (1k ) = 1] = Pr[W CT (1k ) = 1] .
Combining the cases, we have
Advss
ST k CT k

ROneBit,C,X (k) = Pr[X (1 ) = 1] − Pr[X (1 ) = 1]

= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
Robust,C,W (k)
72
Combining the fact that X makes l queries to ROneBit.Encode and runs in time
t + O(l) with the result of lemma 5.3, we get
prf
Advss
Robust,C,W (k) ≤ InSecF (t + O(nl), nl, k) .
prf
Lemma 5.8. FailR n
Robust (t, q, l, µ, k) ≤ InSecF (t + O(nl), nl, k) + µδ .
Proof. Let W be an active R-bounded (t, q, q, 1) warden. We construct a PRF ad-

versary A which runs in time t + O(nl), makes at most nl PRF queries, and satisfies
Advprf R n f
F,A (k) ≥ SuccRobust,W (k) − µδ . A works by emulating W , using its function
oracle f in place of FK (·, ·) to emulate Robust in responding to the queries of W . Let

mW , sW be the hiddentext and the stegotext sequence returned by W , respectively.
Then Af returns 1 iff SDf (sW , hW ) 6= mW . Consider the following two cases for f :
• f is chosen uniformly from all appropriate functions. Then, for each i, the
stegotexts σi,j = C(1k , hi,j , f (N + (i − 1)n + j, mW,i )) are distributed inde-
pendently according to Chni . Consider the sequence of “alternative stegotexts”
di,j = C(1k , h0i,j , f (N + (i − 1)n + j, 1 − mW,i )); each of these is also distributed
independently according to Ch0i,j ; and since W is never given access to the di,j ,
the si,j are independent of the di,j . Now SD will fail (causing Af (1k ) to out-
put 1) only if the event ∀j.(di,j , si,j ) ∈ R0 occurs for some i. Because the di,j
are independent of the actions of W , and because (D, R0 ) is δ-admissible, each
event (di,j , si,j ) ∈ R0 happens independently with probability at most δ. So the
probability of failure for any i is at most δ n . A union bound then gives us:
Pr[Af (1k ) = 1] ≤ µδ n .

against Robust, by the definition of Robust.

Robust,W (k) .
73
Advprf
F,A (k) = Pr[A
FK k
(1 ) = 1] − Pr[Af (1k ) = 1]
= SuccR f k
Robust,W (k) − Pr[A (1 ) = 1]
≥ SuccR n
Robust,W (k) − µδ .
5.2 Active Distinguishing Attacks
In an active distinguishing attack, Ward is allowed to ask Bob to decode innocent-

looking messages, in order to discover whether the messages sent by Alice are stego-
texts or covertexts. In the real world, it might be that Ward knows that if Alice is
using steganography, she will encode a certain set of instructions in her message to
Bob. Ward could perhaps intercept that message and try to alter the instructions,
and then send the altered message to Bob and see how he reacts. If Bob follows
Ward’s instructions, Ward can conclude that Alice’s message must have contained
steganography, and if Bob doesn’t react, Alice’s message was probably innocent.
5.2.1 Chosen-covertext attacks
A Chosen-covertext attack is essentially the same as a chosen-hiddentext attack ex-

cept that the adversary is given access to a decoding oracle, with the restriction
that he can’t invoke the decoding oracle on the “challenge” sequence which is either
a stegotext or a covertext. We will give a formal definition of this concept, and a
construction for any efficiently sampleable channel, assuming the existence of a sym-
metric or public-key encryption scheme which is indistinguishable from random bits
under chosen-ciphertext attack.
Symmetric chosen-covertext attacks
In order to construct a stegosystem which is secure against chosen-covertext attacks,

we will first need to introduce the notion of a cryptosystem which is indistinguishable
74
from random bits under chosen-ciphertext attack.
IND$-CCA Security
Definition. Let E be a symmetric encryption scheme. We define a chosen-ciphertext

attack against E as a game played by an oracle adversary A. A is given oracle access
to DK and an encryption oracle e which is either:
• EK : an oracle that returns EK (m).
• $: an oracle that returns a sample from U|EK (m)| .
A is restricted so that he may not query DK on the result of any query to EK . We

define A’s CCA advantage against E by
Advcca
EK ,DK k $,DK k

E,A (k) = Pr[A (1 ) = 1] − Pr[A (1 ) = 1] ,
where K ← Uk , and define the CCA insecurity of E by
InSeccca Advcca

E (t, qe , qd , µe , µd , k) = max E,A (k) ,
A∈A(t,qe ,qd ,µe ,µd )
where A(t, qe , qd , l∗ , µe , µd ) denotes the set of adversaries running in time t, that make
qe queries of µe bits to e, and qd queries of µd bits to DK .
Then E is (t, qe , qd , µe , µd , k, )-indistinguishable from random bits under chosen

ciphertext attack if InSeccca
E (t, qe , qd , µe , µd , k) ≤ . E is called indistinguishable from
random bits under chosen ciphertext attack (IND$-CCA) if for every PPT A, Advcca
A,E (k)
is negligible in k.
Construction. We let E be any IND$-CPA-secure symmetric encryption scheme and
let F : {0, 1}k × {0, 1}∗ → {0, 1}k be a pseudorandom function. We let K, κ ← Uk .
We construct a cryptosystem E as follows:
• E.Encrypt(K, κ, m): Draw r ← Uk , c ← E.E(K, m), compute t = Fκ (rkc), and

return rkckt.
• E.Decrypt(K, κ, rkckt): If Fκ (rkc) = t, then return E.DK (c), else return ⊥.
75
Theorem 5.9.
InSeccca q , ~µ, l∗ , k) ≤ InSeccpa

E (t, ~
0 prf 0 2
E (t , qe , µe , k) + 2InSecF (t , qe + qd , k) + (qe + qd )2
−k
Proof. Choose an arbitrary adversary A ∈ A(t, qe , qd , µe , µd ). We will consider the

advantage of A in distinguishing the following set of hybrid oracle pairs:
• E1 , D1 : E1 (m) = E.Encrypt(m), D1 (c) = E.Decrypt(c).
• E2 , D2 : uniformly choose f : {0, 1}∗ → {0, 1}k , and a K ← Uk .
To draw from E2 (m), choose r ← Uk , draw c ← E.EK (m), compute t = f (rkc),

and output rkckt
To compute D2 (rkckt), output ⊥ if t 6= f (rkc) and return E.DK (c) otherwise.
• E3 , D3 : choose a random f : {0, 1}∗ → {0, 1}k , and a random K ← Uk .
To draw from E3 (m), choose r ← Uk , draw c ← U`(|m|) , compute t = f (rkc),

and output rkckt.
To compute D3 (m), output ⊥ if t 6= f (rkc) and return E.DK (c) otherwise.
• E4 , D4 : uniformly choose f : {0, 1}∗ → {0, 1}k and K ← Uk .
To draw from E4 (m), choose c ← U2k+`(|m|) .
To compute D4 (rkckt), output ⊥ if t 6= f (rkc) and return E.DK (c) otherwise.
• E5 , D5 : choose K, κ ← Uk .
To draw from E5 (m), choose c ← U2k+`(|m|) .
To compute D5 (rkckt), output ⊥ if t 6= Fκ (rkc) and return E.DK (c) otherwise.
By construction it is clear that
Pr[AE.EK ,E.DK (1k ) = 1] = Pr[AE1 ,D1 (1k ) = 1] ,
and it is also obvious that
Pr[A$,E.DK (1k ) = 1] = Pr[AE5 ,D5 (1k ) = 1] .
76
If we define the function

AdviA (k) = Pr[AEi ,Di (1k ) = 1] − Pr[AEi+1 ,Di+1 (1k ) = 1] ,
we then have that:
Advcca
E.EK ,E.DK k $,E.DK k

A,E (k) = Pr[A (1 ) = 1] − Pr[A (1 ) = 1]

= Pr[AE1 ,D1 (1k ) = 1] − Pr[AE5 ,D5 (1k ) = 1]
4
X
≤ Pr[AEi ,Di (1k ) = 1] − Pr[AEi+1 ,Di+1 (1k ) = 1]
i=1
4
X
= AdviA (k)
i=1
We will proceed to bound AdviA (k), for i ∈ {1, 2, 3, 4}.
Lemma 5.10. Adv1A (k) ≤ InSecprf 0

F (t , qe + qd , k)
Proof. We design a PRF adversary B such that Advprf 1

B,F (k) ≥ AdvA (k) as follows.
B picks K ← Uk and runs A. B uses its function oracle f to respond to A’s queries
as follows:
• On encryption query m, B picks r ← Uk , computes c ← E.EK (m), computes

t = f (rkc) and returns rkckt.
• On decryption query rkckt, B returns ⊥ if t 6= f (rkc) and returns E.DK (c)

otherwise.
Clearly, when B’s oracle f ← F , B simulates E1 , D1 to A:
Pr[B FK (1k ) = 1] = Pr[AE1 ,D1 (1k ) = 1] ,
and when f ← U (∗, k), B simulates E2 , D2 to A:
Pr[B f (1k ) = 1] = Pr[AE2 ,D2 (1k ) = 1] ,
which gives us

Adv1A (k) = Pr[AE1 ,D1 (1k ) = 1] − Pr[AE2 ,D2 (1k ) = 1]

= Pr[B FK (1k ) = 1] − Pr[B f (1k ) = 1]
= Advprf prf 0
B,F (k) ≤ InSecF (t , qe + qd , k)
77
as claimed.
Lemma 5.11. Adv2A (k) ≤ InSeccpa 0

E (t , qe , µe , k) + qd 2
−k
Proof. We will construct a CPA adversary B for E such that
Advcpa 2
B,E (k) ≥ AdvA (k) − qd 2
−k
.
B O works by emulating A, responding to queries as follows, where f is a randomly-

chosen function built up on a per-query basis by B:
• on encryption query m, B picks r ← Uk , computes c = O(m), and sets t =

f (rkc), and returns rkckt.
• on decryption query rkckt, B checks whether t = f (ckr); if not, B returns ⊥

and otherwise B halts and outputs 0.
Let V denote the event that A submits a decryption query that would cause B to
halt. Then, conditioned on ¬V, when B’s oracle is $, B perfectly simulates E3 , D3 to
A:
Pr[B $ (1k ) = 1] = Pr[AE3 ,D3 (1k ) = 1|¬V] .
Also, conditioned on ¬V, when B’s oracle is E.EK , B perfectly simulates E2 , D2 to A:
Pr[B EK (1k ) = 1] = Pr[AE2 ,D2 (1k ) = 1|¬V] .
Combining the cases, we have:

Adv2A (k) = Pr[AE3 ,D3 (1k ) = 1] − Pr[AE2 ,E2 (1k ) = 1]

= Pr[AE3 ,D3 (1k ) = 1|V] Pr[V] + Pr[AE3 ,D3 (1k ) = 1|¬V] Pr[¬V]
− Pr[AE2 ,D2 (1k ) = 1|V] Pr[V] + Pr[AE2 ,D2 (1k ) = 1|¬V] Pr[¬V]

≤ Pr[V] Pr[AE3 ,D3 (1k ) = 1|V] − Pr[AE2 ,D2 (1k ) = 1|V]

+ Pr[¬V] Pr[AE3 ,D3 (1k ) = 1|¬V] − Pr[AE2 ,D2 (1k ) = 1|¬V]

≤ Pr[V] + Pr[AE3 ,D3 (1k ) = 1|¬V] − Pr[AE2 ,D2 (1k ) = 1|¬V]

≤ Pr[V] + Pr[B $ (1k ) = 1] − Pr[B EK (1k ) = 1]
≤ Pr[V] + Advcpa
B,E (k)
≤ qd 2−k + InSeccpa 0
E (t , qe , µe , k)
78
Where the last line follows because each decryption query causes B to halt with
probability 2−k ; the union bound gives the result.
qe2
Lemma 5.12. Adv3A (k) ≤ 2k
Proof. Notice that unless E3 chooses the same values of (r, c) at least twice, E3 and
E4 are identical. Denote this event by C. Then we have:


= Pr[AE3 ,D3 (1k ) = 1|C] Pr[C] + Pr[AE3 ,D3 (1k ) = 1|¬C] Pr[¬C]

− Pr[AE4 ,D4 (1k ) = 1|C] Pr[C] + Pr[AE4 ,D4 (1k ) = 1|¬C] Pr[¬C]

≤ Pr[C] Pr[AE3 ,D3 (1k ) = 1|C] − Pr[AE4 ,D4 (1k ) = 1|C]

+ Pr[¬C] Pr[AE3 ,D3 (1k ) = 1|¬C] − Pr[AE2 ,D2 (1k ) = 1|¬C]

= Pr[C] Pr[AE3 ,D3 (1k ) = 1|C] − Pr[AE4 ,D4 (1k ) = 1|C]
≤ Pr[C]

−k qe
≤2
2
Lemma 5.13. Adv4A (k) ≤ InSecprf 0

F (t , qd , k)
Proof. We construct a PRF adversary B against F with advantage
Advprf 4
B,F (k) = AdvA (k) .
B f starts by choosing K ← Uk . B then runs A, responding to encryption queries

E(m) with rkckt ← U2k+`(|m|) , and responding to decryption queries D(rkckt) with
⊥ if t 6= f (rkc), and DK (c) otherwise. B outputs the bit chosen by A. Notice that
by construction,
Pr[B FK (1k ) = 1] = Pr[AE5 ,D5 (1k ) = 1] , and

Pr[B f (1k ) = 1] = Pr[AE4 ,D4 (1k ) = 1] ,
79
so by definition of advantage, we get:


= Pr[B FK (1k ) = 1] − Pr[B f (1k ) = 1]
= Advprf prf 0
B,F (k) ≤ InSecF (t , qd , k)
The theorem follows by the conjunction of the lemmata.
Chosen-covertext attack definition
In an adaptive chosen-covertext attack against a symmetric stegosystem S, an adver-

sary W is given access to a mystery oracle O, which is either SEK for a uniformly
|SEK (m,h)|
chosen key K or OC , which on query m, h returns a sample from Ch . The
attacker is restricted to querying SD only on strings which were not generated by
queries to O. (As always, W is allowed to know the channel distribution C) At the
conclusion of the attack, W must guess the type of O. We define the Symmetric
Chosen-Covertext Advantage of W against S with respect to C by
Advscca (1 ) = 1] − Pr[W OC ,SD (1k ) = 1] ,

SE,SD k

S,C,W (k) = Pr[W

And define the sCCA insecurity of S with respect to C by
InSecscca Advscca

S,C (t, qe , qd , µe , µd , k) = max S,C,W (k) ,
W ∈W(t,~
q ,~
µ)
where W(t, ~q, ~µ) denotes the class of all W running in time t which make at most qe
encoding queries of µe bits and at most qd decoding queries of µd bits.
We say that S is (t, ~q, ~µ, k, ) secure against symmetric chosen-covertext attack
with respect to C if
InSecscca q , ~µ, k) ≤ ,
S,C (t, ~
and that S is secure against symmetric chosen-covertext attack with respect to C

(SS-sCCA) if Advscca
S,C,W (k) is negligible for all PPT W .
80
Construction
Both of our constructions of chosen-covertext secure stegosystems will require an

efficiently sampleable channel C,1 and will make use of the following “deterministic
encoding” routine:
Construction 5.14. (Deterministic Encoding)
Procedure DEncode:
Input: bits c1 , . . . , cl , history h, bound k, randomness r1 , . . . rlk ∈ {0, 1}k
Let ι = 0
for i = 1 . . . l do
Let j = 0
repeat:
compute si = C(1k , h, rι ), increment j, ι
set h = h, si
Output: s1 , s2 , . . . , sl
For a given sequence of lk random inputs, this routine has exactly one possible
encoding for any message. Building on this routine, we will assume that Alice and Bob
share a key K for a symmetric IND$-CCA-secure encryption scheme E = (G, E, D)
such that for any key K, for any l ≥ k, Pr[U`(l) ∈ [EK ({0, 1}l )]] ≤ 1/2k (An example is
the scheme of the previous section). We will also assume that G : {0, 1}k → {0, 1}k×lk
is a pseudorandom generator. The the following scheme is secure against symmetric
chosen-covertext attack:
Construction 5.15. (sCCA Stegosystem)

Input: m ∈ {0, 1}` , h, K Input: s1 , . . . , sl , h, K
Choose r ← Uk Let c = Basic Decode(s1 , . . . , sl )
Let c = EK (rkm) Parse DK (c) as rkk m.
Let ~r = G(r) Set ~r = G(r).
Output: DEncode(c, h, k, ~r) If s 6= DEncode(c, h, k, ~r) return ⊥.
Output: m
Theorem 5.16. if f is -biased for C, then
InSecscca q , ~µ, k) ≤ InSeccca

sCCA,C (t, ~
0
q , ~µ, k) + qe 2−k + `(µe ) + qe InSecprg
E (t , ~
0
G (t , k) ,
1
Recall that a channel is efficiently sampleable if there exists a PPT C such that C(1k , h, Uk ) and
Ch are computationally indistinguishable
81
where t0 ≤ t + O((µe + µd )k).
Proof. Choose an arbitrary W ∈ W(t, ~q, ~µ). We will bound Advscca

S,C,W (k) by consid-
ering the following sequence of hybrid oracles:
• O1 : the channel oracle C.
• O2 : responds to queries by replacing c ← EK (rkm) with c ← U and replacing

~r = G(r) with ~r ← Uk×lk .
• O3 : responds to queries by replacing c ← EK (rkm) with c ← U .
• O4 : responds to queries with sCCA.Encode.
Clearly Pr[W O1 ,SDK (1k ) = 1] = Pr[W OC ,SDK (1k ) = 1] and Pr[W O4 ,SDK (1k ) = 1] =
Pr[W SEK ,SDK (1k ) = 1]. Thus
O4 ,SDK k O1 ,SDK k
Advscca

S,C,W (k) = Pr[W (1 ) = 1] − Pr[W (1 ) = 1]
≤ Pr[W O2 ,SDK (1k ) = 1] − Pr[W O1 ,SDK (1k ) = 1]

+ Pr[W O3 ,SDK (1k ) = 1] − Pr[W O2 ,SDK (1k ) = 1]

+ Pr[W O4 ,SDK (1k ) = 1] − Pr[W O3 ,SDK (1k ) = 1]

For convenience, we will define the quantity
AdviW (k) = Pr[W Oi+1 ,SDK (1k ) = 1] − Pr[W Oi ,SDK (1k ) = 1] ,

and we will proceed to bound AdviW (k) for i ∈ {1, 2, 3}.
Lemma 5.17. Adv1W (k) ≤ `(µe )
Proof. This follows from lemma 4.8.
Lemma 5.18. Adv2W (k) ≤ qe InSecprg 0

G (t , k)
Proof. We will construct a PRG adversary A for G such that
Advprg 2
G,A (k) ≥ 1/qe AdvW (k) .
82
A works as follows: first, A picks a key K ← Uk to use in responding to the queries
W makes to SDK . Suppose A is given as input qe strings r1 , . . . , rqe of length k × lk
and asked to decide whether they are all samples from Uk×lk or samples from G(Uk ).
Then A can achieve advantage precisely Adv2W (k) by emulating W , responding to
its decoding queries using K, and responding to the ith encoding query (m, h) by
drawing c ← U`(|m|+k) and giving the response DEncode(c, h, k, ri ). If all of the ri are
drawn from Uk×lk , then A perfectly simulates O1 to W , and if all are drawn from
G(Uk ), A perfectly simulates O2 . Thus A’s advantage in distinguishing G(Uk )qe and
qe
Uk×lk is exactly Adv2W (k). The lemma follows from this fact and proposition 2.6 (a
straightforward hybrid argument).
Lemma 5.19. Adv3W (k) ≤ InSeccca 0

q , ~µ, k) + qe 2−k
E (t , ~
Proof. We will construct an adversary A that plays the chosen-ciphertext attack game
against E with advantage
Advcca 3
A,E (k) ≥ AdvW (k) .
A works by emulating W and responding to queries as follows:
• on encoding query (m, h), AO chooses r ← Uk , computes c ← O(rkm), and

returns DEncode(c, h, k, G(r)).
• on decoding query (s, h), A computes c = Basic Decode(s, h); if c was previ-
ously generated by an encoding query, A returns ⊥, otherwise A uses its decryp-
tion oracle to compute rkk m = DK (c). If c 6=⊥ and s = DEncode(c, h, k, G(r)),
A returns m, otherwise A returns ⊥.
In other words, A simulates running the routines sCCA.Encode and sCCA.Decode with
its oracles; with the exception that because A is playing the IND$-CCA game, he is
not allowed to query DK on the result of an encryption query: thus a decoding query
that has the same underlying ciphertext c must be dealt with specially.
Notice that when A is given an encryption oracle, he perfectly simulates O4 to W ,

that is:
Pr[AEK ,DK (1k ) = 1] = Pr[W O4 ,SDK (1k ) = 1] .
83
This is because when c = EK (rkm) then the test s = DEncode(c, h, k, G(r)) would
fail anyways.
Likewise, when A is given a random-string oracle, he perfectly simulates O3 to W ,

given that the outputs of O are not valid ciphertexts. Let us denote the event that
some output of O is a valid ciphertext by V, and the event that some output of O3
encodes a valid ciphertext by U; notice that by construction Pr[U] = Pr[V]. We then
have that
Pr[A$,DK (1k ) = 1] = Pr[A$,DK (1k ) = 1|¬V] Pr[¬V] + Pr[A$,DK (1k ) = 1|V] Pr[V]
≤ Pr[W O3 ,SDK (1k ) = 1|¬U] Pr[¬U] + Pr[V]
≤ Pr[W O3 ,SDK (1k ) = 1] + Pr[V]
≤ Pr[W O3 ,SDK (1k ) = 1] + qe 2−k ,
since Pr[V] ≤ qe 2−k by assumption on E and the union bound.
Combining the cases, we find that
Advcca
A,E (k) = Pr[A
EK ,DK k
(1 ) = 1] − Pr[A$,DK (1k ) = 1]
= Pr[W O4 ,SDK (1k ) = 1] − Pr[A$,DK (1k ) = 1]
≥ Pr[W O4 ,SDK (1k ) = 1] − Pr[W O3 ,SDK (1k ) = 1] − qe 2−k
= Adv3W (k) − qe 2−k
Which proves the lemma.
Combining the three lemmata yields the proof of the theorem.
Public-Key Chosen-covertext attacks
In the public-key case, we will likewise need to construct a public-key encryption

scheme which is indistinguishable from random bits under chosen-ciphertext attack.
The definitions in this section are mostly analogous to those of the previous section,
although the construction of a public-key encryption scheme satisfying this definition
uses very different techniques.
84
IND$-CCA
Let E be a public-key encryption scheme. A chosen-ciphertext attack against E is de-

fined analogously to the symmetric case, except that instead of an oracle for EP K , the
adversary A is given the public key P K: Let E be a symmetric encryption scheme. We
define a chosen-ciphertext attack against E as a game played by an oracle adversary
A:
1. A is given P K and oracle access to DSK , and determines a challenge message

m∗ of length l∗ .
2. A is given a challenge ciphertext c∗ , which is either drawn from EP K (m∗ ) or

U`(l∗ ) .
3. A continues to query DSK subject to the restriction that A may not query
DSK (c∗ ). A outputs a bit.
We define A’s CCA advantage against E by

∗
Advcca
DSK DSK

E,A (k) = Pr[A (P K, E P K (m )) = 1] − Pr[A (P K, U ` ) = 1] ,
where m∗ ← ADSK (P K) and (P K, SK) ← G(1k ), and define the CCA insecurity of
E by
∗
InSeccca Advcca

E (t, q, µ, l , k) = max E,A (k) ,
A∈A(t,q,,µ,l∗ )
∗
where A(t, q, µ, l ) denotes the set of adversaries running in time t, that make q
queries of total length µ, and issue a challenge message m∗ of length l∗ . Then E
is (t, q, µ, l∗ , k, )-indistinguishable from random bits under chosen ciphertext attack if
InSeccca ∗
E (t, q, µ, l , k) ≤ . E is called indistinguishable from random bits under chosen
ciphertext attack (IND$-CCA) if for every PPTM A, Advcca

A,E (k) is negligible in k.
Construction. Let Πk be a family of trapdoor one-way permutations on domain

{0, 1}k . Let SEk0 = (E, D) be a symmetric encryption scheme which is IND$-CCA
0
secure. Let H : {0, 1}k ← {0, 1}k be a random oracle. We define our encryption
scheme E as follows:
• Generate(1k ): draws (π, π −1 ) ← Πk ; the public key is π and the private key is
π −1 .
85
• Encrypt(π, m): draws a random x ← Uk , computes K = H(x), c = EK (m),
y = π(x) and returns ykc.
• Decrypt(π −1 , ykc): computes x = π −1 (y), sets K = H(x) and returns DK (c).
Theorem 5.20.
cca 0
InSeccca ow
E (t, q, µ, l, k) ≤ InSecΠ (t, k) + InSecSE (t , 1, q, l, µ, k) ,
where t0 ≤ t + O(qH ).
Proof. We will show how to use any adversary A ∈ A(t, q, µ, l) against E to create an
adversary B which plays both the IND$-CCA game against SE and the OWP game
against Π so that B succeeds in at least one game with success close to that of A.
B receives as input an element π ∈ Π and a y ∗ ∈ {0, 1}k and also has access to
encryption and decryption oracles O, DK for SE. B keeps a list L of (y, z) pairs,
0
where y ∈ {0, 1}k and z ∈ {0, 1}k , initially, L is empty. B runs A with input π and
answers the decryption and random oracle queries of A as follows:
• When A queries H(x), B first computes y = pi(x), and checks to see whether
y ∗ = y; if it does, B “decides” to play the OWP game and outputs x, the inverse
of y ∗ . Otherwise, B checks to see if there is an entry in L of the form (y, z); if
there is, B returns z to A. If there is no such entry, B picks a z ← Uk0 , adds
(y, z) to L and returns z to A.
• When A queries DSK (ykc), first check whether y = y ∗ ; if so, return DK (c).
Otherwise, check whether there is an entry in L of the form (y, z); if not, choose
z ← Uk0 and add one. Return SE.Dz (y).
When A returns the challenge plaintext m∗ , B computes c∗ = O(m∗ ) and gives A

the challenge value y ∗ kc∗ . B then proceeds to run A, answering queries in the same
manner. If B never terminates to play the OWP game, B decides to play the IND$-
CCA game and outputs A’s decision. Now let P denote the event that A queries H(x)
on an x such that π(x) = y ∗ . Clearly,
Advow
B,Π (k) = Pr[P] .
86
Now, conditioned on ¬P, when B’s oracle O is a random string oracle, c∗ ← U` and
B perfectly simulates the random-string world to A. And (still conditioned on ¬P)
when B’s oracle O is EK , B perfectly simulates the ciphertext world to A. Thus, we
have that:
Advcca
B,SE (k) = Pr[B
$,SE.DK
(π, y) = 1] − Pr[B SE.EK ,SE.DK (π, y) = 1]
= Pr[AE.DSK (U` ) = 1|¬P] − Pr[AE.DSK (E.E(π, m∗ )) = 1|¬P]
But this gives us
E.DSK
Advcca
A,E (k) = Pr[A (U` ) = 1] − Pr[AE.DSK (E.E(π, m∗ )) = 1]
= Pr[AE.DSK (U` ) = 1|¬P] − Pr[AE.DSK (E.E(π, m∗ )) = 1|¬P] Pr[¬P]

+ Pr[AE.DSK (U` ) = 1|P] − Pr[AE.DSK (E.E(π, m∗ )) = 1|P] Pr[P]

≤ Pr[AE.DSK (U` ) = 1|¬P] − Pr[AE.DSK (E.E(π, m∗ )) = 1|¬P] + Pr[P]

= Advcca ow
B,SE (k) + AdvB,Π (k)
0 ow 0
≤ InSeccca
SE (t , 1, q, l, µ, k) + InSecΠ (t , k)
SS-CCA Game
In an adaptive chosen-covertext attack against a public-key stegosystem S, a chal-

lenger draws a key pair (P K, SK) ← SG(1k ), and an adversary W is given P K and
allowed oracle access to SDSK . The attacker produces a challenge hiddentext m∗ and
∗
history h∗ and is given as a response a sequence of documents s∗ ∈ D`(|m |) . After
this, the attacker continues to query SD with the restriction that he may not query
SD(s∗ ). (As always, W is allowed to know the channel distribution C) At the con-
∗
clusion of the attack, W must guess whether s∗ ← SE(P K, m∗ , h∗ ) or s∗ ← Ch` . We
define the Steganographic Chosen-Covertext Advantage of W against S with respect
to C by
∗
Advscca (P K, SE(P K, m∗ , h∗ )) = 1] − Pr[W SDSK (P K, Chl ) = 1] ,
SDSK

S,C,W (k) = Pr[W

87
where (m∗ , h∗ ) ← W SDSK (P K) and (P K, SK) ← SG(1k ). We define the sCCA
insecurity of S with respect to C by
∗
InSecscca Advscca

S,C (t, q, µ, l , k) = max S,C,W (k) ,
W ∈W(t,q,µ,l∗ )
where W(t, q, µ, l∗ ) denotes the class of all W running in time t which make at most
q oracle queries of µ bits and submit a challenge hiddentext of length at most l∗ .
We say that S is (t, q, µ, l, k, ) secure against chosen-covertext attack with respect

to C if
InSecscca
S,C (t, q, µ, l, k) ≤ ,
and that S is secure against chosen-covertext attack with respect to C (SS-sCCA) if

Advscca
S,C,W (k) is negligible for all PPT W .
Construction
Our construction of a public-key stegosystem secure against chosen-covertext attack

is similar to the construction for the symmetric-key case. We will assume that E is a
public-key IND$-CCA secure encryption scheme, and (P K, SK) ← E.G(1k ). Further-
more, we assume that for any l, Pr[SDSK (Ul ) 6=⊥] ≤ 2−t for some t = poly(k). We
will also assume that G : {0, 1}k → {0, 1}k×lk is a pseudorandom generator.
The the following scheme is secure against chosen-covertext attack:
Construction 5.21. (pCCA Stegosystem)

Input: m ∈ {0, 1}` , h, P K Input: s1 , . . . , sl , h, SK
Let c = EP K (rkm) Parse DSK (c) as rkk m.
Let ~r = G(r) Set ~r = G(r).
Output: DEncode(c, h, k, ~r) If s 6= DEncode(c, h, k, ~r) return ⊥.
Output: m
Theorem 5.22. if f is -biased for C, then
InSecscca cca 0
pCCA,C (t, q, µ, l, k) ≤ InSecE (t , q, µ, l, k) + 2
−t
+ `(l + k) + InSecprg 0
G (t , k) ,
where t0 ≤ t + O(lk).
88
Proof. Choose an arbitrary W ∈ W(t, q, µ, l); let (P K, SK) ← G(1k ) and let
(m∗ , h∗ ) ← W SDSK (P K) .
We will bound Advscca

W,pCCA,C (k) by considering the following sequence of hybrid distri-
bution:
`(l+k)
• D1 : Ch∗
• D2 : DEncode(U`(l+k) , h∗ , k, Uk×lk )
• D3 : DEncode(U`(l+k) , h∗ , k, G(Uk ))
• D4 : DEncode(EP K (rkm∗ ), h∗ , k, G(r)), where r ← Uk
Clearly Pr[W SD (D4 ) = 1] = Pr[W SD (SE(P K, m∗ , h∗ )) = 1] and Pr[W SD (D1 ) = 1] =

`(l+k)
Pr[W SD (Ch∗ ) = 1]. Thus
Advscca
SD

W,pCCA,C (k) = Pr[W
(D4 ) = 1] − Pr[W SD (D1 ) = 1]

≤ Pr[W SD (D2 ) = 1] − Pr[W SD (D1 ) = 1]

+ Pr[W SD (D3 ) = 1] − Pr[W SD (D2 ) = 1]

+ Pr[W SD (D4 ) = 1] − Pr[W SD (D3 ) = 1]
For convenience, we will define the quantity

AdviW (k) = Pr[W SD (Di+1 ) = 1] − Pr[W SD (Di ) = 1] ,
and we will proceed to bound AdviW (k) for i ∈ {1, 2, 3}.
Lemma 5.23. Adv1W (k) ≤ `(l + k)
Proof. This follows from lemma 4.8.
Lemma 5.24. Adv2W (k) ≤ InSecprg 0

G (t , k)
Proof. We will construct a PRG adversary A for G such that
Advprg 2
G,A (k) = AdvW (k) .
89
A works as follows: first, A picks a key pair (P K, SK) ← G(1k ) to use in responding
to the queries W makes to SD. A is given as input a string r ∈ {0, 1}k×lk and
asked to decide whether r ← Uk×lk or r ← G(Uk ). Then A can achieve advantage
precisely Adv2W (k) by emulating W , responding to its decoding queries using SK,
and responding to the challenge hiddentext (m∗ , h∗ ) by drawing c ← U`(l+k) and giving
the response s = DEncode(c, h, k, r). If r ← Uk×lk , then s ← D1 , and if r ← G(Uk ),
then s ← D2 . Thus A’s advantage in distinguishing G(Uk ) and Uk×lk is exactly:
Advprg
A,G (k) = |Pr[A(G(Uk )) = 1] − Pr[A(Uk×lk ) = 1]|

= Pr[W SD (D2 ) = 1] − Pr[W SD (D1 ) = 1]
= Adv2W (k)
Lemma 5.25. Adv3W (k) ≤ InSeccca 0

q , ~µ, k) + 2−t
E (t , ~
Proof. We will construct an adversary A that plays the chosen-ciphertext attack game
against E with advantage
Advcca 3
A,E (k) ≥ AdvW (k) .
A starts by emulating W to get a challenge hiddentext, responding to decod-

ing queries as follows: on query (s, h), A computes c = Basic Decode(s, h); A
then uses its decryption oracle to compute rkk m = DSK (c). If c 6=⊥ and s =
DEncode(c, h, k, G(r)), A returns m, otherwise A returns ⊥.
When W generates challenge (m∗ , h∗ ), A chooses r∗ ← Uk and outputs the chal-

lenge r∗ km∗ . A is given the challenge ciphertext c∗ and returns
s∗ = DEncode(c∗ , h∗ , k, G(r∗ ))
to W .
A continues to emulate W , responding to queries as follows: on decoding query

(s, h), A computes c = Basic Decode(s, h); if c = c∗ A returns ⊥, otherwise A uses its
decryption oracle to compute rkk m = DSK (c). If c 6=⊥ and s = DEncode(c, h, k, G(r)),
A returns m, otherwise A returns ⊥.
90
In other words, A simulates running sCCA.Decode with its DSK oracle, except
that because A is playing the IND$-CCA game, he is not allowed to query DSK on the
challenge value c∗ : thus a decoding query that has the same underlying ciphertext c∗
must be dealt with specially.
Notice that when A is given an encryption of r∗ km∗ , he perfectly simulates D4 to

W , that is:
Pr[ADSK (P K, EP K (r∗ km∗ ) = 1] = Pr[W SD (P K, D4 ) = 1] .
This is because when c∗ = EK (r∗ km∗ ) then the test s = DEncode(c, h, k, G(r)) would
fail anyways.
Likewise, when A is given a random string, he perfectly simulates D3 to W , given

that c∗ is not a valid ciphertext. Let us denote the event that c∗ is a valid ciphertext
by V, and the event that a sample from D3 encodes a valid ciphertext by U; notice
that by construction Pr[U] = Pr[V]. We then have that
Pr[ADSK (P K, U` ) = 1] = Pr[ADSK (P K, U` ) = 1|¬V] Pr[¬V]

+ Pr[ADSK (P K, U` ) = 1|V] Pr[V]
≤ Pr[W SD (P K, D3 ) = 1|¬U] Pr[¬U] + Pr[V]
≤ Pr[W SD (P K, D3 ) = 1] + Pr[V]
≤ Pr[W SD (P K, D3 ) = 1] + 2−t ,
since Pr[V] ≤ 2−t by assumption on E.
Combining the cases, we find that
Advcca
A,E (k) = Pr[A
DSK
(P K, EP K (r∗ km∗ ) = 1] − Pr[ADSK (P K, U` ) = 1]
= Pr[W SD (P K, D4 ) = 1] − Pr[ADSK (P K, U` ) = 1]
≥ Pr[W SD (P K, D4 ) = 1] − Pr[W SD (P K, D3 ) = 1] − 2−t
= Adv3W (k) − 2−t
Which proves the lemma.
Combining the three lemmata yields the proof of the theorem.
91
5.2.2 Authenticated Stegosystems
In the case of public-key steganography, Ward is capable of an even stronger attack

than the the CCA attack. For example, the warden can detect the use of steganog-
raphy by Bob simply by encoding a message, sending it to Bob and watching his
reaction: if he reacts consistently with receiving the warden’s message, then he is
probably decoding messages. Thus the warden’s goal should be to detect whether
a specific pair, Alice and Bob are communicating steganographically. To protect
against such an attack will require that Alice have some secret differentiating herself
from the warden: we will allow Alice to publish a “steganographic verification key”
which will allow anyone with private key SK to verify that a stegotext generated
with the corresponding public key P K was generated by Alice; Alice will keep the
“steganographic signature” key secret. In this model, we will define additional attack
games to the basic chosen-hiddentext attack: the Chosen Exactly One Attack, and
the Chosen Stegotext Attack.
Before we can do so, however, it is necessary to extend the syntax and correctness
definitions of a public-key stegosystem to include steganographic signatures.
Definition 5.26. An authenticated public-key stegosystem S is a quadruple of algo-

rithms:
• S.CodeGen takes as input a security parameter 1k and generates a key pair

(P K, SK) ∈ PK × SK. When it is clear from the context which stegosystem
we are referring to, we will abbreviate S.Generate by SG.
• S.SigGen (abbreviated SSG when S is clear from the context) takes as input
a security parameter 1k and generates a key pair (SV K, SSK) ∈ SVK × SSK.
• S.Encode (abbreviated SE when S is clear from the context) takes as input a

public key P K ∈ PK, a stegosigning key SSK ∈ SSK, a string m ∈ {0, 1}∗
(the hiddentext), and a message history h. As with the symmetric case, we will
also assume for our feasibility results that SE has access to a channel oracle for
some channel C, which can sample from Ch for any h.
92
SE(P K, SSK, m, h) returns a sequence of documents s1 , s2 , . . . , sl (the stego-
text) from the support of Chl .
• S.Decode (abbreviated SD) takes as input a secret key SK ∈ SK, a stegoverifi-

cation key SV K, a sequence of documents s1 , s2 , . . . , sl , and a message history
h.
SD(SK, SV K, s, h) returns a hiddentext m ∈ {0, 1}∗ or ⊥.
Of course, we will also require that an authenticated stegosystem is correct:
Definition 5.27. (Correctness) An authenticated stegosystem S is correct if for every

polynomial p(k) there exists a negligible ν(k) such that
∀m ∈ {0, 1}p(k) , h ∈ D∗ : Pr[SD(SK, SV K, SE(P K, SSK, m, h), h) = m] ≥ 1 − µ(k)
where the probability is over the choices (P K, SK) ← SG(1k ), (SV K, SSK) ←
SSG(1k ), and any coin tosses of SE, SD, SG, SSG and the oracle to Ch .
We will model an active warden attacking a stegosystem as an efficient oracle

machine which plays the following oracle-distinguishing game:
1. W is given access to an oracle which samples documents from the marginal

channel distributions CA→B,h for any history h. (This oracle allows W to learn
the covertext distribution on all communications.)
2. W is given access to a second oracle which is either STatk or CTatk . The oracle
ST (for StegoText) will model the case in which the pair Alice and Bob are com-
municating steganographically, while the oracle CT (for CoverText) will model
the case in which they are not. The exact distributions over STatk , CTatk vary
depending on the attack model, atk. Below we will specify these distributions
for atk ∈ {cxo, csa}. Both oracles respond to the null query with the public and
verification keys generated by SG and SSG.
3. In the end, W outputs a bit.
93
We define the advantage of W against stegosystem S over channel C in attack model
atk by
Advatk
STatk k CTatk k

S,C,W (k) = Pr[W (1 ) = 1] − Pr[W (1 ) = 1] ,
where the warden uses random bits r and the probabilities are also taken over the
oracles. Define the insecurity of S over channel C by
InSecatk Advatk

S,C (t, q, l, k) = max S,C,W (k) ,
W ∈W(t,q,l)
where W(t, q, l) denotes the set of all adversaries which make at most q oracle queries
totaling at most l bits, and running in time at most t.
Definition 5.28. (Steganographic secrecy) A Stegosystem S is called (t, q, l, k, )-

steganographically secret against atk (SS-ATK) for the channel C if InSecatk
S,C (t, q, l, k) ≤
.
For the next construction, we will require the notion of a digital signature scheme
that is existentially unforgeable under chosen message attack:
Existentially Unforgeable Digital Signature Schemes.
A digital signature scheme SG is a triple of probabilistic algorithms:
• SG.Generate: takes as input a security parameter 1k and returns a key pair

(V K, SK).
• SG.Sign: takes as input a signing key SK and a message m and outputs a

signature σ.
• SG.Verify: takes as input a verification key V K, a message m, and a signature

σ and outputs a bit.
A signature scheme is sound if V (V K, m, S(SK, m)) = 1 for all m and (SK, V K) ∈

[G(1k )].
Consider the following game that an adversary A plays against SG: the adversary
A is given V K and oracle access to SSK , where (SK, V K) ← G(1k ). A makes q oracle
94
queries of at most l bits to get back {SSK (M1 ), ..., SSK (Mq )}. A then outputs a pair
(M, σM ). A wins if M ∈
/ {M1 , ..., Mq } and V (V K, M, σM ) = 1.
Denote the event of A winning the game by winA,SG (k), Let A(t, q, l) be the set
of adversaries A which make q queries to the oracle of at most l bits and run for t
time steps. Define the EUF-CMA advantage of A against SG as
Advcma
A,SG (k) = Pr[winA,SG (k)] .
Define the insecurity of SG as
InSeccma Advcma

SG (t, q, l, k) = max SG,A (k) .
A∈A(t,q,l)
We say that SG is (t, q, l, k, )-existentially unforgeable under chosen message attack

if InSeccma
SG (t, q, l, k) ≤ . SG is called existentially unforgeable under chosen message
attack (EUF-CMA) if for every PPT A, Advcma

SG,A (k) is negligible in k. We note that
EUF-CMA signature schemes exist if and only if one-way functions exist[47, 55]..
Chosen Exactly-One Attack (CXO).
In a chosen exactly-one attack, Ward may both submit hiddentexts to Alice for en-
coding, and submit stegotexts to Bob for decoding, but he is prohibited from accessing
both at the same time (i.e., with the same history). This models a scenario in which
Ward can inject messages onto a network, but cannot replace them; for example, this
may be the situation if he is connected to the same shared access network as Bob,
but does not control the medium. As in all of our attack models, Ward’s goal is to
determine whether the specific pair of Alice and Bob are communicating stegano-
graphically rather than whether Alice or Bob are using steganography at all. In the
case that Ward asks Bob to decode stegotexts, his goal is to detect whether Bob is re-
ceiving steganography from Alice by watching Bob’s reaction to stegotexts (allegedly
from Alice) generated by Ward himself. If Bob is not receiving steganography from
Alice, he will always output the distinguished message ⊥, meaning “no reaction.” If
Bob is receiving steganography from Alice, he will decode the stegotexts.
We formally define the oracle distributions STcxo , CTcxo as follows: First, draw
(P K, SK) ← SG(1k ), and set φ = {}. Respond to queries using these programs:
95
STcxo (b ∈ {enc, dec}, m, h) CTcxo (b ∈ {enc, dec}, m, h)
if (b = enc) then: if (b = enc) then:
Sample s ← SE(P K, m, h) |SE(P K,m,h)|
Sample s ← Ch
Set φ = φ ∪ {h} Set φ = φ ∪ {h}
return s return s
else else
If h ∈ φ return ε If h ∈ φ return ε
else return SD(SK, m, h) else return ⊥
Note that InSeccha cxo

S,C (t, l, k) ≤ InSecS,C (t, 1, l, k), since any CHA warden can be emu-
lated by a CXO warden making only (enc, m, h)-queries.
SS-CXO is the strongest notion of security that we are able to achieve in the stan-
dard model. Since any stegotext encoded by Alice can be thought of as a “challenge
stegotext,” (Ward’s goal is only to detect that it is, in fact, a stegotext rather than
a covertext) this condition is somewhat analogous to non-adaptive chosen ciphertext
security (IND-CCA1) for public-key encryption.
Let SG = (GS , S, V ) be a EUF-CMA secure signature scheme, with signature key

KS and verification key KV , and let E = (G, E, D) be a IND$-CPA encryption scheme
with public key P K and secret key SK. Let ` be the expansion function of E and
let `σ be the length of signatures generated by SG. Then the following construction
yields a SS-CXO secure stegosystem from Alice to Bob, when Alice knows P K, KS and
Bob knows SK, KV . Assume also that all keys are generated with security parameter
k.
Construction 5.29. (Chosen Exactly-One Security)
Procedure CXO Encode: Procedure CXO Decode:

Input: m, h, P K, KS Input: s1 , . . . , sl , h, SK, KV
Let c = EP K (m, SKS (h, m)) Let c = Basic Decode(s1 , . . . , sl )
Output: Basic Encode(c, h, k) Let (m, σ) = DSK (c)
If V (KV , (h, m), σ) = 0 then set m =⊥
Output: m
Theorem 5.30. Assume f is -biased on Ch for all h. Then
InSeccxo cma
CXO,C (t, q, l, k) ≤ InSecSG (t + O(kl), q, l, k)
+ InSeccpa
E (t + O(kl), q, l + q`σ , k) + `(l + q`σ ) .
96
Proof. Informally, we will consider the hybrid oracle H which answers encoding
queries using CXO Encode and answers all decoding queries with ⊥. Distinguish-
ing this hybrid from STcxo equates to distinguishing CXO Decode from the constant
oracle ⊥ on some history h for which no query of the form (enc, ∗, h) has been made.
This can only happen if a decoding query contains a signature on a (m, h) pair which
was never signed by CXO Encode (because no encoding queries were ever made with
the history h). So, intuitively, distinguishing between H and STcxo requires forging
a signature. Similarly, since both H and CTcxo answer all dec queries by ⊥, distin-
guishing between them amounts to a chosen-hiddentext attack, which by Lemma 4.10
would give an IND$-CPA attacker for E. The result follows by the triangle inequality.
More formally, Let W ∈ W(t, q, l). We will show that W must either forge a
signature or distinguish the output of E from random bits. We will abuse notation
slightly and denote W STcxo by W SE,SD , and W CTcxo by W C,⊥ . Then we have that
Advcxo = 1] − Pr[W C,⊥ = 1] .

SE,SD

CXO,C,W (k) = Pr[W

Consider the “hybrid” distribution which results by answering encoding queries using
CXO Encode but answering all decoding queries with ⊥. (We denote this oracle by
(SE, ⊥))
We construct a EUF-CMA adversary Af which works as follows: given KV , and

a signing oracle for KS , choose (P K, SK) ← GE (1k ); use the signing oracle and
EP K , DSK to emulate CXO Encode and CXO Decode to W . If W ever makes a query
to CXO Decode which does not return ⊥ then Af halts and returns the corresponding
((m, h), σ) pair, otherwise Af runs until W halts and returns (0, 0). If we let F denote
the event that W SE,SD submits a valid decoding query to CXO Decode, then we have
that Advcma
(GS ,S,V ) (Af ) = Pr[F ].
We also construct a IND$-CPA adversary Ad which works as follows: given an en-

cryption oracle, choose (KS , KV ) ← GS (1k ), use KS and the encryption oracle to em-
ulate CXO Encode to W , and respond to any decoding queries with ⊥. Ad returns the
output of W . Note that Advcpa (Ad ) + `(l + q`σ ) ≥ Pr[W SE,⊥ = 1] − Pr[W C,⊥ = 1],

E
which follows from Theorem 4.11.
97
Then we have the following inequalities:
Advcxo = 1] − Pr[W C,⊥ = 1]

SE,SD

CXO,C,W (k) = Pr[W

≤ Pr[W SE,SD = 1] − Pr[W SE,⊥ = 1]
+ Pr[W SE,⊥ = 1] − Pr[W C,⊥ = 1]

≤ Pr[W SE,SD = 1] − Pr[W SE,⊥ = 1] + Advcpa

E (Ad ) + `(l + q`σ )
cpa
≤ Advcma
(GS ,S,V ) (Af ) + AdvE (Ad ) + `(l + q`σ )
Where the last line follows because (let D denote the event W SE,SD = 1, and notice
that D|F ≡ W SE,⊥ = 1):

Pr[D] − Pr[W SE,⊥ = 1] = Pr[D|F ] Pr[F ] + Pr[D|F ] Pr[F ] − (Pr[W SE,⊥ = 1])

= Pr[D|F ] Pr[F ] + Pr[W SE,⊥ = 1](1 − Pr[F ])

− (Pr[W SE,⊥ = 1])

= Pr[F ](Pr[D|F ] − Pr[D|F ])
≤ Pr[F ]
= Advcma
(GS ,S,V ) (Af )
The theorem follows by the definition of insecurity, since both Ad and Af run in time
at most t + O(kl).
Chosen-Stegotext Attacks
We now introduce a security condition analogous to adaptive chosen ciphertext se-

curity for encryption schemes (IND-CCA2), SS-CSA. We will construct a stegosystem
which is SS-CSA-secure in the Random Oracle model, for any channel which is effi-
ciently sampleable: that is, for which there exists a PPT C such that for all h, C(h; Uk )
and Ch are indistinguishable.
We formally define the oracle distributions STcsa , CTcsa as follows: First, draw
(P K, SK) ← SG(1k ), and set φ = {}. Respond to queries using these programs:
98
STcsa (b ∈ {enc, dec}, m, h) CTcsa (b ∈ {enc, dec}, m, h)
if(b = enc) then: if(b = enc) then:
Sample s ← SE(P K, m, h) |SE(P K,m,h)|
Sample s ← Ch
Set φ = φ ∪ {(s, h)} Set φ = φ ∪ {(s, h)}
return s return s
else else
If (m, h) ∈ φ return ε If (m, h) ∈ φ return ε
else return SD(SK, m, h) else return ⊥
Thus, in a chosen-stegotext attack, Ward may ask Alice to encode any (message,
history) pair of his choosing, as often as he likes, and may query Bob on any pair
(s, h) where s was not a result of an encoding query for history h. Notice that if
Ward can ask Bob to decode any message output by Alice for the same history it
was encoded with, he can detect the use of steganography between Alice and Bob;
this is why we do not allow Ward to query Bob on such stegotexts. This restriction
is roughly analogous to the standard restriction that an adaptive chosen-ciphertext
attacker may not query his decryption oracle on the challenge ciphertext. Advantage
and insecurity for SS-CSA are defined analogously to SS-CXO, except that we count
encoding and decoding queries separately (as qe and qd ) as well as counting the number
of queries made to random oracles.
Construction.
We assume that πA , πB are elements of trapdoor one-way permutation family Πk ,

where Alice knows πA−1 and Bob knows πB−1 . In addition, we assume all parties have
access to random oracles F : {0, 1}∗ → {0, 1}k , G : {0, 1}∗ → {0, 1}k , H1 : {0, 1}k →
{0, 1}∗ , and H2 : {0, 1}∗ → {0, 1}k . The following construction slightly modifies
techniques from [9], using the random oracles H1 and H2 with πB to construct a
pseudorandom non-malleable encryption scheme and the oracle F in conjunction with
πA to construct a strongly unforgeable signature scheme.
Construction 5.31. (Chosen Stegotext Security)
99
Procedure CSA EncodeF,G,H : Procedure CSA DecodeF,G,H :
Input: m1 · · · m` , h, πA−1 , πB Input: s1 , . . . , sl , h, πA , πB−1
Let σ = πA−1 (F (r, m, h)) Parse c as y||e||τ .
Let e = H1 (r) ⊕ (m, σ) Set r = πB−1 (y).
Let τ = H2 (r, m, h) If s 6= UEncodeG (c, r, h) return ⊥.
Let y = πB (r) Let (m, σ) = e ⊕ H1 (r)
Let c = y||e||τ If τ 6= H2 (r, m, h) return ⊥.
Output: UEncodeG (c, r, h) If πA (σ) 6= F (r, m, h) return ⊥.
Output: m
Procedure UEncodeG :
Input: c ∈ {0, 1}l , r ∈ {0, 1}k , h
for i = 1 . . . l do
Let j = 0
repeat:
set si = C(h; G(h, r, c, j))
increment j
set h = (h, si )
Output: s1 , s2 , . . . , sl
Theorem 5.32. If f is -biased for C, then
0
InSeccsa q , l, k) ≤ (2qe + qF )InSecow
CSA,C (t, ~
2 k
π (t , k) + (l + 3qe k) + (qe + 2qd )/2 ,
where t0 ≤ t + (qG + qF + qH1 + qH2 )(qe + qd )Tπ + k(l + 3qe k)TC , Tπ is the time to
evaluate members of π, and TC is the running time of C.
Proof. Intuitively, this stegosystem is secure because the encryption scheme employed
is non-malleable, the signature scheme is strongly unforgeable, and each triple of
hiddentext, history, and random-bits has a unique valid stegotext, which contains a
signature on (m, h, r). Thus any adversary making a valid decoding query which was
not the result of an encoding query can be used to forge a signature for Alice — that
is, invert the one-way permutation πA .
We define the following sequence of hybrid oracle distributions:
1. P0(b, m, h) = CTcsa , the covertext oracle.
2. P1(b, m, h) responds to dec queries as in P0, and responds to enc queries using
CSA EncodeF,G,H but with calls to UEncodeG replaced by calls to Basic Encode.
100
3. P2(b, m, h) responds to dec queries as in P1, and responds to enc queries using
CSA EncodeF,G,H .
4. P3(b, m, h) = STcsa , the stegotext oracle.
We are given a CSA attacker W ∈ W(t, qe , qd , qF , qH , qH1 , qH2 , l) and wish to bound
his advantage. Notice that
Advcsa
CSA,C,W (k) ≤ | Pr[W
P0 k
(1 ) = 1] − Pr[W P 1 (1k )]| +
| Pr[W P 1 (1k ) = 1] − Pr[W P 2 (1k ) = 1]| +
| Pr[W P 2 (1k ) = 1] − Pr[W P 3 (1k ) = 1]| .
Hence, we can bound the advantage of W by the sum of its advantages in distin-
guishing the successive hybrids. For hybrids P, Q we will denote this advantage by
AdvP,Q P k Q k
W (k) = | Pr[W (1 ) = 1] − Pr[W (1 ) = 1]|.
Lemma 5.33. AdvP0,P1

W (k) ≤ qe InSecow 0 −k 2
Π (t , k) + 2 (qe /2 − qe /2) + (l + 3qe k)
Proof. Assume WLOG that Pr[W P 1 (1k ) = 1] > Pr[W P 0 (1k ) = 1]. Let Er denote the
event that, when W queries P1, the random value r never repeats, and let Eq denote
the event that W never makes random oracle queries of the form H1 (r) or H2 (r, ∗, ∗)
for an r used by CSA EncodeF,G,H , and let E ≡ Er ∧ Eq . Then:
AdvP0,P1
W (k) = Pr[W P 1 (1k ) = 1] − Pr[W P 0 (1k ) = 1]
= Pr[W P 1 (1k ) = 1|E](1 − Pr[E]) + Pr[W P 1 (1k ) = 1|E] Pr[E]
− Pr[W P 0 (1k ) = 1]
= Pr[E] Pr[W P 1 (1k ) = 1|E] − Pr[W P 1 (1k ) = 1|E]

+ Pr[W P 1 (1k ) = 1|E] − Pr[W P 0 (1k ) = 1]

≤ Pr[E] + (l + 3qe k)

≤ Pr[Er ] + Pr[Eq ] + (l + 3qe k)
qe (qe − 1)
≤ 2−k + Pr[Eq ] + (l + 3qe k) ,
2
101
because if r never repeats and W never queries H1 (r) or H2 (r, ∗, ∗) for some r used
by CSA EncodeF,G,H , then W cannot distinguish between the ciphertexts passed to
Basic Encode and random bit strings.
It remains to bound Pr[Eq ]. Given W ∈ W(t, qe , qd , qF , qG , qH1 , qH2 , l) we con-

struct a one-way permutation adversary A against πB which is given a value πB (x)
and uses W in an attempt to find x, so that A succeeds with probability at least
(1/qe ) Pr[Eq ]. A picks (πA , πA−1 ) from Πk and i uniformly from {1, . . . , qe }, and then
runs W answering all its oracle queries as follows:
• enc queries are answered as follows: on query j 6= i, respond using the program
for CSA EncodeF,G,H with calls to UEncodeG replaced by calls to Basic Encode.
On the i-th query respond with s = Basic Encode(πB (x)||e1 ||τ1 , h) where e1 =
h1 ⊕ (m, σ1 ) and h1 , σ1 , τ1 are chosen uniformly at random from the set of all
strings of the appropriate length (|e1 | = |m| + k and |τ1 | = k), and set φ =
φ ∪ {(s, h)}.
• dec queries are answered using CTcsa .
• Queries to G, F, H1 and H2 are answered in the standard manner: if the query

has been made before, answer with the same answer, and if the query has not
been made before, answer with a uniformly chosen string of the appropriate
length. If a query contains a value r for which πB (r) = πB (x), halt the simula-
tion and output r.
1
It should be clear that Pr[A(πB (x)) = x] ≥ qe
(Pr[Eq ]).

W (k) ≤ qe InSecow 0 −k 2
Π (t , k) + 2 (qe /2 − qe /2)
Proof. Assume WLOG that Pr[W P 2 (1k ) = 1] > Pr[W P 1 (1k ) = 1]. Denote by Er the
event that, when answering queries for W , the random value r of CSA EncodeF,G,H
never repeats, and by Eq the event that W never queries G(∗, r, πB (r)||∗, ∗) for some
102
r used by CSA EncodeF,G,H , and let E ≡ Er ∧ Eq . Then:
AdvP1,P2
W (k) = Pr[W P 2 (1k ) = 1] − Pr[W P 1 (1k ) = 1]
= Pr[W P 2 (1k ) = 1|E] Pr[E] + Pr[W P 2 (1k ) = 1|E] Pr[E]

− Pr[W P 1 (1k ) = 1|E] Pr[E] + Pr[W P 1 (1k ) = 1|E] Pr[E]

= Pr[E] Pr[W P 2 (1k ) = 1|E] − Pr[W P 1 (1k ) = 1|E]

≤ Pr[E]
qe (qe − 1)
≤2−k + Pr[Eq ]
2
Given W ∈ W(t, qe , qd , qF , qG , qH1 , qH2 , l) we construct a one-way permutation adver-
sary A against πB which is given a value πB (x) and uses W in an attempt to find
x. A picks (πA , πA−1 ) from Πk and i uniformly from {1, . . . , qE }, and then runs W
answering all its oracle queries as follows:
• enc queries are answered as follows: on query j 6= i, respond according to

CSA EncodeF,G,H . On the ith query respond by computing
s = UEncodeG (πB (x)||e1 ||τ1 , r1 , h) ,
where e1 = h1 ⊕ (m, σ1 ) and h1 , σ1 , τ1 , r1 are chosen uniformly at random from

the set of all strings of the appropriate length (|e1 | = |m| + k and |τ1 | = k), and
set φ = φ ∪ {(s, h)}.
• dec queries are answered using CTcsa .
• Queries to G, F, H1 and H2 are answered in the standard manner: if the query

length. If a query contains a value r for which πB (r) = πB (x), halt the simula-
tion and output r.
1
It should be clear that Pr[A(πB (x)) = x] ≥ qe
(Pr[Eq ]).

W (k) ≤ qF InSecow 0
Π (t , k) + qd /2
k−1
+ qe /2k
103
Proof. Given W ∈ W(t, qe , qd , qF , qG , qH1 , qH2 , l) we construct a one-way permutation
adversary A against πA which is given a value πA (x) and uses W in an attempt to
find x. A chooses (πB , πB−1 ) from Πk and i uniformly from {1, . . . , qF }, and then runs
W answering all its oracle queries as follows:
• enc queries are answered using CSA EncodeF,G,H except that σ is chosen at
random and F (r, m, h) is set to be πA (σ). If F (r, m, h) was already set, fail the
simulation.
• dec queries are answered using CSA DecodeF,G,H , with the additional constraint
that we reject any stegotext for which there hasn’t been an oracle query of the
form H2 (r, m, h) or F (r, m, h).
• Queries to G, F, H1 and H2 are answered in the standard manner (if the query
length) except that the i-th query to F is answered using πA (x).
A then searches all the queries that W made to the decryption oracle for a value σ
such that πA (σ) = πA (x). This completes the description of A.
Notice that the simulation has a small chance of failure: at most qe /2k . For the
rest of the proof, we assume that the simulation doesn’t fail. Let E be the event that
W makes a decryption query that is rejected in the simulation, but would not have
been rejected by the standard CSA DecodeF,G,H . It is easy to see that Pr[E] ≤ qd /2k−1 .
Since the only way to differentiate P3 from P2 is by making a decryption query that
P3 accepts but P2 rejects, and, conditioned on E, this can only happen by inverting
πA on a some F (r, m, h), we have that:
0
AdvP2,P3
W (k) ≤ qF InSecow
Π (t , k) + qd /2
k−1
+ qe /2k
104
The theorem follows, because:
InSeccsa q , l, k) ≤ Advcsa
CSA,C (t, ~ CSA,C,Wmax (k)
≤ AdvP0,P1
W (k) + AdvP1,P2
W (k) + AdvP2,P3
W (k)
2
0 qe − qe
≤ qe InSecow
Π (t , k) + + (l + 3qe k) + AdvP1,P3
W (k)
2k+1
0 −k 2 P2,P3
≤ 2qe InSecow
Π (t , k) + 2 (qe − qe ) + (l + 3qe k) + AdvW (k)
0 −k 2
≤ (2qe + qF )InSecow
Π (t , k) + 2 (qe + 2qd ) + (l + 3qe k)
We conjecture that the cryptographic assumptions used here can be weakened; in

particular, a random oracle is not necessary given a public-key encryption scheme
which satisfies IND$-CPA and is non-malleable, and a signature scheme which is
strongly unforgeable. However, we are unaware of an encryption scheme in the stan-
dard model satisfying this requirement: nonmalleable encryption schemes following
the Naor-Yung paradigm [23, 42, 48, 56] are easily distinguishable from random bits,
and the schemes of Cramer and Shoup [20, 21] all seem to generate ciphertexts which
are elements of recognizable subgroups. Furthermore, it seems challenging to prevent
our motivating attack without assuming the ability to efficiently sample the channel.
5.3 Relationship between robustness and integrity
In this section, we define the notion of a nontrivial relation R and show that if a
stegosystem is substitution robust with respect to any nontrivial R then it is inse-
cure against both chosen-covertext and chosen-stegotext attacks. This result implies
that no stegosystem can be simultaneously (nontrivially) secure against disrupting and
distinguishing active adversaries.
We first must define what makes an admissible bounding relation R nontrivial.

Suppose R is efficiently computable but has the property that for every efficient A,
Pr [d0 = A(1k , d) ∧ d0 6= d ∧ (d, d0 ) ∈ R]

d←ChA
105
is negligible. Then any steganographically secret stegosystem is trivially robust
against R, because no efficient adversary can produce a related stegotext sW 6= σ
(except with negligible probability) in the substitution attack game; and thus the de-
coding of sW will be s, except with negligible probability. Thus in order for robustness
of a stegosystem to be “interesting” we will require that this is not the case.
Definition 5.36. If R is admissible for C then R is ρ-nontrivial for C if there is a

PPT A and a history hA such that
Pr [d0 = A(1k , d) ∧ d0 6= d ∧ (d, d0 ) ∈ R] ≥ ρ(k) .

d←ChA
We say that R is non-trivial for C if it is ρ(k)-nontrivial for some ρ(k) > 1/poly(k).
Suppose the stegosystem S is substitution robust against the nontrivial relation R.

Consider the following attacker WA . WA first selects a challenge hiddentext mW ← Ul
and requests the encoding of mW under history hA . (In the CSA game, W queries its
oracle with (enc, mW , hA ); in the sCCA game, WA returns (mW , hA ) as the challenge
ciphertext). WA receives the sequence σ1 , . . . , σ` as a response. WA then computes
s1 = A(1k , σ1 ), attempting to find a s1 6= σ1 such that (σ1 , s1 ) ∈ R. If A is successful,
WA queries its decoding oracle on the sequence s = s1 , σ2 , . . . , σ` . If the response to
this query is mW , WA returns 1, otherwise WA returns 0.
Intuitively, whether this attack is against a CSA or sCCA oracle, it has a significant
advantage because when the sequence σ1 , . . . , σ` is a stegotext, then the response to
the decoding query will be m (because S is robust); but when it is a covertext, the
probability of decoding to m should be low (again because S is robust). We will now
formalize this intuition.
Theorem 5.37.
−l
Advscca ss R
S,C,WA (k) ≥ ρ(k) − InSecS,C (tA , 1, l, k) − FailS (tA , 0, 0, l, k) − 2
Proof. Recall that
Advscca
S,C,WA (k) = Pr[W
SD
(SE(mW )) = 1] − Pr[W SD (Ch` ) = 1] .
106
Let us first bound Pr[W SD (Ch` ) = 1]. Recall that W SD (σ) = 1 when
SD(s1 , σ2 , . . . , σ` ) = mW .
Let ms = SD(s); then since s is chosen independently of mW , and mW is chosen

uniformly from {0, 1}l , we have that Pr[ms = mW ] ≤ 2−l . Thus
Pr[W SD (Ch` ) = 1] ≤ 2−l .
Let SR denote the event that in the sCCA game played against stegotext, s1 6=
σ1 ∧ (σ1 , s1 ) ∈ R. Now notice that
Pr[W SD (SE(mW )) = 1] ≥ Pr[W SD (SE(mW )) = 1|SR] Pr[SR] .
Because W returns 1 when SD(s) = mW and s obeys R, we must have that
Pr[W SD (SE(mW )) 6= 1|SR] ≤ FailR

S (tA , 0, 0, l, k) ,
by the definition of FailR

S (tA , 0, 0, l, k).
Also, notice that we can exhibit an efficient SS-CHA adversary Wρ against S such
that
Advss
S,C,Wρ (k) ≥ ρ(k) − Pr[SR] .
Wρ works by requesting the encoding of a uniformly chosen message m∗ ← Uk under

history hA to get a sequence starting with σ ∗ ∈ D; Wρ then computes s∗ ← A(1k , σ ∗ )
and returns 1 if (s∗ 6= σ ∗ ) ∧ (σ ∗ , s∗ ) ∈ R. When σ ∗ ← ChA we have by assumption
that
Pr[Wρ (ChA ) = 1] ≥ ρ(k) ,
whereas
Pr[Wρ (SE(m∗ )) = 1] = Pr[SR] ,
by construction. Since Wρ runs in the time it takes to run A and makes 1 encoding
query of k bits, we have that
ss
InSecss
S,C (tA , 1, l, k) ≥ AdvS,C,Wρ (k)
= Pr[Wρ (ChA ) = 1] − Pr[Wρ (SE(m∗ )) = 1]

≥ ρ(k) − Pr[SR]
107
Which by rearranging of terms gives us:
Pr[SR] ≥ ρ(k) − InSecss

S,C (tA , 1, l, k) .
Combining these results, we get that
Pr[W SD (SE(mW )) = 1] ≥ Pr[W SD (SE(mW )) = 1|SR] Pr[SR]

≥ (1 − FailR
S (tA , 0, 0, l, k)) Pr[SR]
≥ (1 − FailR ss
S (tA , 0, 0, l, k))(ρ(k) − InSecS,C (tA , 1, l, k))
R
≥ ρ(k) − InSecss
S,C (tA , 1, l, k) − FailS (tA , 0, 0, l, k))
And thus by definition of advantage and insecurity, the theorem follows.
Theorem 5.38.
Advcsa R ss
S,C,WA (k) ≥ (1 − FailS (tA , 0, 0, l, k))(ρ(k) − InSecS,C (tA , 1, l, k))
Proof. Recall that
Advcsa STcsa k
S,C,WA (k) = Pr[WA (1 ) = 1] − Pr[WACTcsa (1k ) = 1] .
It is easy to see that Pr[WACTcsa (1k ) = 1] = 0, since querying CTcsa (enc, s, hA ) will
always result in ⊥ or ε, and never mW . The lower bound for Pr[WASTcsa (1k ) = 1] is
proven identically to the stegotext case in the previous proof.
108
Chapter 6
Maximizing the Rate
Intuitively, the rate of a stegosystem is the number of bits of hiddentext that a

stegosystem encodes per document of covertext. Clearly, for practical use a stegosys-
tem should have a relatively high rate, since it may be impractical to send many
documents to encode just a few bits. Thus an important question for steganography,
first posed by Anderson and Petitcolas [6] is “how much information can be safely
encoded by a stegosystem in the channel C?”
A trivial upper bound on the rate of a stegosystem is log |D|. Prior to our work,
there were no provably secure stegosystems, and so there was no known lower bound.
The rate of the stegosystems defined in the previous chapters is o(1), that is, as
the security parameter k goes to infinity, the rate goes to 0. In this chapter, we will
address the question of what the optimal rate is for a (universal) stegosystem. We first
formalize the definition of the rate of a universal stegosystem. We will then tighten
the trivial upper bound by giving a rate M AX such that any universal stegosystem
with rate exceeding M AX is insecure. We will then give a matching lower bound by
exhibiting a provably secure stegosystem with rate (1 − o(1))M AX. Finally we will
address the question of what rate a robust stegosystem may achieve.
109
6.1 Definitions
We concern ourselves with the rate of a universal blockwise, bounded-sample, stegosys-

tem with single-block lookahead.
A universal stegosystem S accepts an oracle for the channel C and is secure against
chosen-hiddentext attack with respect to C as long as C does not violate the hardness
assumptions S is based on. Universality is important because typically there is no
good description of the marginal distributions on a channel.
A stegosystem is an (h, l, λ)-blockwise stegosystem if it is composed of four func-

tions:
• A preprocessing function P E that transforms a hiddentext m ∈ {0, 1}∗ into a

sequence of identically-sized blocks of λ bits.
• A block encoding function BE that encodes a block of input bits into a block of
l documents.
• A block decoding function BD that inverts BE, that is, that transforms a ste-
gotext block into a block of bits.
• A postprocessing function P D that inverts P E: that is, transforms a sequence

of λ-bit blocks into a hiddentext m ∈ {0, 1}∗
A blockwise stegosystem computes SE(K, m, h) by first computing c = P E(K, m),

then computing h0 = h, and si = BE(K, ci , hi−1 ), hi = (si , hi−1 ). SD(K, s1...n , h)
is computed by setting ci = BD(K, si , hi−1 ) for i ∈ {1, . . . , n} and computing m =
P D(K, c). Because stegotexts are compared to covertexts of the same length as a
stegotext, any secure stegosystem can be written as a blockwise stegosystem with a
single block.
A (h, l, t)-sample bounded stegosystem uses l-document blocks, draws at most t

samples from Chl when encoding a block, and has no other knowledge of Chl . Since we
require a stegosystem to have bounded running time and to be universal, the running
time of SE(K, m, h) is always an upper bound on t. Conversely, if a stegosystem is
t-sample bounded, t is a lower bound on the running time of SE.
110
A (h, l, λ)-blockwise stegosystem has single-block lookahead if BE(K, c, h) draws
samples only from Chl and Ch,d
l
, where d ∈ Dl . Any stegosystem with multi-block
lookahead can be transformed into one with single-block lookahead with a larger
blocksize.
The rate of a stegosystem S on channel C, RC (S), is given by

m
RC (S) = lim Eh ,
m→∞ |SE(K, 1m , ε)|
that is, the number of bits encoded per document by S on channel C.
6.2 Upper bound
We consider the class S(h, l, t) of stegosystems which draw at most t samples from
Chl ; we will show two upper bounds on the rate RC (S) for any S ∈ S(h, l, t). The
first, M AXt (S) is in terms of the number of samples, t. The second, M AXC (S) is
in terms of the min entropy H∞ (Chl ) of the channel C. We call the combined upper
bound M AXC (h, l, t) and define it by
M AXC (h, l, t) = min {M AXt (S), M AXC (S)} .
6.2.1 M AXt (S)
For any stegosystem S ∈ S(h, l, t), we will show that there exists a channel C such
that S is insecure relative to C if RC (S) − log t is any positive constant. Thus it
follows that M AXt (S) ≤ log t.
Theorem 6.1.
−c(t,k)
InSecss
S,C (O(t + k), 1, k) ≥ 1 − 2 − 2−k − ρ(k) ,
where ρ(k) = Prm←U,K,h [SD(K, SE(K, m, h), h) 6= m] and RC (S) ≥ log t + c(t, k).
Proof. The channel C is defined as follows: to every history h of length k, we associate

a polynomial of degree t + 1 over GF (2k ) with uniformly chosen coefficients, ph (x).
111
To draw from Ch , we draw x ← Uk , compute y = ph (x) and output xky. Notice that
if S.SE(K, m, h) draws at most t samples (x1 , y1 ), . . . , (xt , yt ) ← Ch , and outputs a
pair (x0 , y 0 ) such that x0 6∈ {x1 , . . . , xt }, then Pr[y 0 = ph (x0 )] ≤ 2−k . On the other
hand, an adversary can draw t + 1 samples from Ch , and compute ph by interpolation.
Thus when SE(K, m, h) outputs a pair (x0 , y 0 ) 6∈ {(x1 , y1 ), . . . , (xt , yt )}, an adversary
can distinguish between SE(m, h) and Ch by checking whether y 0 = ph (x0 ).
Formally, we define the adversary AO as follows: on input 1k , sample a history h

of length k by drawing h1 ← Cε , and hi ← Ch1 ,... ,hi−1 for 1 < i ≤ k. Draw t+1 samples
(x1 , y1 ), . . . , (xt+1 , yt+1 ) ← Ch . Compute the polynomial ph by interpolation. Choose
m∗ ← URC (S) , and draw (x∗ , y ∗ ) ← O(m∗ , h). If y ∗ 6= ph (x∗ ) output 1, otherwise
output 0.
Clearly, A runs in time O(t + k). We will now compute the advantage of A. First,
notice that given a covertext oracle, A will always output 0:
Pr[ACT (1k ) = 1] = 0 .
Now, let NS denote the event that SE(K, m, h) draws samples (x01 , y10 ), . . . , (x0t , yt0 ) ←
Ch and outputs a stegotext (x∗ , y ∗ ) 6∈ {(x01 , y10 ), . . . , (x0t , yt0 )}. Since in this case,
Pr[y ∗ = ph (x∗ )] ≤ 2−k , we have that
Pr[AST (1k ) = 1] ≥ Pr[NS] − 2−k .
Thus we only need to give a lower bound on Pr[NS] to complete the proof.
−1
Fix a tuple (K, m, h) and consider the set SDK,h (m) = {s ∈ D : SD(K, s, h) =
m}. Since RC (S, h, k) ≥ log t + c(t, k), SD partitions D into t × 2c(t,k) such sets. Then
for any fixed set of samples (x0i , yi0 ), the probability over m that SE(K, m, h) has a
−1
sample (x0i , yi0 ) ∈ SDK,h (m) is at most t
2c(t,k) t
= 2−c(t,k) . Let E denote the event that
SE(K, m, h) outputs an s∗ such that SD(K, s∗ , h) 6= m. Then
−1
Pr[NS] ≥ Pr[∀j, (x0j , yj0 ) 6∈ SEK,h (m)] − Pr[E]
≥ 1 − 2−c(t,k) − ρ(k) ,
which yields the stated bound.
112
6.2.2 M AXC (S)
We exhibit a chosen-history, chosen-hiddentext distinguisher for any black box stegosys-

tem (SE, SD) which encodes RC (S) > H∞ (Chl ) bits of hiddentext in l documents of
stegotext non-negligibly often.
Suppose we have a specific history h such that SE encodes ` + 1 bits by samples

from Chl , and H∞ (Chl ) = `. (If such histories occur non-negligibly often then we can
find one by sampling from an oracle for SE; if they do not then the rate of the
stegosystem doesn’t exceed H∞ (Chl )) Since H∞ (Chl ) = `, we know there is at least one
l-document string, s∗ , which has probability 2−` under Chl , and no other string has
more probability mass than s∗ . Now, if SE were deterministic, then we would have
that
Pr[SE(m) = s∗ : m ← U`+1 ] ≤ 2−(`+1) ,
by the unique decoding property. Even if SE is randomized, then for any fixed
random bits r, we have
Pr[SE(m; r) = s∗ : m ← U`+1 ] ≤ 2−(`+1) .
But then by an averaging argument, there must be some m∗ ∈ {0, 1}`+1 such that
Pr[SE(m∗ ) = s∗ ] < 2−(`+1) . In contrast, a covertext oracle CT will have Pr[CT (m) =
s∗ ] = 2−` , for any m ∈ {0, 1}∗ . This gap is wide enough to detect with high confidence,
given poly(2` ) chosen hiddentext samples. And since we are restricted to ` = O(log t)
by M AXt (S) this yields a polynomial-time distinguisher between a covertext oracle
and a stegotext oracle.
Theorem 6.2. Let n = 2` . Then,

−n/32
InSecss 3 3
S,C (O(n ), 2n , k) ≥ 1 − e − e−n/96 − e−n/108 − 2ne−n/128 ,
when H∞ (Chl ) = ` and RC (S) ≥ ` + 1.
Proof. We define an adversary W with the stated advantage. W O executes the fol-
lowing steps:
• W takes n2 samples from Chl . Let ŝ be the most commonly occurring l-document
in the sample set and let p̂ be the number of times ŝ occurs in the sample.
113
• For each document m ∈ {0, 1}`+1 , W draws n2 samples from O(m). Let p̂m be
the number of occurrences of ŝ in the samples from O(m).
• If for any m, p̂m ≤ 43 p̂, W outputs 1. Otherwise W outputs 0.
We will bound the probability that W outputs 1 given a stegotext oracle, and a
covertext oracle, respectively.
From the preceding paragraph, we know that when W has a stegotext oracle,
there exists an m∗ such that E[p̂m∗ ] ≤ 2−`+1 , and we know that E[p̂] = 2−` . So W
will only output 0 if p̂ is much smaller then expected, or if p̂m∗ is much larger than
expected. Specifically, we have:
3 3 3 3
Pr[W ST (1k ) = 0] = Pr[p̂ < 2−` ∧ pmˆ ∗ ≥ p̂] + Pr[p̂ ≥ 2−` ∧ pm ˆ∗ ≥ p̂]
4 4 4 4
3 3 3 3 −`
≤ Pr[p̂ < 2−` ] + Pr[pmˆ ∗ ≥ p̂|p̂ ≥ 2−` ] Pr[p̂ ≥ 2 ]
4 4 4 4
3 −` 3 3
≤ Pr[p̂ < 2 ] + Pr[pm ˆ ∗ ≥ p̂|p̂ ≥ 2−` ]
4 4 4
3 −` 9 −`+1
= Pr[p̂ < 2 ] + Pr[pm ˆ∗ ≥ 2 ]
4 8
≤ e−n/32 + e−n/96
Where the last line follows by multiplicative Chernoff bounds. Thus we have
Pr[W ST (1k ) = 1] ≥ 1 − e−n/32 + e−n/96 .
We know that when W has a covertext oracle, it should be the case that for every
m ∈ {0, 1}∗ , E[pˆm ] = 2−` . Thus W should only output 1 when p̂ is much larger than
expected, or some pˆm is much smaller than its expectation. Specifically, we have that
7 3 7 3
Pr[W CT (1k ) = 1] = Pr[p̂ > 2−` ∧ ∃m.pˆm < p̂] + Pr[p̂ ≤ 2−` ∧ ∃m.pˆm < p̂]
6 4 6 4
7 −` 3 7 −`
≤ Pr[p̂ > 2 ] + Pr[∃m.pˆm < p̂|p̂ ≤ 2 ]
6 4 6
7 −` 7 −`
≤ Pr[p̂ > 2 ] + 2n Pr[pˆm < 2 ]
6 8
−n/108 −n/128
≤e + 2ne
Where the last two lines follow by the union bound and multiplicative Chernoff
bounds.
114
Combining these bounds, we have
Advss
W,S,C (k) = Pr[W
ST
(1k ) = 1] − Pr[W CT (1k ) = 1]
≥ 1 − e−n/32 + e−n/96 − e−n/108 + 2ne−n/128
The theorem follows by the definition of insecurity.
We note that for any distribution D which covers {0, 1}`+1 , W can easily be
adapted to be a KHA-D attack against S; and that W can even be extended to
a KDA-U(`+1)n2 attack against S by counting the occurrences of s∗ for the various
blocks.
6.2.3 Bidirectional communication does not help
Suppose Alice and Bob wish to communicate steganographically at a high rate, using
a bidirectional channel – that is, Bob is allowed to respond to Alice’s messages with
messages drawn from his own channel distribution. A natural question is, “can Alice
and Bob conspire to increase the rate at which Alice may securely transmit informa-
tion to Bob?” We will show that an interactive stegosystem can increase the rate at
which information is transmitted by at most one bit. The proof is straightforward
we show that any stegosystem S for a bidirectional channel B can be converted into
a stegosystem S 0 with the same rate for a unidirectional channel C on pairs from
B.D × B.D. S 0 runs in the same time as S and draws as many samples from C as
Alice and Bob (combined) draw from B in each time step. Thus the upper bound
M AXt (S 0 ) implies that at most one bit more can be sent due to sampling, and the
M AXC (S 0 ) bound implies that minimum entropy constraints still apply.
In order to address this issue, we will first need to formalize the concept of a
bidirectional stegosystem.
Definition 6.3. (Bidirectional Stegosystem) A Bidirectional Stegosystem S for the

bidirectional channel B = (D, H, C A , C B ) is a triple of algorithms:
• S.Encode (abbreviated SE) takes as input a key K ∈ {0, 1}k , a string m ∈

{0, 1}∗ (the hiddentext), a message history h ∈ (D × D)∗ , and a session history
T ∈ (D × D)∗ .
115
SE(K, I, m, h, T ) returns a document from D, or the special symbol done
• S.Facilitate (abbreviated SF ) takes as input a key K, a message history h

and a session history T .
SF (K, h, T ) returns a document from D.
• S.Decode (abbreviated SD) takes as input a key K, a sequence of documents

s1 ||s2 || . . . ||sl , and a message history h.
SD(K, s, h) returns a hiddentext m ∈ {0, 1}∗ .
We denote the transcript of (K, m, h) by the result of setting s0 (A) = SE(K, m, hA , ε),
s0 (B) = SF (K, hB , ε), and
si (A) = SE(K, m, hA , (s0 , . . . , si−1 )), si (B) = SF (K, hB , (s0 , . . . , si−1 ))
until si (A) = done.
We say that S is correct if for all m ∈ {0, 1}poly(k) Pr[SD(K, (s0 , . . . , si ), h) 6= m]

is negligible.
In addition, we formalize the notion of a bidirectional chosen hiddentext attack

(BCHA) by an oracle adversary W given access to an oracle O drawn from one of two
distributions:
• STbcha (m, h): has uniformly chosen key K ← Uk and responds to queries (m, h)
by computing a transcript of (K, m, h).
• CTbcha (m, h): responds to queries by drawing a sequence from B of the same
length as a (K, m, h) transcript.
We define BCHA advantage (Advbcha bcha

W,S,B (k)) and insecurity (InSecS,B (t, q, µ, k)) ac-
cording to the previous conventions.
Theorem 6.4. Any (h, l, t)-sampling bidirectional stegosystem S for the bidirectional
channel B can be converted to a (h, l, 2t)-sampling stegosystem S 0 for a channel C such
that RC (S 0 ) = RB (S), and InSecss 0 bcha 0
S 0 ,C (t , q, µ, k) = InSecS,B (t , q, µ, k).
116
Proof. We define the channel C as a distribution on sequences of pairs from B.D×B.D.
The marginal distribution Ch is defined by (B.ChA , B.ChB ).
The stegosystem S 0 has the same keyspace as S. The new encoding algorithm
S 0 .SE(K, m, h) works by computing a transcript s0 , . . . , sl form (K, m, h) as follows:
For each i, S 0 .SE simulates S.SE by making draws from Ch,s0 ,... ,si−1 and extracting
the first element of the pair; likewise S 0 .SE simulates S.SF my making draws from
Ch,s0 ,...,si−1 and extracting the second element of each pair. S 0 .SE(K.m, h) then out-
puts this transcript as the stegotext. Notice that since S.SE and S.SF each draw at
most t samples from B.ChA , and B.ChB , respectively, S 0 .SE draws at most 2t samples
from Ch . The decoding algorithm S 0 .SD(K, s, h) exactly runs the decoding algorithm
S.SD(K, s, h).
Notice that the chosen-hiddentext oracles ST, CT for S 0 are identically distributed
to the bidirectional chosen-hiddentext oracles STbcha , CTbcha for S. Thus any warden
for S 0 is a warden for S and vice versa, with the advantage preserved. Thus S 0 is as
secure for C as S is for B, as claimed. It is also clear that the rate of S 0 is the same
as that of S, which completes the proof.
6.3 Lower bound
We will prove a lower bound on the achievable rate which matches the stated upper
bound, up to a multiplicative factor which converges to 1. The lower bound is proven
by giving a stegosystem which achieves this rate. We will first give one that achieves
rate (1 − )M AX for any > 0, but which has polynomial rather than negligible
error rate. We will then use error-correcting codes to improve the error rate, and
finally give a construction which does not require Alice or Bob to know the minimum
entropy of C.
6.3.1 With errors
We will assume for the moment that both Alice and Bob know a value l so that λ <
(1−)M AX(Chl , t). We let F : {0, 1}k ×{0, 1}∗ → {0, 1}λ be a pseudorandom function,
117
and assume Alice and Bob share a key K ∈ {0, 1}k . The following construction allows
Alice to send Bob a λ-bit message, with error probability at most λ2−λ .
Construction 6.5. OneBlock Stegosystem
Procedure OneBlock.Encode: Procedure OneBlock.Decode:

Input: K ∈ {0, 1}k , m ∈ {0, 1}λ , h ∈ D∗ , N,l Input: K ∈ {0, 1}k , s ∈ Dl , h, N
Let i = 0, c[D] = 0 Output: FK (N, h, 1, s)
repeat:
Draw si ← Chl
increment i, c[si ]
until FK (N, h, c[si ], si ) = m or count = λ2λ
Output: si
Theorem 6.6.
l
Pr[SD(K, SE(K, m, h, N, l), h, N ) 6= m] ≤ e−λ +λ2λ−H∞ (Ch ) +InSecprf λ λ
F (O(λ2 ), λ2 , k)
Proof. We will show that when FK is replaced by a random function f ,

l
Pr[f (SE f (m, h, N, l)) 6= m] ≤ e−λ + λ2λ−H∞ (Ch ) .
We can then construct a PRF adversary A with advantage at least

l
Advprf
A,F (k) ≥ Pr[SD(K, SE(K, m)) 6= m] − e
−λ
+ λ2λ−H∞ (Ch ) ,
which will give the desired bound.
Let C denote the event that OneBlock.Encodef (m) outputs an si with c[si ] > 1.
This happens when there is at least one j < i such that sj = si . Thus by the union
bound, we have
X
Pr[C] ≤ Pr[sj = si ] .
j<i
−H∞ (Chl )
Since for each j, Pr[sj = si ] ≤ 2 and since i < λ2λ , we get the bound
l
Pr[C] ≤ λ2λ−H∞ (Ch ) .
Let D denote the event that OneBlock.Encodef (m) outputs sλ2λ . This happens
when each of the previous λ2λ tests f (N, h, c[si ], si ) = m fails. Since each test involves
118
a distinct point of f , each of these happens independently with probability 1 − 1/2λ .
Since the events are independent, we can bound Pr[D] by
λ2λ
1
Pr[D] = 1− λ ≤ e−λ .
2
Since the only other condition under which OneBlock.Encodef (m) outputs si is if
f (N, h, 1, si ) = m, we have that
l
Pr[SDf (SE f (m)) 6= m] = Pr[C ∧ D] ≤ e−λ + λ2λ−H∞ (Ch ) .
We now describe a PRF adversary A for F . Af picks m ∈ {0, 1}λ and runs
OneBlock.Encodef (m, ε, 0, l) to get a sequence s ∈ Dl . Af then outputs 1 if f (s) 6= m.
Clearly, when A’s oracle f ← FK , we have
Pr[AFK (1k ) = 1] = Pr[SD(K, SE(K, m, h, N, l), h, N ) 6= m] ,
and when f is a randomly chosen function from {0, 1}∗ → {0, 1}l , we have shown
that
l
Pr[Af (1k ) = 1] ≤ e−λ + λ2λ−H∞ (Ch )
It follows that
Advprf
A,F (k) = Pr[A
FK k
(1 ) = 1] − Pr[Af (1k ) = 1]
l

≥ Pr[SD(K, SE(K, m, h), h) 6= m] − e−λ + λ2λ−H∞ (Ch )
And rearranging terms gives us the stated theorem:

l
Pr[SD(K, SE(K, m, h), h) 6= m] ≤ Advprf
A,F (k) + e
−λ
+ λ2λ−H∞ (Ch )
l
≤ InSecprf λ λ
F (O(λ2 ), λ2 , k) + e
−λ
+ λ2λ−H∞ (Ch )
Theorem 6.7.
prf 0
InSecss λ
OneBlock,C (t, q, qλ, k) ≤ InSecF (t , qλ2 , k) ,
Where t0 ≤ t + O(qλ2λ ).
119
Proof. Fix any nonce-respecting W ∈ W(t, q, qλ). We will show how to construct a
PRF adversary A for F such that
Advprf ss
A,F (k) = AdvW,OneBlock,C (k) .
Af works by emulating W , responding to its queries by running OneBlock.Encodef ;

when W halts with output b, A outputs b as well. Clearly, when f ← FK , we have
that
Pr[AFK (1k ) = 1] = Pr[W ST (1k ) = 1] .
When f is a randomly chosen function, and since W is nonce-respecting, A never

evaluates f on any point twice. Thus Af is equivalent to a process which draws a
new, independent function at each stage. In this model, for any d ∈ Dl , we have
that Pr[SE(m, h) = d] = Prf,s←Chl [s = d|f (s) = m], and since s and f are drawn
independently, we have that Pr[SE(m, h) = d] = PrChl [d]. Thus A’s responses to W ’s
queries are distributed according to C, so
Pr[Af (1k ) = 1] = Pr[W CT (1k ) = 1] .
Combining the cases yields:
Advprf
FK k

A,F (k) = Pr[A
(1 ) = 1] − Pr[Af (1k ) = 1]

= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
W,OneBlock,C (k)
which proves the theorem.
Theorem 6.8. The rate of OneBlock is (1 − )M AXC (h, l, λ2λ ).
Proof. Suppose that M AXC (h, l, t) = H∞ (Chl ). In this case, by choice of l and λ,
OneBlock sends λ = (1 − )H∞ (Chl ) = (1 − )M AX bits in l documents. On the other
hand, if M AXC (h, l, λ2λ ) = log(λ2λ ) = λ + log λ, then since OneBlock sends λ bits
in l documents, we have that
RC (OneBlock) λ
= ≥ (1 − ) ,
M AX λ + log λ
where the last inequality holds for sufficiently large λ.
120
6.3.2 Negligible error rate
Let K = GF (2λ ). This next construction utilizes the following well-known fact:
Proposition 6.9. ([11]) There is a polynomial-time algorithm Correct to solve the

following problem: given n = 2λ pairs (x1 , y1 ), . . . , (xn , yn ) ∈ K 2 , if there is a poly-
nomial p(x) of degree |K| − 2t such that at most t pairs do not satisfy yi = p(xi ),
recover p.
We will use the Berlekamp-Welch[60] algorithm, Correct, to reduce the probabil-

ity of encoding error in the OneBlock construction. In the following construction, we
let n = 2λ , ρ = λ2−λ+2 , and η = (1 − 2ρ)n. The following construction securely hides
messages in {0, 1}λ×η :
Construction 6.10. MultiBlock Stegosystem
Procedure MultiBlock.Encode: Procedure MultiBlock.Decode:

Input: K, m0 , . . . , mη−1 , h, N Input: K, s1 , . . . , sn ∈ Dl , h, N
for i = 1 to n do: for i = 1 to n do:
hii
set xi = P set xi = hii
set yi = η−1j=0 mj xi
j
set yi = OneBlock.SD(K, si , h, N )
set si = OneBlock.SE(K, yi , h, N, l) set m0 , . . . , mη = Correct(xi , yi )
Output: s1 , . . . , sn Output: m
Theorem 6.11.
prf
InSecss λ λ
MultiBlock,C (t, q, qηλ, k) ≤ InSecF (t + O(qηλ2 ), qηλ2 , k)
Proof. We will show how to use an arbitrary W ∈ W(t, q, qηλ) against MultiBlock
to create an adversary X ∈ W(t, qη, qηλ) for OneBlock such that
Advss ss
W,MultiBlock,C (k) = AdvX,OneBlock,C (k) .
The stated bound follows from the definition of insecurity and theorem 6.7.
X O works by running W , and responding to W ’s queries using its own oracle

O. When W queries (m0 · · · mη−1 , h, N ), X produces the values y1 , . . . , yn , where
yi = η−1 j
P
j=0 mj hii , and then queries the sequences si = O(yi , (h, s1 , . . . , si−1 ), N ), to
121
produce the result s1 , . . . , sn , which is returned to W . Now when O ← STOneBlock , it
is clear that X is perfectly simulating MultiBlock to W , so
Pr[X STOneBlock (1k ) = 1] = Pr[W STMultiBlock (1k ) = 1] .
When O ← CTOneBlock , it is also clear that X is perfectly simulating samples from

the channel to W , so
Pr[X CTOneBlock (1k ) = 1] = Pr[W CTMultiBlock (1k ) = 1] .
Combining these cases, we have that
Advss
STOneBlock k CTOneBlock k

X,OneBlock,C (k) = Pr[X (1 ) = 1] − Pr[X (1 ) = 1]

= Pr[W STMultiBlock (1k ) = 1] − Pr[W CTMultiBlock (1k ) = 1]
= Advss
W,MultiBlock,C (k)
Which completes the proof.
Theorem 6.12. If F is pseudorandom, then
Pr[MultiBlock.SD(K, MultiBlock.SE(K, m, h), h) 6= m] ≤ e−nρ/3 ,
which is negligible in n = 2λ .
Proof. As long as there are at most ρn errors, Proposition 6.9 ensures us that Correct
can recover the message m0 , . . . , mη−1 . Thus the probability of a decoding error is at
most the probability of ρn blocks having decoding error in OneBlock.Decode. But
Theorem 6.6 states that the probability of decoding error in OneBlock.Decode is
at most ρ when F is pseudorandom; applying a Chernoff bound yields the stated
result.
Theorem 6.13. The rate of MultiBlock is (1 − − o(1))M AXC (h, l, λ2λ ).
Proof. The rate of MultiBlock is the rate of OneBlock multiplied by the rate of the
error-correcting code used in encoding. Since this rate is (1 − 2ρ) = 1 − λ2−λ+3 ,
we have that the rate converges to 1 as λ −→ ∞, that is, the rate of the code is
(1 − o(1)).
122
6.3.3 Converging to optimal
We notice that if (k) = 1/λ the MultiBlock construction has error rate at most
λ /3
e−λ2 , and has rate (1 − o(1))M AXC (h, t, l). Thus under appropriate parameter
settings, the rate of the construction converges to the optimal rate in the limit.
6.3.4 Unknown length
Suppose Alice and Bob agree at the time of key exchange to use the MultiBlock
stegosystem with hiddentext block size λ. Since neither Alice nor Bob necessarily
know the values (α, β) such that C is (α, β)-informative, there is no way to calculate
or exchange beforehand the stegotext block size l so that λ ≤ (1 − )H∞ (Chl ).
Construction 6.14. FindBlock

Input: K, m ∈ {0, 1}λn , h, N Input: K, s1 , . . . , st ∈ Dt , h, N
let l = 1 let l = 1
repeat: repeat:
let t = FK0 (m) let mkλn t = LDec(K, s1...(n+k)l , h, N )
let s = LEnc(K, mkt, l, h, N ) increment l
increment l until FK0 (m) = t
until s 6=⊥. Output: m
Output: s
Procedure LEnc: Procedure LDec:

Input: K, m,h, l,N Input: K, s1 , . . . , sn+k ∈ Dl ,l,h, N
for i = 1 to n do: for i = 1 to n do:
hii
set xi = P set xi = hii
set yi = η−1 j
j=0 mj xi
set yi = OneBlock.SD(K, si , h, N )
set si = OneBlock.SE(K, yi , h, N, l) set m0 , . . . , mη = Correct(xi , yi )
if (LDec(K, s, l, h, N ) 6= m) set s =⊥ Output: m
Output: s
The idea behind this construction is simple: Alice tries using MultiBlock with
block lengths l = 1, 2, . . . until she finds one such that the decoding of the encoding of
her message is correct. With high probability, if H∞ (Chln ) ≤ λn decoding will fail (the
block error rate will be at least 1 − λ1 ), and as we have seen, when H∞ (Chln ) ≥ (λ + λ1 )n
decoding fails with only negligible probability. Since C is (α, β)-informative, Alice will
123
need to try at most d αλ
β
e values of L. Alice also encodes kl bits of “check” information
with her message, so that when Bob decodes with the wrong block size, he will be
fooled with probability only 2−lk . The rate penalty for this check data is k
n+k
= o(1)
when n = ω(k). Thus for sufficiently large λ the rate of this construction will still
converge to the optimal rate for Ch .
6.4 Robust Steganography
Recall that a stegosystem is said to be substitution robust with respect to the relation
R if an adversary, by making substitutions permitted by R is unable to change the
decoding of a stegotext, except with negligible probability. Since an adversary is
allowed to make changes to stegotexts, increasing the rate of a robust stegosystem is
a more challenging task. Here we will show that if a stegosystem is robust against
any δ-admissible relation R (given access to R), then it can encode at most log 1/δ
bits per document. We will also demonstrate an efficient, robust stegosystem which
encodes (1 − − o(1)) log 1/δ bits per document, for any constant > 0, showing that
this upper bound is tight,
6.4.1 Upper Bound

P
Recall the definition of I(R, D) = maxy (x,y)∈R PrD [x]. We will show that any
universal stegosystem for δ-admissible relations R (given access to R) which attempts
to transmit more than −` log δ bits in ` documents is either not universally secret or
not universally robust.
Theorem 6.15. Let S be a universal stegosystem. For every 0 < δ < 1, there exist
a channel C and relation R such that
√
−c
FailR
S,C (t, 0, 0, (1 + )`, k) ≥ 1 − 2
`
,
where RC (S) = (1 + ) log(1/δ), and c is a constant depending on δ.
Proof. We let C be the uniform distribution on n bit strings, and R(x, y) = 1 iff the
hamming distance of x and y is at most d, where d and n are constants chosen to
124
make I(R, C) ≤ δ. We will give an attacker W which achieves the stated success
probability. For notational convenience, we define l = −` log δ.
W picks the challenge hiddentext m∗ ← Ul , and gets in response the challenge

stegotext s∗ ∈ S.SE(K, m∗ ). W then uniformly picks a sequence s0 subject to |s0i −
s∗i | ≤ d for 1 ≤ i ≤ `. W ’s output is the sequence s0 .
We now compute the success probability of W . Recall that
0 ∗
SuccR
W,S,C (k) = Pr[SD(K, s ) 6= m ] ,
where this probability is taken over K, m∗ , s∗ , and s0 . Notice that the adversary W is
identical to a noisy discrete memoryless channel, with p(s0 |s∗ ) defined as the uniform
distribution on {s ∈ {0, 1}n : |s − s∗ | ≤ d}. This channel has Shannon capacity
exactly − log I(R, C) = − log δ. Furthermore, any robust stegosystem is a candidate
code for the channel. The strong converse to Shannon’s coding theorem [62] tells us
that any code with rate (1 + ) log 1/δ will have average error probability at least
√
1 − 2−c ` , where c = 2−4n+2 log(1/δ) (which is a constant depending on δ).
Since the event that the adversary W is successful is identical to the event that a
decoding error occurs in the code induced by SE(K, ·), SD(K, ·), we have that
√
−c
SuccR
W,S,C (k) ≥ 1 − 2
`
,
which satisfies the theorem.
6.4.2 Lower Bound
In this section we will give a secure, universally δ-substitution robust stegosystem

which achieves rate (1 − − o(1)) log(1/δ) for any > 0. We will first give an
exponentially time-inefficient construction with rate exactly (1 − ) log(1/δ) and then
show how to improve the computational efficiency at the expense of a o(1) factor in
the rate. These constructions use substantially the same ideas as were developed in
chapter 5, so the analysis will also be similar.
125
An inefficient construction
We give a stegosystem with stegotext block size ` and hiddentext block size l =
(1−)` log 1δ . Suppose that the channel distribution C is efficiently sampleable. (Recall
that C is efficiently sampleable if there is an efficient algorithm C such that, given a
uniformly chosen string s ∈ {0, 1}k , a security parameter 1k and history h, C(h, 1k , s)
is indistinguishable from Ch ). We will assume that Alice, Bob, and Ward all have
access to this algorithm. Furthermore, we assume Alice and Bob share a key K to a
pseudorandom function family F : {0, 1}k ×{0, 1}∗ → {0, 1}k ; and have a synchronized
counter N .
Construction 6.16. RLBit

Input: K, m ∈ {0, 1}l , h Input: K, s1 , . . . , s` ∈ D` , h
for i = 1 . . . n do set m∗ = 0 Set h0 = h1 = ()
set σi = C(h, 1k , FK (N, m)) for m ∈ {0, 1}l do
increment N for i = 1 . . . n do
set h = (h, si ) set σi [m] = C((h, hm ), 1k , FK (N, m))
Output: σ1 , . . . , σn set hm = (hm , σi [m])
if (∀1 ≤ i ≤ `.(σi [m], si ) ∈ R0 ) then m∗ = m
set h = (h, hm∗ )
Output: m∗
The idea behind this construction is this: suppose that instead of sharing a key to
a pseudorandom function F , Alice and Bob shared (1/δ)l secret documents dm drawn
independently from Ch . Then Alice could send Bob the message bit m by sending
document dm , and Bob could recover m by checking, for each m in turn, to see if the
document he received was related (by R0 ) to dm . Since the adversary is R bounded
and (D, R0 ) is δ-admissible, the probability of a decoding error — caused either by
the adversary, or by accidental draw of the dm — would be at most 2l δ ` = δ ` .
Lemma 6.17. RLBit is steganographically secret against a nonce-respecting chosen

hiddentext attack:
prf
InSecss
RLBit,C (t, q, ql, k) ≤ InSecF (t + `q, `q).
of total length at most ql (each query can be only l bits, because of the input type).
126
We construct a PRF adversary A which runs in time t + O(q`) and makes at most q`
queries to F , such that
Advprf ss
A,F (k) = AdvW,RLBit,C (k) .
The PRF adversary takes a function oracle f , and emulates W (1k ), responding to the
queries W makes to the encoder SE by using f in place of FK (·, ·). More formally,
we define the subroutine SSE f : {0, 1}∗ × {0, 1}∗ → {0, 1}∗ as follows:
Procedure SSE f :
Input: m ∈ {0, 1}l , history h
for i = 1 . . . ` do
set σi = C(1k , h, f (N, m))
increment N
set h = (h, σi )
Output: σ1 , . . . , σ`
f
Then we define Af (1k ) = W SSE (1k ); A’s advantage over F is then:
Advprf
FK k

F,A (k) = Pr[A
(1 ) = 1] − Pr[Af (1k ) = 1]

= Pr[W ST (1k ) = 1] − Pr[Af (1k ) = 1]

= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
RLBit,C,W (k) .
Where the following cases for f justify the substitutions:
• f is chosen from FK (·, ·). Then the output of SSE f is distributed identically
to the encoding function of RLBit. That is,
Pr[AFK (1k ) = 1] = Pr[W ST (1k ) = 1] .
• f is chosen uniformly. Then by assumption on C, the output of SSE f is dis-

tributed identically to samples from Ch` . that is,
Pr[Af (1k ) = 1] = Pr[W CT (1k ) = 1] .
127
prf
Lemma 6.18. FailR l l `
RLBit (t, q, ql, l, k) ≤ InSecF (t + O(q`) + 2 `, q` + 2 `, k) + δ .
Proof. Let W be an active R-bounded (t, q, ql, l) warden. We construct a PRF ad-
versary A which runs in time t + O(q`), makes at most q` PRF queries, and satisfies
Advprf R ` f
F,A (k) ≥ SuccW,RLBit,C (k) − δ . A works by emulating W , using its function
oracle f in place of FK (·, ·) to emulate RLBit.Encode in responding to the queries

of W . Let mW , sW be the hiddentext and the stegotext sequence returned by W ,
respectively. Then Af returns 1 iff SDf (sW , hW ) 6= mW . Consider the following two
cases for f :
• f is chosen uniformly from all appropriate functions. Then, for each i, the stego-
texts σi = C(1k , hi , f (N +i, mW )) are distributed independently according to Chi .
Consider the sequence of “alternative stegotexts” di [m0 ] = C(1k , h0i , f (N +i, m0 ))
for each m0 6= mW ∈ {0, 1}l ; each of these is also distributed independently ac-
cording to Chi [m0 ] ; and since W is never given access to the di [m0 ], the si are
independent of the di [m0 ]. Now SD will fail (causing Af (1k ) to output 1) only if
the event ∃m0 .∀i.(di [m0 ], si ) ∈ R0 occurs. Because the di [m0 ] are independent of
the actions of W , and because (C, R0 ) is δ-admissible, each event (di [m0 ], si ) ∈ R0
happens independently with probability at most δ. So for each m0 , the proba-
bility of failure is at most δ ` , and thus by a union bound, we have that
X
Pr[Af (1k ) = 1] ≤ δ ` = δ ` .
m0 ∈{0,1}l

against RLBit, by the definition of RLBit.

RLBit,W (k) .
Advprf
F,A (k) = Pr[A
FK k
(1 ) = 1] − Pr[Af (1k ) = 1]
= SuccR f k
RLBit,W (k) − Pr[A (1 ) = 1]
≥ SuccR `
RLBit,W (l) − δ .
128
Improving the run-time
Notice that because the running time of the decoding procedure for RLBit is expo-
nential in `, the proof of robustness is not very strong: the information-theoretic
bound on the success of W is essentially polynomial in the running time of the PRF
adversary we construct from W . Still, if we set ` = poly(log k), and assume subexpo-
nential hardness for F , we obtain a negligible bound on the success probability, but
a quasi-polynomial time decoding routine. We will now give a construction with a
polynomial-time decoding algorithm, at the expense of a o(1) factor in the rate.
As before we will assume that C is efficiently sampleable, that F : {0, 1}k ×

{0, 1}∗ → {0, 1}k is pseudorandom and both parties share a secret K ∈ {0, 1}k , and
a synchronized counter N . As before, we will let l = (1 − )` log(1/δ), but we now set
` so that l = log k. We set an additional parameter L = k/ log(1/δ).
Construction 6.19. RMBit

Input: K, m1 , . . . , mn ∈ {0, 1}l , h, N Input: K, s1 , . . . , sn ∈ D` ,t ∈ DL , h, N
for i = 1 . . . n + d do let m∗ = 0
set σi = LEnc(K, m1...i , h, N, `) let L = LDec(K, s1 . . . sn , ε, h, N )
set h = (h, σi ) for each m ∈ L do
set τ = LEnc(K, m, h, N, L) SetVσ1 , . . . , σn , τ = Encode(K, m, h, N )
Output: σ1 , . . . , σn , τ if ( 1≤i≤L (τi , ti ) ∈ R0 ) then
set m∗ = m
Output: m∗
Procedure LEnc: Procedure LDec:

Input: K, m,h, N ,len Input: K, s1 , . . . , sa , m∗ ,h, N
for i = 1 . . . len do let L = {}
set σi = C(h, 1k , FK (N, i, m)) for m ∈ {0, 1}l do
set h = (h, σi ) Let m0 = m∗ km
Output: σ1 , . . . , σlen SetVσ[m] = LEnc(K, m0 , h, N, `)
if ( 1≤i≤` (σi [m], si ) ∈ R0 ) then
let h[m] = (h, σ[m])
let T = LDec(K, s2...a , m0 , h[m], N )
set L = L ∪ {mkt : t ∈ T }
Output: L
The idea behind this procedure is to break a message up into l-bit blocks, and
encode each one using the RLBit.Encode procedure; and then append the encoding
129
of L documents of message-dependent redundancy. To decode, we iteratively attempt
to match each stegotext block against each of the 2l = k possible hiddentext blocks;
there will always be one matching block, and with some small probability k − there
will be an additional match. We perform a depth-first search on these matches to
find a list of candidate messages, and then test each message to see whether the
redundant encoding matches. Any candidate match from the depth-first search will
also have matching redundancy with probability 2−k , and a union bound will thus
bound the probability of a decoding failure by (1 + 1 )2−k . Furthermore, the total
expected number of nodes explored by Decode is at most (1 + 1 )n; thus our reduction
will be efficient.
Theorem 6.20. RMBit is steganographically secret against a nonce-respecting chosen

hiddentext attack:
prf
InSecss
RMBit,C (t, q, lµ, k) ≤ InSecF (t + O(µ`), µ`).
of total length at most lµ (each query must be a multiple of l bits, because of the
input type). We construct a PRF adversary A which runs in time t + O(µ`) and
makes at most µ` queries to F , such that
Advprf ss
A,F (k) = AdvW,RMBit,C (k) .
The PRF adversary takes a function oracle f , and emulates W (1k ), responding to
the queries W makes to its oracle O by running RMBit.Encode, using f in place of
FK (·, ·). Consider the following cases for f :
• f is chosen from FK (·, ·). Then the responses to W ’s queries are distributed
identically to the encoding function of RMBit. That is,
Pr[AFK (1k ) = 1] = Pr[W ST (1k ) = 1] .
• f is chosen uniformly. Then by assumption on C, the response to each query by

W is distributed identically to samples from Ch` . that is,
Pr[Af (1k ) = 1] = Pr[W CT (1k ) = 1] .
130
A’s advantage over F is then:
Advprf
FK k

F,A (k) = Pr[A
(1 ) = 1] − Pr[Af (1k ) = 1]

= Pr[W ST (1k ) = 1] − Pr[Af (1k ) = 1]

= Pr[W ST (1k ) = 1] − Pr[W CT (1k ) = 1]
= Advss
W,RMBit,C (k) .
Theorem 6.21. RMBit is robust:
prf 0 −k
FailR
RMBit (t, q, lµ, ln, k) ≤ InSecF (t , 2n(1 + 1/) + l(µ + n), k) + (1 + 1/)2 + (e/4)n ,
where t0 ≤ t + O((l + µ)n) + O((1 + 1/)kn).
Proof. Let W be an active R-bounded (t, q, lµ, ln) warden. We construct a PRF
adversary A which runs in time t0 , makes at most 2n(1 + 1/) + l(µ + n) PRF queries,
and satisfies Advprf R
A,F (k) ≥ SuccW,RMBit,C (k) − (1 + 1/)2
−k
− (e/4)n . Af works by
emulating W , using its function oracle f in place of FK (·, ·) to emulate RMBit.Encode
in responding to the queries of W . Let m∗ , s∗ be the hiddentext and the stegotext
sequence returned by W , respectively. Then Af returns 1 iff SDf (s∗ , h∗ ) 6= m∗ . To
ensure that the number of queries and running time are at most t0 , and 2n(1 + 1/) +
l(µ + n), we halt whenever SDf queries makes more than 2n(1 + 1/) to f , an event
we will denote by TB. We will show that Pr[TB] ≤ (e/4)n when f is a randomly
chosen function. Thus we can neglect this case in our analyses of the cases for f .
Consider the following two cases for f :
• f is chosen uniformly from all appropriate functions. Then, a decoding error

happens when there exists another m ∈ {0, 1}ln such that for all (i, j), 1 ≤ i ≤ `,
1 ≤ j ≤ n, we have (s(j−1)n+i , LEncf (m1...j )i ) ∈ R; and also (s`n+i , LEncf (m)i ) ∈
R for all i, 1 ≤ i ≤ L. Let j be the least j such that mj 6= m∗j . Then for blocks
∗
mj+1 , . . . , mn , the `-document blocks LEncf (m1...j+i ) are independent of σj+i .
Thus for such m, the probability of a match is at most δ `(n−j)+L = 2−k δ (n−j)` .
131
Since there are 2l(n−j) messages matching m∗ in the first j blocks, we have that
Pr[Af (1k ) = 1] = Pr[SDf (s∗ ) 6= m∗ ]

^
≤ Pr[∃m 6= m∗ . (si (m1...i/l ), s∗i ) ∈ R]
1≤i≤`n+L
n
X
≤ 2l(n−j) 2−k δ (n−j)`
j=0
∞
X
≤ 2−k δ `j
j=0
1
= 2−k
1 − δ `
≤ 2−k (1 + 1/)

against RMBit, by the definition of RMBit.

W,RMBit,C (k) .
Advprf
F,A (k) = Pr[A
FK k
(1 ) = 1] − Pr[Af (1k ) = 1]
= SuccR f k
RMBit,W (k) − Pr[A (1 ) = 1]
−k
≥ SuccR
RMBit,W (l) − (1 + 1/)2 − Pr[TB] .
It remains to show that Pr[TB] ≤ (e/4)n . Notice that the expected number of
queries to f by A is just the number of messages that match a j`-document prefix of
s∗ , for 1 ≤ j ≤ n, times k. Let Xm = 1 if m ∈ {0, 1}j` matches a j-block prefix of s∗ .
Let X = nj=1 m∈{0,1}j ` Xm denote the number of matching prefix messages. Then
P P
n ≤ E[X] ≤ n(1 + 1/), and a Chernoff bound gives us
Pr[X > 2n(1 + 1/)] ≤ Pr[X > 2E[X]]

≤ (e/4)E[X]
≤ (e/4)n
which completes the proof.
132
Theorem 6.22. RC (RMBit) = (1 − ) log(1/δ) − o(1)
Proof. For a message of length ln = (1 − ) log(1/δ)`n, RM Bit transmits `n + L =

`n + k/ log(1/δ) documents. Thus the rate is
(1 − ) log(1/δ)`n O(k)
= (1 − ) log(1/δ) −
`n + k/ log(1/δ) `n + O(k)
k
≥ (1 − ) log(1/δ) −
n
For any choice of n = ω(k), the second term is o(1), as claimed.
133
134
Chapter 7
Covert Computation
7.1 Introduction
Secure two-party computation allows Alice and Bob to evaluate a function of their
secret inputs so that neither learns anything other than the output of the function.
A real-world example that is often used to illustrate the applications of this primitive
is when Alice and Bob wish to determine if they are romantically interested in each
other. Secure two-party computation allows them to do so without revealing their
true feelings unless they are both attracted. By securely evaluating the AND of the
bits representing whether each is attracted to the other, both parties can learn if
there is a match without risking embarrassment: if Bob is not interested in Alice, for
instance, the protocol does not reveal whether Alice is interested in him. So goes the
example.
However, though often used to illustrate the concept, this example is not entirely
logical. The very use of two-party computation already reveals possible interest from
one party: “would you like to determine if we are both attracted to each other?”
A similar limitation occurs in a variety of other applications where the very use
of the primitive raises enough suspicion to defeat its purpose. To overcome this lim-
itation we introduce covert two-party computation, which guarantees the following
(in addition to leaking no additional knowledge about the individual inputs): (A) no
135
outside eavesdropper can determine whether the two parties are performing the com-
putation or simply communicating as they normally do; (B) before learning f (xA , xB ),
neither party can tell whether the other is running the protocol; (C) at any point prior
to or after the conclusion of the protocol, each party can only determine if the other
ran the protocol insofar as they can distinguish f (xA , xB ) from uniformly chosen
random bits. By defining a functionality g(xA , xB ) such that g(xA , xB ) = f (xA , xB )
whenever f (xA , xB ) ∈ Y and g(xA , xB ) is pseudorandom otherwise, covert two-party
computation allows the construction of protocols that return f (xA , xB ) only when it
is in a certain set of interesting values Y but for which neither party can determine
whether the other even ran the protocol whenever f (xA , xB ) ∈
/ Y . Among the many
important potential applications of covert two-party computation we mention the
following:
• Dating. As hinted above, covert two-party computation can be used to prop-

erly determine if two people are romantically interested in each other. It al-
lows a person to approach another and perform a computation hidden in their
normal-looking messages such that: (1) if both are romantically interested in
each other, they both find out; (2) if none or only one of them is interested in
the other, neither will be able to determine that a computation even took place.
In case both parties are romantically interested in each other, it is important
to guarantee that both obtain the result. If one of the parties can get the result
while ensuring that the other one doesn’t, this party would be able to learn
the other’s input by pretending he is romantically interested; there would be no
harm for him in doing so since the other would never see the result. However,
if the protocol is fair (either both obtain the result or neither of them does),
parties have a deterrence from lying.
• Cheating in card games. Suppose two parties playing a card game want
to determine whether they should cheat. Each of them is self-interested, so
cheating should not occur unless both players can benefit from it. Using covert
two-party computation with both players’ hands as input allows them to com-
pute if they have an opportunity to benefit from cheating while guaranteeing
that: (1) neither player finds out whether the other attempted to cheat unless
136
they can both benefit from it; (2) none of the other players can determine if the
two are secretly planning to collude.
• Bribes. Deciding whether to bribe an official can be a difficult problem. If

the official is corrupt, bribery can be extremely helpful and sometimes neces-
sary. However, if the official abides by the law, attempting to bribe him can
have extremely negative consequences. Covert two-party computation allows
individuals to approach officials and negotiate a bribe with the following guar-
antees: (1) if the official is willing to accept bribes and the individual is willing
to give them, the bribe is agreed to; (2) if at least one of them is not willing to
participate in the bribe, neither of them will be able to determine if the other
attempted or understood the attempt of bribery; (3) the official’s supervisor,
even after seeing the entire sequence of messages exchanged, will not be able to
determine if the parties performed or attempted bribery.
• Covert Authentication. Imagine that Alex works for the CIA and Bob works
for Mossad. Both have infiltrated a single terrorist cell. If they can discover
their “mutual interest” they could pool their efforts; thus both should be look-
ing for potential collaborators. On the other hand, suggesting something out
of the ordinary is happening to a normal member of the cell would likely be
fatal. Running a covert computation in which both parties’ inputs are their
(unforgeable) credentials and the result is 1k if they are allies and uniform bits
otherwise will allow Alex and Bob to authenticate each other such that if Bob
is NOT an ally, he will not know that Alex was even asking for authentica-
tion, and vice-versa. (Similar situations occur in, e.g., planning a coup d’etat or
constructing a zombie network)
• Cooperation between competitors. Imagine that Alice and Bob are com-
peting online retailers and both are being compromised by a sophisticated
cracker. Because of the volume of their logs, neither Alice nor Bob can draw a
reliable inference about the location of the hacker; statistical analysis indicates
about twice as many attack events are required to isolate the cracker. Thus if
Alice and Bob were to compare their logs, they could solve their problem. But
if Alice admits she is being hacked and Bob is not, he will certainly use this
137
information to take her customers; and vice-versa. Using covert computation to
perform the log analysis online can break this impasse. If Alice is concerned that
Bob might fabricate data to try and learn something from her logs, the com-
putation could be modified so that when an attacker is identified, the output is
both an attacker and a signed contract stating that Alice is due a prohibitively
large fine (for instance, $1 Billion US) if she can determine that Bob falsified
his log, and vice-versa. Similar situations occur whenever cooperation might
benefit mutually distrustful competitors.
Our protocols make use of provably secure steganography [4, 7, 34, 53] to hide the
computation in innocent-looking communications. Steganography alone, however, is
not enough. Combining steganography with two-party computation in the obvious
black-box manner (i.e., forcing all the parties participating in an ordinary two-party
protocol to communicate steganographically) yields protocols that are undetectable to
an outside observer but does not guarantee that the participants will fail to determine
if the computation took place. Depending on the output of the function, we wish to
hide that the computation took place even from the participants themselves.
Synchronization, and who knows what?
Given the guarantees that covert-two party computation offers, it is important to

clarify what the parties know and what they don’t. We assume that both parties know
a common circuit for the function that they wish to evaluate, that they know which
role they will play in the evaluation, and that they know when to start evaluating the
circuit if the computation is going to occur. An example of such “synchronization”
information could be: “if we will determine whether we both like each other, the
computation will start with the first message exchanged after 5pm.” (Notice that
since such details can be published as part of the protocol specification, there is no
need for either party to indicate that they wish to compute anything at all) We assume
adversarial parties know all such details of the protocols we construct.
138
Hiding Computation vs. Hiding inputs
Notice that covert computation is not about hiding which function Alice and Bob are
interested in computing, which could be accomplished via standard SFE techniques:
Covert Computation hides the fact that Alice and Bob are interested in computing a
function at all. This point is vital in the case of, e.g., covert authentication, where
expressing a desire to do anything out of the ordinary could result in the death of
one of the parties. In fact, we assume that the specific function to be computed (if
any) is known to all parties. This is analogous to the difference in security goals
between steganography – where the adversary is assumed to know which message, if
any, is hidden – and encryption, where the adversary is trying to decide which of two
messages are hidden.
Roadmap.
The high-level view of our presentation is as follows. First, we will define the secu-
rity properties of covert two-party computation. Then we will present two protocols.
The first protocol we present will be a modification of Yao’s “garbled circuit” two-
party protocol in which, except for the oblivious transfer, all messages generated are
indistinguishable from uniform random bits. We construct a protocol for oblivious
transfer that generates messages that are indistinguishable from uniform random bits
(under the Decisional Diffie-Hellman assumption) to yield a complete protocol for
two-party secure function evaluation that generates messages indistinguishable from
random bits. We then use steganography to transform this into a protocol that gener-
ates messages indistinguishable from “ordinary” communications. The protocol thus
constructed, however, is not secure against malicious adversaries nor is it fair (since
neither is Yao’s protocol by itself). We therefore construct another protocol, which
uses our modification of Yao’s protocol as a subroutine, that satisfies fairness and is
secure against malicious adversaries, in the Random Oracle Model. The major diffi-
culty in doing so is that the standard zero-knowledge-based techniques for converting
a protocol in the honest-but-curious model into a protocol secure against malicious
adversaries cannot be applied in our case, since they reveal that that the other party
is running the protocol.
139
Related Work.
Secure two-party computation was introduced by Yao [63]. Since then, there have
been several papers on the topic and we refer the reader to a survey by Goldreich [26]
for further references. Constructions that yield fairness for two-party computation
were introduced by Yao [64], Galil et al. [24], Brickell et al. [15], and many others
(see [51] for a more complete list of such references). The notion of covert two-party
computation, however, appears to be completely new.
Notation.
We say a function µ : N → [0, 1] is negligible if for every c > 0, for all sufficiently
large k, µ(k) < 1/k c . We denote the length (in bits) of a string or integer s by |s|
and the concatenation of string s1 and string s2 by s1 ||s2 . We let Uk denote the
uniform distribution on k bit strings. If D is a distribution with finite support X,
we define the minimum entropy of D as H∞ (D) = minx∈X {log2 (1/ PrD [x])}. The
statistical distance between two distributions C and D with joint support X is defined
P
by ∆(C, D) = (1/2) x∈X | PrD [x] − PrC [x]|. Two sequences of distributions, {Ck }k
and {Dk }k , are called computationally indistinguishable, written C ≈ D, if for any
probabilistic polynomial-time A, AdvC,D
A (k) = |Pr[A(Ck ) = 1] − Pr[A(Dk ) = 1]| is
negligible in k.
7.2 Covert Two-Party Computation Against Semi-

Honest Adversaries
We now present a protocol for covert two-party computation that is secure against
semi-honest adversaries in the standard model (without Random Oracles) and as-
sumes that the decisional Diffie-Hellman problem is hard. The protocol is based on
Yao’s well-known function evaluation protocol [63].
We first define covert two-party computation formally, following standard defini-

tions for secure two-party computation, and we then describe Yao’s protocol and the
140
necessary modifications to turn it into a covert computation protocol. The definition
presented in this section is only against honest-but-curious adversaries and is unfair in
that only one of the parties obtains the result. In Section 4 we will define covert two-
party computation against malicious adversaries and present a protocol that is fair:
either both parties obtain the result or neither of them does. The protocol in Section
4 uses the honest-but-curious protocol presented in this section as a subroutine.
7.2.1 Definitions
Formally, a two-party, n-round protocol is a pair Π = (P0 , P1 ) of programs. The

computation of Π proceeds as follows: at each round, P0 is run on its input x0 ,
the security parameter 1k , a state s0 , and the (initially empty) history of messages
exchanged so far, to produce a new message m and an internal state s0 . The message
m is sent to P1 , which is run on its input x1 , the security parameter 1k , a state s1 , and
the history of messages exchanged so far to produce a message that is sent back to P0 ,
and a state s1 to be used in the next round. Denote by hP0 (x0 ), P1 (x1 )i the transcript
of the interaction of P0 with input x0 and P1 with input x1 . This transcript includes
all messages exchanged between P0 and P1 along with the timestep in which they
were sent. After n rounds, each party P ∈ {P0 , P1 } halts with an output, denoted
by ΠP (x0 , x1 ) = ΠP (x̄). We say that Π correctly realizes the functionality f if for at
least one P ∈ {P0 , P1 }, Pr[ΠP (x̄) = f (x̄)] ≥ 1 − ν(k), where ν is negligible.
For σ ∈ {0, 1}, we denote by VΠPσ (x0 , x1 ) the view of party Pσ on input xσ when
interacting with P1−σ on input x1−σ . The view includes Pσ ’s input xσ , private random
bits, and all messages sent by P0 and P1 . We say Π securely realizes the functionality
f if Π correctly realizes f and, for any Pσ0 and x1−σ , there is a simulator Pσ00 and an
P0
xσ such that Pσ00 (f (x0 , x1 )) ≈ VΠ σ (x0 , x1 ). Notice that given f (x0 , x1 ), Pσ0 could just
use Pσ00 to simulate his interaction with P1−σ without actually running Π. Thus if Π
securely implements f , neither party learns more from the interaction than could be
learned from just f (x0 , x1 ).
Define the view of party P interacting in protocol Π up through round j by

P
VΠ,j (x̄). When party Pσ is not executing Π but is drawing from B instead, we denote
141
this “protocol” by Π : Bσ .
Definition 7.1. (Covert two-party protocol against honest-but-curious adversaries)

We say an n-round, two-party protocol (P0 , P1 ) covertly realizes the functionality f
for bidirectional channel B if it securely realizes f and if it has the following additional
properties:
1. (External covertness): For any input x̄, hP0 (x0 ), P1 (x1 )i ≈ B.
P0 P0 P1
2. (Internal covertness): For any input x̄, VΠ,n (x̄) ≈ VΠ:B1 ,n
(x̄) and VΠ,n−1 (x̄) ≈
P1
VΠ:B0 ,n−1
(x̄).
3. (Final Covertness): For every PPT D there exists a PPT D0 and a negligible
P1 P1
V (X0 ,x1 ),VΠ:B (X0 ,x1 )
ν such that for any x1 and any distribution X0 , AdvDΠ 0
(k) ≤
f (X0 ,x1 ),Ul
AdvD0 (k) + ν(k).
In other words, until the final round, neither party can distinguish between the
case that the other is running the protocol or just drawing from B; and after the final
message, P0 still cannot tell, while P1 can only distinguish the cases if f (x0 , x1 ) and
Um are distinguishable. Note that property 2 implies property 1, since P0 could apply
the distinguisher to his view (less the random bits).
We will slightly abuse notation and say that a protocol which has messages indis-
tinguishable from random bits (even given one party’s view) is covert for the uniform
channel U.
7.2.2 Yao’s Protocol For Two-Party Secure Function Evalu-

ation
Yao’s protocol [63] securely (not covertly) realizes any functionality f that is expressed
as a combinatorial circuit. Our description is based on [46]. The protocol is run
between two parties, the Input Owner A and the Program Owner B. The input of
A is a value x, and the input of B is a description of a function f . At the end of
the protocol, B learns f (x) (and nothing else about x), and A learns nothing about
142
f . The protocol requires two cryptographic primitives, pseudorandom functions and
oblivious transfer, which we describe here for completeness.
Pseudorandom Functions.
Let {F : {0, 1}k × {0, 1}L(k) → {0, 1}l(k) }k denote a sequence of function families.
Let A be an oracle probabilistic adversary. We define the prf-advantage of A over
F as Advprf
A,F (k) = | PrK [A
FK (·) k
(1 ) = 1] − Prg [Ag (1k ) = 1]|, where K ← Uk and g
is a uniformly chosen function from L(k) bits to l(k) bits. Then F is pseudorandom
if Advprf
A,F (k) is negligible in k for all polynomial-time A. We will write FK (·) as
shorthand for F|K| (K, ·) when |K| is known.
Oblivious Transfer.
1-out-of-2 oblivious transfer (OT21 ) allows two parties, the sender who knows the
values m0 and m1 , and the chooser whose input is σ ∈ {0, 1}, to communicate in such
a way that at the end of the protocol the chooser learns mσ , while learning nothing
about m1−σ , and the sender learns nothing about σ. Formally, let O = (S, C) be a pair
of interactive PPT programs. We say that O is correct if Pr[OC ((m0 , m1 ), σ) = mσ ] ≥
1 − (k) for negligible . We say that O has chooser privacy if for any PPT S 0 and

any m0 , m1 , Pr[S 0 (hS 0 (m0 , m1 ), C(σ)i) = σ] − 1 ≤ (k) and O has sender privacy if
2
0
for any PPT C 0 there exists a σ and a PPT C 00 such that C 00 (mσ ) ≈ VΠC ((m0 , m1 ), σ).
We say that O securely realizes the functionality OT21 if O is correct and has chooser
and sender privacy.
Yao’s Protocol.
Yao’s protocol is based on expressing f as a combinatorial circuit. Starting with

the circuit, the program owner B assigns to each wire i two random k-bit values
(Wi0 , Wi1 ) corresponding to the 0 and 1 values of the wire. It also assigns a random
permutation πi over {0, 1} to the wire. If a wire has value bi we say it has “garbled”
value (Wibi , πi (bi )). To each gate g, B assigns a unique identifier Ig and a table Tg
which enables computation of the garbled output of the gate given the garbled inputs.
143
Given the garbled inputs to g, Tg does not disclose any information about the garbled
output of g for any other inputs, nor does it reveal the actual values of the input bits
or the output bit.
Assume g has two input wires (i, j) and one output wire out (gates with higher
fan in or fan out can be accommodated with straightforward modifications). The
construction of Tg uses a pseudorandom function F whose output length is k + 1.
The table Tg is as follows:
πi (bi ) πj (bj ) value

g(b ,b )
0 0 (Wout i j , πo (bout )) ⊕F b (Ig , 0) ⊕ FW bi (Ig , 0)
Wj j i
g(b ,bj )
0 1 (Wout i , πo (bout )) ⊕ F b (Ig , 0) ⊕ FW bi (Ig , 1)
Wj j i
g(b ,bj )
1 0 (Wout i , πo (bout )) ⊕ F b (Ig , 1) ⊕ FW bi (Ig , 0)
Wj j i
g(b ,b )
1 1 (Wout i j , πo (bout )) ⊕F b (Ig , 1) ⊕ FW bi (Ig , 1)
Wj j i
To compute f (x), B computes garbled tables Tg for each gate g, and sends the tables
to A. Then, for each circuit input wire i, A and B perform an oblivious transfer,
where A plays the role of the chooser (with σ = bi ) and B plays the role of the
sender, with m0 = Wi0 kπi (0) and m1 = Wi1 kπi (1). A computes πj (bj ) for each output
wire j of the circuit (by trickling down the garbled inputs using the garbled tables)
and sends these values to B, who applies πj−1 to learn bj . Alternatively, B can send
the values πj (for each circuit output wire j) to A, who then learns the result. Notice
that the first two columns of Tg can be implicitly represented, leaving a “table” which
is indistinguishable from uniformly chosen bits.
7.2.3 Steganographic Encoding
We use provably secure steganography to transform Yao’s protocol into a covert

two-party protocol; we also use it as a building block for all other covert proto-
cols presented in this paper. For completeness we state a construction that has
appeared in various forms in [4, 16, 34]. Let HASH denote a family of hash functions
H : D → {0, 1}c which is pairwise independent, that is, for any x1 6= x2 ∈ D, for any
y1 , y2 ∈ {0, 1}m , PrH [H(x1 ) = y1 ∧ H(x2 ) = y2 ] = 1/22m . Let D denote an arbitrary
144
probability distribution on D satisfying H∞ (D) = `(k) where k is the security pa-
rameter. The following constructions hide and recover m uniformly-chosen bits in a
distribution indistinguishable from D when `(k) − m = ω(log k) and m = O(log k).
Construction 7.2. (Basic steganographic encoding/decoding routines)
Procedure Basic EncodeD : Procedure Basic Decode:

Input: H ∈ HASH, c ∈ {0, 1}m Input: H ∈ HASH, s ∈ D
Let j = 0 set c = H(s)
repeat: Output: c
sample s ← D, increment j
until H(s) = c OR (j > k)
Output: s
Proposition 7.3. Let H ← HASH. Then
∆ (H, Basic EncodeD (H, Um )), (H, D) ≤ 2−(`(k)−m)/2+1 .

The result follows from the Leftover Hash Lemma ([33], Lemma 4.8). Intuitively,
it guarantees that Basic Encode(c) will be (statistically) indistinguishable from the
messages exchanged in a bidirectional channel whenever c is a uniformly chosen bit
string. (When we refer to Basic Encode with only a single argument, we implicitly
assume that an appropriate h has been chosen and is publicly accessible to all parties.)
Thus, to guarantee covertness for channel B, we will ensure that all our protocols
generate messages that are indistinguishable from uniformly chosen random bits and
then encode these messages with Basic Encode. Formally, suppose Π = (P0 , P1 )
is an arbitrary two-party protocol which securely realizes the functionality f . We
will construct a protocol ΣΠ = (S0P0 , S1P1 ) which has the property that if VΠPb (x̄) is
indistinguishable from uniformly chosen bits (that is, Π covertly realizes f for the
uniform channel), then ΣΠ covertly realizes the functionality f for channel B. We
assume that P0 , P1 have the property that, given a partial input, they return the
string ε, indicating that more bits of input are needed. Then SbPb has the following
round function (which simply uses Basic Encode and Basic Decode to encode and
decode all messages exchanged by P0 and P1 ):
145
Construction 7.4. (Transformation to a covert protocol)
Procedure SbPb :
Input: history h ∈ H, state, document s ∈ D
draw d ← BhPb
if (state.status = “receiving”) then
set state.msg = state.msgkBasic Decode(s)
set c = Pb (state.msg)
if (c 6= ε) set state.status = “sending”; set state.msg = c
if (state.status = “sending”) then
if (d 6=⊥) then
set c = first m bits of state.msg
set state.msg = state.msg without the first m bits
Pb
6=⊥)
set d = Basic Encode(Ch (c)
if state.msg = “” set state.status = “receiving”
Output: message d, state
Theorem 7.5. If Π covertly realizes the functionality f for the uniform channel, then
ΣΠ covertly realizes f for the bidirectional channel B.
Proof. Let k c be an upper bound on the number of bits in hP0 (x0 ), P1 (x1 )i. Then ΣΠ
transmits at most 2k c /m (non-empty) documents. Suppose there is a distinguisher
Sb
D for VΣSb (x̄) from VΣ:B1−b
(x̄) with significant advantage . Then D can be used to
distinguish VΠPb (x̄) from VΠ:U
Pb
1−b
(x̄), by simulating each round as in Σ to produce a
transcript T ; If the input is uniform, then ∆(T, B) ≤ (k c /m)22−(`(k)−m)/2 = ν(k),
and if the input is from Π, then T is identical to VΣSb (x̄). Thus D’s advantage in
distinguishing Π from Π : U1−b is at least − ν(k).
IMPORTANT: For the remainder of the paper we will present protocols Π that
covertly realize f for U. It is to be understood that the final protocol is meant to
be ΣΠ , and that when we state that “Π covertly realizes the functionality f ” we are
referring to ΣΠ .
146
7.2.4 Covert Oblivious Transfer
As mentioned above, we guarantee the security of our protocols by ensuring that all
the messages exchanged are indistinguishable from uniformly chosen random bits. To
this effect, we present a modification of the Naor-Pinkas [45] protocol for oblivious
transfer that ensures that all messages exchanged are indistinguishable from uniform
when the input messages m0 and m1 are uniformly chosen. Our protocol relies on the
well-known integer decisional Diffie-Hellman assumption:
Integer Decisional Diffie-Hellman.
Let P and Q be primes such that Q divides P − 1, let Z∗P be the multiplicative
group of integers modulo P , and let g ∈ Z∗P have order Q. Let A be an adversary
that takes as input three elements of Z∗P and outputs a single bit. Define the DDH
advantage of A over (g, P, Q) as: Advddh a b ab
A (g, P, Q) = | Pra,b,r [Ar (g , g , g , g, P, Q) =
1] − Pra,b,c,r [Ar (g a , g b , g c , g, P, Q) = 1]|, where Ar denotes the adversary A running

with random tape r, a, b, c are chosen uniformly at random from ZQ and all the
multiplications are over Z∗P . The Integer Decisional Diffie-Hellman assumption (DDH)
states that for every PPT A, for every sequence {(Pk , Qk , gk )}k satisfying |Pk | = k
and |Qk | = Θ(k), Advddh
A (gk , Pk , Qk ) is negligible in k.
Setup.
Let p = rq + 1 where 2k < p < 2k+1 , q is a large prime, and gcd(r, q) = 1; let g
generate Z∗p and thus γ = g r generates the unique multiplicative subgroup of order
q; let r̂ be the least integer r such that rr̂ = 1 mod q. Assume |m0 | = |m1 | < k/2.
Let H : {0, 1}2k × Zp → {0, 1}k/2 be a pairwise-independent family of hash functions.
Define the randomized mapping φ : hγi → Z∗p by φ(h) = hr̂ g βq , for a uniformly chosen
β ∈ Zr ; notice that φ(h)r = h and that for a uniformly chosen h ∈ hγi, φ(h) is a
uniformly chosen element of Z∗p . The following protocol is a simple modification of
the Naor-Pinkas 2-round oblivious transfer protocol [45]:
Construction 7.6. COT:
147
1. On input σ ∈ {0, 1}, C chooses uniform a, b ∈ Zq , sets cσ = ab mod q and
uniformly chooses c1−σ ∈ Zq . C sets x = γ a , y = γ b , z0 = γ c0 , z1 = γ c1 and sets
x0 = φ(x), y 0 = φ(y), z00 = φ(z0 ), z10 = φ(z1 ). If the most significant bits of all of
x0 , y 0 , z00 , z10 are 0, C sends the least significant k bits of each to S; otherwise C
picks new a, b, c1−σ and starts over.
2. The sender recovers x, y, z0 , z1 by raising to the power r, picks f0 , f1 ∈ H and

then:
• S repeatedly chooses uniform r0 , s0 ∈ Zq and sets w0 = xs0 γ r0 , w00 = φ(w0 )

until he finds a pair with w00 ≤ 2k . He then sets K0 = z0s0 y r0 .
• S repeatedly chooses uniform r1 , s1 ∈ Zq and sets w1 = xs1 γ r1 , w10 = φ(w1 )
until he finds a pair with w10 ≤ 2k . He then sets K1 = z1s1 y r1 .
S sends w00 kf0 kf0 (K0 ) ⊕ m0 kw10 kf1 kf1 (K1 ) ⊕ m1
3. C recovers Kσ = (wσ0 )rb and computes mσ .
Lemma 7.7. S cannot distinguish between the case that C is following the COT
protocol and the case that C is drawing from Uk ; that is,
S S
VCOT (m0 , m1 , σ) ≈ VCOT:UC
(m0 , m1 , σ).
Proof. Suppose that there exists a distinguisher D with advantage . Then there
exists a DDH adversary A with advantage at least /8 − ν(k) for a negligible ν. A
takes as input a triple (γ a , γ b , γ c ), picks a random bit σ, sets zσ = γ c and picks a
0
uniform z1−σ ∈ {0, 1}k , and computes x0 = φ(γ a ), y 0 = φ(γ b ), zσ0 = φ(zσ ); if all three
are at most 2k , then A outputs D(x0 , y 0 , z00 , z10 ), otherwise A outputs 0.
Clearly, when c 6= ab,

1
Pr[A(γ a , γ b , γ c ) = 1] ≥ S
Pr[D(VCOT:U ) = 1] ,
8 C
since the elements passed by A to D are uniformly chosen and D calls A with proba-
bility at least 1/8 (since each of x0 , y 0 , zσ0 are greater than 2k with probability at most
1/2). But when c = ab, then
Pr[A(γ a , γ b , γ c ) = 1] ≥ (1/8 − ν(k)) Pr[D(VCOT

S
) = 1] ,
148
since the elements passed by A to D are chosen exactly according to the distribution
on C’s output specified by COT ; and since the probability that D is invoked by A
is at least 1/8 when c 6= ab it can be at most ν(k) less when c = ab, by the Integer
DDH assumption. Thus the DDH advantage of A is at least /8 − ν(k). Since /8
must be negligible by the DDH assumption, we have that D’s advantage must also
be negligible.
Lemma 7.8. When m0 , m1 ← Uk/2 , C cannot distinguish between the case that S is
following the COT protocol and the case that S is sending uniformly chosen strings.
C C
That is, VCOT (Uk/2 , Uk/2 , σ) ≈ VCOT:US
(Uk/2 , Uk/2 , σ).
Proof. The group elements w0 , w1 are uniformly chosen by S; thus when m0 , m1 are
uniformly chosen, the message sent by S must also be uniformly distributed.
Lemma 7.9. The COT protocol securely realizes the OT21 functionality.
Proof. The protocol described by Pinkas and Naor is identical to the COT protocol,
with the exception that φ is not applied to the group elements x, y, z0 , z1 , w0 , w1 and
these elements are not rejected if they are greater than 2k . Suppose an adversarial
sender can predict σ with advantage in COT; then he can be used to predict σ
with advantage /16 − ν(k) in the Naor-Pinkas protocol, by applying the map φ
to the elements x, y, z0 , z1 and predicting a coin flip if not all are less than 2k , and
otherwise using the sender’s prediction against the message that COT would send.
Likewise, any bit a chooser can predict about (m0 , m1 ) with advantage in COT,
can be predicted with advantage /4 in the Naor-Pinkas protocol: the Chooser’s
message can be transformed into elements of hγi by taking the components to the
power r, and the resulting message of the Naor-Pinkas sender can be transformed by
sampling from w00 = φ(w0 ), w10 = φ(w1 ) and predicting a coin flip if either is greater
than 2k , but otherwise giving the prediction of the COT chooser on w00 kf0 kf0 (K0 ) ⊕
m0 kw10 kf1 kf1 (K1 ) ⊕ m1 .
Conjoining these three lemmas gives the following theorem:
Theorem 7.10. Protocol COT covertly realizes the uniform-OT21 functionality
149
7.2.5 Combining The Pieces
We can combine the components developed up to this point to make a protocol

which covertly realizes any two-party functionality. The final protocol, which we call
covert-yao, is simple: assume that both parties know a circuit Cf computing the
functionality f . Bob first uses Yao’s protocol to create a garbled circuit for f (·, xB ).
Alice and Bob perform |xA | covert oblivious transfers for the garbled wire values
corresponding to Alice’s inputs. Bob sends the garbled gates to Alice. Finally, Alice
collects the garbled output values and sends them to Bob, who de-garbles these values
to obtain the output.
Theorem 7.11. The covert-yao protocol covertly realizes the functionality f .
Proof. That (Alice, Bob) securely realize the functionality f follows from the security
of Yao’s protocol. Now consider the distribution of each message sent from Alice to
Bob:
• In each execution of COT: each message sent by Alice is uniformly distributed
• Final values: these are masked by the uniformly chosen bits that Bob chose in
garbling the output gates. To an observer, they are uniformly distributed.
Thus Bob’s view, until the last round, is in fact identically distributed when Alice
is running the protocol and when she is drawing from U. Likewise, consider the
messages sent by Bob:
• In each execution of COT: because the Wib from Yao’s protocol are uniformly
distributed, Theorem 7.10 implies that Bob’s messages are indistinguishable
from uniform strings.
• When sending the garbled circuit, the pseudorandomness of F and the uniform
choice of the Wib imply that each garbled gate, even given one garbled input
pair, is indistinguishable from a random string.
Thus Alice’s view after all rounds of the protocol is indistinguishable from her view
when Bob draws from U.
150
If Bob can distinguish between Alice running the protocol and drawing from B
after the final round, then he can also be used to distinguish between f (XA , xB ) and
Ul . The approach is straightforward: given a candidate y, use the simulator from
Yao’s protocol to generate a view of the “data layer.” If y ← f (XA , xB ), then, by
the security of Yao’s protocol, this view is indistinguishable from Bob’s view when
Alice is running the covert protocol. If y ← Ul , then the simulated view of the final
step is distributed identically to Alice drawing from U. Thus Bob’s advantage will be
preserved, up to a negligible additive term.
Notice that as the protocol covert-yao is described, it is not secure against a

malicious Bob who gives Alice a garbled circuit with different operations in the gates,
which could actually output some constant message giving away Alice’s participation
even when the value f (x0 , x1 ) would not. If instead Bob sends Alice the masking
values for the garbled output bits, Bob could still prevent Alice from learning f (x0 , x1 )
but could not detect her participation in the protocol in this way. We use this version
of the protocol in the next section.
7.3 Fair Covert Two-party Computation Against

Malicious Adversaries
The protocol presented in the previous section has two serious weaknesses. First,
because Yao’s construction conceals the function of the circuit, a malicious Bob can
garble a circuit that computes some function other than the result Alice agreed to
compute. In particular, the new circuit could give away Alice’s input or output some
distinguished string that allows Bob to determine that Alice is running the protocol.
Additionally, the protocol is unfair: either Alice or Bob does not get the result.
In this section we present a protocol that avoids these problems. In particular,

our solution has the following properties: (1) If both parties follow the protocol, both
get the result; (2) If Bob cheats by garbling an incorrect circuit, neither party can tell
whether the other is running the protocol, except with negligible advantage; and (3)
Except with negligible probability, if one party terminates early and computes the
151
result in time T , the other party can compute the result in time at most O(T ). Our
protocol is secure in the random oracle model, under the Decisional Diffie Hellman
assumption. We show at the end of this section, however, that our protocol can be
made to satisfy a slightly weaker security condition without the use of a random
oracle. (We note that the technique used in this section has some similarities to one
that appears in [1].)
7.3.1 Definitions
We assume the existence of a non-interactive bitwise commitment scheme with com-

mitments which are indistinguishable from random bits. One example is the (well-
known) scheme which commits to b by CM T (b; (r, x)) = rkπ(x)k(x · r) ⊕ b, where π
is a one-way permutation on domain {0, 1}k , x · y denotes the inner-product of x and
y over GF (2), and x, r ← Uk . The integer DDH assumption implies the existence of
such permutations.
Let f denote the functionality we wish to compute. We say that f is fair if for
every distinguisher Dσ distinguishing f (X0 , X1 ) from U given Xσ with advantage at
least , there is a distinguisher D1−σ with advantage at least − ν(k), for a negligible
function ν. (That is, if P0 can distinguish f (X0 , X1 ) from uniform, so can P1 .) We
say f is strongly fair if (f (X0 , X1 ), X0 ) ≈ (f (X0 , X1 ), X1 ).
A n-round, two-party protocol Π = (P0 , P1 ) to compute functionality f is said

to be a strongly fair covert protocol for the bidirectional channel B if the following
conditions hold:
• (External covertness): For any input x̄, hP0 (x0 ), P1 (x1 )i ≈ B.
• (Strong Internal Covertness): There exists a PPT E (an extractor) such that
Pσ Pσ
if PPT D(V ) distinguishes between VΠ,i (x̄) and VΠ:B1−σ ,i
(x̄) with advantage ,
E D (VΠPσ (x̄)) computes f (x̄) with probability at least /poly(k)
• (Strong Fairness): If the functionality f is fair, then for any Cσ running in time
σ
T such that Pr[Cσ (VΠ,i (x̄)) = f (x̄)] ≥ , there exists a C1−σ running in time
1−σ
O(T ) such that Pr[C1−σ (VΠ,i (x̄)) = f (x̄)] = Ω().
152
• (Final Covertness): For every PPT D there exists a PPT D0 and a negligible ν
VΠPσ (X1−σ ,xσ ),VΠ:B
Pσ
(X1−σ ,xσ )
1−σ
such that for any xσ and distribution X1−σ , AdvD (k) ≤
f (X ,x ),U
AdvD0 1−σ σ l (k) + ν(k).
Intuitively, the Internal Covertness requirement states that “Alice can’t tell if Bob is
running the protocol until she gets the answer,” while Strong Fairness requires that
“Alice can’t get the answer unless Bob can.” Combined, these requirements imply
that neither party has an advantage over the other in predicting whether the other is
running the protocol.
7.3.2 Construction
As before, we have two parties, P0 (Alice) and P1 (Bob), with inputs x0 and x1 ,
respectively, and the function Alice and Bob wish to compute is f : {0, 1}l0 ×{0, 1}l1 →
{0, 1}l , presented by the circuit Cf . The protocol proceeds in three stages: COMMIT,
COMPUTE, and REVEAL. In the COMMIT stage, Alice picks k + 2 strings, r0 , and
s0 [0], . . . , s0 [k], each k bits in length. Alice computes commitments to these values,
using a bitwise commitment scheme which is indistinguishable from random bits, and
sends the commitments to Bob. Bob does likewise (picking strings r1 , s1 [0], . . . , s1 [k]).
The next two stages involve the use of a pseudorandom generator G : {0, 1}k →
{0, 1}l which we will model as a random oracle for the security argument only: G
itself must have an efficiently computable circuit. In the COMPUTE stage, Alice and
Bob compute two serial runs (“rounds”) of the covert Yao protocol described in the
previous section. If neither party cheats, then at the conclusion of the COMPUTE
stage, Alice knows f (x0 , x1 )⊕G(r1 ) and Bob’s value s1 [0]; while Bob knows f (x0 , x1 )⊕
G(r0 ) and Alice’s value s0 [0]. The REVEAL stage consists of k rounds of two runs
each of the covert Yao protocol. At the end of each round i, if nobody cheats, Alice
learns the ith bit of Bob’s string r1 , labeled r1 [i], and also Bob’s value s1 [i], and Bob
learns r0 [i], s0 [i]. After k rounds in which neither party cheats, Alice thus knows r1
and can compute f (x0 , x1 ) by computing the exclusive-or of G(r1 ) with the value she
learned in the COMPUTE stage, and Bob can likewise compute the result.
Each circuit sent by Alice must check that Bob has obeyed the protocol; thus at
153
every round of every stage, the circuit that Alice sends to Bob takes as input the
opening of all of Bob’s commitments, and checks to see that all of the bits Alice has
learned so far are consistent with Bob’s input. The difficulty to overcome with this
approach is that the result of the check cannot be returned to Alice without giving
away that Bob is running the protocol. To solve this problem, Alice’s circuits also take
as input the last value s0 [i − 1] that Bob learned. If Alice’s circuit ever finds that the
bits she has learned are inconsistent with Bob’s input, or that Bob’s input for s0 [i − 1]
is not consistent with the actual value of s0 [i − 1], the output is a uniformly chosen
string of the appropriate length. Once this happens, all future outputs to Bob will
also be independently and uniformly chosen, because he will have the wrong value for
s0 [i], which will give him the wrong value for s0 [i+1], etc. Thus the values s0 [1, . . . , k]
serve as “state” bits that Bob maintains for Alice. The analogous statements hold
for Bob’s circuits and Alice’s inputs.
Construction 7.12. (Fair covert two-party computation)
Inputs and setup. To begin, each party Pσ chooses k + 2 random strings rσ ,

sσ [0],. . . ,sσ [k] ← Uk . Pσ ’s inputs to the protocol are then Xσ = (xσ , rσ , sσ [0 . . . k]).
COMMIT stage. Each party Pσ computes the commitment κσ = CM T (Xσ ; ρσ )

and sends this commitment to the other party. Denote by Kσ the value that Pσ
interprets as a commitment to X1−σ , that is, K0 denotes the value Alice interprets as
a commitment to Bob’s input X1 .
COMPUTE stage. The COMPUTE stage consists of two serial runs of the covert-
yao protocol.
1. Bob garbles the circuit compute1 shown in figure 7.1, which takes x0 , r0 ,
s0 [0], . . . ,s0 [k], and ρ0 as input and outputs G(r1 ) ⊕ f (x0 , x1 )ks1 [0] if K1 is
a commitment to X0 . If this check fails, COMPUTE1 outputs a uniformly
chosen string, which has no information about f (x0 , x1 ) or s1 [0]. Bob and Alice
perform the covert-yao protocol; Alice labels her result F0 kS0 [0].
2. Alice garbles the circuit compute0 shown in figure 7.1, which takes x1 , r1 ,
154
computeσ (x1−σ , r, s[0 . . . k], ρ) = revealiσ (x1−σ , S1−σ [i−1], r, s1−σ [0 . . . k], ρ) =
if (Kσ = CM T (x1−σ , r, s; ρ)) Let F = G(r) ⊕ f (x0 , x1 )
then set F = G(rσ ) ⊕ f (x0 , x1 ) if (Kσ = CM T (x1−σ , r, s1−σ ; ρ) and
set S = sσ [0] else draw F ← Ul , F = Fσ and
draw S ← Uk . Rσ [i − 1] = r[i − 1] and
output F kS S1−σ [i − 1] = sσ [i − 1] and
Sσ [i − 1] = s1−σ [i − 1]) then
set R = rσ [i], S = sσ [i]
else draw R ← {0, 1}, S ← Uk
output RkS
Figure 7.1: The circuits compute and reveal.
s1 [0],. . . ,s1 [k], and ρ1 as input and outputs G(r0 ) ⊕ f (x0 , x1 )ks0 [0] if K0 is a
commitment to X1 . If this check fails, compute0 outputs a uniformly chosen
string, which has no information about f (x0 , x1 ) or s0 [0]. Bob and Alice perform
the covert-yao protocol; Bob labels his result F1 kS1 [0].
REVEAL stage. The REVEAL stage consists of k rounds, each of which consists
of 2 runs of the covert-yao protocol:
1. in round i, Bob garbles the circuit reveali1 shown in figure 7.1, which takes
input x0 , S0 [i − 1], r0 , s0 [0 . . . k], ρ0 and checks that:
• Bob’s result from the COMPUTE stage, F1 , is consistent with x0 , r0 .

• The bit R1 [i − 1] which Bob learned in round i − 1 is equal to bit i − 1
of Alice’s secret r0 . (By convention, and for notational uniformity, we will
define R0 [0] = R1 [0] = r0 [0] = r1 [0] = 0)
• The state S0 [i − 1] that Bob’s circuit gave Alice in the previous round was
correct. (Meaning Alice obeyed the protocol up to round i − 1)
• Finally, that the state S1 [i − 1] revealed to Bob in the previous round was
the state s0 [i − 1] which Alice committed to in the COMMIT stage.
155
If all of these checks succeed, Bob’s circuit outputs bit i of r1 and state s1 [i];
otherwise the circuit outputs a uniformly chosen k + 1-bit string. Alice and Bob
perform covert-yao and Alice labels the result R0 [i], S0 [i].
2. Alice garbles the circuit reveali0 depicted in figure 7.1 which performs the
analogous computations to reveali1 , and performs the covert-yao protocol
with Bob. Bob labels the result R1 [i], S1 [i].
After k such rounds, if Alice and Bob have been following the protocol, we have
R1 = r0 and R0 = r1 and both parties can compute the result. The “states” s are
what allow Alice and Bob to check that all previous outputs and key bits (bits of r0
and r1 ) sent by the other party have been correct, without ever receiving the results
of the checks or revealing that the checks fail or succeed.
Theorem 7.13. Construction 7.12 is a strongly fair covert protocol realizing the
functionality f
Proof. The correctness of the protocol follows by inspection. The two-party security
follows by the security of Yao’s protocol. Now suppose that some party, wlog Alice,
cheats (by sending a circuit which computes an incorrect result) in round j. Then, the
key bit R0 [j + 1] and state S0 [j + 1] Alice computes in round j + 1 will be randomized;
and with overwhelming probability every subsequent result that Alice computes will
be useless. Assuming Alice can distinguish f (x0 , X1 ) from uniform, she can still
compute the result in at most 2k−j time by exhaustive search over the remaining key
bits. By successively guessing the round at which Alice began to cheat, Bob can
compute the result in time at most 2k−j+2 . If Alice aborts at round j, Bob again
can compute the result in time at most 2k−j+1 . If Bob cheats in round j by giving
inconsistent inputs, with high probability all of his remaining outputs are randomized;
thus cheating in this way gives him no advantage over aborting in round j − 1. Thus,
the fairness property is satisfied.
If G is a random oracle, neither Alice nor Bob can distinguish anything in their
view from uniformly chosen bits without querying G at the random string chosen by
P0
the other. So given a distinguisher D running in time p(k) for VΠ,i (x̄) with advantage
, it is simple to write an extractor which runs D, recording its queries to G, picks
156
one such query (say, q) uniformly, and outputs G(q) ⊕ F0 . Since D can only have an
advantage when it queries r1 , E will pick q = r1 with probability at least 1/p(k) and
in this case correctly outputs f (x0 , x1 ). Thus the Strong Internal Covertness property
is satisfied.
Weakly fair covertness.
We can achieve a slightly weaker version of covertness without using random oracles.
Π is said to be a weakly fair covert protocol for the channel B if Π is externally covert,
and has the property that if f is strongly fair, then for every distinguisher Dσ for
Pσ P
VΠ,i (x̄) with significant advantage , there is a distinguisher D1−σ for VΠ,i1−σ (x̄) with
advantage Ω(). Thus in a weakly fair covert protocol, we do not guarantee that both
parties get the result, only that if at some point in the protocol, one party can tell
that the other is running the protocol with significant advantage, the same is true for
the other party.
We note that in the above protocols, if the function G is assumed to be a pseudo-

random generator (rather than a random oracle), then the resulting protocol exhibits
weakly fair covertness. Suppose Dσ has significant advantage after round j, as in the
hypothesis of weak covertness. Notice that given r1−σ [1 . . . j −1], G(r1−σ ) ⊕ f (x̄), the
remainder of Pσ ’s view can be simulated efficiently. Then Dσ must be a distinguisher
for G(r) given the first j − 1 bits of r. But since f is strongly fair, P1−σ can apply
Dσ to G(rσ ) ⊕ f (x̄) by guessing at most 1 bit of rσ and simulating Pσ ’s view with his
own inputs. Thus P1−σ has advantage at least /2 − ν(k) = Ω().
157
158
Chapter 8
Future Research Directions
While this thesis has resolved several of the open questions pertaining to univer-
sal steganography, there are still many interesting open questions about theoretical
steganography. In this section we highlight those that seem most important.
8.1 High-rate steganography
We have shown that for a universal blockwise stegosystem with bounded sample access
to a channel, the optimal rate is bounded above by both the minimum entropy of the
channel and the logarithm of the sample bound. Three general research directions
arise from this result. First, a natural question is what happens to this bound if
we remove the universality and blockwise constraints. A second natural direction to
pursue is the question of efficiently detecting the use of a stegosystem that exceeds
the maximum secure rate. A third interesting question to explore is the relationship
between extractors and stegosystems.
If we do not restrict ourselves to consider universal blockwise stegosystems, there

is some evidence to suggest that it is possible to achieve a much higher rate. For
instance, for the uniform channel U, the IND$-CPA encryption scheme in section 2
has rate which converges to 1. Likewise, a recent proposal by Van Le [41] describes
a stegosystem based on the “folklore” observation that perfect compression for a
channel yields secure steganography; the system described there is not universal, nor
159
is it secure in a blockwise model, but the rate approaches the Shannon entropy for
any efficiently sampleable channel with entropy bounded by the logarithm of the
security parameter k. Thus it is natural to wonder whether there is a reasonable
security model and a reasonable class of nonuniversally accessible stegosystems which
are provably secure under this model, yet have rate which substantially exceeds that
of the construction in Chapter 6.
We show that any blockwise stegosystem which exceeds the minimum entropy
can be detected by giving a detection algorithm which draws many samples from the
channel. It is an interesting question whether the number of samples required can be
reduced significantly for some channels. It is not hard to see that artificial channels
can be designed for which this is the case using, for instance, a trapdoor permutation
for which the warden knows the trapdoor. However, a more natural example would
be of interest.
The design methodology of (blockwise) transforming the uniform channel to an

arbitrary channel, as well as the minimum entropy upper bound on the rate of a
stegosystem suggest that there is a connection to extractors. An extractor is a func-
tion that transforms a sample from an arbitrary blockwise source of minimum entropy
m and a short random string into a string of roughly m bits that has distribution sta-
tistically close to uniform. (In fact a universal hash function is an extractor.) It would
be interesting to learn whether there is any deeper connection between stegosystems
and extractors, for instance, the decoding algorithm for a stegosystem (SE, SD) acts
as an extractor-like function for some distributions; in particular SDK (·) optimally
extracts entropy from the distribution SEK (U ). However, it is not immediately ob-
vious how to extend this to a general extractor.
8.2 Public Key Steganography
The necessary and sufficient conditions for the existence of a public-key stegosystem
constitute an open question. Certainly for a universal stegosystem the necessary and
sufficient condition is the existence of a trapdoor predicate family with domains that
are computationally indistinguishable from a polynomially dense set: as we showed in
160
Chapter 4, such primitives are sufficient for IND$-CPA public-key encryption; while
on the other hand, the existence of a universal public-key stegosystem implies the
existence of a public-key stegosystem for the uniform channel, which is by itself a
trapdoor predicate family with domains that are computationally indistinguishable
from a set of density 1. Unlike the case with symmetric steganography, however,
we are not aware of a reduction from a stegosystem for an arbitrary channel to a
dense-domain trapdoor predicate family.
In a similar manner, it is an open question whether steganographic key exchange

protocols can be constructed based on intractability assumptions other than the Deci-
sional Diffie-Hellman assumption. This is in contrast to cryptographic key exchange,
which is implied by the existence of any public-key encryption scheme or oblivious
transfer protocol. It is not clear whether the existence of IND$-CPA public-key en-
cryption implies the existence of SKE protocols.
8.3 Active attacks
Concerning steganography in the presence of active attacks, several questions remain

open. Some standard cryptographic questions remain about chosen-covertext secu-
rity, and substitution-robust steganography. A more important issue is a model of a
disrupting adversary which more closely models the type of attacks applied to existing
proposals in the literature for robust stegosystems.
There are several open cryptographic questions relating to chosen-covertext se-

curity. For example, it is not clear whether IND$-CCA-secure public-key encryption
schemes exist in the standard model (without random oracles). As we alluded to
in chapter 5, all of the known general constructions of chosen-ciphertext secure en-
cryption schemes are easily distinguished from random bits, and the known schemes
depending on specific intractability assumptions seem to depend on using testable
subgroups. Another interesting question is whether chosen covertext security can be
achieved with oracle-only access to the channel. The key problem here is in ensur-
ing that it is hard to find more than one valid encoding of a valid ciphertext; this
seems difficult to accomplish without repeatable access to the channel. To avoid this
161
problem, Backes and Cachin [7] have introduced the notion of Replayable Chosen
Covertext (RCCA) security, which is identical to sCCA security, with the exception
that the adversary is forbidden to submit covertexts which decode to the challenge
hiddentext. The problem with this approach is that the replay attack seems to be a
viable attack in the real world. Thus it is an interesting question to investigate the
possibility of notions “in-between” sCCA and RCCA.
Similar questions about substitution-robustness remain open. It is an interesting

problem to design a universal provably-secure substitution robust stegosystem that
requires only sender access to the channel. Also of interest is whether the require-
ment that Bob can evaluate an admissible superset of the relation R can be removed.
Intuitively, it seems that the ability to evaluate R is necessary for substitution ro-
bustness, because the decoding algorithm evaluates R to an extent: if R(x, y), then
it should be the case that SD(x) = SD(y) except with negligible probability. The
trouble with this intuition is first, that there is no requirement that decoding a single
document should return anything meaningful, and second, that while such an algo-
rithm evaluates a superset of R, it may not be admissible. In light of our proof that
no stegosystem can be secure against both distinguishing and disrupting adversaries,
it is also interesting to investigate the possibility of substitution robustness against
adversaries with access to a decoding oracle.
The most important open question concerning robust steganography is the mis-
match between substitution robustness and the types of attacks perpetrated against
typical proposals for robust steganography. Such attacks include strategies such as
splitting a single document into a series of smaller documents with the same mean-
ing, merging two or more documents into a single document with the same meaning,
and reordering documents in a list. Especially if there is no bound on the length of
sequences to which these operations can be applied, it seems difficult to even write a
general description of the rules such a warden must follow; and although it is reason-
ably straightforward to counteract any single attack in the previous list, composing
several of them with relation-bounded substitutions as well seems to lead to attacks
which are difficult to defend against.
162
8.4 Covert Computation
In the area of covert computation, this thesis leaves room for improvement and open
problems. For example, can (strongly) fair covert two-party computation secure
against malicious adversaries be satisfied without random oracles? It seems at least
plausible that constructions based on concrete assumptions such as the “knowledge-
of-exponent” assumption or the “generalized BBS” assumption may allow construc-
tion of such protocols, yet the obvious applications always destroy the final covertness
property. A related question is whether covert two-party computation can be based on
general cryptographic assumptions rather than the specific Decisional Diffie-Hellman
assumption used here.
Another open question is that of improving the efficiency of the protocols presented
here, either by designing protocols for specific goals or through adapting efficient
two-party protocols to provide covertness. A possible direction to pursue would be
“optimistic” fairness involving a trusted third party. In this case, though, there is the
question of how the third party could “complete” the computation without revealing
participation.
Another interesting question is whether the notion of covert two-party computa-

tion can be extended in some natural and implementable way to multiple parties.
Such a generalization could have important applications in the area of anonymous
communications, allowing, for instance, the deployment of undetectable anonymous
remailer networks. The difficulty here is in finding a sensible model - how can a
multiparty computation take place without knowing who the other parties are? If
the other parties are to be known, how can their participation be secret? What if
the normal communication patterns between parties is not the complete graph? In
addition to these difficulties, the issues associated with cheating players become more
complex, and there seems to be no good candidate protocol for the uniform channel.
163
8.5 Other models
The results of Chapter 3 show that the ability to sample from a channel in our model is
necessary for steganographic communication using that channel. Since in many cases
we do not understand the channel well enough to sample from it, a natural question
is whether there exist models where less knowledge of the distribution is necessary;
such a model will necessarily restrict the adversary’s knowledge of the channel as well.
One intuition is that typical steganographic adversaries are not monitoring the traffic
between a specific pair of individuals in an effort to confirm suspicious behavior, but
are monitoring a high-volume stream of traffic between many points looking for the
“most suspicious” behavior; so stegosystems which could be detected by analyzing
a long sequence of communications might go undetected if only single messages are
analyzed. This type of model is tantalizing because there are unconditionally secure
cryptosystems under various assumptions about adversaries with bounded storage
[18, 50], but it remains an interesting challenge to give a satisfying formal model and
provably secure construction for this scenario.
164
Bibliography
[1] G. Aggarwal, N. Mishra and B. Pinkas. Secure computation of the k’th-ranked

element To appear in Advances in Cryptology – Proceedings of Eurocrypt ’04,
2004.
[2] Luis von Ahn, Manuel Blum and John Langford. Telling Humans and Computers
Apart (Automatically) or How Lazy Cryptographers do AI.
[3] Luis von Ahn and Nicholas J. Hopper. Public-Key Steganography. Submitted to
crypto 2003.
[4] L. von Ahn and N. Hopper. Public-Key Steganography. To appear in Advances

in Cryptology – Proceedings of Eurocrypt ’04, 2004.
[5] Ross J. Anderson and Fabien A. P. Petitcolas. On The Limits of Steganography.

IEEE Journal of Selected Areas in Communications, 16(4). May 1998.
[6] Ross J. Anderson and Fabien A. P. Petitcolas. Stretching the Limits of Steganog-
raphy. In: Proceedings of the first International Information Hiding Workshop.
1996.
[7] M. Backes and C. Cachin. Public-Key Steganography with Active Attacks. IACR
e-print archive report 2003/231, 2003.
[8] M. Bellare, A. Desai, D. Pointcheval, and P. Rogaway. Relations Among Notions

of Security for Public-Key Encryption Schemes. In: Advances in Cryptology –
Proceedings of CRYPTO 98, pages 26–45, 1998.
[9] M. Bellare and P. Rogaway. Random Oracles are Practical. Computer and Com-
munications Security: Proceedings of ACM CCS 93, pages 62–73, 1993.
[10] M. Bellare and S. Micali. Non-interactive oblivious transfer and applications.

Advances in Cryptology – Proceedings of CRYPTO ’89, pages 547-557, 1990.
[11] E.R Berlekamp. Bounded Distance +1 Soft-Decision Reed-Solomon Decoding.

IEEE Transactions on Information Theory, 42(3), pages 704–720, 1996.
165
[12] J. Brassil, S. Low, N. F. Maxemchuk, and L. O’Gorman. Hiding Information in
Documents Images. In: Conference on Information Sciences and Systems, 1995.
[13] M. Blum and S. Goldwasser. An Efficient Probabilistic Public-Key Encryption

Scheme Which Hides All Partial Information. Advances in Cryptology: CRYPTO
84, Springer LNCS 196, pages 289-302. 1985.
[14] M. Blum and S. Micali. How to generate cryptographically strong sequences of

random bits. In: Proceedings of the 21st FOCS, pages 112–117, 1982.
[15] E. Brickell, D. Chaum, I. Damgärd, J. van de Graaf: Gradual and Verifiable

Release of a Secret. Advances in Cryptology – Proceedings of CRYPTO ’87, pages
156-166, 1987.
[16] C. Cachin. An Information-Theoretic Model for Steganography. In: Information

Hiding – Second International Workshop, Preproceedings. April 1998.
[17] C. Cachin. An Information-Theoretic Model for Steganography. In: Information

and Computation 192 (1): pages 41–56, July 2004.
[18] C. Cachin and U. Maurer. Unconditional Security Against Memory-Bounded

Adversaries. In: Advances in Cryptology – CRYPTO ’97, Springer LNCS 1294,
pp. 292–306, 1997.
[19] R. Canetti, U. Feige, O. Goldreich and M. Naor. Adaptively Secure Multi-party

Computation. 28th Symposium on Theory of Computing (STOC 96), pages 639-
648. 1996.
[20] R. Cramer and V. Shoup. A practical public-key cryptosystem provably secure

against adaptive chosen ciphertext attack. Advances in Cryptology: CRYPTO 98,
Springer LNCS 1462, pages 13-27, 1998.
[21] R. Cramer and V. Shoup. Universal Hash Proofs and a Paradigm for Adap-
tive Chosen Ciphertext Secure Public-Key Encryption. Advances in Cryptology:
EUROCRYPT 2002, Springer LNCS 2332, pages 45-64. 2002.
[22] S. Craver. On Public-Key Steganography in the Presence of an Active Warden.

In: Information Hiding – Second International Workshop, Preproceedings. April
1998.
[23] D. Dolev, C. Dwork, and M. Naor. Non-malleable Cryptography. 23rd Sympo-

sium on Theory of Computing (STOC ’91), pages 542-552. 1991.
[24] Z. Galil, S. Haber, M. Yung. Cryptographic Computation: Secure Fault-Tolerant

Protocols and the Public-Key Model. Advances in Cryptology – Proceedings of
CRYPTO ’87, pages 135-155, 1987.
166
[25] O. Goldreich. Foundations of Cryptography: Basic Tools. Cambridge University
Press, 2001.
[26] O. Goldreich. Secure Multi-Party Computation. Unpublished Manuscript.

http://philby.ucsd.edu/books.html, 1998.
[27] O. Goldreich, S. Goldwasser and S. Micali. How to construct pseudorandom

functions. Journal of the ACM, vol 33, 1998.
[28] O. Goldreich and L.A. Levin. A Hardcore predicate for all one-way functions.
In: Proceedings of 21st STOC, pages 25–32, 1989.
[29] O. Goldreich, S. Micali and A. Wigderson. How to Play any Mental Game.
Nineteenth Annual ACM Symposium on Theory of Computing, pages 218-229.
[30] S. Goldwasser and M. Bellare. Lecture Notes on Cryptography.

Unpublished manuscript, August 2001. available electronically at
http://www-cse.ucsd.edu/~mihir/pa pers/gb.html.
[31] S. Goldwasser and S. Micali. Probabilistic Encryption & how to play mental
poker keeping secret all partial information. In: Proceedings of the 14th STOC,
pages 365–377, 1982.
[32] D. Gruhl, W. Bender, and A. Lu. Echo Hiding. In: Information Hiding: First
International Workshop, pages 295–315, 1996.
[33] J. Hastad, R. Impagliazzo, L. Levin, and M. Luby. A pseudorandom generator

from any one-way function. SIAM Journal on Computing, 28(4), pages 1364-1396,
1999.
[34] N. Hopper, J. Langford and L. Von Ahn. Provably Secure Steganography. Ad-
vances in Cryptology – Proceedings of CRYPTO ’02, pages 77-92, 2002.
[35] Nicholas J. Hopper, John Langford, and Luis von Ahn. Provably Secure Steganog-
raphy. CMU Tech Report CMU-CS-TR-02-149, 2002.
[36] Russell Impagliazzo and Michael Luby. One-way Functions are Essential for
Complexity Based Cryptography. In: 30th FOCS, November 1989.
[37] G. Jagpal. Steganography in Digital Images Thesis, Cambridge University Com-

puter Laboratory, May 1995.
[38] D. Kahn. The Code Breakers. Macmillan 1967.
[39] J. Katz and M. Yung. Complete characterization of security notions for prob-
abilistic private-key encryption. In: Proceedings of 32nd STOC, pages 245–254,
1999.
167
[40] Stefan Katzenbeisser and Fabien A. P. Petitcolas. Information hiding techniques
for steganography and digital watermarking. Artech House Books, 1999.
[41] T. Van Le. Efficient Provably Secure Public Key Steganography IACR e-print
archive report 2003/156, 2003.
[42] Y. Lindell. A Simpler Construction of CCA2-Secure Public Key Encryption.

Advances in Cryptology: EUROCRYPT 2003, Springer LNCS 2656, pages 241-
254. 2003.
[43] K. Matsui and K. Tanaka. Video-steganography. In: IMA Intellectual Property

Project Proceedings, volume 1, pages 187–206, 1994.
[44] T. Mittelholzer. An Information-Theoretic Approach to Steganography and Wa-

termarking In: Information Hiding – Third International Workshop. 2000.
[45] M. Naor and B. Pinkas. Efficient Oblivious Transfer Protocols. In: Proceedings of
the 12th Annual ACM/SIAM Symposium on Discrete Algorithms (SODA 2001),
pages 448–457. 2001.
[46] M. Naor, B. Pinkas and R. Sumner. Privacy Preserving Auctions and Mechanism
Design. In: Proceedings, 1999 ACM Conference on Electronic Commerce.
[47] M. Naor and M. Yung. Universal One-Way Hash Functions and their Crypto-
graphic Applications. 21st Symposium on Theory of Computing (STOC 89), pages
33-43. 1989.
[48] M. Naor and M. Yung. Public-key cryptosystems provably secure against chosen
ciphertext attacks. 22nd Symposium on Theory of Computing (STOC 90), pages
427-437. 1990.
[49] C. Neubauer, J. Herre, and K. Brandenburg. Continuous Steganographic Data

Transmission Using Uncompressed Audio. In: Information Hiding: Second Inter-
national Workshop, pages 208–217, 1998.
[50] N. Nissan. Pseudorandom generators for space-bounded computation. Combi-

natorica 12(1992):449–461.
[51] B. Pinkas. Fair Secure Two-Party Computation. In: Advances in Cryptology –

Eurocrypt ’03, pp 87–105, 2003.
[52] C. Rackoff and D. Simon. Non-interactive Zero-Knowledge Proof of Knowledge

and Chosen Ciphertext Attack. Advances in Cryptology: CRYPTO 91, Springer
LNCS 576, pages 433-444, 1992.
[53] L. Reyzin and S. Russell. Simple Stateless Steganography. IACR e-print archive
report 2003/093, 2003.
168
[54] Phillip Rogaway, Mihir Bellare, John Black and Ted Krovetz. OCB: A Block-
Cipher Mode of Operation for Efficient Authenticated Encryption. In: Proceedings
of the Eight ACM Conference on Computer and Communications Security (CCS-
8). November 2001.
[55] J. Rompel. One-way functions are necessary and sufficient for secure signatures.
22nd Symposium on Theory of Computing (STOC 90), pages 387-394. 1990.
[56] A. Sahai. Non-Malleable Non-Interactive Zero Knowledge and Adaptive Chosen-

Ciphertext Security. 40th IEEE Symposium on Foundations of Computer Science
(FOCS 99), pages 543-553. 1999.
[57] J. A. O’Sullivan, P. Moulin, and J. M. Ettinger Information theoretic analysis

of Steganography. In: Proceedings ISIT ‘98. 1998.
[58] C.E. Shannon. Communication theory of secrecy systems. In: Bell System Tech-
nical Journal, 28 (1949), pages 656-715.
[59] G.J. Simmons. The Prisoner’s Problem and the Subliminal Channel. In: Pro-
ceedings of CRYPTO ’83. 1984.
[60] L. Welch and E.R. Berlekamp. Error correction of algebraic block codes. US
Patent Number 4,663,470, December 1986.
[61] A. Westfeld, G. Wolf. Steganography in a Video Conferencing System. In:

Information Hiding – Second International Workshop, Preproceedings. April 1998.
[62] J. Wolfowitz. Coding Theorems of Information Theory. Springer-Verlag, Berlin

and Prentice-Hall, NJ, 1978.
[63] A. C. Yao. Protocols for Secure Computation. Proceedings of the 23rd IEEE
Symposium on Foundations of Computer Science, 1982, pages 160–164.
[64] A. C. Yao. How to Generate and Exchange Secrets. Proceedings of the 27th IEEE
Symposium on Foundations of Computer Science, 1986, pages 162–167.
[65] A. Young and M. Yung. Kleptography: Using Cryptography against Cryptog-

raphy. Advances in Cryptology: Eurocrypt 87, Springer LNCS 1233, pages 62-74,
1987.
[66] J Zollner, H.Federrath, H.Klimant, A.Pftizmann, R. Piotraschke, A.Westfield,

G.Wicke, G.Wolf. Modeling the security of steganographic systems. In: Informa-
tion Hiding – Second International Workshop, Preproceedings. April 1998.
169

Toward A Theory of Steganography: Nicholas J. Hopper

Uploaded by

Copyright:

Available Formats

Toward A Theory of Steganography: Nicholas J. Hopper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Toward A Theory of Steganography: Nicholas J. Hopper

Uploaded by

Copyright:

Available Formats

Toward a theory of Steganography

School of Computer Science

Submitted in partial fulfillment of the requirements

Informally, steganography refers to the practice of hiding secret mes-

I profusely thank Manuel Blum for five years of constant support,

2 Model and Definitions 9

5 Security against Active Adversaries 65

6 Maximizing the Rate 109

7 Covert Computation 135

8 Future Research Directions 159

The recent interest in digital steganography is fueled by the increased amount

1.1 Cryptography and Provable Security

To address these limitations, researchers introduced a theory of security against

Proving that a system is secure in the computational sense has unfortunately

Subsequent to these breakthrough ideas [13, 31], cryptographers have investigated

These notions will be explored in further detail in Chapter 2.

All information-theoretic formulations of steganography are severely limited, how-

1.3 Contributions of the thesis

The primary contribution of this thesis is a rigorous, cryptographic theory of steganog-

Symmetric Key Steganography.

Steganography with active adversaries

We consider the security of a stegosystem against an adversary who actively attempts

Bounds on steganographic rate

The rate of a stegosystem is defined by the (expected) ratio of hiddentext size to

We introduce the novel concept of covert two-party computation. Whereas ordinary

A steganographic design methodology

Model and Definitions

We will model all parties by Probabilistic Turing Machines (PTMs). A PTM is a

2.2 Cryptography and Provable Security

2.2.1 Computational Indistinguishability

If ∆(X, Y ) is small, it will be difficult to distinguish between X and Y , because

Definition 2.2. (Advantage) We will denote the advantage of a program A in dis-

Definition 2.3. (Insecurity) We denote the insecurity of X, Y by

Proposition 2.4. Let ∆(X, Y ) = . Then for any probabilistic program A,

Proposition 2.5. For any t, InSecX,Y (t, k) ≤ ∆(X, Y )

= |(1 − Pr[A(X) = 0]) − (1 − Pr[A(Y ) = 0])|

And thus, by the previous proposition, AdvX,Y

every A, we then have that

Proposition 2.6. For any m ∈ N, InSecX m ,Y m (t, k) ≤ mInSecX,Y (t + (m − 1)T, k),

Notice that starting from the definition of advantage, we have:

= | Pr[A(Zi−1 ) = 1] − Pr[A(Zi ) = 1]|

And therefore we can bound A’s advantage in distinguishing X m ,Y m by

so we can conclude that

Since the theorem holds for any A ∈ T IM E(t), we have that

Pr [H(Z, x1 ) = y1 ∧ H(Z, x2 ) = y2 ] = 2−2n .

∆((Z, H(Z, X)), (Z, Un )) ≤ 2−(k−n)/2+1

As a convention, we will sometimes refer to H as a family of functions and identify

2.2.3 Pseudorandom Generators

And the PRG-Insecurity of G by

Then Gk is a (t, )-secure PRG if InSecprg

A ∈ T IM E(poly(k)), there is a negligible µ such that Advprg

Pseudorandom generators can be seen as the basic primitive on which symmetric

2.2.4 Pseudorandom Functions

Define the insecurity of F as

Proof. Consider an arbitrary PRG adversary A. We will construct a PRF adversary

Pr[B f (1k ) = 1] = Pr[A(Uk+1 ) = 1] .

Combining the cases gives us

and thus, for every A, we have

which yields the stated theorem.

Proposition 2.4. Let ∆(X, Y ) = . Then for any probabilistic program A,

Then Gk is a (t, )-secure PRG if InSecprg