0% found this document useful (0 votes)

47 views

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

This document proposes a method for unpaired image-to-image translation using cycle-consistent adversarial networks. The method learns to translate images from a source domain to a target domain without using paired training examples. It introduces a cycle consistency loss to enforce that translating an image from one domain to the other and back results in the original image, addressing issues with prior methods that rely solely on adversarial losses. Qualitative results demonstrate the approach on tasks like artistic style transfer without requiring paired image datasets.

Uploaded by

Bobbi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

Uploaded by

Bobbi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Unpaired Image-to-Image Translation

using Cycle-Consistent Adversarial Networks

Jun-Yan Zhu∗ Taesung Park∗ Phillip Isola Alexei A. Efros

Berkeley AI Research (BAIR) laboratory, UC Berkeley

Monet Photos Zebras Horses Summer Winter

arXiv:1703.10593v4 [cs.CV] 19 Feb 2018

Monet photo zebra horse summer winter

photo Monet horse zebra winter summer

Photograph Monet Van Gogh Cezanne Ukiyo-e

Figure 1: Given any two unordered image collections X and Y , our algorithm learns to automatically “translate” an image
from one into the other and vice versa: (left) Monet paintings and landscape photos from Flickr; (center) zebras and horses
from ImageNet; (right) summer and winter Yosemite photos from Flickr. Example application (bottom): using a collection
of paintings of famous artists, our method learns to render natural photographs into the respective styles.

Abstract 1. Introduction
Image-to-image translation is a class of vision and
What did Claude Monet see as he placed his easel by the
graphics problems where the goal is to learn the mapping
bank of the Seine near Argenteuil on a lovely spring day
between an input image and an output image using a train-
in 1873 (Figure 1, top-left)? A color photograph, had it
ing set of aligned image pairs. However, for many tasks,
been invented, may have documented a crisp blue sky and
paired training data will not be available. We present an
a glassy river reflecting it. Monet conveyed his impression
approach for learning to translate an image from a source
of this same scene through wispy brush strokes and a bright
domain X to a target domain Y in the absence of paired
palette.
examples. Our goal is to learn a mapping G : X → Y
such that the distribution of images from G(X) is indistin- What if Monet had happened upon the little harbor in
guishable from the distribution Y using an adversarial loss. Cassis on a cool summer evening (Figure 1, bottom-left)?
Because this mapping is highly under-constrained, we cou- A brief stroll through a gallery of Monet paintings makes it
ple it with an inverse mapping F : Y → X and introduce a possible to imagine how he would have rendered the scene:
cycle consistency loss to enforce F (G(X)) ≈ X (and vice perhaps in pastel shades, with abrupt dabs of paint, and a
versa). Qualitative results are presented on several tasks somewhat flattened dynamic range.
where paired training data does not exist, including collec- We can imagine all this despite never having seen a side
tion style transfer, object transfiguration, season transfer, by side example of a Monet painting next to a photo of the
photo enhancement, etc. Quantitative comparisons against scene he painted. Instead we have knowledge of the set of
several prior methods demonstrate the superiority of our Monet paintings and of the set of landscape photographs.
approach. We can reason about the stylistic differences between these
* indicates equal contribution

1
n xi
Paired
yi o ( )( )X
Unpaired
Y
x ∈ X, is indistinguishable from images y ∈ Y by an ad-
versary trained to classify ŷ apart from y. In theory, this ob-
jective can induce an output distribution over ŷ that matches

n
, o
the empirical distribution pdata (y) (in general, this requires
G to be stochastic) [15]. The optimal G thereby translates
the domain X to a domain Ŷ distributed identically to Y .
, , However, such a translation does not guarantee that an in-
n o dividual input x and output y are paired up in a meaningful
way – there are infinitely many mappings G that will in-
, duce the same distribution over ŷ. Moreover, in practice,
we have found it difficult to optimize the adversarial objec-
…

…
tive in isolation: standard procedures often lead to the well-
Figure 2: Paired training data (left) consists of training ex- known problem of mode collapse, where all input images
amples {xi , yi }N
i=1 , where the correspondence between xi map to the same output image and the optimization fails to
and yi exists [21]. We instead consider unpaired training make progress [14].
data (right), consisting of a source set {xi }Ni=1 (xi ∈ X) These issues call for adding more structure to our ob-
and a target set {yj }j=1 (yj ∈ Y ), with no information pro- jective. Therefore, we exploit the property that translation
vided as to which xi matches which yj . should be “cycle consistent”, in the sense that if we trans-
late, e.g., a sentence from English to French, and then trans-
two sets, and thereby imagine what a scene might look like
late it back from French to English, we should arrive back
if we were to “translate” it from one set into the other.
at the original sentence [3]. Mathematically, if we have a
In this paper, we present a method that can learn to do the
translator G : X → Y and another translator F : Y → X,
same: capturing special characteristics of one image col-
then G and F should be inverses of each other, and both
lection and figuring out how these characteristics could be
mappings should be bijections. We apply this structural as-
translated into the other image collection, all in the absence
sumption by training both the mapping G and F simultane-
of any paired training examples.
ously, and adding a cycle consistency loss [64] that encour-
This problem can be more broadly described as image- ages F (G(x)) ≈ x and G(F (y)) ≈ y. Combining this loss
to-image translation [21], converting an image from one with adversarial losses on domains X and Y yields our full
representation of a given scene, x, to another, y, e.g., objective for unpaired image-to-image translation.
grayscale to color, image to semantic labels, edge-map to
We apply our method to a wide range of applications,
photograph. Years of research in computer vision, image
including collection style transfer, object transfiguration,
processing, computational photography, and graphics have
season transfer and photo enhancement. We also compare
produced powerful translation systems in the supervised
against previous approaches that rely either on hand-defined
setting, where example image pairs {x, y} are available
factorizations of style and content, or on shared embed-
(Figure 2, left), e.g., [10, 18, 21, 22, 26, 31, 44, 55, 57, 61].
ding functions, and show that our method outperforms these
However, obtaining paired training data can be difficult and
baselines. Our code is available at https://github.
expensive. For example, only a couple of datasets exist for
com/junyanz/CycleGAN. Check out more results at
tasks like semantic segmentation (e.g., [4]), and they are
https://junyanz.github.io/CycleGAN/.
relatively small. Obtaining input-output pairs for graphics
tasks like artistic stylization can be even more difficult since
2. Related work
the desired output is highly complex, typically requiring
artistic authoring. For many tasks, like object transfigura- Generative Adversarial Networks (GANs) [15, 62]
tion (e.g., zebra↔horse, Figure 1 top-middle), the desired have achieved impressive results in image generation [5,
output is not even well-defined. 37], image editing [65], and representation learning [37, 42,
We therefore seek an algorithm that can learn to trans- 35]. Recent methods adopt the same idea for conditional
late between domains without paired input-output examples image generation applications, such as text2image [39], im-
(Figure 2, right). We assume there is some underlying rela- age inpainting [36], and future prediction [34], as well as to
tionship between the domains – for example, that they are other domains like videos [53] and 3D data [56]. The key to
two different renderings of the same underlying scene – and GANs’ success is the idea of an adversarial loss that forces
seek to learn that relationship. Although we lack supervi- the generated images to be, in principle, indistinguishable
sion in the form of paired examples, we can exploit super- from real images. This is particularly powerful for image
vision at the level of sets: we are given one set of images in generation tasks, as this is exactly the objective that much
domain X and a different set in domain Y . We may train of computer graphics aims to optimize. We adopt an ad-
a mapping G : X → Y such that the output ŷ = G(x), versarial loss to learn the mapping such that the translated
DY DX
G G
DX DY x Ŷ x̂ y X̂ ŷ
G F F
X( Y X Y
X Y ( cycle-consistency
loss
cycle-consistency
loss
F
(a) (b) (c)

Figure 3: (a) Our model contains two mapping functions G : X → Y and F : Y → X, and associated adversarial
discriminators DY and DX . DY encourages G to translate X into outputs indistinguishable from domain Y , and vice versa
for DX and F . To further regularize the mappings, we introduce two cycle consistency losses that capture the intuition that if
we translate from one domain to the other and back again we should arrive at where we started: (b) forward cycle-consistency
loss: x → G(x) → F (G(x)) ≈ x, and (c) backward cycle-consistency loss: y → F (y) → G(F (y)) ≈ y
image cannot be distinguished from images in the target do- tween the input and output, nor do we assume that the input
main. and output have to lie in the same low-dimensional embed-
Image-to-Image Translation The idea of image-to- ding space. This makes our method a general-purpose solu-
image translation goes back at least to Hertzmann et al.’s tion for many vision and graphics tasks. We directly com-
Image Analogies [18], who employ a non-parametric tex- pare against several prior and contemporary approaches in
ture model [9] on a single input-output training image pair. Section 5.1.
More recent approaches use a dataset of input-output exam- Cycle Consistency The idea of using transitivity as a
ples to learn a parametric translation function using CNNs, way to regularize structured data has a long history. In
e.g. [31]. Our approach builds on the “pix2pix” framework visual tracking, enforcing simple forward-backward con-
of Isola et al. [21], which uses a conditional generative ad- sistency has been a standard trick for decades [47]. In
versarial network [15] to learn a mapping from input to out- the language domain, verifying and improving translations
put images. Similar ideas have been applied to various tasks via “back translation and reconsiliation” is a technique
such as generating photographs from sketches [43] or from used by human translators [3] (including, humorously, by
attribute and semantic layouts [23]. However, unlike these Mark Twain [50]), as well as by machines [16]. More
prior works, we learn the mapping without paired training recently, higher-order cycle consistency has been used in
examples. structure from motion [60], 3D shape matching [20], co-
Unpaired Image-to-Image Translation Several other segmentation [54], dense semantic alignment [63, 64], and
methods also tackle the unpaired setting, where the goal is depth estimation [13]. Of these, Zhou et al. [64] and Go-
to relate two data domains, X and Y . Rosales et al. [40] dard et al. [13] are most similar to our work, as they use a
propose a Bayesian framework that includes a prior based cycle consistency loss as a way of using transitivity to su-
on a patch-based Markov random field computed from a pervise CNN training. In this work, we are introducing a
source image, and a likelihood term obtained from multi- similar loss to push G and F to be consistent with each
ple style images. More recently, CoGAN [30] and cross- other. Concurrent with our work, in these same proceed-
modal scene networks [1] use a weight-sharing strategy to ings, Yi et al. [58] independently use a similar objective
learn a common representation across domains. Concurrent for unpaired image-to-image translation, inspired by dual
to our method, Liu et al. [29] extends this framework with learning in machine translation [16].
a combination of variational autoencoders [25] and gen- Neural Style Transfer [12, 22, 51, 11] is another way
erative adversarial networks. Another line of concurrent to perform image-to-image translation, which synthesizes a
work [45, 48, 2] encourages the input and output to share novel image by combining the content of one image with
certain “content” features even though they may differ in the style of another image (typically a painting) based on
“style“. They also use adversarial networks, with additional matching the Gram matrix statistics of pre-trained deep fea-
terms to enforce the output to be close to the input in a pre- tures. Our main focus, on the other hand, is learning the
defined metric space, such as class label space [2], image mapping between two image collections, rather than be-
pixel space [45], and image feature space [48]. tween two specific images, by trying to capture correspon-
Unlike the above approaches, our formulation does not dences between higher-level appearance structures. There-
rely on any task-specific, predefined similarity function be- fore, our method can be applied to other tasks, such as
painting→ photo, object transfiguration, etc. where single Input 𝑥 Generated image 𝐺(𝑥) Reconstruction F(𝐺 𝑥 )
sample transfer methods do not perform well. We compare
these two methods in Section 5.2.

3. Formulation
Our goal is to learn mapping functions between two
domains X and Y given training samples {xi }N i=1 where
xi ∈ X and {yj }M 1
j=1 where yj ∈ Y . We denote the data
distribution as x ∼ pdata (x) and y ∼ pdata (y). As illus-
trated in Figure 3 (a), our model includes two mappings
G : X → Y and F : Y → X. In addition, we in-
troduce two adversarial discriminators DX and DY , where
DX aims to distinguish between images {x} and translated
images {F (y)}; in the same way, DY aims to discriminate
between {y} and {G(x)}. Our objective contains two types
of terms: adversarial losses [15] for matching the distribu-
tion of generated images to the data distribution in the target
domain; and cycle consistency losses to prevent the learned
mappings G and F from contradicting each other.

3.1. Adversarial Loss

We apply adversarial losses [15] to both mapping func-
tions. For the mapping function G : X → Y and its dis- Figure 4: The generated images G(x) and the re-
criminator DY , we express the objective as: constructed images F (G(x)) from various experiments.
From top to bottom: photo↔Cezanne, horses↔zebras,
LGAN (G, DY , X, Y ) =Ey∼pdata (y) [log DY (y)] winter→summer Yosemite, aerial photos↔maps on Google
Maps.
+Ex∼pdata (x) [log(1 − DY (G(x))],
(1)
functions should be cycle-consistent: as shown in Figure 3
where G tries to generate images G(x) that look similar to (b), for each image x from domain X, the image translation
images from domain Y , while DY aims to distinguish be- cycle should be able to bring x back to the original image,
tween translated samples G(x) and real samples y. G aims i.e. x → G(x) → F (G(x)) ≈ x. We call this forward cy-
to minimize this objective against an adversary D that tries cle consistency. Similarly, as illustrated in Figure 3 (c), for
to maximize it, i.e. minG maxDY LGAN (G, DY , X, Y ). We each image y from domain Y , G and F should also satisfy
introduce a similar adversarial loss for the mapping func- backward cycle consistency: y → F (y) → G(F (y)) ≈ y.
tion F : Y → X and its discriminator DX as well: i.e. We can incentivize this behavior using a cycle consistency
minF maxDX LGAN (F, DX , Y, X). loss:
3.2. Cycle Consistency Loss
Lcyc (G, F ) =Ex∼pdata (x) [kF (G(x)) − xk1 ]
Adversarial training can, in theory, learn mappings G
and F that produce outputs identically distributed as target +Ey∼pdata (y) [kG(F (y)) − yk1 ]. (2)
domains Y and X respectively (strictly speaking, this re-
quires G and F to be stochastic functions) [14]. However, In preliminary experiments, we also tried replacing the L1
with large enough capacity, a network can map the same norm in this loss with an adversarial loss between F (G(x))
set of input images to any random permutation of images in and x, and between G(F (y)) and y, but did not observe
the target domain, where any of the learned mappings can improved performance.
induce an output distribution that matches the target dis-
tribution. Thus, adversarial losses alone cannot guarantee The behavior induced by the cycle consistency loss can
that the learned function can map an individual input xi to be observed in Figure 4: the reconstructed images F (G(x))
a desired output yi . To further reduce the space of possi- end up matching closely to the input images x.
ble mapping functions, we argue that the learned mapping
1 We often omit the subscript i and j for simplicity.
3.3. Full Objective more stable during training and generates higher quality
results. In particular, for a GAN loss LGAN (G, D, X, Y ),
Our full objective is:
we train the G to minimize Ex∼pdata (x) [(D(G(x)) − 1)2 ]
L(G, F, DX , DY ) =LGAN (G, DY , X, Y ) and train the D to minimize Ey∼pdata (y) [(D(y) − 1)2 ] +
+ LGAN (F, DX , Y, X) Ex∼pdata (x) [D(G(x))2 ].
Second, to reduce model oscillation [14], we follow
+ λLcyc (G, F ), (3) Shrivastava et al’s strategy [45] and update the discrimina-
where λ controls the relative importance of the two objec- tors using a history of generated images rather than the ones
tives. We aim to solve: produced by the latest generative networks. We keep an im-
age buffer that stores the 50 previously generated images.
G∗ , F ∗ = arg min max L(G, F, DX , DY ). (4) For all the experiments, we set λ = 10 in Equation 3.
G,F Dx ,DY
We use the Adam solver [24] with a batch size of 1. All
Notice that our model can be viewed as training two “au- networks were trained from scratch with a learning rate of
toencoders” [19]: we learn one autoencoder F ◦ G : X → 0.0002. We keep the same learning rate for the first 100
X jointly with another G ◦ F : Y → Y . However, these au- epochs and linearly decay the rate to zero over the next 100
toencoders each have special internal structure: they map epochs. Please see the appendix (Section 7) for more details
an image to itself via an intermediate representation that about the datasets, architectures, and training procedures.
is a translation of the image into another domain. Such a
setup can also be seen as a special case of “adversarial au- 5. Results
toencoders” [32], which use an adversarial loss to train the
bottleneck layer of an autoencoder to match an arbitrary tar- We first compare our approach against recent methods
get distribution. In our case, the target distribution for the for unpaired image-to-image translation on paired datasets
X → X autoencoder is that of domain Y . where ground truth input-output pairs are available for eval-
In Section 5.1.4, we compare our method against ab- uation. We then study the importance of both the adversar-
lations of the full objective, including the adversarial loss ial loss and the cycle consistency loss, and compare our full
LGAN alone and the cycle consistency loss Lcyc alone, and method against several variants. Finally, we demonstrate
empirically show that both objectives play critical roles the generality of our algorithm on a wide range of applica-
in arriving at high-quality results. We also evaluate our tions where paired data does not exist. For brevity, we re-
method with only cycle loss in one direction, and show that fer to our method as CycleGAN. The code and full results
a single cycle is not sufficient to regularize the training for can be viewed at https://github.com/junyanz/
this under-constrained problem. CycleGAN.

4. Implementation
5.1. Evaluation
Network Architecture We adapt the architecture for our
generative networks from Johnson et al. [22] who have Using the same evaluation datasets and metrics as
shown impressive results for neural style transfer and super- “pix2pix” [21], we compare our method against several
resolution. This network contains two stride-2 convolu- baselines both qualitatively and quantitatively. The tasks in-
tions, several residual blocks [17], and two fractionally- clude semantic labels↔photo on the Cityscapes dataset [4],
strided convolutions with stride 12 . We use 6 blocks for and map↔aerial photo on data scraped from Google Maps.
128 × 128 images, and 9 blocks for 256 × 256 and higher- We also perform ablation study on the full loss function.
resolution training images. Similar to Johnson et al. [22],
we use instance normalization [52]. For the discriminator 5.1.1 Metrics
networks we use 70 × 70 PatchGANs [21, 28, 27], which AMT perceptual studies On the map↔aerial photo
aim to classify whether 70 × 70 overlapping image patches task, we run “real vs fake” perceptual studies on Amazon
are real or fake. Such a patch-level discriminator architec- Mechanical Turk (AMT) to assess the realism of our out-
ture has fewer parameters than a full-image discriminator, puts. We follow the same perceptual study protocol from
and can be applied to arbitrarily-sized images in a fully con- Isola et al. [21], except we only gather data from 25 partici-
volutional fashion [21]. pants per algorithm we tested. Participants were shown a se-
quence of pairs of images, one a real photo or map and one
Training details We apply two techniques from recent fake (generated by our algorithm or a baseline), and asked
works to stabilize our model training procedure. First, for to click on the image they thought was real. The first 10
LGAN (Equation 1), we replace the negative log likeli- trials of each session were practice and feedback was given
hood objective by a least-squares loss [33]. This loss is as to whether the participant’s response was correct or in-
Figure 5: Different methods for mapping labels↔photos trained on Cityscapes images. From left to right: input, Bi-
GAN/ALI [6, 8], CoGAN [30], feature loss + GAN, SimGAN [45], CycleGAN (ours), pix2pix [21] trained on paired data,
and ground truth.

Figure 6: Different methods for mapping aerial photos↔maps on Google Maps. From left to right: input, BiGAN/ALI [6, 8],
CoGAN [30], feature loss + GAN, SimGAN [45], CycleGAN (ours), pix2pix [21] trained on paired data, and ground truth.

correct. The remaining 40 trials were used to assess the rate automatic quantitative measure that does not require human
at which each algorithm fooled participants. Each session experiments. For this we adopt the “FCN score” from [21],
only tested a single algorithm, and participants were only and use it to evaluate the Cityscapes labels→photo task.
allowed to complete a single session. Note that the numbers The FCN metric evaluates how interpretable the generated
we report here are not directly comparable to those in [21] photos are according to an off-the-shelf semantic segmen-
as our ground truth images were processed slightly differ- tation algorithm (the fully-convolutional network, FCN,
ently 2 and the participant pool we tested may be differently from [31]). The FCN predicts a label map for a generated
distributed from those tested in [21] (due to running the ex- photo. This label map can then be compared against the
periment at a different date and time). Therefore, our num- input ground truth labels using standard semantic segmen-
bers should only be used to compare our current method tation metrics described below. The intuition is that if we
against the baselines (which were run under identical con- generate a photo from a label map of “car on road”, then we
ditions), rather than against [21]. have succeeded if the FCN applied to the generated photo
FCN score Although perceptual studies may be the gold detects “car on road”.
standard for assessing graphical realism, we also seek an
Semantic segmentation metrics To evaluate the perfor-
2 We train all the models on 256 × 256 images while in pix2pix [21], mance of photo→labels, we use the standard metrics from
the model was trained on 256 × 256 patches of 512 × 512 images, and
run convolutionally on the 512 × 512 images at test time. We choose
the Cityscapes benchmark, including per-pixel accuracy,
256 × 256 in our experiments as many baselines cannot scale up to high per-class accuracy, and mean class Intersection-Over-Union
resolution images, and CoGAN cannot be tested fully convolutionally. (Class IOU) [4].
Map → Photo Photo → Map
Loss % Turkers labeled real % Turkers labeled real Loss Per-pixel acc. Per-class acc. Class IOU
CoGAN [30] 0.6% ± 0.5% 0.9% ± 0.5% CoGAN [30] 0.45 0.11 0.08
BiGAN/ALI [8, 6] 2.1% ± 1.0% 1.9% ± 0.9% BiGAN/ALI [8, 6] 0.41 0.13 0.07
SimGAN [45] 0.7% ± 0.5% 2.6% ± 1.1% SimGAN [45] 0.47 0.11 0.07
Feature loss + GAN 1.2% ± 0.6% 0.3% ± 0.2% Feature loss + GAN 0.50 0.10 0.06
CycleGAN (ours) 26.8% ± 2.8% 23.2% ± 3.4% CycleGAN (ours) 0.58 0.22 0.16
pix2pix [21] 0.85 0.40 0.32
Table 1: AMT “real vs fake” test on maps↔aerial photos at
256 × 256 resolution. Table 3: Classification performance of photo→labels for
different methods on cityscapes.
Loss Per-pixel acc. Per-class acc. Class IOU
CoGAN [30] 0.40 0.10 0.06
Loss Per-pixel acc. Per-class acc. Class IOU
BiGAN/ALI [8, 6] 0.19 0.06 0.02
Cycle alone 0.22 0.07 0.02
SimGAN [45] 0.20 0.10 0.04
GAN alone 0.51 0.11 0.08
Feature loss + GAN 0.06 0.04 0.01
GAN + forward cycle 0.55 0.18 0.12
CycleGAN (ours) 0.52 0.17 0.11
GAN + backward cycle 0.39 0.14 0.06
pix2pix [21] 0.71 0.25 0.18
CycleGAN (ours) 0.52 0.17 0.11
Table 2: FCN-scores for different methods, evaluated on Table 4: Ablation study: FCN-scores for different variants
Cityscapes labels→photo. of our method, evaluated on Cityscapes labels→photo.
5.1.2 Baselines Loss Per-pixel acc. Per-class acc. Class IOU
Cycle alone 0.10 0.05 0.02
CoGAN [30] This method learns one GAN generator for GAN alone 0.53 0.11 0.07
GAN + forward cycle 0.49 0.11 0.07
domain X and one for domain Y , with tied weights on the GAN + backward cycle 0.01 0.06 0.01
first few layers for shared latent representation. Translation CycleGAN (ours) 0.58 0.22 0.16
from X to Y can be achieved by finding a latent represen- Table 5: Ablation study: classification performance of
tation that generates image X and then rendering this latent photo→labels for different losses, evaluated on Cityscapes.
representation into style Y .
SimGAN [45] Like our method, Shrivastava et al.[45] 5.1.3 Comparison against baselines
uses an adversarial loss to train a translation from X to Y . As can be seen in Figure 5 and Figure 6, we were un-
The regularization term kX −G(X)k1 was used to penalize able to achieve compelling results with any of the baselines.
making large changes at pixel level. Our method, on the other hand, is able to produce transla-
Feature loss + GAN We also test a variant of Sim- tions that are often of similar quality to the fully supervised
GAN [45] where the L1 loss is computed over deep pix2pix.
image features using a pretrained network (VGG-16 Table 1 reports performance regarding the AMT per-
relu4 2 [46]), rather than over RGB pixel values. Com- ceptual realism task. Here, we see that our method can
puting distances in deep feature space, like this, is also fool participants on around a quarter of trials, in both the
sometimes referred to as using a “perceptual loss” [7, 22]. maps→aerial photos direction and the aerial photos→maps
BiGAN/ALI [8, 6] Unconditional GANs [15] learn a direction at 256 × 256 resolution4 .
generator G : Z → X, that maps random noise Z to im- All baselines almost never fooled participants.
ages X. The BiGAN [8] and ALI [6] propose to also learn Table 2 assesses the performance of the labels→photo
the inverse mapping function F : X → Z. Though they task on the Cityscapes and Table 3 assesses the opposite
were originally designed for mapping a latent vector z to an mapping (photos→labels). In both cases, our method again
image x, we implemented the same objective for mapping a outperforms the baselines.
source image x to a target image y.
pix2pix [21] We also compare against pix2pix [21], 5.1.4 Analysis of the loss function
which is trained on paired data, to see how close we can
get to this “upper bound” without using any paired training In Table 4 and Table 5, we compare against ablations
data. of our full loss. Removing the GAN loss substantially
For a fair comparison, we implement all the baselines degrades results, as does removing the cycle-consistency
using the same architecture and details as our method, ex- loss. We therefore conclude that both terms are critical
cept for CoGAN [30]. CoGAN builds on generators that to our results. We also evaluate our method with the cy-
produce images from a shared latent representation, which cle loss in only one direction: GAN+forward cycle loss
is incompatible with our image-to-image network. We use 4 We also train CycleGAN and pix2pix at 512 × 512 resolution, and
the public implementation of CoGAN instead 3 . observe the comparable performance: maps→aerial photos: CycleGAN:
37.5% ± 3.6% and pix2pix: 33.9% ± 3.1%; aerial photos→maps: Cy-
3 https://github.com/mingyuliutw/CoGAN cleGAN: 16.5% ± 4.1% and pix2pix: 8.5% ± 2.6%
Input Cycle alone GAN alone GAN+forward GAN+backward CycleGAN (ours) Ground truth

Figure 7: Different variants of our method for mapping labels↔photos trained on cityscapes. From left to right: input, cycle-
consistency loss alone, adversarial loss alone, GAN + forward cycle-consistency loss (F (G(x)) ≈ x), GAN + backward
cycle-consistency loss (G(F (y)) ≈ y), CycleGAN (our full method), and ground truth. Both Cycle alone and GAN +
backward fail to produce images similar to the target domain. GAN alone and GAN + forward suffer from mode collapse,
producing identical label maps regardless of the input photo.
Input Output Input Output Input Output
5.1.6 Additional results on paired datasets

Figure 8 shows some example results on other paired

datasets used in “pix2pix” [21], such as architectural
label → facade
labels↔photos from the CMP Facade Database [38], and
edges↔shoes from the UT Zappos50K dataset [59]. The
image quality of our results is close to those produced by
facade → label the fully supervised pix2pix while our method learns the
mapping without paired supervision.

5.2. Applications
edges → shoes
We demonstrate our method on several applications
where paired training data does not exist. Please refer to
the appendix (Section 7) for more details about the datasets.
edges → shoes
We observe that translations on training data are often more
appealing than those on test data, and full results of all ap-
Figure 8: Example results of CycleGAN on paired datasets plications on both training and test data can be viewed on
used in “pix2pix” [21] such as architectural labels↔photos our project website.
and edges↔shoes.
Collection style transfer (Figure 10 and Figure 11)
Ex∼pdata (x) [kF (G(x)) − xk1 ], or GAN+backward cycle loss We train the model on landscape photographs downloaded
Ey∼pdata (y) [kG(F (y))−yk1 ] (Equation 2) and find that it of- from Flickr and WikiArt. Note that unlike recent work on
ten incurs training instability and causes mode collapse, es- “neural style transfer” [12], our method learns to mimic the
pecially for the direction of the mapping that was removed. style of an entire collection of artworks, rather than trans-
Figure 7 shows several qualitative examples. ferring the style of a single selected piece of art. Therefore,
we can learn to generate photos in the style of, e.g., Van
Gogh, rather than just in the style of Starry Night. The size
5.1.5 Image reconstruction quality of the dataset for each artist/style was 526, 1073, 400, and
563 for Cezanne, Monet, Van Gogh, and Ukiyo-e.
In Figure 4, we show a few random samples of the recon- Object transfiguration (Figure 13) The model is
structed images F (G(x)). We observed that the recon- trained to translate one object class from ImageNet [41] to
structed images were very close to the original inputs x, another (each class contains around 1000 training images).
at both training and testing time, even in cases where one Turmukhambetov et al. [49] proposes a subspace model to
domain represents significantly more diverse information, translate one object into another object of the same category,
such as map↔aerial photos. while our method focuses on object transfiguration between
Input CycleGAN CycleGAN+𝐿𝑖𝑑𝑒𝑛𝑡𝑖𝑡𝑦 Photo enhancement (Figure 14) We show that our
method can be used to generate photos with shallower depth
of field. We train the model on flower photos downloaded
from Flickr. The source domain consists of photos of flower
taken by smartphones, which usually have deep DoF due
to small aperture. The target contains photos captured by
DSLRs with larger aperture. Our model successfully gen-
erates photos with shallower depth of field from the photos
taken by smartphones.
Comparison with Gatys et al. [12] In Figure 15, we
compare our results with neural style transfer [12] on photo
stylization. For each row, we first use two representative
artworks as the style images for [12]. Our method, on
the other hand, is able to produce photos in the style of en-
tire collection. To compare against neural style transfer of
an entire collection, we compute the average Gram Matrix
Figure 9: The effect of the identity mapping loss on Monet’s across the target domain, and use this matrix to transfer the
painting→ photos. From left to right: input paintings, Cy- “average style” using [12].
cleGAN without identity mapping loss, CycleGAN with Figure 16 demonstrates similar comparisons for other
identity mapping loss. The identity mapping loss helps pre- translation tasks. We observe that Gatys et al. [12] requires
serve the color of the input paintings. finding target style images that closely match the desired
output, but still often fails to produce photo-realistic results,
two visually similar categories. while our method succeeds to generate natural looking re-
Season transfer (Figure 13) The model is trained on sults, similar to the target domain.
854 winter photos and 1273 summer photos of Yosemite
downloaded from Flickr. 6. Limitations and Discussion
Photo generation from paintings (Figure 12) For Although our method can achieve compelling results in
painting→photo, we find that it is helpful to introduce an many cases, the results are far from uniformly positive. Sev-
additional loss to encourage the mapping to preserve color eral typical failure cases are shown in Figure 17. On transla-
composition between the input and output. In particular, we tion tasks that involve color and texture changes, like many
adopt the technique of Taigman et al. [48] and regularize the of those reported above, the method often succeeds. We
generator to be near an identity mapping when real samples have also explored tasks that require geometric changes,
of the target domain are provided as the input to the gen- with little success. For example, on the task of dog→cat
erator: i.e. Lidentity (G, F ) = Ey∼pdata (y) [kG(y) − yk1 ] + transfiguration, the learned translation degenerates to mak-
Ex∼pdata (x) [kF (x) − xk1 ]. ing minimal changes to the input (Figure 17). This might
Without Lidentity , the generator G and F are free to be caused by our generator architecture choices which are
change the tint of input images when there is no need to. tailored for good performance on the appearance changes.
For example, when learning the mapping between Monet’s Handling more varied and extreme transformations, espe-
paintings and Flickr photographs, the generator often maps cially geometric changes, is an important problem for future
paintings of daytime to photographs taken during sunset, work.
because such a mapping may be equally valid under the ad- Some failure cases are caused by the distribution charac-
versarial loss and cycle consistency loss. The effect of this teristics of the training datasets. For example, the horse →
identity mapping loss are shown in Figure 9. zebra example (Figure 17, right) has got confused, because
In Figure 12, we show additional results translating our model was trained on the wild horse and zebra synsets
Monet’s paintings to photographs. This figure and Figure 9 of ImageNet, which does not contain images of a person
show results on paintings that were included in the train- riding a horse or zebra.
ing set, whereas for all other experiments in the paper, we We also observe a lingering gap between the results
only evaluate and show test set results. Because the training achievable with paired training data and those achieved by
set does not include paired data, coming up with a plausi- our unpaired method. In some cases, this gap may be very
ble translation for a training set painting is a nontrivial task. hard – or even impossible – to close: for example, our
Indeed, since Monet is no longer able to create new paint- method sometimes permutes the labels for tree and build-
ings, generalization to unseen, “test set”, paintings is not a ing in the output of the photos→labels task. To resolve this
pressing problem. ambiguity may require some form of weak semantic super-
vision. Integrating weak or semi-supervised data may lead
to substantially more powerful translators, still at a fraction
of the annotation cost of the fully-supervised systems.
Nonetheless, in many cases completely unpaired data is
plentifully available and should be made use of. This paper
pushes the boundaries of what is possible in this “unsuper-
vised” setting.
Acknowledgments: We thank Aaron Hertzmann, Shiry
Ginosar, Deepak Pathak, Bryan Russell, Eli Shechtman,
Richard Zhang, and Tinghui Zhou for many helpful com-
ments. This work was supported in part by NSF SMA-
1514512, NSF IIS-1633310, a Google Research Award, In-
tel Corp, and hardware donations from NVIDIA. JYZ is
supported by the Facebook Graduate Fellowship and TP is
supported by the Samsung Scholarship. The photographs
used for style transfer were taken by AE, mostly in France.
Input Monet Van Gogh Cezanne Ukiyo-e

Figure 10: Collection style transfer I: we transfer input images into the artistic styles of Monet, Van Gogh, Cezanne, and
Ukiyo-e. Please see our website for additional examples.
Input Monet Van Gogh Cezanne Ukiyo-e

Figure 11: Collection style transfer II: we transfer input images into the artistic styles of Monet, Van Gogh, Cezanne, Ukiyo-e.
Please see our website for additional examples.
Input Output Input Output

Figure 12: Relatively successful results on mapping Monet’s paintings to photographs. Please see our website for additional
examples.
Input Output Input Output Input Output

horse → zebra

zebra → horse

winter Yosemite → summer Yosemite

summer Yosemite → winter Yosemite

apple → orange

orange → apple

Figure 13: Our method applied to several translation problems. These images are selected as relatively successful results
– please see our website for more comprehensive and random results. In the top two rows, we show results on object
transfiguration between horses and zebras, trained on 939 images from the wild horse class and 1177 images from the zebra
class in Imagenet [41]. Also check out the horse→zebra demo video at https://youtu.be/9reHvktowLY. The
middle two rows show results on season transfer, trained on winter and summer photos of Yosemite from Flickr. In the
bottom two rows, we train our method on 996 apple images and 1020 navel orange images from ImageNet.
Input Output Input Output Input Output Input Output

Figure 14: Photo enhancement: mapping from a set of iPhone snaps to professional DSLR photographs, the system often
learns to produce shallow focus. Here we show some of the most successful results in our test set – average performance is
considerably worse. Please see our website for more comprehensive and random examples.

Input Gatys et al. (image I) Gatys et al. (image II) Gatys et al. (collection) CycleGAN

Photo → Van Gogh

Photo → Ukiyo-e

Photo → Cezanne

Figure 15: We compare our method with neural style transfer [12] on photo stylization. Left to right: input image, results
from [12] using two different representative artworks as style images, results from [12] using the entire collection of the
artist, and CycleGAN (ours).
Input Gatys et al. (image I) Gatys et al. (image II) Gatys et al. (collection) CycleGAN

apple → orange

horse → zebra

Monet → photo

Figure 16: We compare our method with neural style transfer [12] on various applications. From top to bottom:
apple→orange, horse→zebra, and Monet→photo. Left to right: input image, results from [12] using two different images as
style images, results from [12] using all the images from the target domain, and CycleGAN (ours).

Input Output Input Output Input Output

apple → orange zebra → horse winter → summer

dog → cat cat → dog Monet → photo

Horse → Zebra

photo → Ukiyo-e photo → Van Gogh iPhone photo → DSLR photo ImageNet “wild horse” training images

Figure 17: Typical failure cases of our method. Please see our website for more comprehensive results.
References [15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,
D. Warde-Farley, S. Ozair, A. Courville, and Y. Ben-
[1] Y. Aytar, L. Castrejon, C. Vondrick, H. Pirsiavash, gio. Generative adversarial nets. In NIPS, 2014. 2, 3,
and A. Torralba. Cross-modal scene networks. arXiv 4, 7
preprint arXiv:1610.09003, 2016. 3
[16] D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T. Liu, and
[2] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and W.-Y. Ma. Dual learning for machine translation. In
D. Krishnan. Unsupervised pixel-level domain adap- NIPS, pages 820–828, 2016. 3
tation with generative adversarial networks. arXiv
[17] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual
preprint arXiv:1612.05424, 2016. 3
learning for image recognition. In CVPR, pages 770–
[3] R. W. Brislin. Back-translation for cross-cultural 778, 2016. 5
research. Journal of cross-cultural psychology, [18] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and
1(3):185–216, 1970. 2, 3 D. H. Salesin. Image analogies. In SIGGRAPH, pages
[4] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. En- 327–340. ACM, 2001. 2, 3
zweiler, R. Benenson, U. Franke, S. Roth, and [19] G. E. Hinton and R. R. Salakhutdinov. Reducing the
B. Schiele. The cityscapes dataset for semantic urban dimensionality of data with neural networks. Science,
scene understanding. In CVPR, 2016. 2, 5, 7, 20 313(5786):504–507, 2006. 5
[5] E. L. Denton, S. Chintala, R. Fergus, et al. Deep gen- [20] Q.-X. Huang and L. Guibas. Consistent shape maps
erative image models using a laplacian pyramid of ad- via semidefinite programming. In Computer Graph-
versarial networks. In NIPS, pages 1486–1494, 2015. ics Forum, volume 32, pages 177–186. Wiley Online
2 Library, 2013. 3
[6] J. Donahue, P. Krähenbühl, and T. Darrell. Adversar- [21] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-
ial feature learning. arXiv preprint arXiv:1605.09782, to-image translation with conditional adversarial net-
2016. 6, 7 works. In CVPR, 2017. 2, 3, 5, 6, 7, 8, 20
[7] A. Dosovitskiy and T. Brox. Generating images with [22] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses
perceptual similarity metrics based on deep networks. for real-time style transfer and super-resolution. In
In NIPS, pages 658–666, 2016. 7 ECCV, pages 694–711. Springer, 2016. 2, 3, 5, 7, 20
[8] V. Dumoulin, I. Belghazi, B. Poole, A. Lamb, [23] L. Karacan, Z. Akata, A. Erdem, and E. Erdem.
M. Arjovsky, O. Mastropietro, and A. Courville. Learning to generate images of outdoor scenes from
Adversarially learned inference. arXiv preprint attributes and semantic layouts. arXiv preprint
arXiv:1606.00704, 2016. 6, 7 arXiv:1612.00215, 2016. 3
[9] A. A. Efros and T. K. Leung. Texture synthesis by [24] D. Kingma and J. Ba. Adam: A method for stochastic
non-parametric sampling. In ICCV, volume 2, pages optimization. arXiv preprint arXiv:1412.6980, 2014.
1033–1038. IEEE, 1999. 3 5
[25] D. P. Kingma and M. Welling. Auto-encoding varia-
[10] D. Eigen and R. Fergus. Predicting depth, surface nor-
tional bayes. ICLR, 2014. 3
mals and semantic labels with a common multi-scale
convolutional architecture. In ICCV, pages 2650– [26] P.-Y. Laffont, Z. Ren, X. Tao, C. Qian, and J. Hays.
2658, 2015. 2 Transient attributes for high-level understanding and
editing of outdoor scenes. ACM Transactions on
[11] L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shecht- Graphics (TOG), 33(4):149, 2014. 2
man. Preserving color in neural artistic style transfer.
arXiv preprint arXiv:1606.05897, 2016. 3 [27] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cun-
ningham, A. Acosta, A. Aitken, A. Tejani, J. Totz,
[12] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style Z. Wang, et al. Photo-realistic single image super-
transfer using convolutional neural networks. CVPR, resolution using a generative adversarial network.
2016. 3, 8, 9, 15, 16 arXiv preprint arXiv:1609.04802, 2016. 5
[13] C. Godard, O. Mac Aodha, and G. J. Brostow. Un- [28] C. Li and M. Wand. Precomputed real-time texture
supervised monocular depth estimation with left-right synthesis with markovian generative adversarial net-
consistency. In CVPR, 2017. 3 works. ECCV, 2016. 5
[14] I. Goodfellow. Nips 2016 tutorial: Generative ad- [29] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised
versarial networks. arXiv preprint arXiv:1701.00160, image-to-image translation networks. arXiv preprint
2016. 2, 4, 5 arXiv:1703.00848, 2017. 3
[30] M.-Y. Liu and O. Tuzel. Coupled generative adversar- [45] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind,
ial networks. In NIPS, pages 469–477, 2016. 3, 6, W. Wang, and R. Webb. Learning from simulated
7 and unsupervised images through adversarial training.
[31] J. Long, E. Shelhamer, and T. Darrell. Fully convolu- arXiv preprint arXiv:1612.07828, 2016. 3, 5, 6, 7
tional networks for semantic segmentation. In CVPR, [46] K. Simonyan and A. Zisserman. Very deep convo-
pages 3431–3440, 2015. 2, 3, 6 lutional networks for large-scale image recognition.
[32] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and arXiv preprint arXiv:1409.1556, 2014. 7
B. Frey. Adversarial autoencoders. arXiv preprint [47] N. Sundaram, T. Brox, and K. Keutzer. Dense point
arXiv:1511.05644, 2015. 5 trajectories by gpu-accelerated large displacement op-
tical flow. In ECCV, pages 438–451. Springer, 2010.
[33] X. Mao, Q. Li, H. Xie, R. Y. Lau, and Z. Wang. Multi-
3
class generative adversarial networks with the l2 loss
function. arXiv preprint arXiv:1611.04076, 2016. 5 [48] Y. Taigman, A. Polyak, and L. Wolf. Unsuper-
vised cross-domain image generation. arXiv preprint
[34] M. Mathieu, C. Couprie, and Y. LeCun. Deep multi-
arXiv:1611.02200, 2016. 3, 9
scale video prediction beyond mean square error.
ICLR, 2016. 2 [49] D. Turmukhambetov, N. D. Campbell, S. J. Prince,
and J. Kautz. Modeling object appearance using
[35] M. F. Mathieu, J. Zhao, A. Ramesh, P. Sprechmann, context-conditioned component analysis. In CVPR,
and Y. LeCun. Disentangling factors of variation pages 4156–4164, 2015. 8
in deep representation using adversarial training. In
NIPS, pages 5040–5048, 2016. 2 [50] M. Twain. The Jumping Frog: in English, then in
French, and then Clawed Back into a Civilized Lan-
[36] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and guage Once More by Patient, Unremunerated Toil.
A. A. Efros. Context encoders: Feature learning by 1903. 3
inpainting. CVPR, 2016. 2
[51] D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempit-
[37] A. Radford, L. Metz, and S. Chintala. Unsu- sky. Texture networks: Feed-forward synthesis of tex-
pervised representation learning with deep convolu- tures and stylized images. In Int. Conf. on Machine
tional generative adversarial networks. arXiv preprint Learning (ICML), 2016. 3
arXiv:1511.06434, 2015. 2
[52] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance
[38] R. Š. Radim Tyleček. Spatial pattern templates for normalization: The missing ingredient for fast styliza-
recognition of objects with regular structure. In Proc. tion. arXiv preprint arXiv:1607.08022, 2016. 5
GCPR, Saarbrucken, Germany, 2013. 8, 20
[53] C. Vondrick, H. Pirsiavash, and A. Torralba. Generat-
[39] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, ing videos with scene dynamics. In NIPS, pages 613–
and H. Lee. Generative adversarial text to image syn- 621, 2016. 2
thesis. arXiv preprint arXiv:1605.05396, 2016. 2 [54] F. Wang, Q. Huang, and L. J. Guibas. Image co-
[40] R. Rosales, K. Achan, and B. J. Frey. Unsupervised segmentation via consistent functional maps. In ICCV,
image translation. In iccv, pages 472–478, 2003. 3 pages 849–856, 2013. 3
[41] O. Russakovsky, J. Deng, H. Su, J. Krause, [55] X. Wang and A. Gupta. Generative image modeling
S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, using style and structure adversarial networks. ECCV,
M. Bernstein, et al. Imagenet large scale visual recog- 2016. 2
nition challenge. IJCV, 115(3):211–252, 2015. 8, 14 [56] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenen-
[42] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, baum. Learning a probabilistic latent space of ob-
A. Radford, and X. Chen. Improved techniques for ject shapes via 3d generative-adversarial modeling. In
training gans. arXiv preprint arXiv:1606.03498, 2016. NIPS, pages 82–90, 2016. 2
2 [57] S. Xie and Z. Tu. Holistically-nested edge detection.
[43] P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays. Scrib- In ICCV, 2015. 2
bler: Controlling deep image synthesis with sketch [58] Z. Yi, H. Zhang, T. Gong, Tan, and M. Gong. Dual-
and color. In CVPR, 2017. 3 gan: Unsupervised dual learning for image-to-image
[44] Y. Shih, S. Paris, F. Durand, and W. T. Freeman. Data- translation. In ICCV, 2017. 3
driven hallucination of different times of day from a [59] A. Yu and K. Grauman. Fine-grained visual compar-
single outdoor photo. ACM Transactions on Graphics isons with local learning. In CVPR, pages 192–199,
(TOG), 32(6):200, 2013. 2 2014. 8, 20
[60] C. Zach, M. Klopschitz, and M. Pollefeys. Disam-
biguating visual relations using loop constraints. In
CVPR, pages 1426–1433. IEEE, 2010. 3
[61] R. Zhang, P. Isola, and A. A. Efros. Colorful image
colorization. In ECCV, 2016. 2
[62] J. Zhao, M. Mathieu, and Y. LeCun. Energy-
based generative adversarial network. arXiv preprint
arXiv:1609.03126, 2016. 2
[63] T. Zhou, Y. Jae Lee, S. X. Yu, and A. A. Efros.
Flowweb: Joint image set alignment by weaving con-
sistent, pixel-wise correspondences. In CVPR, pages
1191–1200, 2015. 3
[64] T. Zhou, P. Krahenbuhl, M. Aubry, Q. Huang, and
A. A. Efros. Learning dense correspondence via 3d-
guided cycle consistency. In CVPR, pages 117–126,
2016. 2, 3
[65] J.-Y. Zhu, P. Krähenbühl, E. Shechtman, and A. A.
Efros. Generative visual manipulation on the natural
image manifold. In ECCV, 2016. 2
7. Appendix weight for the identity mapping loss was 0.5λ where λ was
the weight for cycle consistency loss, and we set λ = 10.
7.1. Training details Flower photo enhancement Flower images taken on
All the networks (except edges↔shoes) were trained iPhones were downloaded from Flickr by searching for the
from scratch, with a learning rate of 0.0002 for 200 epochs. photos taken by Apple iPhone 5, 5s, or 6, with search text
In practice, we divide the objective by 2 while optimizing flower. DSLR images with shallow DoF were also down-
D, which slows down the rate at which D learns relative to loaded from Flickr by search tag flower, dof. The images
G. We keep the same learning rate for the first 100 epochs were scaled to width 360 pixels. The identity mapping loss
and linearly decay the rate to zero over the next 100 epochs. of weight 0.5λ was used. The training set size of the smart-
Weights were initialized from a Gaussian distribution with phone and DSLR dataset were 1813 and 3326, respectively.
mean 0 and standard deviation 0.02.
7.2. Network architectures
Cityscapes label↔Photo 2975 training images from the
Cityscapes training set [4] with image size 128 × 128. We Our code and models are available at https://
used the Cityscapes val set for testing. github.com/junyanz/CycleGAN. We also provide
Maps↔aerial photograph 1096 training images a pytorch implementation at https://github.com/
scraped from Google Maps [21] with image size 256 × 256. junyanz/pytorch-CycleGAN-and-pix2pix
Images were sampled from in and around New York City. Generator architectures We adapt our architectures
Data was then split into train and test about the median from Johnson et al. [22]. We use 6 blocks for 128 × 128
latitude of the sampling region (with a buffer region added training images, and 9 blocks for 256 × 256 or higher-
to ensure that no training pixel appeared in the test set). resolution training images. Below, we follow the naming
Architectural facades labels↔photo 400 training im- convention used in the Johnson el al.’s Github repository5
ages from [38]. Let c7s1-k denote a 7 × 7 Convolution-InstanceNorm-
Edges→shoes around 50, 000 training images from UT ReLU layer with k filters and stride 1. dk denotes a 3 × 3
Zappos50K dataset [59]. The model was trained for 5 Convolution-InstanceNorm-ReLU layer with k filters, and
epochs with learning rate of 0.0002. stride 2. Reflection padding was used to reduce artifacts.
Horse↔Zebra and Apple↔Orange The images for Rk denotes a residual block that contains two 3 × 3 con-
each class were downloaded from ImageNet using key- volutional layers with the same number of filters on both
words wild horse, zebra, apple, and navel orange. The im- layer. uk denotes a 3 × 3 fractional-strided-Convolution-
ages were scaled to 256×256 pixels. The training set size of InstanceNorm-ReLU layer with k filters, and stride 21 .
each class was horse: 939, zebra: 1177, apple: 996, orange: The network with 6 blocks consists of:
1020. c7s1-32,d64,d128,R128,R128,R128,
Summer↔Winter Yosemite The images were down- R128,R128,R128,u64,u32,c7s1-3
loaded using Flickr API using the tag yosemite and the date- The network with 9 blocks consists of:
taken field. Black-and-white photos were pruned. The im- c7s1-32,d64,d128,R128,R128,R128,
ages were scaled to 256 × 256 pixels. The training size of R128,R128,R128,R128,R128,R128,u64,u32,c7s1-3
each class was summer: 1273, winter: 854. Discriminator architectures For discriminator net-
Photo↔Art for style transfer The art images were works, we use 70 × 70 PatchGAN [21]. Let Ck denote a
downloaded from Wikiart.org by crawling. Some artworks 4 × 4 Convolution-InstanceNorm-LeakyReLU layer with k
that were sketches or too obscene were pruned by hand. filters and stride 2. After the last layer, we apply a convo-
The photos are downloaded from Flickr using the combina- lution to produce a 1 dimensional output. We do not use
tion of tags landscape and landscapephotography. Black- InstanceNorm for the first C64 layer. We use leaky ReLUs
and-white photos were pruned. The images were scaled to with slope 0.2. The discriminator architecture is:
256 × 256 pixels. The training set size of each class was C64-C128-C256-C512
Monet: 1074, Cezanne: 584, Van Gogh: 401, Ukiyo-e:
1433, Photographs: 6853. The Monet dataset was partic-
ularly pruned to include only landscape paintings, and Van
Gogh included only his later works that represent his most
recognizable artistic style.
Monet’s paintings→photos In order to achieve high
resolution while conserving memory, random square crops
of the rectangular images were used for training. To gener-
ate results, images of width 512 pixels with correct aspect 5 https://github.com/jcjohnson/

ratio were passed to the generator network as input. The fast-neural-style.

Generating Anime Faces From Human Faces With Adversarial Networks
No ratings yet
Generating Anime Faces From Human Faces With Adversarial Networks
7 pages
Elements of Episodic Memory
No ratings yet
Elements of Episodic Memory
350 pages
Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks
No ratings yet
Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks
18 pages
Unpaired Image To Image Translation CycleGAn
No ratings yet
Unpaired Image To Image Translation CycleGAn
18 pages
CycleGAN_CVPR2017
No ratings yet
CycleGAN_CVPR2017
18 pages
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
No ratings yet
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
10 pages
Contrastive Learning For Unpaired Image-to-Image Translation
No ratings yet
Contrastive Learning For Unpaired Image-to-Image Translation
29 pages
Conditional Image To Image Translation
No ratings yet
Conditional Image To Image Translation
9 pages
Image-to-Image Translation With Conditional Adversarial Networks
No ratings yet
Image-to-Image Translation With Conditional Adversarial Networks
17 pages
Image-to-Image Translation With Conditional Adversarial Networks
No ratings yet
Image-to-Image Translation With Conditional Adversarial Networks
17 pages
1032c868
No ratings yet
1032c868
9 pages
Conditional GAN: Deep Image Processing Seminar
No ratings yet
Conditional GAN: Deep Image Processing Seminar
61 pages
CycleGAN - Learning To Translate Images (Without Paired Training Data) - by Sarah Wolf - Towards Data Science
No ratings yet
CycleGAN - Learning To Translate Images (Without Paired Training Data) - by Sarah Wolf - Towards Data Science
9 pages
2102.04699
No ratings yet
2102.04699
9 pages
lata2019
No ratings yet
lata2019
4 pages
Image Disentanglement and Uncooperative Re-Entanglement For High-Fidelity Image-to-Image Translation
No ratings yet
Image Disentanglement and Uncooperative Re-Entanglement For High-Fidelity Image-to-Image Translation
12 pages
UVCGAN UNetVision Transformer Cycle-consistent GAN for Unpaired
No ratings yet
UVCGAN UNetVision Transformer Cycle-consistent GAN for Unpaired
17 pages
Image Data Augmentation With Unpaired Image-To-Image Camera Model Translation
No ratings yet
Image Data Augmentation With Unpaired Image-To-Image Camera Model Translation
5 pages
Perceptual Adversarial Networks For Image-to-Image Transformation
No ratings yet
Perceptual Adversarial Networks For Image-to-Image Transformation
13 pages
Report 16
No ratings yet
Report 16
9 pages
2108.04547v2
No ratings yet
2108.04547v2
10 pages
One-Step Image Translation With Text-to-Image Models
No ratings yet
One-Step Image Translation With Text-to-Image Models
29 pages
Contextual Loss ECCV 2018
No ratings yet
Contextual Loss ECCV 2018
16 pages
Review Paper Yonas - Brhanu
No ratings yet
Review Paper Yonas - Brhanu
4 pages
Art2Real - Unfolding The Reality of Artworks
No ratings yet
Art2Real - Unfolding The Reality of Artworks
11 pages
Breaking the Dilemma of Medical Image-to-image
No ratings yet
Breaking the Dilemma of Medical Image-to-image
18 pages
Recycle-GAN Unsupervised Video Retargeting
No ratings yet
Recycle-GAN Unsupervised Video Retargeting
17 pages
Cross-Caption Cycle-Consistent Text-to-Image Synthesis
No ratings yet
Cross-Caption Cycle-Consistent Text-to-Image Synthesis
9 pages
2022_The effect of loss function on conditional generative adversarial networks
No ratings yet
2022_The effect of loss function on conditional generative adversarial networks
12 pages
Image-to-Image Translation: Methods and Applications
No ratings yet
Image-to-Image Translation: Methods and Applications
19 pages
DiffIT
No ratings yet
DiffIT
22 pages
Contrastive Learning in Image Style Transfer: A Thorough Examination using CAST and UCAST Frameworks
No ratings yet
Contrastive Learning in Image Style Transfer: A Thorough Examination using CAST and UCAST Frameworks
8 pages
CDGAN - Cyclic Discriminative GAN For Image2image Translation
No ratings yet
CDGAN - Cyclic Discriminative GAN For Image2image Translation
11 pages
Park_LANIT_Language-Driven_Image-to-Image_Translation_for_Unlabeled_Data_CVPR_2023_paper
No ratings yet
Park_LANIT_Language-Driven_Image-to-Image_Translation_for_Unlabeled_Data_CVPR_2023_paper
11 pages
Deep Generative Adversarial Networks For Image-To
No ratings yet
Deep Generative Adversarial Networks For Image-To
26 pages
Generation With Nuanced Changes Continuous Image-To-Image Translation With Adversarial Preferences
No ratings yet
Generation With Nuanced Changes Continuous Image-To-Image Translation With Adversarial Preferences
12 pages
Benaim_Domain_Intersection_and_Domain_Difference_ICCV_2019_paper
No ratings yet
Benaim_Domain_Intersection_and_Domain_Difference_ICCV_2019_paper
9 pages
Image-to-Image Difussion Models
No ratings yet
Image-to-Image Difussion Models
29 pages
Convert Selfie To Anime Using Generative Adversarial Networks
No ratings yet
Convert Selfie To Anime Using Generative Adversarial Networks
4 pages
Text To Image Synthesis Using Self
No ratings yet
Text To Image Synthesis Using Self
20 pages
Zhangyuanxin+Final
No ratings yet
Zhangyuanxin+Final
12 pages
Exploring Patch-Wise Semantic Relation For Contrastive Learning in Image-to-Image Translation Tasks
No ratings yet
Exploring Patch-Wise Semantic Relation For Contrastive Learning in Image-to-Image Translation Tasks
10 pages
Image-to-Image Translation With Conditional Adversarial Networks (Review)
No ratings yet
Image-to-Image Translation With Conditional Adversarial Networks (Review)
3 pages
Luan_Deep_Photo_Style_CVPR_2017_paper
No ratings yet
Luan_Deep_Photo_Style_CVPR_2017_paper
9 pages
Perceptual Losses For Real-Time Style Transfer and Super-Resolution
No ratings yet
Perceptual Losses For Real-Time Style Transfer and Super-Resolution
18 pages
Source Free Domain Adaptation With Image Translation: Preprint. Under Review
No ratings yet
Source Free Domain Adaptation With Image Translation: Preprint. Under Review
11 pages
Palette Diffusion
No ratings yet
Palette Diffusion
26 pages
Unsupervised Cross-Domain Image Generation
No ratings yet
Unsupervised Cross-Domain Image Generation
14 pages
Face Swap Using Autoencoders & Image-To-Image Translation Techniques
No ratings yet
Face Swap Using Autoencoders & Image-To-Image Translation Techniques
7 pages
Dual Adversarial Inference For Text-to-Image Synthesis
No ratings yet
Dual Adversarial Inference For Text-to-Image Synthesis
20 pages
Few shot fish image generation and classification
No ratings yet
Few shot fish image generation and classification
6 pages
N-19906
No ratings yet
N-19906
8 pages
2101.08629v2
No ratings yet
2101.08629v2
24 pages
06 cGAN
No ratings yet
06 cGAN
45 pages
Meta
No ratings yet
Meta
17 pages
Document Query
No ratings yet
Document Query
5 pages
AI resubmtion
No ratings yet
AI resubmtion
18 pages
Session 5
No ratings yet
Session 5
33 pages
A Text-Image Feature Mapping Algorithm Based on Tr
No ratings yet
A Text-Image Feature Mapping Algorithm Based on Tr
10 pages
4 - Creating Creative Photomontages or Image Mixing Using Generative Adversarial Networks
No ratings yet
4 - Creating Creative Photomontages or Image Mixing Using Generative Adversarial Networks
9 pages
Vanishing Point: Exploring the Limits of Vision: Insights from Computer Science
From Everand
Vanishing Point: Exploring the Limits of Vision: Insights from Computer Science
Fouad Sabry
No ratings yet
EN Spare Parts Manual
No ratings yet
EN Spare Parts Manual
24 pages
Continuous Improvement Project
No ratings yet
Continuous Improvement Project
15 pages
Imaging Enterprise Scan Searchable PDF Doctool 24.4 Release Notes
No ratings yet
Imaging Enterprise Scan Searchable PDF Doctool 24.4 Release Notes
6 pages
Narrative Report On The Division Seminar
No ratings yet
Narrative Report On The Division Seminar
3 pages
Name: - Class: - Date: - Chapter 4: Coordinate Geometry Remediation Worksheet 1 Chapter 4.1 Gradient of A Straight Line
No ratings yet
Name: - Class: - Date: - Chapter 4: Coordinate Geometry Remediation Worksheet 1 Chapter 4.1 Gradient of A Straight Line
31 pages
Chemistry 5070 Notes
No ratings yet
Chemistry 5070 Notes
191 pages
FME IDEA FP BW 29 4 25
No ratings yet
FME IDEA FP BW 29 4 25
1 page
Seals, Bearings, Linings Brochure PDF
No ratings yet
Seals, Bearings, Linings Brochure PDF
24 pages
Cours 3 - Classes Structurées Et Collaborations
No ratings yet
Cours 3 - Classes Structurées Et Collaborations
21 pages
Geo - 3
No ratings yet
Geo - 3
2 pages
Electron Configuration and The Periodic Table
100% (1)
Electron Configuration and The Periodic Table
43 pages
Design of a Lead-Free Slipper Bearing for Low Speed Axial Piston Pump Applications
No ratings yet
Design of a Lead-Free Slipper Bearing for Low Speed Axial Piston Pump Applications
7 pages
Chex Muddy Buddies: Prep Time: 15 Mins Servings: 18 (1/2 Cup Each)
No ratings yet
Chex Muddy Buddies: Prep Time: 15 Mins Servings: 18 (1/2 Cup Each)
17 pages
Form 3 Lesson 28
No ratings yet
Form 3 Lesson 28
2 pages
Write The LC Code For Each
No ratings yet
Write The LC Code For Each
8 pages
Guitar Notes Explained
No ratings yet
Guitar Notes Explained
30 pages
How To Test Super Capacitors
No ratings yet
How To Test Super Capacitors
8 pages
4.DB Flats (1-7) & SMDB Typical
No ratings yet
4.DB Flats (1-7) & SMDB Typical
9 pages
Instant Download Digital Libraries at Times of Massive Societal Transition 22nd International Conference on Asia Pacific Digital Libraries ICADL 2020 Kyoto Japan November 30 December 1 2020 Proceedings Emi Ishita PDF All Chapters
100% (3)
Instant Download Digital Libraries at Times of Massive Societal Transition 22nd International Conference on Asia Pacific Digital Libraries ICADL 2020 Kyoto Japan November 30 December 1 2020 Proceedings Emi Ishita PDF All Chapters
55 pages
Paper 8: Harmonics Mitigation Using Active Power Filter: A Technological Review
100% (1)
Paper 8: Harmonics Mitigation Using Active Power Filter: A Technological Review
10 pages
Sphere
No ratings yet
Sphere
8 pages
Syllabus & Total Content of Sap Fico Is As Follows - Total 50 Hrs Videos + Notes & Blue Prints
No ratings yet
Syllabus & Total Content of Sap Fico Is As Follows - Total 50 Hrs Videos + Notes & Blue Prints
2 pages
Time Value of Money: - PV FV/ (1+r) - PVA AMT ( (1 - (1+r) ) /R) - FV PV (1+r) - FVA AMT ( ( (1+r) - 1) /R)
No ratings yet
Time Value of Money: - PV FV/ (1+r) - PVA AMT ( (1 - (1+r) ) /R) - FV PV (1+r) - FVA AMT ( ( (1+r) - 1) /R)
16 pages
Intro To 3d Printing
No ratings yet
Intro To 3d Printing
11 pages
Hariharan SIR
No ratings yet
Hariharan SIR
9 pages
Injection Precast Concrete Pile
No ratings yet
Injection Precast Concrete Pile
18 pages
Cute Animals and How to Draw Them
No ratings yet
Cute Animals and How to Draw Them
55 pages
file_explorer
No ratings yet
file_explorer
3 pages
Specification - Containers R R Polypack
No ratings yet
Specification - Containers R R Polypack
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

Uploaded by

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

Uploaded by

Unpaired Image-to-Image Translation

using Cycle-Consistent Adversarial Networks

Jun-Yan Zhu∗ Taesung Park∗ Phillip Isola Alexei A. Efros

Monet Photos Zebras Horses Summer Winter

Monet photo zebra horse summer winter

photo Monet horse zebra winter summer

Photograph Monet Van Gogh Cezanne Ukiyo-e

3.1. Adversarial Loss

Figure 8 shows some example results on other paired

winter Yosemite → summer Yosemite

summer Yosemite → winter Yosemite

Photo → Van Gogh

Input Output Input Output Input Output

apple → orange zebra → horse winter → summer

dog → cat cat → dog Monet → photo

ratio were passed to the generator network as input. The fast-neural-style.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.