Computational Photography
Computational Photography
Computational Photography
Published by the IEEE Computer Society 0272-1716/10/$26.00 © 2010 IEEE IEEE Computer Graphics and Applications 81
Projects in VR
companies that make camera processing chips. This manufacturer runs a photo-sharing site with sig-
secrecy makes it hard to insert open source compo- nificant penetration into Euro-American markets,
nents into the value chain because these chip mak- although several are trying (for example, Nikon’s
ers won’t reveal their hardware interfaces. My Picturetown or Kodak’s Gallery).
The tangle of underlying patents, some of them
quite broad, also constitutes a barrier to entry Branding and Conservatism
for new camera companies. When a patent war Third, traditional camera companies are inherently
erupts, as it has between Nokia, Google, and Ap- conservative. For example, every component in a
ple in the mobile phone space, it typically ends Nikon or Canon SLR is treated by the company
with negotiations and cross-licensing of patents, as reflecting on the brand’s quality as a whole. So,
new technologies are refined for years before they’re
introduced in a commercial product. This strategy
Although published computational yields reliable products but slow innovation.
The open source software community advocates
photography techniques might appear exactly the opposite strategy—“release early, re-
ready for commercialization, key steps are lease often.”2 This strategy yields fast innovation
and (eventually) high quality, as has been proven
sometimes missing. with Linux, the Apache Web server, the Thunder-
bird mail client, and most iPhone applications,
even though the latter are not open source. When
further raising the entry barrier. Starting a new I proposed to a prominent camera manufacturer
camera company isn’t impossible, but profit mar- recently that they open their platform to user-
gins are razor-thin, partly due to licensing fees. generated plug-ins, they worried that if a photog-
So, unless you introduce technology that lets you rapher took a bad picture using a plug-in, he or she
charge significantly more for your camera, making might return the camera to the store for warranty
money is difficult. repair. Although this attitude was understandable
20 years ago, it’s now antiquated; iPhone users
Hardware versus Software know that if a third-party application crashes, it
Second, traditional camera manufacturers are pri- isn’t Apple’s fault.
marily hardware, not software, companies. Of course, As an example of this conservatism, although
digital cameras contain a tremendous amount of algorithms for HDR imaging have existed since the
software, but these companies treat software as a ne- mid-1990s, except for a few two-exposure models
cessity, not a point of departure. So, computational from Fuji and Sony, no camera manufacturer of-
photography represents a threat because its value fers an HDR camera. Instead, these manufacturers
comes largely from algorithms. Fredo Durand likes have been locked in a “megapixel war,” leading to
to say that “software is the next optics”—meaning cameras with more pixels than most consumers
that future optical systems will be hybrids of glass need. This war is finally winding down, so compa-
and image processing. Besides placing more em- nies are casting about for a new feature on which
phasis on software, computational photography to compete. Once one of them offers such a feature
results are being published in journals and either (which might be HDR imaging), the rest will join
placed in the public domain or patented by uni- in. They’ll compete fiercely on that feature, to the
versities, who often issue nonexclusive licenses. exclusion of others that might be ready for com-
These trends run counter to the camera industry’s mercialization and could be useful to consumers.
hardware-intensive, secrecy-based structure.
Traditional camera companies are also uncom- Research Gaps
fortable with Internet ecosystems, which typically Finally, while published computational photog-
involve multiple vendors, permit (or encourage) raphy techniques might appear ready for com-
user-generated content, and thrive or falter on the mercialization, key steps are sometimes missing.
quality of their user interfaces. It’s worth noting Although HDR imaging has a long history, the re-
that, as of this writing, not a single point-and- search community has never addressed the ques-
shoot or single-lens reflex (SLR) camera offers a tion of automatically deciding which exposures to
3G connection (although a few offer Wi-Fi). This capture—that is, metering for HDR. This omission
will change soon, driven by market pressure from undoubtedly arises from the lack of a program-
the increasingly powerful and connected cameras mable camera with access to a light meter. In ad-
on mobile devices. Also, no traditional camera dition, published techniques often aren’t robust
82 September/October 2010
Image on screen
Card with
markers
(a) (b)
Figure 1. Two student projects with programmable cameras. (a) Abe Davis’s system captured, transmitted, and displayed a 4D
light field. (b) Derek Chen mounted five camera phones facing backward on his car’s ceiling, creating a virtual rearview mirror
with no blind spots.
enough for everyday photography. Flash/no-flash in a town hall meeting at the 2005 symposium I
imaging is still an active research area partly be- mentioned earlier. But, at the time, it wasn’t clear
cause existing techniques produce visible artifacts how to address it.
in many common photographic situations. Again, The pivotal event for us was a 2006 visit to my
I believe progress has been slow in these areas for laboratory by Kari Pulli, a Nokia research manager.
lack of a portable, programmable camera. He pointed out that over the past five years, the
cameras in cell phones have dramatically improved
Other Problems in resolution, optical quality, and photographic
Although most SLR cameras offer a software de- functionality. Moreover, camera phones offer fea-
velopment toolkit (SDK), these SDKs give you no tures that dedicated cameras don’t—wireless con-
more control than the buttons on the camera. You nectivity, a high-resolution display, 3D graphics,
can change the aperture, shutter speed, and ISO, and high-quality audio. Perhaps most important,
but you can’t change the metering or focusing al- these platforms run real operating systems, which
gorithms or modify the pipeline that performs de- vendors have begun opening to third-party devel-
mosaicing, white balancing, denoising, sharpening, opers. With Nokia funding, Mark Horowitz (chair
and image compression. Hackers have managed to of Stanford’s Department of Electrical Engineer-
run scripts on some cameras (see http://chdk.wikia. ing), Kari, and I began developing computational
com/wiki/CHDK_for_Dummies), but these scripts photography applications for commercially avail-
mainly just fiddle with the user interface. able cell phones.
Another potential source of programmable cam-
eras is development kits for the processing chips Early Experiments
embedded in all cameras. But, except for Texas Among our early experiments in this area was a
Instruments’ OMAP (Open Multimedia Applica- real-time, in-camera algorithm for aligning succes-
tion Platform) platform, on which our cameras are sive frames captured by a Nokia N95 smartphone’s
based, these kits (and the hardware interfaces to video camera. Real-time image alignment is a
the underlying chips) are either completely secret low-level tool with many immediate applications,
(as with Canon and Nikon) or carefully protected such as automated panorama capture and frame-
by nondisclosure agreements (such as Zoran and averaged low-light photography.3 In my spring 2008
Ambarella). computational photography course, we loaned N95s
Of course, you can always buy a machine vision to every student. The resulting projects (see http://
camera (from Point Grey, Elphel, or others) and graphics.stanford.edu/courses/cs448a-08-spring)
program everything yourself, but you won’t enjoy demonstrated the value of a programmable camera
trekking to Everest Base Camp tethered to a laptop. platform and the added value of having that plat-
form portable and self-powered.
Frankencamera: An Architecture for For example, Abe Davis developed a system for
Programmable Cameras capturing, transmitting, and displaying a 4D light
My laboratory’s interest in building programmable field (see Figure 1a). He waved a phone around a
cameras grew out of computational photography physical object to capture a light field. He then
courses we’ve offered at Stanford since 2004. In transmitted the light field to a second phone, on
these courses, I found it frustrating that students which you could view it by waving that phone. The
could perform computations on photographs but second phone computed its pose in real time using
couldn’t perform them in a camera or use these its camera, displaying the appropriate slice from
computations’ results to control the camera. I raised the light field so that the object appeared station-
this Achilles’ heel of computational photography ary behind the viewfinder.
Figure 2. A gallery of Frankencameras. (a) Frankencamera F1, (b) Frankencamera F2, and (c) a Nokia N900 F—
a retail smartphone with a custom software stack.
Derek Chen mounted five N95s facing backward Frankencamera F2 (see Figure 2b) employed a Texas
on his car’s ceiling (see Figure 1b). He combined Instruments OMAP 3 system-on-a-chip mounted on
the video streams to form a synthetic aperture a Mistral EVM (evaluation module). The Mistral
picture that could, in principle, be displayed live has an integral touch screen display, thus solving
on the rearview mirror. Because the aperture’s the bandwidth problem, letting us encase the en-
baseline was large, Chen could blur out the posts tire camera in a laser-cut plastic body. Attached
supporting the car’s roof, thereby providing a vir- to the Mistral is the Elphel 353’s sensor tile and
tual rearview mirror with no blind spots. In the hence the same Aptina sensor.
experiment shown in this figure, he blurs out wide The third camera (see Figure 2c) was a retail Nokia
strips of blue tape arranged on the rear window in N900 smartphone with a Toshiba ET8EK8 5-mega-
a way that partly obscured his view. Note that you pixel CMOS (complementary metal-oxide semicon-
can barely see the tape in these synthetic pictures. ductor) sensor and a custom software stack.
84 September/October 2010
a burst of images having different exposures, ISOs,
focus settings, and even different spatial resolu-
tions or regions of interest in the field of view.
This, in turn, facilitates many computational pho-
tography algorithms.
In the end, our architecture consisted of a
hardware specification, a software stack based on
Linux, and FCam, an API with bindings for C++.
To demonstrate our architecture’s viability, we
(a) (b)
implemented it on the Frankencamera F2 and our
modified Nokia N900. We then programmed six Figure 3. A weekend hack. (a) Frankencamera F2 with two flash units
applications: HDR viewfinding and capture, low- attached. (b) Using our FCam API, David Jacobs programmed one flash
light viewfinding and capture, automated acquisi- to strobe repeatedly during a one-second exposure, producing the card
tion of extended-dynamic-range panoramas, foveal trails. Programming the other, more powerful flash to fire once at the
imaging, gyroscope-based hand-shake detection, end of the exposure produced the three cards’ crisp images.
and rephotography.4
As it happens, my favorite application was none one step on a journey. Our goal is to create a com-
of these six; it was a weekend hack in which PhD munity of photographer programmers who develop
student David Jacobs mounted two Canon flash algorithms, applications, and hardware for compu-
units on the Frankencamera F2 (see Figure 3a) and tational cameras. By the time this article appears
programmed them to obtain bizarre lighting effects in print, we’ll have published our architecture and
on playing cards thrown into the air (see Figure 3b). released the source code for our implementation
What’s cool about this application is that, beginning on the Nokia N900. We’ll have also released a bi-
with Frankencamera F2 and FCam, Jacobs took only nary you can download to any retail N900 (with-
a few hours to assemble the hardware and write the out bricking the phone), thereby making its camera
code. It’s exactly the sort of ad hoc experimentation programmable. When we finish building Franken-
we had hoped our architecture would enable. camera F3 (based on the LUPA-4000 sensor), we’ll
make it available at cost to anyone who wants one—
FCam’s Pros and Cons with luck, by December 2010.
So, how useful is FCam? In my 2010 computa- My hope is that, once programmable cameras
tional photography course, we loaned Nokia N900 are widely available to the research community,
Fs to every student and asked them to replace the camera technology can be studied at universities,
autofocus algorithm. (The phone’s camera, al- published in conferences and journals, and fiddled
though small, has a movable lens.) We graded the with by hobbyists and photographers. This should
assignment on the accuracy with which they could foster a computational photography software in-
focus on a test scene we provided and on focus- dustry, including plug-ins for cameras. Headed
ing speed in milliseconds. An assignment such out to shoot your daughter’s soccer game? Don’t
as this would have been impossible before FCam. forget to download the new “soccer ball focus” ap-
Two students submitted algorithms that were bet- plication everybody’s raving about! When I walk
ter than Nokia’s; we presented these in Finland at through the Siggraph 2020 exhibition, I expect to
the course’s conclusion. see a dizzying array of cameras, add-ons, software,
That said, FCam isn’t perfect. While implement- photography asset management systems, and
ing it, we ran up against limitations in our refer- schools offering vocational training in computa-
ence platforms’ hardware and low-level software.4 tional photography.
For example, although our API supports changing Speaking of vocational training, I’d like to see
spatial resolution (the total number of pixels) on our Frankencameras used for not only research
every frame in a video stream, the Video for Linux 2 but also education. Photography is a US$50 bil-
(V4L2) software layer on which our system is built lion business—as large as moviemaking and video
has a pipeline with fixed-sized buffers. Changing gaming combined. If we succeed in opening up
resolutions requires flushing and resetting this the industry, we’re going to need more univer-
pipeline, making it slower than we wanted. sity-trained engineers in this area. To seed this
growth, my laboratory will loan a Frankencamera
Bootstrapping a World of F3 and 10 Nokia N900s to any US university that
Open Source Cameras uses them to teach a computational photography
Defining a new architecture isn’t a goal; it’s only course. (This program, which will begin in 2011, is
supported by a grant from the US National Science (Compute Unified Device Architecture), OpenCL,
Foundation and a matching gift from Nokia.) In or Adobe’s PixelBender, but better suited to com-
thinking through what it might take to triple the putational photography.
number of universities teaching computational
photography, it’s interesting to compare this na- Single-Shot Computational Photography
scent field to the older, larger field of computer Many papers on computational photography take
graphics. I conjecture that the ubiquity and quality this form: “Capture a burst of images varying
of university computer graphics curricula are due camera setting X and combine them to produce a
to three factors: the existence of good textbooks single image that exhibits better Y.” It’s a clever
(such as Newman and Sproull, and Foley, van Dam, strategy. However, once researchers start doing
Feiner, and Hughes), experimental platforms with this in a portable camera, they’ll discover that
an open API (Silicon Graphics + OpenGL), and few photographic situations lend themselves to
a high-quality publication venue (Siggraph). To this solution—people walk, breezes blow, foliage
accomplish this in computational photography, moves, and hands shake.
we’ll need the same three ingredients. Good pub- If we want to make computational photography
lication venues are already in place. The Franken- robust, we must focus on single-frame solutions.
camera is a platform and an API, but it’s only But why must cameras record the world in discrete
one—we need more. frames? Isn’t this a holdover from film days? Us-
For introductory textbooks, however, we’re start- ing novel optics and sensor designs we can mea-
ing from zero. There are good photography books, sure the light falling on the focal plane when and
but they don’t cover its technical aspects. As a glar- where we wish.
ing example, not one how-to book on photography
contains a formula for depth of field. To find one, Computational Video
you must turn to an optics monograph such as Ru- Although this article emphasizes still photography,
dolf Kingslake’s classic Optics in Photography (SPIE, the computational photography community is also
1992). To fill this gap in my courses at Stanford, my interested in video. Representative techniques here
students and I have written narrated Flash applets include video stabilization, spatiotemporal upsam-
on the technical aspects of photography (see http:// pling, video summarization, and continuous ver-
graphics.stanford.edu/courses/cs178/applets). How- sions of still-photography algorithms—for example,
ever, these aren’t an adequate substitute for a text- HDR video, light field videography, and object in-
book. Somebody needs to write one. sertion and removal. Here, a special challenge is to
ensure that the processing is temporally coherent—
Grand Challenges that is, it doesn’t introduce jarring discontinuities
Suppose the community succeeds in making cam- from frame to frame.
eras programmable and teaching a generation of
students about camera technology. What should Integration of In-Camera and In-Cloud Computing
these students work on? What are the hard prob- Although high-end cameras don’t yet contain 3G
lems “on the dark side of the lens?”5 Here are a radios, this will change soon. Once cameras are
few suggestions. connected to the cloud, you can imagine many ap-
plications that, for the foreseeable future, would
Shader Languages for Cameras take too long to run on the camera or burn too
When students write camera applications using much battery power. A good example is video sta-
FCam, they typically do their image processing on bilization. I don’t need this to run on the camera;
the CPU, then complain about speed. It’s not their just make sure my video is stable when it’s posted
fault. Although cameras and cell phones have other to YouTube. You can also imagine using images
computing resources—GPUs, digital signal proces- from photo-sharing sites to help control the cam-
sors, and image signal processors—these aren’t de- era—my photo albums prove that my dog is black;
signed with computational photography in mind, why does my camera’s light meter insist on making
and their hardware interfaces are often secret. her look gray?
Fortunately, we believe this situation will
P
change. Similar to the revolution occurring in
graphics chips, we envision photographic image rogress in science and engineering is a dia-
processors that use software-reconfigurable execu- logue between theory and experimentation. In
tion stages. To control such a processor, you could this dialogue, the experimental platform’s quality is
design a domain-specific language akin to CUDA paramount. We built our Camera 2.0 project on the
86 September/October 2010
premise that a tethered SLR isn’t an adequate ex- References
perimental platform for computational photography. 1. R. Raskar and J. Tumblin, Computational Photography:
In particular, an SLR’s lack of controllability makes Mastering New Techniques for Lenses, Lighting, and
it hard to do innovative research, and its lack of Sensors, AK Peters, 2010.
transparency makes it hard to do definitive research. 2. E.S. Raymond, The Cathedral and the Bazaar, O’Reilly,
(What are the noise characteristics of a Canon 5D 2001.
Mark II’s image sensor? Only Canon knows.) 3. A. Adams, N. Gelfand, and K. Pulli, “Viewfinder
Michael Faraday, one of the greatest experimen- Alignment,” Proc. Eurographics, Eurographics Assoc.,
tal scientists, built his own laboratory equipment. 2008, pp. 597–606.
Faraday wasn’t a fanatic; the devices available to 4. A. Adams et al., “The Frankencamera: An Experimen-
him in the early 19th century were neither flex- tal Platform for Computational Photography,” ACM
ible enough nor accurate enough for his delicate Trans. Graphics, vol. 29, no. 4, 2010, article 29.
experiments in electromagnetism. As Thomas 5. N. Goldberg, Camera Technology: The Dark Side of the
Kuhn writes in his seminal study, The Structure of Lens, Academic Press, 1992.
Scientific Revolutions, “Novelty ordinarily emerges 6. T. Kuhn, The Structure of Scientific Revolutions, Univ.
only for the man who, knowing with precision of Chicago Press, 1962.
what he should expect, is able to recognize that
something has gone wrong.”6 Kuhn further hy- Marc Levoy is a professor in Stanford University’s Computer
pothesizes that intellectual revolutions occur Science Department. Contact him at levoy@cs.stanford.edu.
only when a scientist casts aside the tools of the
dominant paradigm and begins to invent wild Contact the department editors at cga-vr@computer.org.
theories and experiment at random. The Camera
2.0 project’s primary goal is to facilitate this sort Selected CS articles and columns are also available
of wild experimentation. for free at http://ComputingNow.computer.org.
ADVERTISERINFORMATION•SEPTEMBER/OCTOBER2010
Sandy Brown: Sr. Business Development Mgr. US Central: Darcy Giovingo | Email: dg.ieeemedia@ieee.org
Phone: +1 714 821 8380 | Fax: +1 714 821 4010 Phone: +1 847 498 4520 | Fax: +1 847 498 5911
computingnow.computer.org