BakedSDF - Meshing Neural SDFs For Real-Time View Synthesis

BakedSDF: Meshing Neural SDFs for Real-Time View Synthesis
LIOR YARIV∗† , Weizmann Institute of Science, Israel and Google Research, United Kingdom
PETER HEDMAN∗ , Google Research, United Kingdom
CHRISTIAN REISER† , Tübingen AI Center, Germany and Google Research, United Kingdom
DOR VERBIN, Google Research, United States of America
PRATUL P. SRINIVASAN, Google Research, United States of America
RICHARD SZELISKI, Google Research, United States of America
JONATHAN T. BARRON, Google Research, United States of America
BEN MILDENHALL, Google Research, United States of America
arXiv:2302.14859v1 [cs.CV] 28 Feb 2023
(a) Extracted mesh (b) Rendering (∼105 FPS) (c) Diffuse/specular components
(d) Appearance editing (e) Physics simulation
Fig. 1. Our method, BakedSDF, optimizes a neural surface-volume representation of a complex real-world scenes and (a) “bakes“ that representation into a
high-resolution mesh. These meshes (b) can be rendered in real time on commodity hardware, and support other applications such as (c) separating material
components, (d) appearance editing with accurate cast shadows, and (e) physics simulation for inserted objects. Interactive demo at https://bakedsdf.github.io/.
We present a method for reconstructing high-quality meshes of large un- 1 INTRODUCTION

bounded real-world scenes suitable for photorealistic novel view synthesis. Current top-performing approaches for novel view synthesis — the
We first optimize a hybrid neural volume-surface scene representation de-
task of using captured images to recover a 3D representation that
signed to have well-behaved level sets that correspond to surfaces in the
scene. We then bake this representation into a high-quality triangle mesh,
can be rendered from unobserved viewpoints — are largely based on
which we equip with a simple and fast view-dependent appearance model Neural Radiance Fields (NeRF) [Mildenhall et al. 2020]. By represent-
based on spherical Gaussians. Finally, we optimize this baked representa- ing a scene as a continuous volumetric function parameterized by a
tion to best reproduce the captured viewpoints, resulting in a model that multilayer perceptron (MLP), NeRF is able to produce photorealistic
can leverage accelerated polygon rasterization pipelines for real-time view renderings that exhibit detailed geometry and view-dependent ef-
synthesis on commodity hardware. Our approach outperforms previous fects. Because the MLP underlying a NeRF is expensive to evaluate
scene representations for real-time rendering in terms of accuracy, speed, and must be queried hundreds of times per pixel, rendering a high
and power consumption, and produces high quality meshes that enable resolution image from a NeRF is typically slow.
applications such as appearance editing and physical simulation. Recent work has improved NeRF rendering performance by trad-
ing compute-heavy MLPs for discretized volumetric representations
∗ Both authors contributed equally to this research. such as voxel grids. However, these approaches require substan-
† Work done while interning at Google. tial GPU memory and custom volumetric raymarching code and
are not amenable to real-time rendering on commodity hardware,
since modern graphics hardware and software is oriented towards
Authors’ addresses: Lior Yariv, Weizmann Institute of Science, Israel and Google Re-
search, United Kingdom; Peter Hedman, Google Research, United Kingdom; Christian
rendering polygonal surfaces rather than volumetric fields.
Reiser, Tübingen AI Center, Germany and Google Research, United Kingdom; Dor While current NeRF-like approaches are able to recover high-
Verbin, Google Research, United States of America; Pratul P. Srinivasan, Google Re- quality real-time-renderable meshes of individual objects with sim-
search, United States of America; Richard Szeliski, Google Research, United States of
America; Jonathan T. Barron, Google Research, United States of America; Ben Milden- ple geometry [Boss et al. 2022], reconstructing detailed and well-
hall, Google Research, United States of America. behaved meshes from captures of real-world unbounded scenes
2 • Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin, Pratul P. Srinivasan, Richard Szeliski, Jonathan T. Barron, and Ben Mildenhall
(such as the “360 degree captures” of Barron et al. [2022]) has proven be rendered efficiently without visible cracks or unstable results
to be more difficult. Recently, MobileNeRF [Chen et al. 2022a] ad- when the camera moves.
dressed this problem by training a NeRF whose volumetric content Most recent approaches to view synthesis sidestep the difficulty
is restricted to lie on the faces of a polygon mesh, then baking that of high-quality mesh reconstruction by using volumetric represen-
NeRF into a texture map. Though this approach yields reasonable tations of geometry and appearance, such as voxel grids [Lombardi
image quality, MobileNeRF initializes the scene geometry as a col- et al. 2019; Penner and Zhang 2017; Szeliski and Golland 1999; Vo-
lection of axis-aligned tiles that turns into a textured polygon “soup” giatzis et al. 2007] or multiplane images [Srinivasan et al. 2019; Zhou
after optimization. The resulting geometry is not suitable for com- et al. 2018]. These representations are well-suited to gradient-based
mon graphics applications such as texture editing, relighting, and optimization of a rendering loss, so they can be effectively optimized
physical simulation. to reconstruct detailed geometry seen in the input images. The most
In this work, we demonstrate how to extract high-quality meshes successful of these volumetric approaches is Neural Radiance Fields
from a NeRF-like neural volumetric representation. Our system, (NeRF) [Mildenhall et al. 2020], which forms the basis for many
which we call BakedSDF, extends the hybrid volume-surface neural state-of-the-art view synthesis methods (see Tewari et al. [2022]
representation of VolSDF [Yariv et al. 2021] to represent unbounded for a review). NeRF represents a scene as a continuous volumetric
real-world scenes. This representation is designed to have a well- field of matter that emits and absorbs light, and renders an image
behaved zero level set corresponding to surfaces in the scene, which using volumetric ray-tracing. NeRF uses an MLP to parameterize
lets us extract high-resolution triangle meshes using marching cubes. the mapping from a spatial coordinate to a volumetric density and
We equip this mesh with a fast and efficient view-dependent appear- emitted radiance, and that MLP must be evaluated at a set of sampled
ance model based on spherical Gaussians, which is fine-tuned to coordinates along a ray to yield a final color.
reproduce the input images of the scene. The output of our system Subsequent works have proposed modifying NeRF’s representa-
can be rendered at real-time frame rates on commodity devices, and tion of scene geometry and appearance for improved quality and
we show that our real-time rendering system outperforms prior editability. Ref-NeRF [Verbin et al. 2022] reparameterizes NeRF’s
work in terms of realism, speed, and power consumption. Addi- view-dependent appearance to enable appearance editing and im-
tionally we show that (unlike comparable prior work) the mesh prove the reconstruction and rendering of specular materials. Other
produced by our model is accurate and detailed, which enables stan- works [Boss et al. 2021; Kuang et al. 2022; Srinivasan et al. 2021;
dard graphics applications such as appearance editing and physics Zhang et al. 2021a,b] attempt to decompose a scene’s view-dependent
simulation. appearance into material and lighting properties. In addition to
modifying NeRF’s representation of appearance, papers including
UNISURF [Oechsle et al. 2021], VolSDF [Yariv et al. 2021], and
NeuS [Wang et al. 2021] augment NeRF’s fully-volumetric represen-
2 RELATED WORK tation of geometry with hybrid volume-surface models.
View synthesis, i.e., the task of rendering novel views of a scene The MLP NeRF uses to represent a scene is usually large and
given a set of captured images, is a longstanding problem in the fields expensive to evaluate, and this means that a NeRF is slow to train
of computer vision and graphics. In scenarios where the observed (hours or days per scene) and slow to render (seconds or minutes per
viewpoints are sampled densely, synthesizing new views can be megapixel). Recent methods have proposed reducing computation
done with light field rendering — straightforward interpolation into at the expense of increasing storage by replacing that single large
the set of observed rays [Gortler et al. 1996; Levoy and Hanrahan MLP with a voxel grid [Karnewar et al. 2022; Sun et al. 2022], a grid
1996]. However, in practical settings where observed viewpoints are of small MLPs [Reiser et al. 2021], low-rank [Chen et al. 2022b] or
captured more sparsely, reconstructing a 3D representation of the sparse [Yu et al. 2022] grid representations, or a multiscale hash en-
scene is crucial for rendering convincing novel views. Most classical coding equipped with a small MLP [Müller et al. 2022]. While these
approaches for view synthesis use triangle meshes (typically recon- representations reduce the computation required for both training
structed using a pipeline consisting of multi-view stereo [Furukawa and rendering (at the cost of increased storage), rendering can be
and Hernández 2015; Schönberger et al. 2016], Poisson surface re- further accelerated by precomputing and storing, i.e., “baking”, a
construction [Kazhdan et al. 2006; Kazhdan and Hoppe 2013], and trained NeRF into a more efficient representation. SNeRG [Hedman
marching cubes [Lorensen and Cline 1987]) as the underlying 3D et al. 2021], FastNeRF [Garbin et al. 2021], Plenoctrees [Yu et al.
scene representation, and render novel views by reprojecting ob- 2021], and Scalable Neural Indoor Scene Rendering [Wu et al. 2022]
served images into each novel viewpoint and blending them together all bake trained NeRFs into sparse volumetric structures and use sim-
using either heuristically-defined [Buehler et al. 2001; Debevec et al. plified models of view-dependent appearance to avoid evaluating
1996; Wood et al. 2000] or learned [Hedman et al. 2018; Riegler and an MLP at each sample along each ray. These methods have en-
Koltun 2020, 2021] blending weights. Although mesh-based repre- abled real-time rendering of NeRFs on high-end hardware, but their
sentations are well-suited for real-time rendering with accelerated use of volumetric raymarching precludes real-time performance on
graphics pipelines, the meshes produced by these approaches tend commodity hardware.
to have inaccurate geometry in regions with fine details or complex
materials, which leads to errors in rendered novel views. Alterna-
tively, point-based representations [Kopanas et al. 2021; Rückert
et al. 2022] are better suited for modeling thin geometry, but cannot
BakedSDF: Meshing Neural SDFs for Real-Time View Synthesis • 3
3 PRELIMINARIES Surface reconstruction Baking a high-resolution Modeling appearance

In this section, we describe the neural volumetric representation using volume rendering triangle mesh with spherical Gaussians
that NeRF [Mildenhall et al. 2020] uses for view synthesis as well as
improvements introduced by mip-NeRF 360 [Barron et al. 2022] for
representing unbounded “360 degree” scenes.
A NeRF is a 3D scene representation consisting of a learned
function that maps a position x and outgoing ray direction d to a
volumetric density 𝜏 and color c. To render the color of a single pixel
in a target camera view, we first compute the ray corresponding
to that pixel r = o + 𝑡d, and then evaluate the NeRF at a series of
points {𝑡𝑖 } along the ray. The resulting outputs 𝜏𝑖 , c𝑖 at each point
are composited together into a single output color value C: Fig. 2. An illustration of the three stages of our method. We first reconstruct
the scene using a surface-based volumetric representation (Section 4.1), then
bake it into a high-quality mesh (Section 4.2), and finally optimize a view-
∑︁ © ∑︁
C= exp − 𝜏 𝑗 𝛿 𝑗 ® (1 − exp (−𝜏𝑖 𝛿𝑖 )) c𝑖 , 𝛿𝑖 = 𝑡𝑖 − 𝑡𝑖−1 . (1)
ª
dependent appearance model based on spherical Gaussians (Section 4.3).
𝑖 « 𝑗 <𝑖 ¬
This definition of C is a quadrature-based approximation of the
volume rendering equation [Max 1995]. geometry and appearance of a scene using NeRF-like volume ren-
NeRF parametrizes this learned function using an MLP whose dering. Then, we “bake” that geometry into a mesh, which we show
weights are optimized to implicitly encode the geometry and color is accurate enough to support convincing appearance editing and
of the scene: A set of training input images and their camera poses physics simulation. Finally, we train a new appearance model that
are converted into a set of (ray, color) pairs, and gradient descent uses spherical Gaussians (SGs) embedded within each vertex of the
is used to optimize the MLP weights such that the rendering of mesh, which replaces the expensive NeRF-like appearance model
each ray resembles its corresponding input color. Formally, NeRF from the first step. The resulting 3D representation that results from
minimizes a loss between the ground truth color Cgt and the color this approach can be rendered in real-time on commodity devices, as
C produced in Equation 1, averaged over all training rays: rendering simply requires rasterizing a mesh and querying a small
h
2
i number of spherical Gaussians.
Ldata = E C − Cgt . (2)
4.1 Modeling density with an SDF
If the input images provide sufficient coverage of the scene (in terms
of multiview 3D constraints), this simple process yields a set of MLP Our representation combines the benefits of mip-NeRF 360 for repre-
weights that accurately describe the scene’s 3D volumetric density senting unbounded scenes with the well-behaved surface properties
and appearance. of VolSDF’s hybrid volume-surface representation [Yariv et al. 2021].
Mip-NeRF 360 [Barron et al. 2022] extends the basic NeRF for- VolSDF models volumetric density of the scene as a parametric func-
mulation to reconstruct and render real-world “360 degree” scenes tion of an MLP-parameterized signed distance function (SDF) 𝑓
where cameras can observe unbounded scene content in all direc- that returns the signed distance 𝑓 (x) from each point x ∈ R3 to
tions. Two improvements introduced in mip-NeRF 360 are the use of the surface. Because our focus is reconstructing unbounded real-
a contraction function and a proposal MLP. The contraction function world scenes, we parameterize 𝑓 in contracted space (Equation 3)
maps unbounded scene points in R3 to a bounded domain: rather than world-space. The underlying surface of the scene is the
( zero-level set of 𝑓 , i.e., the set of points at distance zero from the
x ∥x∥ ≤ 1 surface:
contract(x) = 1

x , (3)
2 − ∥x ∥ ∥x ∥ ∥x∥ > 1 S = {x : 𝑓 (x) = 0} . (4)
Following VolSDF, we define the volume density 𝜏 as:
which produces contracted coordinates that are well-suited to be
positionally encoded as inputs to the MLP. Additionally, mip-NeRF 𝜏 (x) = 𝛼 Ψ𝛽 (𝑓 (x)) , (5)
360 showed that large unbounded scenes with detailed geometry where Ψ𝛽 is the cumulative distribution function of a zero-mean
require prohibitively large MLPs and many more samples along each Laplace distribution with scale parameter 𝛽 > 0. Note that as 𝛽
ray than is tractable in the original NeRF framework. Mip-NeRF 360 approaches 0, the volumetric density approaches a function that
therefore introduced a proposal MLP: a much smaller MLP that is returns 𝛼 inside any object and 0 in free space. To encourage 𝑓 to
trained to bound the density of the actual NeRF MLP. This proposal approximate a valid signed distance function (i.e. one where 𝑓 (x)
MLP is used in a hierarchical sampling procedure that efficiently returns the signed Euclidean distance to the level set of 𝑓 for all x),
generates a set of input samples for the NeRF MLP that are tightly we penalize the deviation of 𝑓 from satisfying the Eikonal equation
focused around non-empty content in the scene. [Gropp et al. 2020]:
LSDF = Ex (∥∇𝑓 (x)∥ − 1) 2 .

4 METHOD (6)
Our method is composed of three stages, which are visualized in Note that as 𝑓 is defined in contracted space, this constraint also
Figure 2. First we optimize a surface-based representation of the operates on contracted space.
Mesh Diffuse color Specular Full appearance
Fig. 3. Our method produces an accurate mesh and decomposes appearance into diffuse and specular color.
Recently, Ref-NeRF [Verbin et al. 2022] improved view-dependent the training pixel color. We then splat any sample with a sufficiently
appearance by parameterizing it as a function of the view direction large rendering weight (> 0.005) into the 3D grid and mark the
reflected about the surface normal. Our use of an SDF-parameterized corresponding cell as a candidate for surface extraction.
density allows this to be easily adopted as SDFs have well-defined We sample our SDF grid at evenly spaced coordinates in the
surface normals: n(x) = ∇𝑓 (x)/∥∇𝑓 (x)∥. Therefore, when training contracted space, which yields unevenly spaced non-axis-aligned
this stage of our model we adopt Ref-NeRF’s appearance model coordinates in world space. This has the desirable property of creat-
and compute color using separate diffuse and specular components, ing smaller triangles (in world space) for foreground content close
where the specular component is parameterized by the concate- to the origin and larger triangles for distant content. Effectively,
nation of the view direction reflected about the normal direction, we leverage the contraction operator as a level-of-detail strategy:
the dot product between the normal and view direction, and a 256 because our desired rendered views are close to the scene origin,
element bottleneck vector output by the MLP that parametrizes 𝑓 . and because the shape of the contraction is designed to undo the ef-
We use a variant of mip-NeRF 360 as our model (see Appendix A fects of perspective projection, all triangles will have approximately
in supplementary material for specific training details). Similarly equal areas when projected onto the image plane.
to VolSDF [Yariv et al. 2021], we parameterize the density scale After extracting the triangle mesh, we use a region growing pro-
factor as 𝛼 = 𝛽 −1 in Equation 5. However, we find that scheduling cedure to fill small holes that might exist in regions that were either
𝛽 rather than leaving it as a free optimizable parameter results unobserved by input viewpoints or missed by the proposal MLP
in more stable training. We therefore anneal 𝛽 according to 𝛽𝑡 = during the baking procedure. We iteratively mark voxels in a neigh-
−1
borhood around the current mesh and extract any surface crossings

𝛽 −𝛽
𝛽 0 1 + 0𝛽 1 𝑡 0.8 , where 𝑡 goes from 0 to 1 during training, 𝛽 0 =
1 that exist in these newly active voxels. This effectively remedies sit-
0.1, and 𝛽 1 for the three hierarchical sampling stages is 0.015, 0.003,
uations where a surface exists in the SDF MLP but was not extracted
and 0.001 respectively. Because the Eikonal regularization needed
by marching cubes due to insufficient training view coverage or
for an SDF parameterization of density already removes floaters
errors in the proposal MLP. We then transform the mesh into world
and results in well-behaved normals, we do not find it necessary to
space so it is ready for rasterization by a conventional rendering
use the orientation loss or predicted normals from Ref-NeRF, or the
engine that operates in Euclidean space. Finally, we post-process
distortion loss from mip-NeRF 360.
the mesh using vertex order optimization [Sander et al. 2007], which
speeds up rendering performance on modern hardware by allowing
4.2 Baking a high-resolution mesh
vertex shader outputs to be cached and reused between neighbor-
After optimizing our neural volumetric representation, we cre- ing triangles. In Appendix B we detail additional steps for mesh
ate a triangle mesh from the recovered MLP-parameterized SDF extraction which do not strictly improve reconstruction accuracy,
by querying it on a regular 3D grid and then running Marching but enable a more pleasing interactive viewing experience.
Cubes [Lorensen and Cline 1987]. Note that VolSDF models bound-
aries using a density fall-off that extends beyond the SDF zero cross-
ing (parameterized by 𝛽). We account for this spread when extracting
the mesh and choose 0.001 as the iso-value for surface crossings, as 4.3 Modeling view-dependent appearance
otherwise we find the scene geometry to be slightly eroded. The baking procedure described above extracts high-quality triangle
When running Marching Cubes, the MLP-parameterized SDF mesh geometry from our MLP-based scene representation. To model
may contain spurious surface crossings in regions that are occluded the scene’s appearance, including view-dependent effects such as
from the observed viewpoints as well as regions that the proposal specularities, we equip each mesh vertex with a diffuse color c𝑑
MLP marks as “free space”. The SDF MLP’s values in both of these and a set of spherical Gaussian lobes. As far-away regions are only
types of regions are not supervised during training, so we must cull observed from a limited set of view directions, we do not need to
any surface crossings that would show up as spurious content in model view dependence with the same fidelity everywhere in the
the reconstructed mesh. To address this, we inspect the 3D samples scene. In our experiments, we use three spherical Gaussian lobes in
taken along the rays in our training data. We compute the volumetric the central regions (∥x∥ ≤ 1) and one lobe in the periphery. Figure 3
rendering weight for each sample, i.e., how much it contributes to demonstrates our appearance decomposition.
bicycle
treehill
flowerbed
Ground truth Ours Mobile-NeRF Deep Blending

Fig. 4. Test-set renderings (with insets) for our model and the two state-of-the-art real-time baselines we evaluate against, using scenes from the mip-NeRF 360
dataset. Deep Blending [Hedman et al. 2018] produces posterized renderings when the proxy geometry used as input is incorrect (such as in the background
of the bicycle scene) and renderings from MobileNeRF [Chen et al. 2022a] tend to exhibit aliasing artifacts or oversmoothing.
This appearance representation satisfies our efficiency goal for well-modeled by mesh geometry (e.g. pixels at soft object bound-
both compute and memory and can thus be rendered in real-time. aries and semi-transparent objects), instead of the L2 loss that was
Each spherical Gaussian lobe has seven parameters: a 3D unit vector minimized by VolSDF we use a robust loss 𝜌 (·, 𝛼, 𝑐) with hyperpa-
𝜇 for the lobe mean, a 3D vector c for the lobe color, and a scalar 𝜆 rameters 𝛼 = 0, 𝑐 = 1/5 during training, which allows optimization
for the width of the lobe. These lobes are parameterized by the view to be more robust to outliers [Barron 2019]. We also model quantiza-
direction vector d, so the rendered color C for a ray intersecting tion with a straight-through estimator [Bengio et al. 2013], ensuring
any given vertex can be computed as: that the optimized values for view-dependent appearance are well
represented by 8 bits of precision.
𝑁
∑︁ We find that directly optimizing this per-vertex representation
C = c𝑑 + c𝑖 exp (𝜆𝑖 (𝜇𝑖 · d − 1)) . (7) saturates GPU memory, which prevents us from scaling up to high-
𝑖=1 resolution meshes. We instead optimize a compressed hash-grid
representation based on Instant NGP [Müller et al. 2022] (see Ap-
To optimize this representation, we first rasterize the mesh into
pendix A in supplemental material). During optimization, we query
all training views and store the vertex indices and barycentric coor-
this representation at each 3D vertex location within a training batch
dinates associated with each pixel. After this preprocessing, we
to produce our diffuse colors and spherical Gaussian parameters.
can easily render a pixel by applying barycentric interpolation
After optimization is complete, we bake out the compressed scene
to the learned per-vertex parameters and then running our view-
representation contained in the hash grids by querying the NGP
dependent appearance model (simulating the operation of a frag-
model at each vertex location for the appearance-related parameters.
ment shader). We can therefore optimize the per-vertex parameters
Finally, we export the resulting mesh and per-vertex appearance
by minimizing a per-pixel color loss as in Equation 2. As detailed
parameters using the gLTF format [ISO/IEC 12113:2022 2022] and
in Appendix B, we also optimize for a background clear color to
compress it with gzip, a format natively supported by web protocols.
provide a more pleasing experience with the interactive viewer. To
prevent that optimization from being biased by pixels that are not
Outdoor Scenes Indoor Scenes

PSNR ↑ SSIM ↑ LPIPS ↓ PSNR ↑ SSIM ↑ LPIPS ↓
NeRF [Mildenhall et al. 2020] 21.46 0.458 0.515 26.84 0.790 0.370
NeRF++ [Zhang et al. 2020] 22.76 0.548 0.427 28.05 0.836 0.309
offline
Stable View Synthesis [Riegler and Koltun 2021] 23.01 0.662 0.253 28.22 0.907 0.160
Mip-NeRF 360 [Barron et al. 2022] 24.47 0.691 0.283 31.72 0.917 0.180
Instant-NGP [Müller et al. 2022] 22.90 0.566 0.371 29.15 0.880 0.216
Ours (offline) 23.40 0.619 0.379 30.21 0.888 0.243
real-time
Deep Blending [Hedman et al. 2018] 21.54 0.524 0.364 26.40 0.844 0.261
Mobile-NeRF [Chen et al. 2022a] 21.95 0.470 0.470 − − −
Ours (real-time) 22.47 0.585 0.349 27.06 0.836 0.258
Table 1. Quantitative results of our model on the “outdoor” and “indoor” scenes from mip-NeRF 360 [Barron et al. 2022], with evaluation split for “offline” and
“real-time” algorithms. Red, orange, and yellow indicate the first, second, and third best performing algorithms for each metric. Metrics not provided by a
baseline are denoted with “−”.
5 EXPERIMENTS more on-disk storage than MobileNeRF (1.27×) and Instant NGP
We evaluate our method’s performance both in terms of the accuracy (4.07×), we see that our model is significantly more efficient than
of its output renderings and in terms of its speed, energy, and mem- both baselines — our model yields FPS/Watt metrics that are 1.44×
ory requirements. For accuracy, we test two versions of our model: and 77× greater respectively, in addition to producing higher quality
the intermediate volume rendering results described in Section 4.1, renderings.
which we refer to as our “offline” model, and the baked real-time
W ↓ FPS ↑ FPS/W ↑ MB (disk) ↓
model described in Sections 4.2 and 4.3, which we call the “real-
Instant-NGP [Müller et al. 2022] 350 3.78 0.011 106.8
time” model. As baselines we use prior offline models [Barron et al. Mobile-NeRF [Chen et al. 2022a] 85 50.06 0.589 341.9
2022; Mildenhall et al. 2020; Müller et al. 2022; Riegler and Koltun Ours 85 72.21 0.850 434.5
2021; Zhang et al. 2020] designed for fidelity, as well as with prior Table 2. The performance (Watts consumed, frames per second, and their
real-time methods [Chen et al. 2022a; Hedman et al. 2018] designed ratio) and storage requirements for our real-time method and two baselines.
for performance. We additionally compare our method’s recovered FPS is measured when rendering at 1920 × 1080 resolution.
meshes with those extracted by COLMAP [Schönberger et al. 2016],
mip-NeRF 360 [Barron et al. 2022], and MobileNeRF [Chen et al.
2022a]. All FPS (frames-per-second) measurements are for rendering Our significantly improved performance relative to MobileNeRF
at 1920 × 1080 resolution. may seem unusual at first glance, as both our approach and Mobile-
NeRF both yield optimized meshes that can be easily and quickly
5.1 Real-time rendering of unbounded scenes rasterized. This discrepancy is likely due to MobileNeRF’s reliance
We evaluate our method on the dataset of real-world scenes from on alpha masking (which results in a significant amount of compute-
mip-NeRF 360 [Barron et al. 2022], which contains complicated intensive overdraw) and MobileNeRF’s use of an MLP to model view-
indoor and outdoor scenes captured from all viewing angles. In dependent radiance (which requires significantly more compute to
Table 1 we present a quantitative evaluation of both the offline and evaluate than our spherical Gaussian approach).
real-time versions of our model against our baselines. Though our Compared to Deep Blending [Hedman et al. 2018], we see from
offline model is outperformed by some prior works (as we might Table 1 that our method achieves higher quality. However, it is also
expect, given that our focus is performance) our real-time method worth noting that our representation is also much simpler: while
outperforms the two recent state-of-the-art real-time baselines we our meshes can be rendered in a browser, Deep Blending relies on
evaluate again across all three error metrics used by this benchmark. carefully tuned CUDA rendering and must store both color and
In Figure 4 we show a qualitative comparison of renderings from geometry for all training images in the scene. As a result, total
our model and these two state-of-the-art real-time baselines, and storage cost for Deep Blending in the outdoor scenes is 2.66× higher
we observe that our approach exhibits significantly more detail and (1154.78 MB on average) than for our corresponding meshes.
fewer artifacts than prior work.
In Table 2 we evaluate our method’s rendering performance by 5.2 Mesh extraction
comparing against Instant-NGP (the fastest “offline” model we evalu- In Figure 5 we present a qualitative comparison of our mesh with
ate against) and MobileNeRF (the real-time model that produces the those obtained using COLMAP [Schönberger et al. 2016], Mobile-
highest quality renderings after our own). We measure performance NeRF [Chen et al. 2022a] and an iso-surface of Mip-NeRF 360 [Barron
of all methods at 1920 × 1080. Both MobileNeRF and our method are et al. 2022]. We evaluate against COLMAP not only because it rep-
running in-browser on a 16" Macbook Pro with a Radeon 5500M resents a mature structure-from-motion software package, but also
GPU while Instant NGP is running on a workstation equipped with because the geometry produced by COLMAP is used as input by Sta-
a power NVIDIA RTX 3090 GPU. Though our approach requires ble View Synthesis and Deep Blending. COLMAP uses volumetric
COLMAP MobileNeRF Mip-NeRF360 Ours et al. 2021] and MobileNeRF [Chen et al. 2022a] significantly re-
duces rendering quality and yields error metrics that are roughly
comparable to the “1 Spherical Gaussian” ablation. This is especially
counter-intuitive given the significant cost of evaluating a small
MLP (∼ 2070 FLOPS per pixel) compared to a single spherical Gauss-
ian (21 FLOPS per pixel). Additionally, we ablate the robust loss
used to train our appearance representation with a simple L2 loss,
which unsurprisingly boosts PSNR (which is inversely proportional
to MSE) at the expense of the other metrics.
PSNR ↑ SSIM ↑ LPIPS ↓ MB (GPU) ↓

Diffuse (0 Spherical Gaussians) 22.32 0.636 0.352 436.1
1 Spherical Gaussian 24.02 0.680 0.322 549.1
Fig. 5. Comparing the meshes produced by our technique with baselines 2 Spherical Gaussian 24.39 0.693 0.312 662.2
that yield meshes. Our meshes are higher in quality compared to those 3 SGs in the periphery 24.34 0.688 0.317 775.3
of COLMAP, MobileNeRF, and Mip-NeRF 360. COLMAP’s mesh contains View-dependent MLP [2021] 24.30 0.687 0.318 516.8
noise, floaters, and irregular object boundaries, MobileNeRF’s mesh is a L2 loss 24.52 0.690 0.316 572.6
“polygon soup” that does not accurately represent scene geometry, and iso- Ours 24.51 0.697 0.309 572.6
surfaces from Mip-NeRF 360’s density field tend to be noisy and represent Table 3. An ablation study of our view-dependent appearance model on all
reflections with inaccurate geometry. scenes from the mip-NeRF 360 dataset.
graph cuts on a tetrahedralization of the scene [Jancosek and Pajdla 5.4 Limitations
2011; Labatut et al. 2007] to obtain a binary segmentation of the Although our model achieves state-of-the-art speed and accuracy
scene and then forms a triangle mesh as the surface between these for the established task of real-time rendering of unbounded scenes,
regions. Because this binary segmentation does not allow for any there are several limitations that represent opportunities for future
averaging of the surface, small noise in the initial reconstruction improvement: We represent the scene using a fully opaque mesh
tends to result in noisy reconstructed meshes, which results in a representation, and as such our model may struggle to represent
“bumpy” appearance. MobileNeRF represents the scene as a discon- semi-transparent content (glass, fog, etc.). And as is common for
nected collection of triangles, as its sole focus is view synthesis. As a mesh-based approaches, our model sometimes fails to accurately
result, its optimized and pruned “triangle soup” is highly noisy and represent areas with small or detailed geometry (dense foliage, thin
cannot be used for downstream tasks such as appearance editing. structures, etc.). These concerns could perhaps be addressed by aug-
As recently shown [Oechsle et al. 2021; Wang et al. 2021; Yariv menting the mesh with opacity values, but allowing for continuous
et al. 2021], extracting an iso-surface directly from the density field opacity would require a complex polygon sorting procedure that is
predicted by NeRF can sometimes fail to faithfully capture the ge- difficult to integrate into a real-time rasterization pipeline. One addi-
ometry of the scene. In Figure 5 we show this effect using Mip-NeRF tional limitation of our technique is that our model’s output meshes
360 and extract the iso-surface where its density field exceeds a occupy a significant amount of on-disk space (∼ 430 megabytes per
value of 50. Note how the surface of the table is no longer flat, as scene), which may prove challenging to store or stream for some
the reflection of the vase is modeled using mirror-world geometry. applications. This could be ameliorated through mesh simplification
In contrast, our method produces a smooth and high-fidelity mesh, followed by UV atlasing. However, we found that existing tools for
which is better suited for appearance and illumination editing, as simplification and atlasing, which are mostly designed for artist-
demonstrated in Figure 1. made 3D assets, did not work well for our meshes extracted by
marching cubes.
5.3 Appearance model ablation
In Table 3 we present the results of an ablation study of our spherical 6 CONCLUSION
Gaussian appearance model. We see that reducing the number of We have presented a system that produces a high-quality mesh for
SGs to 2, 1, and 0 (i.e., a diffuse model) causes accuracy to degrade real-time rendering of large unbounded real world scenes. Our tech-
monotonically. However, when using 3 SGs in the periphery our nique first optimizes a hybrid neural volume-surface representation
model tends to overfit to the training views, causing a slight drop of the scene that is designed for accurate surface reconstruction.
in quality compared to our proposed model with just a single pe- From this hybrid representation, we extract a triangle mesh whose
ripheral SG. Furthermore, compared to 3 SGs everywhere, using a vertices contain an efficient representation of view-dependent ap-
single SG in the periphery reduces the average size vertex by 1.52× pearance, then optimize this meshed representation to best repro-
(from 36 to 23.76 bytes), which significantly reduces the memory duce the captured input images. This results in a mesh that yields
bandwidth consumption (a major performance bottleneck for ren- state-of-the-art results for real-time view synthesis in terms of both
dering). Perhaps surprisingly, replacing our SG appearance model speed and in accuracy, and is of a high enough quality to enable
with the small view-dependent MLP used by both SNeRG [Hedman downstream applications.
ACKNOWLEDGMENTS Marc Levoy and Pat Hanrahan. 1996. Light field rendering. SIGGRAPH (1996).
Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann,
We would like to thank Forrester Cole and Srinivas Kaza for their and Yaser Sheikh. 2019. Neural Volumes: Learning Dynamic Renderable Volumes
implementation of the JAX rasterizer, Simon Rodriguez as an in- from Images. SIGGRAPH (2019).
W. E. Lorensen and H. E. Cline. 1987. Marching cubes: A high resolution 3D surface
valuable source of knowledge for real-time graphics programming, construction algorithm. SIGGRAPH (1987).
and Marcos Seefelder for brainstorming the real-time renderer. We Nelson Max. 1995. Optical models for direct volume rendering. IEEE TVCG (1995).
further thank Thomas Müller for his valuable advice on tuning Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ra-
mamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields
Instant-NGP for the Mip-NeRF 360 dataset, and Zhiqin Chen for for View Synthesis. ECCV (2020).
generously sharing with us the MobileNeRF evaluations. Lastly, we Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. 2022. Instant
thank Keunhong Park for thoughtful review of our manuscript. neural graphics primitives with a multiresolution hash encoding. SIGGRAPH (2022).
Michael Oechsle, Songyou Peng, and Andreas Geiger. 2021. UNISURF: Unifying Neural
Implicit Surfaces and Radiance Fields for Multi-View Reconstruction. ICCV (2021).
REFERENCES Eric Penner and Li Zhang. 2017. Soft 3D Reconstruction for View Synthesis. SIGGRAPH
Asia (2017).
Jonathan T. Barron. 2019. A General and Adaptive Robust Loss Function. CVPR (2019). Christian Reiser, Songyou Peng, Yiyi Liao, and Andreas Geiger. 2021. KiloNeRF: Speed-
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. ing up neural radiance fields with thousands of tiny MLPs. ICCV (2021).
2022. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. CVPR (2022). Gernot Riegler and Vladlen Koltun. 2020. Free View Synthesis. ECCV (2020).
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating Gernot Riegler and Vladlen Koltun. 2021. Stable view synthesis. CVPR (2021).
gradients through stochastic neurons for conditional computation. arXiv preprint Darius Rückert, Linus Franke, and Marc Stamminger. 2022. ADOP: Approximate
arXiv:1308.3432 (2013). differentiable one-pixel point rendering. SIGGRAPH (2022).
Mark Boss, Raphael Braun, Varun Jampani, Jonathan T. Barron, Ce Liu, and Hendrik P. A. Pedro V. Sander, Diego Nehab, and Joshua Barczak. 2007. Fast Triangle Reordering for
Lensch. 2021. NeRD: Neural Reflectance Decomposition from Image Collections. Vertex Locality and Reduced Overdraw. SIGGRAPH (2007).
ICCV (2021). Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm.
Mark Boss, Andreas Engelhardt, Abhishek Kar, Yuanzhen Li, Deqing Sun, Jonathan T. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. ECCV (2016).
Barron, Hendrik P.A. Lensch, and Varun Jampani. 2022. SAMURAI: Shape And Pratul P. Srinivasan, Boyang Deng, Xiuming Zhang, Matthew Tancik, Ben Mildenhall,
Material from Unconstrained Real-world Arbitrary Image collections. NeurIPS and Jonathan T. Barron. 2021. NeRV: Neural reflectance and visibility fields for
(2022). relighting and view synthesis. CVPR (2021).
Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. Pratul P. Srinivasan, Richard Tucker Joand nathan T. Barron, Ravi Ramamoorthi, Ren
2001. Unstructured Lumigraph Rendering. SIGGRAPH (2001). Ng, and Noah Snavely. 2019. Pushing the Boundaries of View Extrapolation with
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. 2022b. TensoRF: Multiplane Images. CVPR (2019).
Tensorial Radiance Fields. ECCV (2022). Cheng Sun, Min Sun, and Hwann-Tzong Chen. 2022. Direct voxel grid optimization:
Zhiqin Chen, Thomas Funkhouser, Peter Hedman, and Andrea Tagliasacchi. 2022a. Super-fast convergence for radiance fields reconstruction. CVPR (2022).
MobileNeRF: Exploiting the polygon rasterization pipeline for efficient neural field Richard Szeliski and Polina Golland. 1999. Stereo Matching with Transparency and
rendering on mobile architectures. arXiv:2208.00277 (2022). Matting. IJCV (1999).
Paul E. Debevec, Camillo J. Taylor, and Jitendra Malik. 1996. Modeling and Rendering Ayush Tewari, Justus Thies, Ben Mildenhall, Pratul Srinivasan, Edgar Tretschk, W Yifan,
Architecture from Photographs: A hybrid geometry- and image-based approach. Christoph Lassner, Vincent Sitzmann, Ricardo Martin-Brualla, Stephen Lombardi,
SIGGRAPH (1996). et al. 2022. Advances in neural rendering. Computer Graphics Forum (2022).
Yasutaka Furukawa and Carlos Hernández. 2015. Multi-View Stereo: A Tutorial. Foun- Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T Barron, and
dations and Trends in Computer Graphics and Vision (2015). Pratul P Srinivasan. 2022. Ref-NeRF: Structured view-dependent appearance for
Stephan J. Garbin, Marek Kowalski, Matthew Johnson, Jamie Shotton, and Julien neural radiance fields. CVPR (2022).
Valentin. 2021. FastNeRF: High-Fidelity Neural Rendering at 200FPS. ICCV (2021). G. Vogiatzis, C. Hernández, P. Torr, and R. Cipolla. 2007. Multi-View Stereo via Volu-
Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen. 1996. The metric Graph-Cuts and Occlusion Robust Photo-Consistency. IEEE TPAMI (2007).
lumigraph. SIGGRAPH (1996). Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping
Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, and Yaron Lipman. 2020. Implicit Wang. 2021. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for
Geometric Regularization for Learning Shapes. Proceedings of Machine Learning Multi-view Reconstruction. NeurIPS (2021).
and Systems (2020). Daniel N. Wood, Daniel I. Azuma, Ken Aldinger, Brian Curless, Tom Duchamp, David H.
Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Salesin, and Werner Stuetzle. 2000. Surface Light Fields for 3D Photography. SIG-
Gabriel Brostow. 2018. Deep blending for free-viewpoint image-based rendering. GRAPH (2000).
SIGGRAPH Asia (2018). Xiuchao Wu, Jiamin Xu, Zihan Zhu, Hujun Bao, Qixing Huang, James Tompkin, and
Peter Hedman, Pratul P. Srinivasan, Ben Mildenhall, Jonathan T. Barron, and Paul Weiwei Xu. 2022. Scalable Neural Indoor Scene Rendering. ACM TOG (2022).
Debevec. 2021. Baking Neural Radiance Fields for Real-Time View Synthesis. ICCV Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. 2021. Volume rendering of neural
(2021). implicit surfaces. NeurIPS (2021).
Dan Hendrycks and Kevin Gimpel. 2016. Gaussian Error Linear Units (GELUs). Alex Yu, Sara Fridovich-Keil, Matthew Tancik, Qinhong Chen, Benjamin Recht, and
arXiv:1606.08415 (2016). Angjoo Kanazawa. 2022. Plenoxels: Radiance fields without neural networks. CVPR
ISO/IEC 12113:2022 2022. Information technology — Runtime 3D asset delivery format — (2022).
Khronos glTF 2.0. Standard. International Organization for Standardization. Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. 2021.
Michal Jancosek and Tomás Pajdla. 2011. Multi-view reconstruction preserving weakly- PlenOctrees for real-time rendering of neural radiance fields. ICCV (2021).
supported surfaces. (2011). Kai Zhang, Fujun Luan, Qianqian Wang, Kavita Bala, and Noah Snavely. 2021a. PhySG:
Animesh Karnewar, Tobias Ritschel, Oliver Wang, and Niloy Mitra. 2022. ReLU fields: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and
The little non-linearity that could. SIGGRAPH (2022). Relighting. CVPR (2021).
Michael Kazhdan, Matthew Bolitho, and Hugues Hoppe. 2006. Poisson Surface Recon- Kai Zhang, Gernot Riegler, Noah Snavely, and Vladlen Koltun. 2020. NeRF++: Analyzing
struction. Symposium on Geometry Processing (2006). and Improving Neural Radiance Fields. arXiv:2010.07492 (2020).
Michael Kazhdan and Hugues Hoppe. 2013. Screened Poisson Surface Reconstruction. Xiuming Zhang, Pratul P. Srinivasan, Boyang Deng, Paul Debevec, William T. Freeman,
ACM TOG (2013). and Jonathan T. Barron. 2021b. NeRFactor: Neural Factorization of Shape and
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. Reflectance Under an Unknown Illumination. SIGGRAPH Asia (2021).
ICLR (2015). Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018.
Georgios Kopanas, Julien Philip, Thomas Leimkühler, and George Drettakis. 2021. Point- Stereo Magnification: Learning View Synthesis Using Multiplane Images. SIGGRAPH
Based Neural Rendering with Per-View Optimization. Computer Graphics Forum (2018).
(2021).
Zhengfei Kuang, Kyle Olszewski, Menglei Chai, Zeng Huang, Panos Achlioptas, and
Sergey Tulyakov. 2022. NeROIC: Neural Rendering of Objects from Online Image
Collections. SIGGRAPH (2022).
Patrick Labatut, Jean-Philippe Pons, and Renaud Keriven. 2007. Efficient multi-view
reconstruction of large-scale scenes using interest points, Delaunay triangulation
and graph cuts. ICCV (2007).
A TRAINING AND OPTIMIZATION DETAILS a variety of devices. For our comparisons we modify it to run at
SDF model definition and optimization. As stated in Section 4.1, different resolutions. When we compute image quality metrics, we
we model our SDF using a variant of mip-NeRF 360. We train our choose the resolution of the test set images. Furthermore, when we
model using the same optimization settings as mip-NeRF 360 (250k measure run-time performance we use a 1920 × 1080, which is a
iterations of Adam [Kingma and Ba 2015] with a batch size of 214 resolution that is representative for most modern displays.
and a learning rate that is warm-started and then log-linearly inter- Instant NGP. Table 1 reports quality results for Instant NGP [Müller
polated from 2 · 10−3 to 2 · 10−5 , with 𝛽 1 = 0.9, 𝛽 2 = 0.999, 𝜖 = 10−6 ) et al. 2022] method, where we carefully adapt it to work on un-
and similar MLP architectures (a proposal MLP with 4 layers and 256 bounded large scenes. We asked the authors of Instant NGP for help
hidden units, and a NeRF MLP with 8 layers and 1024 hidden units, with tuning their method and made the following changes:
both using swish/SiLU rectifiers [Hendrycks and Gimpel 2016] and
• We use big.json configuration file provided with the official
8 scales of positional encoding). Following the hierarchical sampling
code release,
procedure of mip-NeRF 360, we perform two resampling stages us-
• we increased the batch size by 4× to 220 , and
ing 64 samples evaluated using the proposal MLP, and then one
• we increased the scene scale from 16 to 32.
evaluation stage using 32 samples of the NeRF MLP. The proposal
MLP is optimized by minimizing Lprop + 0.1LSDF where Lprop is Note that none of these changes has a significant impact on the
the proposal loss described in [Barron et al. 2022], designed to bound render time for Instant NGP.
the weights output by the NeRF MLP density. By default, the Instant NGP viewer is equipped with a dynamic
upscaling implementation, which renders images at a lower resolu-
Optimizing for per-vertex attributes via a compressed hash grid. As tion and then applies smart upscaling. For a fair comparison we turn
stated Section 4.3, during optimization we use Instant NGP [Müller this off when measuring perfomance, as these dynamic upscalers
et al. 2022] as the underlying representation for our vertex attributes. can be applied to any renderer. More importantly, we want the per-
We use the following hyperparameters: L = 18, T = 221 and Nmax = fomance numbers to correspond with the test set quality metrics,
8192. We remove the view-direction input from the NGP model, as and none of the test-set images were computed using upscaling.
we incorporate it later in Equation 7. We use a weight decay of 0.1
for the hash grids but not the MLP, optimize using Adam [Kingma
and Ba 2015] for 150k iterations with a batch size of 214 and an initial
learning rate of 0.001 that we drop by 10× every 50k iterations.
B TWEAKS FOR A COMPELLING VIEWER

Here we detail a few tweaks to the pipeline which do not strictly
improve reconstruction accuracy, but rather makes for a more com-
pelling viewing experience. With this in mind, we found it important
to alleviate jarring transitions between the reconstructed scene con-
tent and the background color. To this end, we also include a global
clear color into the appearance parameters we optimize for in Sec-
tion 4.3. That is, we assign this color to any pixel in the training
data which does not have a valid triangle index.
To further mask the transition between geometry and background,
we enclose SDF with bounding geometry before extracting the mesh
in Section 4.2. We compute a convex hull computed as the inter-
section of 32 randomly oriented planes, where the location of each
plane has been set to bound 99.75% of the voxels that have marked
as candidates for surface extraction. We then further make this hull
conservative by inflating it by a slight margin of ×1.025. However,
since the extracted mesh needs to be transformed into world space
for rendering, we must take care to avoid numerical precision issues
that may arise from using unbounded vertex coordinates during
rasterization. We solve this by bounding the scene with a distant
sphere with a radius of 500 world-space units. These two operations
are easily implemented by setting the SDF value in each grid cell
to the pointwise minimum of the MLP-parameterized SDF and the
SDF of the defined bounding geometry.
C BASELINES DETAILS
MobileNeRF viewer configuration. Note that by default the Mobile-
NeRF viewer runs at a reduced resolution for high-framerates across

BakedSDF - Meshing Neural SDFs For Real-Time View Synthesis

Uploaded by

Copyright:

Available Formats

BakedSDF - Meshing Neural SDFs For Real-Time View Synthesis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BakedSDF - Meshing Neural SDFs For Real-Time View Synthesis

Uploaded by

Copyright:

Available Formats

BakedSDF: Meshing Neural SDFs for Real-Time View Synthesis

(d) Appearance editing (e) Physics simulation

We present a method for reconstructing high-quality meshes of large un- 1 INTRODUCTION

3 PRELIMINARIES Surface reconstruction Baking a high-resolution Modeling appearance

Mesh Diffuse color Specular Full appearance

Ground truth Ours Mobile-NeRF Deep Blending

Outdoor Scenes Indoor Scenes

PSNR ↑ SSIM ↑ LPIPS ↓ MB (GPU) ↓

B TWEAKS FOR A COMPELLING VIEWER

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.