0% found this document useful (0 votes)
68 views

Accelerating Marching Cubes With Graphics Hardware

Marching Cubes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Accelerating Marching Cubes With Graphics Hardware

Marching Cubes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Accelerating Marching Cubes with Graphics Hardware

Gunnar Johansson

Hamish Carr

Linkoping University

University College Dublin

Abstract
For large data sets in medicine and science, efficient isosurface extraction and rendering is crucial for interactive visualization. Previous GPU
acceleration techniques have been restricted to
tetrahedral meshes. We generalize this work
to arbitrary meshes by caching local topology
on the video card to reduce both CPU load and
bandwidth consumption, demonstrating our results with the Marching Cubes cases. We also
present improvements to span space techniques
that pre-classify the rangs over which individual cases are used in a given cube. Our results
indicate that speedups in excess of tenfold are
feasible, compared with speedups of less than
twofold demonstrated in previous papers.

Introduction

Applications in medical imaging and scientific


simulations often generate large amounts of
volumetric data, which are commonly visualized using isosurfaces. The standard isosurface extraction method is Marching Cubes [8],
which marches through each cube in the grid
in turn, generating a triangulated approximation of the isosurface. Algorithmic improvements and optimization techniques have been
proposed to reduce the runtime of this algorithm. In this paper, we contribute improvements with respect to both aspects: algorithmic improvements to span space acceleration,
c 2006 Gunnar Johansson and Hamish
Copyright
Carr. Permission to copy is hereby granted provided the
original copyright notice is reproduced in copies made.

and optimizations based on GPUs. Algorithmically, we improve span space performance with
a time-space trade-off that stores the topology
of the isosurface in each cell in the span space
structure, avoiding computations at runtime.
Our main contribution is to improve existing
GPU-based isosurface acceleration by caching
the topology on the video card and using a vertex program to perform geometric interpolation. Previous work [10, 6] demonstrated that
acceleration by 30% was feasible for tetrahedral
data. Reck et al. [11] further accelerated this
method to approximately 850% using an interval tree. However, for cubic data, tetrahedral
subdivision generates surfaces with pronounced
artifacts [2]. Goetz et al. [5] therefore accelerated Marching Cubes on the GPU, but without
span space techniques or the correct normals.
By comparison, we generate isosurfaces that
are more accurate than Goetz et al. [5], and
whose acceleration is at least as good as Pascucci [10] but superior in quality and speed
when tetrahedral subdivision is accounted for.
Moreover, combining the GPU acceleration
with the span space improvements results in
acceleration of as much as 1300%. Our method
is also applicable to higher-order isosurface interpolation, while further acceleration is anticipated in the next generation of GPUs.
Section 2 will describe the data and outline
the GPU pipeline. Section 3 presents the previous work on Marching Cubes and isosurface
acceleration. Section 4 then presents the contributions of this paper. Finally, results, conclusions and future work are presented in Sections 5, 6 and 7.

Background

For a 3D scalar field f : IR3 IR, an isosurface is the inverse image f 1 (h) of a particular isovalue h. In practice f is assumed to be
reconstructed from discrete samples by dividing the domain of the function into polyhedral
cells, often cubes or tetrahedra, with simple interpolants. Isosurfaces are then visualized by
generating an approximation of the mathematically precise surface, usually by constructing a
triangulated approximation in each cell [8].
Since many problems in graphics are decomposable into smaller problems for later
compositing, modern GPUs use deep parallel
pipelines. The input to the pipeline is a stream
of vertices arriving at the vertex processor, followed by rasterization to fragments for final
computations and compositing.
Vertex programs run in the vertex processor(s) and allow the programmer to control the
properties of each vertex individually. However, the parallel nature of the processor prohibits access to properties of other vertices, and
prevents the creation or deletion of any vertices. Although triangles of zero area can be
deleted later in the pipeline, the inability to
create new vertices is a major limitation in migrating isosurface extraction to the GPU.
Fragment programs run in the fragment
processor(s), and allow the programmer to control properties of individual pixels. As with the
vertex processor, access to other fragments is
prohibited, and the fragments position in the
image cannot be changed.

Previous Work

The most important contribution to isosurface


extraction is Marching Cubes [8]. We start by
describing this algorithm, then discuss algorithmic methods of accelerating the task and
optimization and parallelization methods that
reduce the runtime with a constant factor.

3.1

Marching Cubes

Marching Cubes [8] exploits the decomposability of isosurface extraction. Each corner in a
cubic cell is classified as above (black ) or below
(white) a given isovalue h. Since each cube has

8 corners, there are exactly 28 = 256 different


ways in which the vertices can be classified. By
applying arguments of symmetry, rotation and
complementarity, these 256 cases reduce to 15
(Figure 1). For each case, the normals and the
vertices are generated using central differencing
and linear interpolation along each edge.
The original algorithm was later found to
generate holes in the surface under certain conditions [4]. A number of solutions were presented to this problem, the simplest of which,
proposed by Montani et al [9], added complementary cases to the original 15 basic cases to
resolve the problem.
An alternate solution subdivides each cube
into tetrahedras, then applies the cases shown
in (Figure 2), usually referred to as marching
tetrahedra. Although convenient for its simplicity, this approach requires more cells to be
processed, increasing both computations and
storage. Furthermore, it has been shown that
this subdivision introduces artifacts [2].

0: Empty

1: Triangle

2: Quad

Figure 2: Marching Tetrahedra Cases


For a data set of N samples and an isosurface of k triangles, Marching Cubes is O(N ) in
time complexity. Since N may be well in excess of 10, 000, 000, and k << N , algorithmic
improvements have been devised to reduce the
time complexity to O(logN + k), and by optimization and parallelization that reduce the
constant factor.

3.2

Accelerating Algorithmically

The span space [7] is a representation of the


maximum and minimum values spanned by
each cell in a dataset. All cells with a minimum lower than, and a maximum higher than
the current isovalue must be intersected by the
surface. This is represented in a two dimensional space as depicted in Figure 3. Each cell
is represented by a point in this space, and the
intersected cells are found by looking in the upper left corner restricted by the isovalue.

7
black: corner value
is above isovalue
white: corner value
is below isovalue

7C

6C

10

5C

11

4C

12

13

14

3C

2C

1C

0C

current
isovalue

maximum

Figure 1: Marching Cubes Cases [9]

current
isovalue

minimum

Figure 3: The Span Space

The span space is often implemented using a kd-tree [1, 7], which uses O(N ) storage
and O(N
log N ) construction time for searches
in O( N + k) time. Other implementations
use the interval tree [3] for O(N ) storage and
O(N log N ) construction time, with a search
cost of O(log N + k) time, at the expense of
greater memory requirements than the k-d tree.
These methods discard cubes not intersected
by the surface, but still use Marching Cubes or
one of its derivatives to compute the triangles
to be rendered. Accordingly, improvements to
triangle extraction are broadly applicable.

3.3

GPU Acceleration

Recent advances in GPU programmability have


made it possible to shift parts of the isosurface
extraction from the CPU to the GPU. Pascucci
[10] showed how to accelerate marching tetrahedra by noting that each cell can be intersected by at most two triangles. For each cell,
four vertices were sent to the GPU, adding the

data values as parameters to each vertex, and


a vertex program classified each cell and interpolated the vertex position. Surplus triangles were suppressed by moving two or more
vertices to the same position. Although easily
implemented and elegant, this solution places
high demands on the video bus bandwidth.
Later work by Klein et al [6] improved this
by moving the computations to the more powerful fragment processor, requiring all data to
be encoded in textures. The output of the fragment program is fed back to the vertex processor to be rendered in the final image, leading
to a fairly complex implementation.
Using an approach similar to [10], Reck et
al [11] applied span space techniques with an
interval tree. This reduces both bandwidth demands and computations on the GPU since
only the cells intersected by the surface are
processed. They report a speed-up of approximately 9.5 times for the interval tree combined
with GPU interpolation, showing that the bottleneck in the extraction process is not the computations, but the data transfer.
The solutions described above are all restricted to tetrahedral meshes. Recently, Goetz
et al [5] have presented a GPU accelerated solution for Marching Cubes. They compute the
Marching Cubes case on the CPU and pass the
resulting geometry to the GPU, which interpolates the vertices in a vertex program. However, since they do not apply span space methods, their solution is limited by the high bandwidth demands. Moreover, their method does
not compute correct normals.

GPU Isosurfaces

In this section, we will describe our GPU accelerated solution. We start by describing the
caching cell topology technique which makes it
possible to apply complex topology such as the
Marching Cubes cases on the GPU. We then
present a pre-computation step for optimizing
the case classification on the CPU.

4.1

Caching Cell Topology

The solutions presented in Section 3.3 all send


the full geometry of each cell into the graphics
pipeline. Since the hardware used did not have
general GPU memory access, all vertex data
must be included in the vertex stream leading
to very high bandwidth demands.
We start with the idea that the GPU can
cache geometry very efficiently. In particular,
we store each possible cell configuration (the
Marching Cubes cases) in the graphics card, using for example display lists. After performing
the case classification on the CPU, we invoke
the corresponding case display list, instead of
sending the full geometry through the graphics bus. Each vertex is stored as two indices
(0-7), giving the cube vertices between which
to interpolate. A vertex program then adjusts
the position by interpolating these vertices to
get the correct position, using vertex textures
to locate the corresponding data values.
We expect this to accelerate rendering because the computationally expensive interpolation step is deferred to the GPU which has
hardware optimized for such calculation. Moreover, the bandwidth from GPU to VRAM is
higher than that from CPU to RAM. Finally,
this method is simple to implement, and works
for any cell topology and interpolant.

4.2

Pre-computing Cell Topology

Despite transferring most of the computation


to the GPU, we discovered that the computation was still CPU-bound, as the case classification step involves multiple branches and stalls
the pipeline. We reduced this cost by applying
span space techniques to both the cell selection
phase and the cell classification phase.
This approach is based on the fact that the
isovalue is used both for cell selection in the

interval tree and for cell classification. Instead


of performing redundant computation, we precompute the classification of each cell, storing
the isovalue range for each case as a separate
entry in the interval or k-d tree. For example, the cube in Figure 4 has 7 non-empty case
classifications, each of which is stored in the interval tree. For clarity, we refer to this modified
tree as the case interval tree or case k-d tree.
This transfers the classification cost to preprocessing instead of run-time extraction. The sole
disadvantage is that this increases the memory
footprint by a factor of 7 in the worst case.

240
23

150
43

isovalue

240
189
150
78
43

78
4

189
18

23
18
4

case 0
case 1
case 3
case 5
case 8
case 5C
case 2C
case 1C
case 0C

Figure 4: Cell Configurations in a Single Cell

Results

We tested on a Pentium 4 3.4GHz processor with 2GB of RAM, and an nVIDIA


GeForce 6800 GT with 256MB VRAM, using C++, OpenGL and the Cg shading language. We tested five of the datasets found
at http://www.volvis.org/, all 8 bit datasets
with increasing resolutions.
Initially, we only cached the cell topology, using a vertex program for interpolation and computation of normals. As we see in Table 1, our
results were disappointing, with acceleration
of 1.2 - 1.6 fold. With pre-computed normals
stored in the same texture as the data values,
this improved only slightly (Table 2). Since
the bottleneck here is the CPU cell selection,
this is unsurprising. Once the CPU bottleneck
was reduced with span space techniques, however, we achieved acceleration factors of as 6.9 12.9 compared to pure Marching Cubes, where
span space alone only achieved 2.3 - 3.3 for the
same data sets. Adding the case interval tree

Size
Isovalue
CPU
Marching Cubes
Kd-Tree
Case Kd-Tree
Interval Tree
Case Interval Tree
GPU
Marching Cubes
Kd-Tree
Case Kd-Tree
Interval Tree
Case Interval Tree

Fuel
64x64x64
10

Hydrogen atom
128x128x128
20

Engine
256x256x128
155

Aneurism
256x256x256
100

Skull
256x256x256
80

58
117 (2.0)
125 (2.2)
122 (2.1)
130 (2.2)

23
24
27
32

9.4
(2.5)
(2.5)
(2.9)
(3.4)

2.3
5.3 (2.3)
5.8 (2.6)
6.2 (2.7)
-

1.3
5.1 (3.8)
6.2 (4.6)
6.3 (4.7)
6.7 (5.0)

1.0
1.7 (1.7)
2.1 (2.1)
-

91
89
89
91
97

12
22
22
23
25

(1.3)
(2.4)
(2.4)
(2.5)
(2.7)

2.8
5.3
5.3
5.5

1.6
5.8
5.9
6.0
6.1

1.5 (1.5)
2 (2.0)
2.1 (2.1)
-

(1.6)
(1.5)
(1.5)
(1.6)
(1.7)

(1.2)
(2.3)
(2.4)
(2.4)
-

(1.2)
(4.4)
(4.4)
(4.5)
(4.6)

Table 1: Framerate (frames per second) without pre-computed normals. Numbers in


parentheses show ratio between the respective acceleration technique and the CPU implementation
of brute force Marching Cubes. Missing results indicate that our data structures would not fit in
main memory.
or case k-d tree generally added less than 10%
performance, however, indicating that we were
probably approaching a memory access bottleneck.

Conclusions

We have described an approach for accelerating isosurface extraction using graphics hardware, by storing the Marching Cubes cases on
the GPU and interpolating the vertices using a
vertex program. We have also extended the use
of the kd-tree and the interval tree to contain
pre-computed cases. This transfers the case
classification to a pre-processing stage on the
CPU, and completely removes the need for the
CPU to access the original dataset.
Our results demonstrate that the principal
bottleneck in isosurface extraction is in the
CPU rather than the GPU, and that with judicious algorithmic and pipeline optimization,
significant acceleration can be achieved for any
isosurface extraction kernel. We note that the
limiting factor on performance appeared to be
the vertex texture performance, and in particular the lack of efficient 3D vertex textures. We
expect that future hardware will improve on
this situation.

Future Work

In addition to wishing to repeat the work with


future hardware, we would like to investigate
different possibilities of moving the computations to the more powerful fragment processor
which has better support for various texture
formats. We would also like to optimize the
texture memory usage, allowing larger datasets
to be rendered. Finally, we would like to apply our approach to other, higher-order, interpolants.

About the Authors


Gunnar Johansson is currently completing
his M.Sc. in Media Technology at Linkoping
University in Sweden. This is a follow-up
from his year in Dublin at the Computational
Science programme at UCD.
Hamish Carr completed his Ph.D. at the
University of British Columbia in May, 2004
and was appointed as a Lecturer at University
College Dublin effective September, 2004. His
Ph.D. research involved computing a topological abstraction of scalar fields called the contour tree and applying it to problems in scientific and medical visualization.

Size
Isovalue
CPU
Marching Cubes
Kd-Tree
Case Kd-Tree
Interval Tree
Case Interval Tree
GPU
Marching Cubes
Kd-Tree
Case Kd-Tree
Interval Tree
Case Interval Tree

Fuel
64x64x64
10
65
146 (2.3)
160 (2.5)
157 (2.4)
168 (2.6)
91 (1.4)
448 (6.9)
450 (6.9)
459 (7.1)
479 (7.4)

Hydrogen atom
128x128x128
20

22
25
27
31

9.4
(2.4)
(2.7)
(2.9)
(3.3)

12.4 (1.3)
90 (9.6)
121 (12.9)
112 (12.0)
134 (14.3)

Engine
256x256x128
155

Aneurism
256x256x256
100

Skull
256x256x256
80

2.5
6.9 (2.8)
8 (3.2)
8.3 (3.3)
-

1.4
6.4 (4.6)
8.1 (5.8)
8.1 (5.8)
-

1.1
2.1 (1.9)
2.4 (2.4)
-

3.1 (1.2)
24 (9.6)
28 (11.2)
28 (11.2)
-

Table 2: Framerate (frames per second) with pre-computed normals. Numbers in parentheses show ratio between the respective acceleration technique and the CPU implementation of
brute force Marching Cubes. Missing results indicate that our data structures would not fit in main
memory, or that the pre-computed normals would not fit in VRAM.

References
[1] Jon Louis Bentley. Multidimensional binary
search trees used for associative searching.
Commun. ACM, 18(9):509517, 1975.
[2] Hamish Carr, Torsten M
oller, and Jack
Snoeyink. Artifacts caused by simplicial subdivision. IEEE Transactions on Visualization and Computer Graphics, 12(2):231242,
March 2006.
[3] Paolo Cignoni, Paola Marino, Claudio Montani, Enrico Puppo, and Roberto Scopigno.
Speeding up isosurface extraction using interval trees. IEEE Transactions on Visualization
and Computer Graphics, 3(2):158170, 1997.
[4] Martin. J. Durst. Re: Additional reference
to marching cubes. SIGGRAPH Comput.
Graph., 22(5):243, 1988.
[5] Frank Goetz, Theodor Junklewitz, and Gitta
Domik. Real-time marching cubes on the vertex shader. In Eurographics 2005 Short Presentations. Eurographics Association, 2005.
[6] Thomas Klein, Simon Stegmaier, and Thomas
Ertl. Hardware-accelerated reconstruction of
polygonal isosurface representations on unstructured grids.
In Computer Graphics
and Applications, 12th Pacific Conference on
(PG04), pages 186195, 2004.
[7] Yarden Livnat, Han-Wei Shen, and Christopher R. Johnson.
A near optimal isosurface extraction algorithm using the span

space. IEEE Transactions on Visualization


and Computer Graphics, 2(1):7384, 1996.
[8] William E. Lorensen and Harvey E. Cline.
Marching cubes: A high resolution 3d surface
construction algorithm. In SIGGRAPH 87:
Proceedings of the 14th annual conference on
Computer graphics and interactive techniques,
pages 163169, New York, NY, USA, 1987.
ACM Press.
[9] Claudio Montani, Riccardo Scateni, and
Roberto Scopigno. A modified look-up table
for implicit disambiguation of marching cubes.
The Visual Computer, 10(6):353355, December 1994.
[10] Valerio Pascucci.
Isosurface computation
made simple: Hardware acceleration, adaptive
refinement and tetrahedral stripping. In Joint
Eurographics - IEEE TVCG Symposium on
Visualization (VisSym), pages 293300, 2004.
[11] Frank Reck, Germany Carsten Dachsbacher,
Marc Stamminger, G
unther Greiner, and
Roberto Grosso. Realtime isosurface extraction with graphics hardware. In Eurographics
2004 - Short Presentations & Interactive Demos, 2004.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy