Resound: Interactive Sound Rendering For Dynamic Virtual Environments

RESound: Interactive Sound Rendering for Dynamic
Virtual Environments∗
Micah T. Taylor Anish Chandak

University of North Carolina University of North Carolina
taylormt@cs.unc.edu achandak@cs.unc.edu
Lakulish Antani Dinesh Manocha
University of North Carolina University of North Carolina
lakulish@cs.unc.edu dm@cs.unc.edu
ABSTRACT 1. INTRODUCTION
We present an interactive algorithm and system (RESound) Extending the frontier of visual computing, an auditory
for sound propagation and rendering in virtual environments display uses sound to communicate information to a user
and media applications. RESound uses geometric propaga- and offers an alternative means of visualization or media.
tion techniques for fast computation of propagation paths By harnessing the sense of hearing, sound rendering can
from a source to a listener and takes into account specu- further enhance a user’s experience in multimodal virtual
lar reflections, diffuse reflections, and edge diffraction. In worlds [10, 27]. In addition to immersive environments, au-
order to perform fast path computation, we use a unified ditory display can provide a natural and intuitive human-
ray-based representation to efficiently trace discrete rays as computer interface for many desktop or handheld applica-
well as volumetric ray-frusta. RESound further improves tions (see Figure 1). Realistic sound rendering can directly
sound quality by using statistical reverberation estimation impact the perceived realism of users of interactive media
techniques. We also present an interactive audio rendering applications. An accurate acoustic response for a virtual
algorithm to generate spatialized audio signals. The over- environment is attuned according to the geometric repre-
all approach can handle dynamic scenes with no restrictions sentation of the environment. This response can convey
on source, listener, or obstacle motion. Moreover, our algo- important details about the environment, such as the lo-
rithm is relatively easy to parallelize on multi-core systems. cation and motion of objects. The most common approach
We demonstrate its performance on complex game-like and to sound rendering is a two-stage process:
architectural environments.
• Sound propagation: the computation of impulse re-
sponses (IRs) that represent an acoustic space.
Categories and Subject Descriptors • Audio rendering: the generation of spatialized au-
H.5.5 [Information Interfaces and Presentation]: Sound dio signal from the impulse responses and dry (ane-
and Music Computing—modeling, systems; I.3.7 [Computer choically recorded or synthetically generated) source
Graphics]: Three-Dimensional Graphics and Realism—ray- signals.
tracing
Sound propagation from a source to a listener conveys in-
formation about the size of the space surrounding the sound
General Terms source and identifies the source to the listener even when the
source is not directly visible. This considerably improves the
Performance immersion in virtual environments. For instance, in a first-
person shooter game scenario (see Figure 1(b)), the distant
cries of a monster coming around a corner or the soft steps of
Keywords an opponent approaching from behind can alert the player
Acoustics, sound, ray tracing and save them from fatal attack. Sound propagation is also
used for acoustic prototyping (see Figure 1(d)) for computer
∗Project webpage: games, complex architectural buildings, and urban scenes.
http://gamma.cs.unc.edu/Sound/RESound/ Audio rendering also provides sound cues which give direc-
tional information about the position of the sound source
relative to a listener. The cues are generated for headphones
or a 3D surround sound speaker system. Thus, the listener
Permission to make digital or hard copies of all or part of this work for can identify the sound source even when the sound source is
personal or classroom use is granted without fee provided that copies are out of the field of view of the listener. For example, in a VR
not made or distributed for profit or commercial advantage and that copies combat simulation (see Figure 1(a)), it is critical to simulate
bear this notice and the full citation on the first page. To copy otherwise, to the 3D sounds of machine guns, bombs, and missiles. An-
republish, to post on servers or to redistribute to lists, requires prior specific other application of 3D audio is user interface design, where
permission and/or a fee.
MM’09, October 19–24, 2009, Beijing, China. sound cues are used to search for data on a multi-window
Copyright 2009 ACM 978-1-60558-608-3/09/10 ...$10.00. screen (see Figure 1(c)).
(a) (b) (c) (d)
Figure 1: Multimedia applications that need interactive sound rendering (a) Virtual reality training: Virtual
Iraq simulation to treat soldiers suffering from post-traumatic stress disorder (top) and emergency training
for medical personnel using Second Life (bottom). (b) Games: Half-life 2 (top) and Crackdown, winner of
the best use of audio at the British Academy of Film and Television Arts awards (bottom). (c) Interfaces
and Visualization: Multimodal interfaces (top) and data exploration & visualization system (bottom). (d)
Computer aided Design: Game level design (top) and architectural acoustic modeling (bottom).
The main computational cost in sound rendering is the triangles) as well as dynamic scenes with moving sound
real-time computation of the IRs based on the propagation sources, listener, and scene objects. We can perform in-
paths from each source to the listener. The IR computation teractive sound propagation including specular reflections,
relies on the physical modeling of the sound field based on diffuse reflections, and diffraction of up to 3 orders on a
an accurate description of the scene and material proper- multi-core PC. To the best of our knowledge, RESound is
ties. The actual sensation of sound is due to small varia- the first interactive sound rendering system that can per-
tions in the air pressure. These variations are governed by form plausible sound propagation and rendering in dynamic
the three-dimensional wave equation, a second-order linear virtual environments.
partial differential equation, which relates the temporal and Organization: The paper is organized as follows. We re-
spatial derivatives of the pressure field [40]. Current numer- view the related methods on acoustic simulation in Section
ical methods used to solve the wave equation are limited 2. Section 3 provides an overview of RESound and high-
to static scenes and can take minutes or hours to compute lights the various components. We present the underlying
the IRs. Moreover, computing a numerically accurate so- representations and fast propagation algorithms in Section
lution, especially for high frequencies, is considered a very 4. The reverberation estimation is described in Section 5
challenging problem. and the audio rendering algorithm is presented in Section
Main Results: We present a system (RESound) for in- 6. The performance of our system is described in Section 7.
teractive sound rendering in complex and dynamic virtual In Section 8, we discuss the quality and limitations of our
environments. Our approach is based on geometric acous- system.
tics, which represents acoustic waves as rays. The geo-
metric propagation algorithms model the sound propagation
based on rectilinear propagation of waves and can accurately 2. PREVIOUS WORK
model the early reflections (up to 4 − 6 orders). Many algo- In this section, we give a brief overview of prior work in
rithms have been proposed for interactive geometric sound acoustic simulation. Acoustic simulation for virtual envi-
propagation using beam tracing, ray tracing or ray-frustum ronment can be divided into three main components: sound
tracing [5, 14, 37, 40]. However, they are either limited to synthesis, sound propagation, and audio rendering. In this
static virtual environments or can only handle propagation paper, we only focus on interactive sound propagation and
paths corresponding to specular reflections. audio rendering.
In order to perform interactive sound rendering, we use
fast techniques for sound propagation and audio rendering. 2.1 Sound Synthesis
Our propagation algorithms use a hybrid ray-based repre- Sound synthesis generates audio signals based on interac-
sentation that traces discrete rays [18] and ray-frusta [24]. tions between the objects in a virtual environment. Synthe-
Discrete ray tracing is used for diffuse reflections and frus- sis techniques often rely on physical simulators to generate
tum tracing is used to compute the propagation paths for the forces and object interactions [7, 30]. Many approaches
specular reflections and edge diffraction. We fill in the late have been proposed to synthesize sound from object interac-
reverberations using statistical methods. We also describe tion using offline [30] and online [32, 47, 48] computations.
an audio rendering pipeline combining specular reflections, Anechoic signals in a sound propagation engine can be re-
diffuse reflections, diffraction, 3D sound, and late reverber- placed by synthetically generated audio signal as an input.
ation. Thus, these approaches are complementary to the presented
Our interactive sound rendering system can handle mod- work and could be combined with RESound for an improved
els consisting of tens of thousands of scene primitives (e.g. immersive experience.
2.2 Sound Propagation
Sound propagation deals with modeling how sound waves
propagate through a medium. Effects such as reflections,
transmission, and diffraction are the important components.
(a) (b) (c)
Sound propagation algorithms can be classified into two ap-
proaches: numerical methods and geometric methods. Figure 3: Example scene showing (a) specular, (b)
Numerical Methods: These methods [6, 19, 26, 29] diffraction, and (c) diffuse propagation paths.
solve the wave equation numerically to perform sound propa-
gation. These methods can provide very accurate results but few sound sources in real-time. Recent approaches based on
are computationally expensive. Despite recent advances [31], audio perception [28, 46] and sampling of sound sources [51]
these methods are too slow for interactive applications, and can handle 3D sound for thousands of sound sources.
only limited to static scenes.
Geometric Methods: The most widely used methods
for interactive sound propagation in virtual environments 3. SYSTEM OVERVIEW
are based on geometric acoustics. They compute propa- In this section, we give an overview of our approach and
gation paths from a sound source to the listener and the highlight the main components. RESound simulates the
corresponding impulse response from these paths. Specu- sound field in a scene using geometric acoustics (GA) meth-
lar reflections of sound are modeled with the image-source ods.
method [2, 34]. Image-source methods recursively reflect
the source point about all of the geometry in the scene to 3.1 Acoustic modeling
find specular reflection paths. BSP acceleration [34] and All GA techniques deal with finding propagation paths be-
beam tracing [13, 21] have been used to accelerate this com- tween each source and the listener. The sound waves travel
putation in static virtual environments. Other methods to from a source (e.g. a speaker) and arrive at a listener (e.g.
compute specular paths include ray tracing based methods a user) by traveling along multiple propagation paths repre-
[18, 49] and approximate volume tracing methods [5, 23]. senting different sequences of reflections, diffraction, and re-
There has also been work on complementing specular re- fractions at the surfaces of the environment. Figure 3 shows
flections with diffraction effects. Diffraction effects are very an example of such paths. In this paper, we limit ourselves
noticeable at corners, as the diffraction causes the sound to reflections and diffraction paths. The overall effect of
wave to propagate in regions that are not directly visible these propagation paths is to add reverberation (e.g. echoes)
to the sound source. Two diffraction models are commonly to the dry sound signal. Geometric propagation algorithms
used: the Uniform Theory of Diffraction (UTD) [17] and a need to account for different wave effects that directly influ-
recent formulation of the Biot-Tolstoy-Medwin method [41]. ence the response generated at the listener.
The BTM method is more costly to compute than UTD, and When a small, point like, sound source generates non-
has only recently been used in interactive simulation [35]. directional sound, the pressure wave expands out in a spher-
The UTD, however, has been adapted for use in several in- ical shape. If the listener is set a short distance from the
teractive simulations [3, 42, 44]. source, the wave field eventually encounters the listener.
Another important effect that can be modeled with GA Due to the spreading of the field, the amplitude at the lis-
is diffuse reflections. Diffuse reflections have been shown tener is attenuated. The corresponding GA component is
to be important for modeling sound propagation [9]. Two a direct path from the source to the listener. This path
common existing methods for handling diffuse reflections are represents the sound field that is diminished by distance at-
radiosity based methods [37, 38] and ray tracing based meth- tenuation.
ods [8, 16]. As the sound field propagates, it is likely that the sound
The GA methods described thus far are used to render field will also encounter objects in the scene. These objects
the early reflections. The later acoustic response must also may reflect or otherwise scatter the waves. If the object is
be calculated [15]. This is often done through statistical large relative to the field’s wavelength, the field is reflected
methods [12] or ray tracing [11]. specularly, as a mirror does for light waves. In GA, these
paths are computed by enumerating all possible reflection
2.3 Audio Rendering paths from the source to the listener, which can be a very
Audio rendering generates the final audio signal which can costly operation. There has been much research focused on
be heard by a listener over the headphones or speakers [20]. reducing the cost of this calculation [14], as most earlier
In context of geometric sound propagation, it involves con- methods were limited to static scenes with fixed sources.
volving the impulse response computed by the propagation The delay and attenuation of these contributions helps the
algorithm with an anechoic input audio signal and introduce listener estimate the size of the propagation space and pro-
3D cues in the final audio signal to simulate the direction of vides important directional cues about the environment.
incoming sound waves. In a dynamic virtual environment, Objects that are similar in size to the wavelength may
sound sources, listener, and scene objects may be moving. also be encountered. When a sound wave encounters such
As a result, the impulse responses change frequently and it an object, the wave is influenced by the object. We focus
is critical to generate an artifact-free smooth audio signal. on two such scattering effects: edge diffraction and diffuse
Tsingos [43] and Wenzel et al. [52] describe techniques for reflection.
artifact-free audio rendering in dynamic scenes. Introduc- Diffraction effects occur at the edges of objects and cause
ing 3D cues in the final audio signals requires convolution the sound field to be scattered around the edge. This scatter-
of an incoming sound wave with a Head Related Impulse ing results in a smooth transition as a listener moves around
Response (HRIR) [1, 22]. This can only be performed for a edges. Most notably, diffraction produces a smooth transi-
Figure 2: The main components of RESound: scene preprocessing; geometric propagation for specular,
diffuse, and diffraction components; estimation of reverberation from impulse response; and final audio
rendering.
tion when the line-of-sight between the source and listener is aligned bounding boxes and is updated when the objects in
obstructed. The region behind an edge in which the diffrac- the scene move. This hierarchy is used to perform fast in-
tion field propagates is called the shadow region. tersection tests for discrete ray and frustum tracing. The
Surfaces that have fine details or roughness of the same edges of objects in the scene are also analyzed to determine
order as the wavelength can diffusely reflect the sound wave. appropriate edges for diffraction.
This means that the wave is not specularly reflected, but Interactive Sound Propagation: This stage computes
reflected in a Lambertian manner, such that the reflected the paths between the source and the listener. The direct
direction is isotropic. These diffuse reflections complement path is quickly found by checking for obstruction between
the specular components [9]. the source and listener. A volumetric frustum tracer is used
As the sound field continues to propagate, the number to find the specular and edge diffraction paths. A stochastic
of reflections and scattering components increase and the ray tracer is used to compute the diffuse paths. These paths
amplitude of these components decrease. The initial orders are adjusted for frequency band attenuation and converted
(e.g. up to four or six) of reflection are termed early re- to appropriate pressure components.
flections. These components have the greatest effect on a Audio Rendering: After the paths are computed, they
listener’s ability to spatialize the sound. However, the early need to be auralized. A statistical reverberation filter is es-
components are not sufficient to provide an accurate acous- timated using the path data. Using the paths and the esti-
tic response for any given scene. The later reverberation mated filter as input, the waveform is attenuated by the au-
effects are a function of the scene size [12] and convey an ralization system. The resulting signal represents the acous-
important sense of space. tic response and is output to the system speakers.
3.2 Ray-based Path Tracing 4. INTERACTIVE SOUND PROPAGATION

RESound uses a unified ray representation for specular In this section, we give an overview of our sound propa-
reflections, diffuse reflections, and diffraction path compu- gation algorithm. Propagation is the most expensive step in
tations. The underlying framework exploits recent advances the overall sound rendering pipeline. The largest computa-
in interactive ray tracing in computer graphics literature. tional cost is the calculation of the acoustic paths that the
We compute diffuse reflections using a discrete ray represen- sound takes as it is reflected or scattered by the objects in
tation [25, 50] and specular reflections and diffraction using the scene. Under the assumption of geometric acoustics, this
a ray-frustum representation [5, 24]. A frustum is a convex is primarily a visibility calculation. Thus, we have chosen
combination of four corner rays [24]. We use fast ray tracing rays as our propagation primitive. For example, the direct
algorithms to perform intersection tests for the discrete rays sound contribution is easily modeled by casting a ray be-
as well as volumetric frusta. tween the source and listener. If the path is not obstructed,
We assume that the scene is composed of triangles and there is a direct contribution from the source to the listener.
is represented using a bounding volume hierarchy (BVH) The other propagation components are more expensive to
of axis-aligned bounding boxes (AABBs). A BVH can be compute, but rely on similar visibility computations.
used to handle dynamic scenes efficiently [25]. The same When computing the propagation components, many in-
underlying hierarchy is used for both discrete rays and ray- tersection computations between the scene triangles and the
frusta as part of our unified representation. Rays are shot ray primitives are performed. In order to reduce the compu-
as ray packets [25] and efficient frustum culling is used for tation time, we would like to minimize the cost of the inter-
fast intersection of ray packets and frusta with the BVH. In section tests. Since our propagation method is ray based, an
order to perform fast intersection tests with scene triangles acceleration structure to minimize ray intersections against
the frustum representation uses Plücker coordinates [36]. scene geometry can be used. Specifically, our system con-
structs a bounding volume hierarchy (BVH) of axis aligned
3.3 RESound Components bounding boxes [25]. This structure can be updated for dy-
Our system consists of three main processing steps. These namic scene objects with refitting algorithms. Also, we mark
are outlined in Figure 2. all possible diffraction edges. This allows the diffraction
Preprocessing: As part of preprocessing, a scene bound- propagation to abort early if the scene edge is not marked
ing volume hierarchy is created. This is a hierarchy of axis- as a diffracting edge.
the source to the listener. This path is verified by casting
a ray from the listener towards the frustum origin. If the
ray intersection point is contained in the frustum origin face
on the triangle, the path segment is valid. This validation
process is repeated using the computed intersection point to
the origin of the previous frustum. If the entire path is valid,
the path distance and attenuation are recorded. Figure 4(a)
shows an overview of the frustum engine.
4.2 Edge Diffraction paths

Frustum tracing can be modified to account for diffraction
contributions [42] using the Uniform Theory of Diffraction
(UTD). The UTD can be used to calculate the diffraction
attenuation for ray paths used in GA. When a sound ray
encounters an edge, the ray is scattered about the edge. In
the UTD formulation, the region covered by the diffraction
contribution is defined by the angle of the entrance ray. If
a ray hits the edge with an angle of θ, the ray is scattered
about the edge in a cone shape where the cone makes an
angle θ with the edge.
As the frusta intersect the scene triangles, the triangle
edges are checked whether they are marked as diffracting
edges. If the triangle has diffracting edges, and the edges are
contained within the frustum face, a new diffraction frustum
is created. Similar to other approaches [42, 44], we compute
(a) (b) the diffraction component only in the shadow region. As
such, the frustum is bounded by the line-of-sight from the
Figure 4: Unified ray engine: Both (a) frustum trac- frustum origin and the far side of the triangle. This frustum
ing and (b) ray tracing share a similar rendering then propagates through the scene as normal.
pipeline. The final sound path is verified using the same process
described for specular paths. However, for diffraction se-
quences, the path is attenuated using the UTD equation [17].
4.1 Specular paths The UTD equation is in the frequency domain, and is thus
We use volumetric frustum tracing [24] to calculate the computed for a number of frequency bands. The resulting
specular paths between the source and listener. From our UTD coefficients are combined with the attenuation for the
basic ray primitive, we form a convex volume bounded by other path segments to create the final path attenuation.
4 rays. In order to model a uniform point sound source,
we cast many of these frustum primitives such that all the 4.3 Diffuse component
space around the source is covered. For each frustum, the In order to compute sound reflected off diffuse materials,
bounding rays of the volume are intersected with the scene we use a stochastic ray tracer (Figure 4(b)). Rays are prop-
primitives. After the rays have hit the geometric primitives, agated from the sound source in all the directions. When a
they are specularly reflected. This gives rise to another frus- ray encounters a triangle it is reflected and tracing contin-
tum that is recursively propagated. This continues until a ues. The reflection direction is determined by the surface
specified order of reflection is achieved. material. The listener is modeled by a sphere that approxi-
However, it is possible that the 4 bounding rays of the mates the listener’s head. As the rays propagate, we check
frustum did not all hit the same object in the scene. In this for intersections with this sphere. If there is an intersection,
case, it cannot be guaranteed that the resulting specular the path distance and the surfaces encountered are recorded
frustum correctly contains the reflection volume. As such, for the audio rendering step.
we employ an adaptive subdivision strategy [5] to reduce The scattering coefficient for surface materials varies for
the error in the volume. If it is found that the 4 rays do not different sound frequencies. Thus, for one frequency incom-
intersect the same geometric primitive, that is, the frustum ing rays may be heavily scattered, while eor another fre-
face is not fully contained within the bounds of the geometric quency the reflection is mostly specular. Since intersecting
primitive, the frustum is subdivided using a quad-tree like rays with the objects in the scene is a costly operation, we
structure into 4 sub-frusta. The sub-frusta are then inter- wish to trace rays only once for all the frequencies. As such,
sected with the scene and the subdivision process continues for each ray intersection, we randomly select between diffuse
until a user-defined subdivision level is reached. When the and specular reflection [11].
subdivision is complete, any ambiguous intersections are re- If the ray hit the listener, we scale the energy for each
solved by choosing the closest intersected object and reflect- frequency band appropriately based on the material prop-
ing the subdivided frustum’s rays against it. This process erties and type of reflections selected. If a path is found to
results in a reasonably [5] accurate volumetric covering of be composed entirely of specular reflections, it is discarded
the scene space. as such paths are found in the frustum tracing step. Once
Given any propagation frusta, if the listener is contained all paths have been computed and attenuated, the resulting
within the volume, there must exist some sound path from values are converted to a histogram which combines nearby
speakers. In this section, we provide details on the real-
time audio rendering pipeline implemented in our interactive
sound propagation system. Our audio rendering pipeline is
implemented using XAudio21 , a cross-platform audio library
for Windows and Xbox 360.
Our sound propagation algorithm generates a list of spec-
ular, diffuse, and diffracted paths from each source to the
listener. These paths are accessed asynchronously by the
audio rendering pipeline as shown in Figure 6 at different
Figure 5: Extrapolating the IR to estimate late re- rates. Furthermore, each path can be represented as a vir-
verberation: The red curve is obtained from a least- tual source with some attenuation, distance from the lis-
squares fit (in log-space) of the energy IR computed tener, and the incoming direction relative to the listener.
by GA, and is used to add the reverberant tail to The direction of a virtual source relative to the listener is
the IR. simulated by introducing 3D sound cues in the final au-
dio. Additionally, the source, listener, and scene objects can
contributions into single, larger contributions. The energy move dynamically. In such cases, the impulse response (IR)
for each contribution is reduced based on the number of rays computed during the sound propagation step could vary sig-
that have been propagated. The square root of the each con- nificantly from one frame to another. Thus, our approach
tribution is used to compute a final pressure value. mitigates the occurence of artifacts by various means. Our
system also uses the previously described reverberation data
5. REVERBERATION ESTIMATION to construct the appropriate sound filters.
The propagation paths computed by the frustum tracer 6.1 Integration with Sound Propagation
and stochastic ray tracer described in Section 4 are used only The paths computed by the sound propagation algorithm
for the early reflections that reach the listener. While they in Section 4 are updated at different rates for different or-
provide important perceptual cues for spatial localization ders of reflection. These paths are then queried by the audio
of the source, capturing late reflections (reverberation) con- rendering system in a thread safe manner. To achieve a high
tributes significantly to the perceived realism of the sound quality final audio signal, the audio rendering system needs
simulation. to query at the sampling rate of the input audio signal (44.1
We use well-known statistical acoustics models to estimate KHz). However, our audio rendering system queries per au-
the reverberant tail of the energy IR. The Eyring model [12] dio frame. We have found frames containing 10ms worth of
is one such model that describes the energy decay within a audio samples suitable. Various user studies support that a
single room as a function of time: lower update rate [33] can be used without any perceptual
cS
E(t) = E0 e 4V t log(1−α) (1) difference. It should be noted that the direct sound com-
ponent and the early reflection components are very fast to
where c is the speed of sound, S is the total absorbing compute. Thus, we update the direct contribution and first
surface area of the room, V is the volume of the room and order reflections at a higher rate than the other components.
α is the average absorption coefficient of the surfaces in the For the direct and first order reflection paths, we also intro-
room. duce 3D sound cues in the final audio signal. To produce
Given the energy IR computed using GA, we perform a the final audio we band pass the input signal into eight oc-
simple linear least-squares fit to the IR in log-space. This tave bands. For each octave band we compute an impulse
gives us an exponential curve which fits the IR and can easily response, which is convolved with the band pass input au-
be extrapolated to generate the reverberation tail. From the dio to compute final audio as shown in Figure 7. The details
curve, we are most interested in estimating the RT60 , which on computing an impulse response using the paths from the
is defined as the time required for the energy to decay by 60 sound propagation engine are below.
dB. Given the slope computed by the least-squares fit of the Specular and Diffraction IR: The specular reflections
IR data, it is a simple matter to estimate the value of RT60 . and diffraction are formulated as a function of the sound
This value is used in the audio rendering step to generate pressure, as described in the previous sections. Thus, any
late reverberation effects. path reaching from a source to the listener has a delay com-
Note that Equation (1) is for a single-room model, and is puted as d/C where d is the distance traveled, and C is the
not as accurate for scenes with multiple rooms (by “rooms” speed of sound. Each impulse is attenuated based on fre-
we mean regions of the scene which are separated by dis- quency dependent wall absorption coefficients and the dis-
tinct apertures, such as doors or windows). The single-room tance traveled. For all the paths reaching from a source
model is a good approximation for large interior spaces and to the listener, a value with attenuation Apath is inserted
many outdoor scenes. Other models exist for coupled rooms at time index d/C in the impulse response. One such im-
[39], but they would require fitting multiple curves to the pulse response is computed for all different octave bands for
IR, and the number of curves to fit would depend on the a source-listener pair.
number of rooms in the scene. In the interests of speed and Diffuse IR: The diffuse reflections are formulated as a
simplicity, we have chosen to use a single-room model. function of the energy of the sound waves. Using the paths
collected at the listener, an energy IR is constructed for all
6. AUDIO RENDERING the reflection paths reaching the listener This energy IR is
Audio rendering is the process of generating an audio sig- 1
http://msdn.microsoft.com/en-
nal which can be heard by a listener using headphones or us/library/bb694503(VS.85).aspx
Figure 6: An overview of the integration of audio rendering system with the sound propagation engine.
Sound propagation engine updates the computed paths in a thread safe buffer. The direct path and first
order reflection paths are updated at higher frequency. The audio rendering system queries the buffer and
performs 3D audio for direct and first order paths and convolution for higher order paths. The cross-fading
and interpolation components smooth the final audio output signal.
direct and first order reflections are convolved with a normal-

ized HRIR [1]. Some recent approaches have been proposed
to handle audio rendering of large numbers of sound sources
[45, 51]. These approaches can also be integrated with our
system.
6.4 Adding Late Reverberation

XAudio2 supports the use of user-defined filters and other
audio processing components through the XAPO interface.
Figure 7: IR Convolution: The input audio signal S
One of the built-in filters is an artificial reverberation filter,
is band passed into N octave bands which are con-
which can add late decay effects to a sound signal. This
volved with the IR of the corresponding band.
filter can be attached to the XAudio2 pipeline (one filter
converted into pressure IR for audio rendering. We take the per band) to add late reverberation in a simple manner.
square root of energy response to create a pressure IR for The reverberation filter has several configurable parame-
each frequency band. This IR is combined with specular ters, one of which is the RT60 for the room. In Section 5,
and diffraction IRs to produce the final IR used in the audio we described a method for estimating this value. The re-
rendering. verberation filter is then updated with the estimate. This
approach provides a simple, efficient way of complementing
6.2 Issues with Dynamic Scenes the computed IRs with late reverberation effects.
Our sound propagation system is general and can han-
dle moving sources, moving listener, and dynamic geometric 7. PERFORMANCE
primitives. This introduces a unique set of challenges for our
real-time audio rendering system. Due to the motion of the Our system makes use of several levels of parallel algo-
sources, listener, and scene objects, the propagation paths rithms to accelerate the computation. Ray tracing is known
could change dramatically and producing artifact-free audio to be a highly parallelizable algorithm and our system threads
rendering can be challenging. Therefore, we impose physical to take advantage of multi-core computers. Also, frustum
restrictions on the motion of sources, listener, and the ge- tracing uses vector instructions to perform operations on
ometric primitives to produce artifact-free audio rendering. a frustum’s corner rays in parallel. Using these optimiza-
To further mitigate the effects of the changing IRs, we con- tions, our system achieves interactive performance on com-
volve each audio frame with the current and the previous IRs mon multi-core PCs.
and crossfade them to produce the final audio signal. The In this section, we detail the performance of RESound.
window of cross-fading can be adjusted to minimize the ar- We highlight each subsystem’s performance on a varying set
tifacts due to motion. Other more sophisticated approaches of scenes. The details of the scenes and system performance
like predicting the positions and velocities of source or the are presented in Table 1, and the scenes are visually shown in
listener can also be used [43, 52]. Figure 8. In all benchmarks, we run RESound using a multi-
core PC at 2.66Ghz; the number of threads per component
6.3 3D Sound Rendering is described in each section.
In a typical sound simulation, many sound waves reach
the listener from different directions. These waves diffract
around the listener’s head and provide cues regarding the
direction of the incoming wave. This diffraction effect can
be encoded in a Head-Related Impulse Response (HRIR)
[1]. Thus, to produce a realistic 3D sound rendering effect,
each incoming path to the listener can be convolved with an
(a) (b) (c) (d)
HRIR. However, for large numbers of contributions this com-
putation can quickly become expensive and it may not be Figure 8: Test scenes used: (a) Room, (b) Confer-
possible to perform audio rendering in real-time. Thus, only ence, (c) Sibenik, and (d) Sponza.
Specular + diffraction (3 orders) Specular + diffraction (1 order) Diffuse (3 orders)
Scene Triangles Time Frusta Paths Time Frusta Paths Time Paths
Room 6k 359ms 278k 4 77ms 7k 3 274ms 228
Conference 282k 1137ms 320k 7 157ms 5k 2 323ms 318
Sibenik 76k 2810ms 900k 14 460ms 10k 5 437ms 26
Sponza 66k 1304ms 598k 8 260ms 10k 3 516ms 120
Table 1: Performance: Test scene details and the performance of the RESound components.
# Impulses Compute time (ms)

10 0.026
50 0.111
100 0.425
1000 37.805
5000 1161.449 (a) (b) (c)
Table 2: Timings for late reverberation estimation. Figure 9: Specular paths: With a subdivision (a)
level of 2, frustum tracing finds 13 paths. A subdi-
vision (b) level of 5 finds 40 paths. The (c) image-
Specular and Diffraction: We generate two separate source solution has 44 paths.
IRs using frustum tracing. One IR includes only the first or-
der specular and diffraction contributions. Since these paths
are fast to compute, we devote one thread to this task. The
other IR we generate includes the contributions for 3 orders
of reflection and 2 orders of diffraction. This is done us-
ing 7 threads. The performance details for both simulations
cycles are described in Table 1.
Diffuse tracing: Our diffuse tracer stochastically sam-
ples the scene space during propagation. As such, the rays
are largely incoherent and it is difficult to use ray packets.
Nonetheless, even when tracing individual rays, RESound
can render at interactive rates as shown in the performance Figure 10: Diffraction paths: Increasing the frustum
table. The timings are for 200k rays with 3 reflections using subdivision improves the diffraction accuracy.
7 threads.
Late reverberation: We measured the time taken by
our implementation to perform the least-squares fitting while with diffraction [42]. Due to limitations of frustum engines,
estimating late reverberation. The execution time was mea- it was found that certain types of diffraction paths could
sured by averaging over 10 frames. During testing, we vary not be enumerated. However, as the frustum subdivision
the density of the impulse response. The reverberation cal- level was increased, the number of diffraction paths found
culation is not threaded due to its minimal time cost. The approached an ideal solution (Figure 10) and the paths ac-
results are summarized in Table 2. curately matched the reference solution.
The diffuse IR in RESound is generated by stochastic ray
tracing. The sampling and attenuation model RESound uses
8. QUALITY AND LIMITATIONS has previously been shown to be statistically valid with suf-
The algorithms used in RESound are based on the physi- ficient sampling. Detailed analysis and validation has been
cal properties of high frequency acoustic waves. We discuss presented by Embrechts [11].
the output quality of each component in the RESound sys- We compare our reverberation decay times to statistically
tem and compare against the accurate known simulations. estimated times in two simple scenes. Similar scenes are
We also note the benefits that RESound offers over simpler described in other work [15, 16]. The results are presented
audio rendering systems. The underlying limitations of the in Table 3.
methods used are also discussed.
8.2 Benefits
8.1 Quality Interactive audio simulations used in current applications
Since adaptive frustum tracing approximates the image are often very simple and use precomputed reverberation
source reflection model, its accuracy has been compared to effects and arbitrary attenuations. In RESound, the de-
image-source methods [5]. It was found that as the sub- lays and attenuations for both reflection and diffraction are
division level increases, the number of contributions found
by the frustum simulation approach the number found by
the image-method. Moreover, the attenuation of the result- Room size (m) Absorption Predicted RESound
ing impulse response from frustum tracing is similar to that 4x4x4 0.1 1030 ms 1170 ms
found by image-source (Figure 9). 27.5x27.5x27.5 0.0 8890 ms 7930 ms
Similarly, the validity of diffraction using frustum tracing
has also been compared to an accurate beam tracing system Table 3: Reverberation decay times for two models.
current commodity hardware. We are also investigating us-
ing frustum tracing for very accurate GA simulations [4].
Finally, we would like to use RESound for other applica-
tions such as tele-conferencing and design of sound-based
user interfaces.
(a) (b) (c)

10. ACKNOWLEDGMENTS
Figure 11: Path direction: (a) Binaural paths are This research is supported in part by ARO Contract W911NF-
physically impossible, but (b) diffraction and (c) re- 04-1-0088, NSF award 0636208 , DARPA/RDECOM Con-
flection paths direct the listener as physically ex- tracts N61339-04-C-0043 and WR91CRB-08-C-0137, Intel,
pected. and Microsoft.
11. REFERENCES
[1] V. Algazi, R. Duda, and D. Thompson. The CIPIC HRTF
based on physical approximations. This allows RESound to Database. In IEEE ASSP Workshop on Applications of
generate acoustic responses that are expected given scene Signal Processing to Audio and Acoustics, 2001.
materials and layout. [2] J. B. Allen and D. A. Berkley. Image method for efficiently
In addition to calculating physically based attenuations simulating small-room acoustics. The Journal of the
and delays, RESound also provides accurate acoustic spa- Acoustical Society of America, 65(4):943–950, April 1979.
tialization. When compared to simple binaural rendering, [3] F. Antonacci, M. Foco, A. Sarti, and S. Tubaro. Fast
RESound provides more convincing directional cues. Con- modeling of acoustic reflections and diffraction in complex
environments using visibility diagrams. In Proceedings of
sider a situation when the sound source is hidden from the
12th European Signal Processing Conference (EUSIPCO
listener’s view (Figure 11). In this case, without reflection ’04), pages 1773–1776, September 2004.
and diffraction, the directional component of the sound field [4] A. Chandak, L. Antani, M. Taylor, and D. Manocha. Fastv:
appears to pass through the occluder. However, propaga- From-point visibility culling on complex models.
tion paths generated by RESound arrive at the listener with Eurographics Symposium on Rendering, 2009.
a physically accurate directional component. [5] A. Chandak, C. Lauterbach, M. Taylor, Z. Ren, and
D. Manocha. AD-Frustum: Adaptive Frustum Tracing for
8.3 Limitations Interactive Sound Propagation. IEEE Transactions on
Visualization and Computer Graphics, 14(6):1707–1722,
RESound has several limitations. The accuracy of our al- Nov.-Dec. 2008.
gorithm is limited by the use of underlying GA algorithms. [6] R. Ciskowski and C. Brebbia. Boundary Element methods
In practice, GA is only accurate for higher frequencies. More- in acoustics. Computational Mechanics Publications and
over, the accuracy of our frustum-tracing reflection and diffrac- Elsevier Applied Science, 1991.
tion varies as a function of maximum subdivision. Our [7] P. R. Cook. Real Sound Synthesis for Interactive
diffraction formulation is based on the UTD and assumes Applications. A. K. Peters, 2002.
[8] B. Dalenbäck. Room acoustic prediction based on a unified
that the edge lengths are significantly larger than the wave-
treatment of diffuse and specular reflection. The Journal of
length. Also, frustum tracing based diffraction also is lim- the Acoustical Society of America, 100(2):899–909, 1996.
ited in the types of diffraction paths that can be found. Our [9] B.-I. Dalenbäck, M. Kleiner, and P. Svensson. A
approach for computing the diffuse IR is subject to statisti- Macroscopic View of Diffuse Reflection. Journal of the
cal error [11] that must be overcome with dense sampling. Audio Engineering Society (JAES), 42(10):793–807,
In terms of audio rendering, we impose physical restrictions October 1994.
on the motion of the source, listener, and scene objects to [10] N. Durlach and A. Mavor. Virtual Reality Scientific and
generate an artifact free rendering. Technological Challenges. National Academy Press, 1995.
[11] J. J. Embrechts. Broad spectrum diffusion model for room
acoustics ray-tracing algorithms. The Journal of the
9. CONCLUSION AND FUTURE WORK Acoustical Society of America, 107(4):2068–2081, 2000.
[12] C. F. Eyring. Reverberation time in “dead” rooms. The
We have presented an interactive sound rendering system Journal of the Acoustical Society of America,
for dynamic virtual environments. RESound uses GA meth- 1(2A):217–241, January 1930.
ods to compute the propagation paths. We use a ray-based [13] T. Funkhouser, I. Carlbom, G. Elko, G. Pingali, M. Sondhi,
underlying representation that is used to compute specular and J. West. A beam tracing approach to acoustic
reflections, diffuse reflections, and edge diffraction. We also modeling for interactive virtual environments. In Proc. of
use statistical late reverberation estimation techniques and ACM SIGGRAPH, pages 21–32, 1998.
[14] T. Funkhouser, N. Tsingos, and J.-M. Jot. Survey of
present an interactive audio rendering algorithm for dynamic
Methods for Modeling Sound Propagation in Interactive
virtual environments. We believe RESound is the first inter- Virtual Environment Systems. Presence and Teleoperation,
active system that can generate plausible sound rendering in 2003.
complex, dynamic virtual environments. [15] M. Hodgson. Evidence of diffuse surface reflection in rooms.
There are many avenues for future work. We would like The Journal of the Acoustical Society of America,
to further analyze the accuracy of our approach. It is pos- 88(S1):S185–S185, 1990.
sible to further improve the accuracy of edge diffraction by [16] B. Kapralos, M. Jenkin, and E. Milios. Acoustic Modeling
using the BTM formulation, as opposed to UTD. Similarly, Utilizing an Acoustic Version of Phonon Mapping. In Proc.
of IEEE Workshop on HAVE, 2004.
the accuracy of diffuse reflections can be improved based
[17] R. G. Kouyoumjian and P. H. Pathak. A uniform
on better sampling methods. Many interactive applications geometrical theory of diffraction for an edge in a perfectly
such as games or VR need 30 − 60 Hz update rates and we conducting surface. Proc. of IEEE, 62:1448–1461, Nov.
may need faster methods to achieve such a performance on 1974.
[18] A. Krokstad, S. Strom, and S. Sorsdal. Calculating the The Journal of the Acoustical Society of America,
acoustical room response by the use of a ray tracing 116(2):958–969, August 2004.
technique. Journal of Sound and Vibration, 8(1):118–125, [40] P. Svensson and R. Kristiansen. Computational modelling
July 1968. and simulation of acoustic spaces. In 22nd International
[19] K. Kunz and R. Luebbers. The Finite Difference Time Conference: Virtual, Synthetic, and Entertainment Audio,
Domain for Electromagnetics. CRC Press, 1993. June 2002.
[20] H. Kuttruff. Acoustics. Routledge, 2007. [41] U. P. Svensson, R. I. Fred, and J. Vanderkooy. An analytic
[21] S. Laine, S. Siltanen, T. Lokki, and L. Savioja. Accelerated secondary source model of edge diffraction impulse
beam tracing algorithm. Applied Acoustic, 70(1):172–181, responses . Acoustical Society of America Journal,
2009. 106:2331–2344, Nov. 1999.
[22] V. Larcher, O. Warusfel, J.-M. Jot, and J. Guyard. Study [42] M. Taylor, A. Chandak, Z. Ren, C. Lauterbach, and
and comparison of efficient methods for 3-d audio D. Manocha. Fast Edge-Diffraction for Sound Propagation
spatialization based on linear decomposition of hrtf data. In in Complex Virtual Environments. In EAA Auralization
Audio Engineering Society 108th Convention preprints, Symposium, Espoo, Finland, June 2009.
page preprint no. 5097, January 2000. [43] N. Tsingos. A versatile software architecture for virtual
[23] C. Lauterbach, A. Chandak, and D. Manocha. Adaptive audio simulations. In International Conference on Auditory
sampling for frustum-based sound propagation in complex Display (ICAD), Espoo, Finland, 2001.
and dynamic environments. In Proceedings of the 19th [44] N. Tsingos, T. Funkhouser, A. Ngan, and I. Carlbom.
International Congress on Acoustics, 2007. Modeling acoustics in virtual environments using the
[24] C. Lauterbach, A. Chandak, and D. Manocha. Interactive uniform theory of diffraction. In Proc. of ACM
sound rendering in complex and dynamic scenes using SIGGRAPH, pages 545–552, 2001.
frustum tracing. IEEE Transactions on Visualization and [45] N. Tsingos, E. Gallo, and G. Drettakis. Perceptual audio
Computer Graphics, 13(6):1672–1679, Nov.-Dec. 2007. rendering of complex virtual environments. Technical
[25] C. Lauterbach, S. Yoon, D. Tuft, and D. Manocha. Report RR-4734, INRIA, REVES/INRIA Sophia-Antipolis,
RT-DEFORM: Interactive Ray Tracing of Dynamic Scenes Feb 2003.
using BVHs. IEEE Symposium on Interactive Ray Tracing, [46] N. Tsingos, E. Gallo, and G. Drettakis. Perceptual audio
2006. rendering of complex virtual environments. ACM Trans.
[26] J. Lehtinen. Time-domain numerical solution of the wave Graph., 23(3):249–258, 2004.
equation, 2003. [47] K. van den Doel. Sound Synthesis for Virtual Reality and
[27] R. B. Loftin. Multisensory perception: Beyond the visual in Computer Games. PhD thesis, University of British
visualization. Computing in Science and Engineering, Columbia, 1998.
05(4):56–58, 2003. [48] K. van den Doel, P. G. Kry, and D. K. Pai. Foleyautomatic:
[28] T. Moeck, N. Bonneel, N. Tsingos, G. Drettakis, physically-based sound effects for interactive simulation
I. Viaud-Delmon, and D. Alloza. Progressive perceptual and animation. In SIGGRAPH ’01: Proceedings of the 28th
audio rendering of complex scenes. In I3D ’07: Proceedings annual conference on Computer graphics and interactive
of the 2007 symposium on Interactive 3D graphics and techniques, pages 537–544, New York, NY, USA, 2001.
games, pages 189–196, New York, NY, USA, 2007. ACM. ACM Press.
[29] P. Monk. Finite Element Methods for Maxwell’s Equations. [49] M. Vorlander. Simulation of the transient and steady-state
Oxford University Press, 2003. sound propagation in rooms using a new combined
[30] J. F. O’Brien, C. Shen, and C. M. Gatchalian. Synthesizing ray-tracing/image-source algorithm. The Journal of the
sounds from rigid-body simulations. In The ACM Acoustical Society of America, 86(1):172–178, 1989.
SIGGRAPH 2002 Symposium on Computer Animation, [50] I. Wald. Realtime Ray Tracing and Interactive Global
pages 175–181. ACM Press, July 2002. Illumination. PhD thesis, Computer Graphics Group,
[31] N. Raghuvanshi, N. Galoppo, and M. C. Lin. Accelerated Saarland University, 2004.
wave-based acoustics simulation. In ACM Solid and [51] M. Wand and W. Straßer. Multi-resolution sound
Physical Modeling Symposium, 2008. rendering. In SPBG’04 Symposium on Point - Based
[32] N. Raghuvanshi and M. C. Lin. Interactive sound synthesis Graphics 2004, pages 3–11, 2004.
for large scale environments. In Symposium on Interactive [52] E. Wenzel, J. Miller, and J. Abel. A software-based system
3D graphics and games, pages 101–108, 2006. for interactive spatial sound synthesis. In International
[33] L. Savioja, J. Huopaniemi, T. Lokki, and R. Väänänen. Conference on Auditory Display (ICAD), Atlanta, GA,
Creating interactive virtual acoustic environments. Journal April 2000.
of the Audio Engineering Society (JAES), 47(9):675–705,
September 1999.
[34] D. Schröder and T. Lentz. Real-Time Processing of Image
Sources Using Binary Space Partitioning. Journal of the
Audio Engineering Society (JAES), 54(7/8):604–619, July
2006.
[35] D. Schröder and A. Pohl. Real-time Hybrid Simulation
Method Including Edge Diffraction. In EAA Auralization
Symposium, Espoo, Finland, June 2009.
[36] K. Shoemake. Plücker coordinate tutorial. Ray Tracing
News, 11(1), 1998.
[37] S. Siltanen, T. Lokki, S. Kiminki, and L. Savioja. The room
acoustic rendering equation. The Journal of the Acoustical
Society of America, 122(3):1624–1635, September 2007.
[38] S. Siltanen, T. Lokki, and L. Savioja. Frequency domain
acoustic radiance transfer for real-time auralization. Acta
Acustica united with Acustica, 95:106–117(12), 2009.
[39] J. E. Summers, R. R. Torres, and Y. Shimizu.
Statistical-acoustics models of energy decay in systems of
coupled rooms and their relation to geometrical acoustics.

Resound: Interactive Sound Rendering For Dynamic Virtual Environments

Uploaded by

Copyright:

Available Formats

Resound: Interactive Sound Rendering For Dynamic Virtual Environments

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Resound: Interactive Sound Rendering For Dynamic Virtual Environments

Uploaded by

Copyright:

Available Formats

RESound: Interactive Sound Rendering for Dynamic

Micah T. Taylor Anish Chandak

3.2 Ray-based Path Tracing 4. INTERACTIVE SOUND PROPAGATION

4.2 Edge Diffraction paths

direct and first order reflections are convolved with a normal-

6.4 Adding Late Reverberation

# Impulses Compute time (ms)

(a) (b) (c)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.