Der Auwera1
Der Auwera1
Der Auwera1
3, SEPTEMBER 2008
Abstract—The recently developed H.264/AVC video codec with specifications, such as DVB, ATSC, 3GPP, 3GPP2, MediaFLO,
Scalable Video Coding (SVC) extension, compresses non-scalable DMB, DVD Forum (HD-DVD), and Blu-Ray Disc Association
(single-layer) and scalable video significantly more efficiently than
(BD-ROM). At the same time, the introduction of IPTV over
MPEG-4 Part 2. Since the traffic characteristics of encoded video
have a significant impact on its network transport, we examine high speed access network links is ongoing, e.g., over Ethernet
the bit rate-distortion and bit rate variability-distortion perfor- Passive Optical Networks (EPONs) or ADSL2+/VDSL2, and
mance of single-layer video traffic of the H.264/AVC codec and mobile TV technologies are made widely available. IPTV,
SVC extension using long CIF resolution videos. We also compare mobile TV, and satellite TV are considered key applications
the traffic characteristics of the hierarchical B frames (SVC) versus
classical B frames. In addition, we examine the impact of frame size
that can make H.264/AVC the dominant video encoder in the
smoothing on the video traffic to mitigate the effect of bit rate vari- broadcasting and consumer market.
abilities. We find that compared to MPEG-4 Part 2, the H.264/AVC In general, video can be encoded (i) with fixed quantization
codec and SVC extension achieve lower average bit rates at the ex- scales, which results in nearly constant video quality at the ex-
pense of significantly increased traffic variabilities that remain at
pense of variable video traffic (bit rate), or (ii) with rate con-
a high level even with smoothing. Through simulations we investi-
gate the implications of this increase in rate variability on (i) frame trol, which adapts the quantization scales to keep the video bit
losses when transmitting a single video, and (ii) on a bufferless rate nearly constant at the expense of variable video quality [4].
statistical multiplexing scenario with restricted link capacity and In order to examine the fundamental traffic characteristics of
information loss. We find increased frame losses, and rate-distor- the H.264/AVC video coding standard, which does not specify
tion/rate-variability/encoding complexity tradeoffs. We conclude
that solely assessing bit rate-distortion improvements of video en- a normative rate control mechanism, we focus primarily on en-
coder technologies is not sufficient to predict the performance in codings with fixed quantization scales (and provide a brief study
specific networked application scenarios. of encodings with rate control in Section V-D). An additional
Index Terms—Frame loss ratio, H.264/AVC, hierar- motivation for the focus on variable bit rate video encoded with
chical B frames, rate variability-distortion (VD), rate-distortion fixed quantization scales is that the variable bit rate streams
(RD), single-layer video, statistical multiplexing, SVC, video allow for statistical multiplexing gains that have the potential
quality, video traffic. to improve the efficiency of video transport over communica-
tion networks [4]. The development of video network transport
I. INTRODUCTION mechanisms that meet the strict playout deadlines of the video
frames and efficiently accommodate the variability of the video
The bit rate-distortion characteristics of H.264/AVC, H.264 ular encoded bit streams in simulations, are the availability of a
SVC, and MPEG-4 Part 2 have been extensively studied in the large number of traces of long and real video sequences, the fact
literature [1], [7], [8]. In contrast, in the present study, we re- that video traces are not copyrighted, and that only knowledge of
search the joint characterization of bit rate-distortion and higher basic concepts of video encoding are required. We also provide
order bit rate statistics, such as the variability of the bit rate, as a tools that interface with popular network simulators, resulting
function of the distortion. First, we perform a detailed analysis in fast and reliable network simulation results, otherwise only
of elementary statistics of the video traffic. We study statistics available to networking researchers with in-depth video coding
of frame sizes, group of picture (GoP) sizes, frame and GoP expertise and large computational resources for the encoding of
qualities, and correlations between frame sizes and qualities. many long video sequences with numerous encoding parame-
We use bit rate-distortion (RD) and bit rate variability-distor- ters.
tion (VD) curves to compare H.264/AVC and SVC single-layer This paper is structured as follows. In Section II, we review
traffic to the traffic of the MPEG-4 Part 2 [9] encoder, which related work. In Section III, we present a brief overview of the
is the predecessor of H.264/AVC. In addition, we study several examined video coding standards. In Section IV, we describe
GoP structures (including classic B frame prediction and hierar- the employed video test sequences, encoding tools, and video
chical B frame prediction) and analyze the impact of frame size traffic metrics. In Section V, we study the video traffic statis-
smoothing on the video traffic variability. tics for the different encoders and GoP structures considering
Our main findings are that H.264/AVC and H.264 SVC frame and GoP size statistics, autocorrelations, and frame size
single-layer video traffic is significantly more variable than smoothing. In Section VI, we examine the implications of the
MPEG-4 Part 2 traffic under similar encoding conditions. At higher traffic variability with the new H.264 and SVC codecs
the same time, we confirm the significant average bit rate sav- for basic single video stream and multiplexed multiple video
ings. The increased bit rate variability is observed over a wide stream network transport. We summarize our conclusions in
range of average qualities of the encoded streams and for all Section VII.
tested video sequences. This makes the transport of H.264/AVC
and H.264 SVC single-layer traffic more challenging than
II. RELATED WORK
MPEG-4 Part 2 traffic. Even when frame size smoothing is
employed to mitigate the effect of the increased variability, The traffic characterizations of MPEG-1 and MPEG-4 Part
we find that the smoothed traffic is still significantly more 2 [9] encoded video, examined e.g., in [25]–[30], have formed
variable compared to MPEG-4 Part 2 traffic when the same the basis for a plethora of studies addressing the challenges of
smoothing is applied. We simulate two streaming scenarios to modeling the video traffic, see e.g., [16]–[24], and of efficiently
quantify the effect of the increased bit rate variability on (i) transporting the variable bit rate video traffic over networks to
the frame loss ratio when transmitting a single video stream meet the playout deadlines of the video frames, see for instance
over a fixed-bandwidth bottleneck link, and (ii) on a basic [5], [6], [11]–[15], [31]. To the best of our knowledge, the bit
real-time bufferless statistical multiplexing model. We observe rate variability of H.264/AVC and SVC are for the first time
that the increased bit rate variability results in significantly examined in the present study.
higher frame losses for H.264/AVC encoded streams compared Existing studies of the H.264/AVC codec and its extensions,
to MPEG-4 Part 2 encoded streams. Secondly, we observe that such as [1], [7], [8], focus primarily on the rate-distortion (RD)
a significant improvement in bit rate-distortion efficiency does performance, i.e., the video quality (PSNR) as function of the
not suffice to conclude that there is an equal gain in the number average bit rate, and typically consider only short video se-
of supported streams on a link with constrained bandwidth quences up to a few hundred frames. In contrast, for the trans-
and information loss probability. We find that the increased port over communication networks, the traffic variability is also
bit rate variability can lead to insignificant gains in number of a key concern. Therefore, we study the bit rate variability as
supported streams when the additional encoding complexity a function of the video quality or distortion, which we express
is taken into consideration. Therefore, we conclude that solely in the bit rate variability-distortion (VD) curve. In order to ob-
assessing bit rate-distortion improvements of video encoder tain reliable and meaningful statistical estimates of the traffic
technologies is not sufficient to predict the performance in variability and other properties, it is necessary to examine long
certain networked application scenarios, such as statistical video sequences with several thousand frames as we do in this
multiplexing of streams. study.
All encodings presented in this study are publicly available as We note that for one fixed GoP pattern, a preliminary study
from the video traces library at:http://trace.eas.asu.edu. Frame [32] briefly compared the bit rate variability-distortion of the
size video traces [10] are files mainly containing video frame H.264/AVC encoder with the variability of the MPEG-4 Part 2
time stamps, frame types (e.g., I, P, or B), encoded frame sizes and MPEG-2 encoders. In contrast, in this study we compre-
(in bits), and frame qualities (PSNR). Video traces are employed hensively compare the H.264/AVC encoder, the H.264 SVC en-
in simulation studies of the transport of video over communica- coder, and the MPEG-4 Part 2 encoder for a range of GoP pat-
tion networks, see e.g., [11]–[15], and as a basis for video traffic terns. In addition, we compare hierarchical B frames with clas-
models, as for instance in [16]–[24]. Advantages over using reg- sical B frames, examine the impact of rate control on the traffic
700 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008
variability, and explore the implications of the increased vari- Extended profile is meant for error-resilient streaming appli-
abilities on network transport in this study. cations. The FRExt amendment adds four High profiles: High
(HP), High 10 (Hi10P), High 4:2:2 (Hi422P), and High 4:4:4
III. MPEG-4 VIDEO STANDARDS (Hi444P) [7], [34]. The High profile has improved tools which
can result in up to 10% compression gains over the Main profile
We briefly introduce the state-of-the-art video codecs (en- and up to 59% over MPEG-2 for High Definition video with
coder/decoder) in the MPEG-4 family and their applications. only a small increase in computational complexity compared
MPEG-4 is a family of open international standards that provide to the Main profile. Recently, five additional profiles have been
tools for the delivery of multimedia. The tools include codecs added for professional applications, e.g., supporting intra-only
for the compression of audio and video, graphics and interac- encoding.
tive features. MPEG-4’s latest video codec is Part 10 or AVC, A major improvement is the introduction of the entropy
the Advanced Video Codec, which is also identically standard- coding scheme Context Adaptive Binary Arithmetic Coding
ized as ITU H.264. The latest standardization effort addressing (CABAC), which typically gives 10–15% bit rate savings
scalability is the extension of H.264/AVC called Scalable Video [33] over previous variable length coding schemes used in
Coding (SVC). In the following sections we briefly introduce MPEG-2/4. Since arithmetic coding is compute intensive, the
the following video codecs: MPEG-4 Part 2, H.264/AVC, and Main profile also supports a scheme called Context Adaptive
H.264 SVC. Variable Length Coding (CAVLC), which is an improved
version of older variable length coding schemes. Other new
A. MPEG-4 Part 2 normative tools include spatial intra frame prediction which
The MPEG-4 Part 2 [9] standard combines tools in profiles, predicts a region of a given frame from other regions of the
and levels provide a way to limit computational complexity, e.g., same frame, a new integer transform which significantly re-
by specifying the bit rate. For applications where hardware cost duces ringing artifacts, and an adaptive in-loop deblocking
or power considerations make implementing H.264/AVC diffi- filter which reduces artifacts [33]. H.264/AVC also introduces a
cult, MPEG-4 Part 2 offers the Simple and Advanced Simple new tool called Variable Block sizes which introduce a different
Profile specifications. number of square and rectangular macroblock sizes, such as
The most used profile for streaming video is the Simple Pro- (4 4), (8 8), and (16 8) pixels. These different block sizes
file (SP). This profile is defined for two-way and very low com- permit selecting the optimal block size for motion compensa-
plexity receivers, such as wireless videophones. Therefore, the tion and prediction. H.264/AVC also uses Lagrangian based
tools are selected by giving priority to low-delay and low-com- rate-distortion optimization [33].
plexity. SP includes the compression tools to encode I frames In previous standards, one reference frame (I or P) from the
and P frames, 1/2 pixel motion compensation, AC/DC predic- past for prediction of P frame blocks was allowed, and one refer-
tion, 4 motion vectors per macroblock (4-MV) and Unrestricted ence frame (I or P) from the past and one reference frame (I or P)
MV. Furthermore, error-resilience tools are supported. from the future for prediction of B frame blocks were allowed,
The Advanced Simple Profile (ASP) was defined with Internet whereby the blocks from these past and future reference frames
and streaming video in mind. For these applications the delay were weighted equally to form the predicted B frame block.
is less of an issue and the targeted platforms have high pro- Similarly, for prediction of a B frame block in H.264/AVC, two
cessing power. Therefore, ASP has tools that allow to improve blocks are selected from the reference frames; however, there
the quality of video over SP. For example, the ASP profile con- are two lists that each can contain multiple reference frames.
tains 1/4 pixel motion compensation, B frames, and global mo- One block is selected from a frame in each of the two reference
tion compensation. lists and these blocks can be weighted unequally [35].
B. H.264/AVC
C. H.264 SVC
H.264/AVC represents a big leap in video compression tech-
nology with typically a 50% reduction of average bit rate for In 2007, the SVC scalability extension [2], [3] has been
a given video quality compared to MPEG-2 and about a 30% added to the H.264/AVC standard. The SVC extension pro-
reduction compared with MPEG-4 Part 2 [33]. Block trans- vides temporal scalability, coarse (CGS), medium (MGS), and
forms in conjunction with motion compensation and prediction SNR scalability in general, spatial scalability, and combined
are still the core of the encoder as in previous standards, but a spatio-temporal-SNR scalability (restricted set of spatio-tem-
number of new encoding mechanisms have been added which poral-SNR points can be extracted from a global scalable bit
give a much better performance over previous standards [1]. stream).
The H.264/AVC standard defines several profiles. The In the following, we discuss the concept of hierarchical B
Baseline profile is intended for low-delay applications, low frames in more detail, since our study refers to this concept re-
processing power platforms, and for high packet loss environ- peatedly. SVC’s temporal scalability is built on the hierarchical
ments. The Main profile encompasses all tools for achieving prediction concept for B frames. The introduction of hierar-
high coding efficiency for high bit rate applications. The chical B frames has allowed the H.264 SVC encoder to achieve
IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008 701
Fig. 1. B frame prediction structures. (a) Classical B frame prediction structure. (b) Hierarchical B frame prediction structure.
temporal scalability while at the same time improving RD ef- Therefore, H.264 SVC introduces cascading quantizers which
ficiency compared to the classical B frame prediction method assign a higher quantization parameter value (lower quality) to
employed by the older MPEG standards (MPEG-1/2/4-Part 2) B frames belonging to higher temporal layers. This concept is
and by default in H.264/AVC. In Fig. 1, we illustrate both con- based on the insight that the lowest temporal layer 0 requires
cepts for predicting B frames. higher quality than the next temporal layer, since all other pre-
Hierarchical B frames are an important new concept that dictions depend on it. The quality of each subsequent temporal
was first introduced in H.264/AVC using generalized B frames layer can be gradually reduced since fewer layers depend on
and was later found to be the best method to build the Scalable it. Apparently the quality fluctuation that is introduced within
Video Coding (SVC) extension on. Hence, the H.264 SVC en- a GoP is not subjectively noticeable by human observers, as
coded single-layer stream is decodable by existing H.264/AVC studied by the standard committee.
decoders. The scalability modes do require new SVC capability,
with the supported modes depending on the applications or
equivalently on the H.264 SVC profiles. In this description IV. VIDEO SEQUENCES, ENCODING TOOLS, AND VIDEO
we do not go into detail about low-delay or constrained delay TRAFFIC METRICS
B frame prediction structures. We refer to [3] for a detailed
discussion and further reading. A. Video Sequences
Fig. 1(a) depicts the classical B frame prediction structure,
where each B frame is predicted only from the preceding The CIF video sequences used for the statistics presented
I or P frame and from the subsequent I or P frame. Other in this study are the ten minute Sony Digital Video Camera
B frames are not referenced since this is not allowed by video Recorder demo sequence (17,682 frames at 30 frames/sec),
standards preceding H.264/AVC. This restriction is lifted in the which we refer to as Sony Demo sequence, the first half
generalized B frame paradigm that was first introduced in the hour of the Silence of the Lambs movie (54,000 frames at
H.264/AVC standard. Fig. 1(b) depicts the hierarchical B frame 30 frames/sec), the Star Wars IV movie (54,000 frames at
structure which uses B frames for the prediction of B frames. 30 frames/sec), and the first hour of the Tokyo Olympics video
The illustrated case is the dyadic hierarchy of B frames, (133,128 frames at 30 frames/sec). We also use about 30 minutes
meaning that the number of B frames in between the key of the NBC 12 News (49,523 frames at 30 frames/sec), including
pictures (I or P frames) equals . the commercials. The video sequences Silence of the Lambs,
The hierarchy with 3 B frames (I frame period is 16) is Star Wars IV, Tokyo Olympics, and NBC 12 News can respec-
depicted in Fig. 1(b). In this example, the frame sequence tively be described as drama/thriller, science fiction/action,
is , where the sports, and news video. Due to space constraints, we present in
index represents the temporal layer number. Sections V-B and V-C only illustrative plots for encodings with
The coding efficiency of hierarchical B frames depends on Silence of the Lambs and in Section V-E only illustrative plots
the number of hierarchical B frames (temporal levels) and for Silence of the Lambs and Star Wars IV. The corresponding
on the choice of quantization parameters for each B frame. plots for the other video sequences are available in [36].
702 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008
C. Video Traffic Metrics and if is the maximum of all GoP sizes , then the
peak-to-mean GoP size ratio [unit free] is defined as
Here we provide a brief overview of essential video traffic
metrics. For a video sequence consisting of frames encoded (8)
with a given quantization scale, we let
denote the sizes [bits] of the encoded video frames. The mean The coefficient of variation of GoP sizes is
frame size [bits] of the encoded video sequence is defined as
(9)
(3)
(11)
and is widely employed as a measure of the variability of
the frame sizes, i.e., the bit rate variability of the encoded We denote the PSNR quality of a video frame by and
video. Plotting the as a function of the quantization define the average PSNR quality of a video sequence as
scale (or equivalently, the PSNR video quality) gives the
rate variability-distortion (VD) curve [30]. Alternatively, the (12)
peak-to-mean (Peak/Mean or ) ratio of the frame sizes is
commonly used to express the traffic variability. If is the The coefficient of quality variation is defined as
maximum size of all frames, then the peak-to-mean frame
size ratio [unit free] is defined as (13)
Fig. 2. RD and VD curves comparing GoP structures G16-B1, G16-B3, G16-B7, and G16-B15 for Silence of the Lambs. (a) H.264/AVC RD curves (b) H.264/AVC
VD curves (c) H.264 SVC RD curves (d) H.264 SVC VD curves (e) MPEG-4 Part 2 RD curves (f) MPEG-4 Part 2 VD curves.
encoding is compatible with H.264/AVC encoding. Hence, our and allow us to compare the three encoders based on identical
single-layer comparison between H.264/AVC and SVC is equiv- underlying GoP patterns.
alent to a comparison between the classical B frame prediction We employ the H.264/AVC encoder in the Main profile with
and hierarchical B frames. all compression tools enabled, as specified in Section III-B, i.e.,
using variable block sizes, three reference frames for the past
A. Encoding Setup and the future, referenced B frames, P and B frame weighted
In the subsequent experiments, we employ four different prediction, CABAC, and rate-distortion optimization (RDO).
GoP structures, namely IBPBPBPBPBPBPBPB (16 frames, We designate these settings by “Full-RDO”. The H.264 SVC
with 1 B frame per I/P frame), which we denote by G16-B1, settings are similar.
IBBBPBBBPBBBPBBB (16 frames, with 3 B frames per We use the MPEG-4 Part 2 encoder in the Advanced Simple
I/P frame) denoted by G16-B3, IBBBBBBBPBBBBBBB (16 profile (ASP) to encode the sequences, for comparison with the
frames, with 7 B frames per I/P frame) denoted by G16-B7, H.264/AVC encodings. This ASP profile adds B frames to the
and IBBBBBBBBBBBBBBB (16 frames, with 15 B frames per I Simple profile. We employ half pixel motion compensated pre-
frame) denoted by G16-B15. In the context of SVC, these four diction; RDO is not supported by the reference encoder im-
GoP structures are respectively designated by their “GoP size” plementation. The MPEG-4 Part 2 encoder uses one reference
which is the number of hierarchical B frames plus one key frame for the past and one for the future, and 16 16 blocks for
picture, either of type I or P. Hence, G16-B1 has GoP size 2, motion estimation that can be split into 8 8 blocks.
G16-B3 has GoP size 4, G16-B7 has GoP size 8, and G16-B15
has GoP size 16. In the following, we employ our own GoP B. GoP Structure Comparison
structure notation to emphasize the repetitive I-P-B frame type Selected RD graphs for the Silence of the Lambs sequence
patterns in the encodings and to avoid confusion. These four encoded with H.264/AVC, H.264 SVC, and MPEG-4 Part 2
GoP structures are natural structures for hierarchical B frames are depicted in Figs. 2(a), 2(c), and 2(e). Each figure depicts
704 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008
the RD curves for all GoP structures for a particular encoder. the five considered video sequences for the various statistical
We observe that the H.264/AVC encoder achieves the best RD measures. We group H.264/AVC, H.264 SVC, and MPEG-4
performance for GoP structure G16-B3 with almost coinciding Part 2 results for selected quantization scales that provide sim-
RD curves. For the MPEG-4 Part 2 encoder the RD efficiency ilar minimum to maximum ranges (across the five sequences)
decreases significantly with increasing number of B frames in of the mean PSNR frame qualities (30–35 dB, 35–40 dB,
the GoP structures. Contrary to these two encoders, the H.264 and 40–45 dB) to facilitate the comparison of the statistical
SVC encoder achieves best RD performance for the G16-B15 measures across encoders. (We refer to [36] for the detailed
GoP structure and lowest for G16-B1. From RD comparison statistics for all sequences.) We provide statistics for GoP
plots between all three encoders, not included due to space con- structure G16-B3 and also for GoP structure G16-B15 for the
straints, we find that for GoP structure G16-B1, H.264/AVC and SVC encoder, since SVC has best RD efficiency for G16-B15
H.264 SVC have comparable RD performance. However, H.264 among all four GoP structures. The SVC statistics for G16-B3
SVC increasingly outperforms H.264/AVC for GoP structures allow for a comparison across encoders between classical
G16-B3 to G16-B15. and hierarchical B frames based on identical GoP structures,
In addition to the RD graphs, the VD graphs are provided eliminating influences of different numbers of P and B frames
in Figs. 2(b), (d), and (f). From the H.264/AVC figure, we ob- within the GoPs. In the first column of each table the encoding
serve that the bit rate variability increases from GoP structure mode is specified as the GoP structure, e.g., G16B3, followed
G16-B1 to G16-B3, and then decreases for G16-B7 and G16- by a code representing the encoder ( for H.264/AVC with
B15, with the latter having a lower variability than G16-B1. For Full-RDO, SV for H.264 SVC, and for MPEG-4 Part 2),
the MPEG-4 Part 2 encodings, the highest rate variability occurs and ending with the quantization scale.
for G16-B1 and decreases with increasing number of B frames. For each average PSNR quality range, we observe the much
On the contrary, for the H.264 SVC encoder the highest vari- higher compression ratios, or equivalently smaller average
ability occurs for the G16-B15 GoP structure and gradually de- frame sizes and bit rates, obtained with the H.264/AVC, and
creases with decreasing number of B frames. For the GoP struc- H.264 SVC encoders compared to the MPEG-4 Part 2 encoder,
tures G16-B3 to G16-B15, the variabilities of the SVC encod- as well as the significantly higher coefficient of variation
ings are significantly higher than for H.264/AVC, with values and peak-to-mean values. The and values
around 3.0 for the Silence of the Lambs and even surpassing this of the GoP sizes are significantly lower than the values of the
high level for Sony Demo [36]. frame sizes. We provide a detailed analysis of smoothing on
These observed RD and VD behaviors as a function of GoP frame size statistics in Section V-E. In the following, we pro-
structures, are explained as follows. First, there is some influ- vide plots to illustrate the statistical properties of the G16-B3
ence of the choices of quantization parameters for each frame encodings of Silence of the Lambs for relatively high quality
type (I, P, or B). For the H.264/AVC encodings, the quantiza- settings ( for H.264/AVC, for H.264 SVC,
tion parameter of the B frames is set two units larger than the and for MPEG-4 Part 2) and relatively low quality
parameters for the I and P frames (which are equal), while for settings ( for H.264/AVC, for H.264
the MPEG-4 Part 2 encodings we set all quantization param- SVC, and for MPEG-4 Part 2). We have chosen these
eters equal for all frame types. H.264 SVC employs a com- particular settings, because the corresponding average video
plex, but deterministic assignment of quantization parameters qualities of the Silence of the Lambs encodings are very close
to frames belonging to the temporal layers (cascading of quan- for all three encoders.
tization parameters), with the lowest QPs (highest quality) as- Fig. 3 depicts frame sizes as a function of frame number .
signed to frames belonging to the temporal base layer and grad- We observe that the frame sizes have similar behaviors for all
ually higher QPs (lower quality) assigned to frames of higher encodings with peaked and smoothed traffic for approximately
temporal layers. Second, H.264 SVC uses a hierarchical refer- the same indices, which is related to the video content, with peak
ence frame structure (dyadic) inside each GoP that is completely values occurring for frames that are harder to compress. The
different from the reference frame structure employed by the MPEG-4 Part 2 traces overall have larger frame sizes than the
other two encoders. Both reasons, cascading QP assignments H.264/AVC and SVC encodings, except for a few peaks in the
and hierarchical B frame structure, are the cause of the signifi- H.264/AVC and H.264 SVC plots that exceed the corresponding
cantly different behavior of the RD and VD curves of the H.264 peaks in the MPEG-4 Part 2 plots. The coefficient of variation
SVC encoder as a function of the GoP structures compared to is harder to observe visually, but one can estimate the observed
the other encoders. Furthermore, we observe that the better the average frame sizes and compare with the peak values. The av-
RD performance of a particular GoP structure, the higher the erage frame size values of the MPEG-4 Part 2 encodings appear
corresponding traffic variability. to be higher compared to the peaks than for H.264/AVC and
SVC encodings, hence the higher variability of the latter two.
C. Frame Size and GoP Size Statistics For each encoder, we observe that the variability is higher for
the low video quality compared to the high quality.
We summarize key frame size and GoP size statistics in In Fig. 4 we present histograms of the frame sizes which are
Table I by reporting the minimum, mean, and maximum across plotted up to the maximum frame sizes, which are 31,061, 8,291,
IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008 705
TABLE I
OVERVIEW OF FRAME SIZE, GoP SIZE, BIT RATE, AND QUALITY STATISTICS OF SINGLE-LAYER ENCODINGS WITH H.264/AVC , H.264 SVC ,
AND MPEG-4 PART 2
29,044, 7,104, 35,555, and 5,702 Bytes in Figs. 4(a)–(f), respec- tocorrelations. Small negative autocorrelation values appear for
tively. We observe that H.264/AVC and SVC encodings have large lags and are the result of signal symmetries around the
narrower histograms with longer tails than the MPEG-4 Part 2 average frame size. Representative GoP size sequence autocor-
encodings. This is the case both for low and high qualities. This relation plots are provided in Fig. 6. None of the curves have an
resembles the higher energy compaction property of the H.264 exponential decay, indicating the presence of long range depen-
encoders, or equivalently, their better compression efficiency. dencies.
The GoP size histograms of the H.264/AVC and SVC encoders,
not included due to space constraints, exhibit similar narrow- D. Impact of Rate Control on Rate Variabilities
ness compared to MPEG-4 Part 2. So far we have focused on open-loop variable bit rate en-
In Fig. 5, we plot the autocorrelation coefficient of the frame coding, which allows us to examine the pure impact of video
sizes as a function of the lag in frames. The frame size auto- encoding technologies on traffic statistics. Nevertheless, often
correlation is a “comb of spikes” superimposed on a slowly de- rate control algorithms are used to adapt the bit rate of a video
caying curve. The larger peaks occur for lags that are multiples stream towards a specified target bit rate. Studying rate con-
of 16, i.e., the I frame period, and are the result of the correla- trolled video traffic implies the selection of a particular algo-
tion of the large I frames with each other and also the P frames, rithm [37], and hence dependency of the traffic analysis on this
and to a lesser extent the B frames. The three smaller peaks in algorithm. With these limitations in mind, we provide rate con-
between the larger peaks are the result of the correlation of the trol results for comparison with the variable bit rate statistics of
I and the P frames with each other. For other lag values, the I MPEG-4 Part 2 and H.264/AVC encodings provided in Table I.
or P frames are correlated with the B frames, resulting in rel- We consider the TM5 rate control technique for MPEG-4 Part
atively small autocorrelation. We observe that the decay of the 2 encodings and the rate control algorithm of the JM 12.2 refer-
autocorrelation curves is somewhat faster for the high qualities ence software for H.264/AVC encodings [37]. We set the target
than for the low qualities. The decay of the MPEG-4 Part 2 en- bit rates for each sequence equal to the mean bit rates of the
codings is much faster than for the H.264/AVC and SVC au- corresponding variable bit rate encodings with GoP structure
706 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008
Fig. 3. Frame size plots of Silence of the Lambs G16-B3 encodings. (a) H.264/AVC (b) H.264/AVC (c) H.264 SVC (d)
H.264 SVC (e) MPEG-4 Part 2 (f) MPEG-4 Part 2 .
G16-B3. Table I summarizes the traffic statistics, whereby FRC sion efficiency, result in widely varying individual frame sizes
means H.264/AVC with rate control and MpRC means MPEG-4 and qualities.
Part 2 with rate control. The H.264/AVC rate control achieved From this brief rate control experiment, we conclude that rate
all target rates quite accurately for all sequences, while TM5 control has very limited effectiveness in mitigating the observed
mostly achieved its target rates within a small margin. increases of the bit rate variabilities between MPEG-4 Part 2
We first observe from Table I that the mean and of and H.264/AVC. We leave a detailed analysis of rate control for
the frame sizes as well as the values with rate control are future work.
typically larger than the corresponding metrics without rate con-
trol. On the other hand, the mean of the GoP sizes with rate E. Frame Size Smoothing
control is typically smaller than without rate control. Further- In order to mitigate the effect of variable video frame sizes
more, the maximum and values for frame and GoP on network transport, a wide variety of frame size smoothing
sizes, are typically significantly larger for the rate controlled mechanisms have been developed and studied in the con-
traffic, while the minimum and values are smaller text of the MPEG-4 Part 2, H.263, and preceding codecs,
for GoP sizes with rate control. These observations can be ex- see for instance [38]–[45]. In this section, we examine the
plained by the long video sequences with many scene changes fundamental impact of frame size smoothing on H.264/AVC,
that make prediction of rates by the control algorithm more chal- H.264 SVC, and MPEG-4 Part 2 traffic by considering the
lenging, resulting in larger maximum and . More- elementary smoothing of the frames over non-overlapping
over, the larger time horizons, such as GoP lengths, that the rate blocks of frames each. More specifically, with the aggrega-
control algorithms work on to achieve the target bit rate, and the tion level , the sizes of consecutive frames are averaged,
different treatment of I, P, and B frames to maintain compres- and transmitted at the corresponding average bit rate across a
IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008 707
Fig. 4. Frame size histogram plots of Silence of the Lambs G16-B3 encodings. (a) H.264/AVC (b) H.264/AVC (c) H.264 SVC
(d) H.264 SVC (e) MPEG-4 Part 2 (f) MPEG-4 Part 2 .
network. Given the original (unsmoothed) frame size sequence over two frames. More smoothing (achieved with larger ) of
, we obtain the smoothed frame sizes the H.264/AVC and SVC traffic lowers the variability, how-
ever, for the same smoothing the MPEG-4 traffic variability also
drops and stays well below the smoothed H.264/AVC, and SVC
(14) traffic. In some cases, such as for the Silence of the Lambs se-
quence with GoP structure G16-B15 [36], the variability of the
H.264/AVC, and SVC traffic smoothed over eight frames is still
for and examine their CoV.
higher than or comparable to the unsmoothed MPEG-4 Part 2
To illustrate the effect of frame size smoothing on the bit
traffic.
rate variability, we plot the VD curves of both the unsmoothed
These encoding results illustrate the significantly higher bit
and the smoothed (denoted by in the figures) H.264/AVC,
rate variability of H.264/AVC and H.264 SVC video traffic com-
SVC, and MPEG-4 Part 2 video traffic of selected Silence of
pared to MPEG-4 Part 2 video traffic, even when frame size
the Lambs and Star Wars IV encodings in Figs. 7 and 8. The
smoothing is applied. This increased rate variability must be
traffic is smoothed over respectively and frames.
taken into account and its impact evaluated when using existing
From Figs. 7 and 8, and VD plots of other encodings [36], we
network protocols and mechanisms for streaming H.264/AVC
observe that the variability of the H.264/AVC and SVC traffic
and H.264 SVC encoded video.
smoothed over two frames is significantly higher than the un-
smoothed MPEG-4 Part 2 traffic for all sequences and all GoP
F. Quality and Correlation Statistics
structures, except for G16-B1 [36]. For the latter, the variability
of the smoothed traffic is partially higher and partially lower Next, we analyze the video quality of our encodings. We use
than the unsmoothed MPEG-4 Part 2 traffic. However, it is al- the PSNR as our quality metric, which is overall a good measure
ways higher than the variability of the MPEG-4 traffic smoothed of video frame quality and is easy to compute for large numbers
708 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008
Fig. 5. Frame size autocorrelation plots of Silence of the Lambs G16-B3 encodings. (a) H.264/AVC (b) H.264/AVC (c) H.264 SVC
(d) H.264 SVC (e) MPEG-4 Part 2 (f) MPEG-4 Part 2 .
of long video encodings. For a detailed specification of the sta- mary, we found that there exists a general trend that the mag-
tistics used in this section, we refer to [28]. We focus on the nitude of on the frame level decreases as the quality de-
luminance component in our analysis. creases. On the GoP level, the magnitude of tends to be
We observe from Table I for all three encoders that the mean higher than on the frame level and tends to increase with de-
PSNR decreases as the quantization parameter used in the creasing quality for the H.264/AVC encodings. Conversely, for
encodings increases. This is expected for decreasing bit rates. the MPEG-4 Part 2 encodings, the GoP level magnitudes tend
Conversely, the coefficient of quality variation increases to decrease with decreasing video quality as do the frame level
when the video quality decreases. This means that the relative magnitudes. This is an interesting distinction between both en-
quality fluctuations are larger and more visible when the video coders.
quality is low. The same observations are valid on the GoP
level. (The GoP quality metrics are not included in Table I due VI. IMPLICATIONS OF INCREASED RATE VARIABILITIES
to space constraints.) Furthermore, we found that the values of In the previous sections, we focused on the statistical char-
the coefficient of quality variation on the GoP level are close acterization of the single-layer (non-scalable) video traffic as
to the values on the frame level. However, from an examina- generated by the H.264/AVC (classical B), H.264 SVC (hier-
tion of the quality ranges (difference between highest and lowest archical B), and MPEG-4 Part 2 (classical B) encoders. We
PSNR frame quality) we found a distinction between the frame observed the improved rate-distortion (RD) efficiency of hier-
level and the GoP level, with the latter ranges being consistently archical B frames (H.264 SVC) compared to the classical B
smaller. These trends are independent of the GoP structures. frames (H.264/AVC), and a tremendous RD improvement over
The report [36] also presents the frame size-PSNR quality MPEG-4 Part 2. However, together with this increase in RD
correlation coefficients , as well as the corresponding cor- efficiency, the bit rate variability, measured in the coefficient
relation coefficient for the GoP aggregation level. In sum- of variation and the peak-to-mean ratio of the frame and GoP
IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008 709
Fig. 7. VD curves for Silence of the Lambs with GoP structure G16-B3, un-
smoothed and smoothed . (a) Unsmoothed, smoothed (b) Un-
smoothed, smoothed .
Fig. 6. GoP size autocorrelation plots of Silence of the Lambs G16-B3 encod-
ings. (a) H.264/AVC (b) H.264 SVC (c) MPEG-4
Part 2 .
A. Implications for Frame Loss Ratio Fig. 8. VD curves for Star Wars IV with GoP structure G16-B3, unsmoothed
and smoothed . (a) Unsmoothed, smoothed (b) Unsmoothed,
1) Encoding Setup: Ten different half hour video sequences, smoothed .
namely Silence of the Lambs, Star Wars IV, Indiana Jones, Cit-
izen Kane, Die Hard, The Firm, Terminator 1, Gandhi, Tokyo
Olympics, and NBC News were encoded with H.264/AVC and 2) Results and Discussion: We evaluate the frame loss ratio,
MPEG-4 Part 2 with GoP structure of G16-B3 in CIF resolution i.e., number of frames dropped in the network to the number of
as in the previous sections. transmitted frames, through NS-2 [46] simulations. We consider
710 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008
Fig. 9. simulation and curves for five long CIF sequences encoded with H.264/AVC (G16-B3), H.264 SVC (G16-B15), and MPEG-4 Part 2
(G16-B3). The channel capacity is Mbps and the bit loss probability is . Perfect CBR curves are included for reference. (a) Silence
of the Lambs (b) Star Wars IV (c) Sony Demo (d) NBC 12 News (e) Tokyo Olympics.
Fig. 10. Detail plots for the quality range 35–40/42 dB of the simulations in Fig. 9. Perfect CBR curves are included for reference. (a) Silence
of the Lambs (b) Star Wars IV (c) Sony Demo (d) NBC 12 News (e) Tokyo Olympics.
B. Implications for Statistical Multiplexing We define this basic real-time video streaming scenario to
provide a “ground truth” for studying the implications of the bit
1) Experimental Setup: We investigate a basic real-time rate variabilities. We could have chosen a complex streaming
frame-based video streaming scenario modeled by a bufferless scenario with several routers, buffers, aggregated traffic con-
statistical multiplexer [30], [47]–[50]. In this model, a channel sisting of diverse video streams (content) and data cross traffic,
with bandwidth capacity connects a video server with a etc., however, this would introduce “arbitrary” parameters that
bufferless statistical multiplexer to receivers. Each video influence the outcome of the experiment.
frame is transmitted during one frame period (e.g., 33 ms for In our simulations, we measure the information loss proba-
a frame rate of 30 frames/s). If the frame size equals bility [48], [49], i.e., the information loss (bits) that occurs when
bits, with denoting the frame index and the stream index, the aggregated bit rate exceeds the channel capacity , and
then the bit rate required to transmit frame of stream is is given by:
given by . If frame of each stream
is statistically multiplexed onto the channel, then the aggregated (15)
bit rate is given by .
In each experiment, we stream identical video sequences, where . The goal of the first set of simu-
however, for each stream the starting phase is randomly se- lations is to estimate the maximum number of video streams
lected according to a uniform distribution over all frames that can be accommodated by the link capacity , while
of this one sequence [10], [48]. The constraining the information loss probability to a value smaller
streams are wrapped around to obtain streams of equal lengths. than . We consider and , and set
712 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008
Mbps. Many independent replications of each simulation were viding evidence of the profound impact of the rate variability
run until the 90% confidence interval of the information loss increase of H.264 SVC on compared to H.264/AVC. We
probability estimate was less than 10% of the corresponding remark that there is no data available for the MPEG-4 Part 2
sample mean. In the second set of simulations, we estimate the curves in Figs. 9(a), (b), and (e) below respectively 35 dB, 34
minimum link capacity that accommodates a prescribed dB, and 32.5 dB. The reason is that for these sequences the
number of streams while keeping the information loss prob- lowest quality is achieved around these qualities.
ability smaller than a specified information loss probability , In Fig. 10, we zoom in on the high quality range ( 35 dB)
which we set to . For each estimate we perform of Fig. 9. In the high quality range, the average bit rates of the
500 runs, each consisting of 1000 independent video streaming streams are quite large compared to . Therefore, the number of
simulations. streams that can be supported by the link is relatively small and
We consider the five long CIF sequences described in as a result the statistical multiplexing effect that mitigates the
Section IV, and encode them with H.264/AVC using GoP struc- rate variability of the streams is reduced. We observe that the
ture G16-B3, with H.264 SVC using GoP structure G16-B15, gap between the curves of the unsmoothed H.264/AVC
and with MPEG-4 Part 2 using GoP structure G16-B3. The and H.264 SVC traffic is quite narrow for all sequences. Very
chosen quantization parameters correspond to the range of interesting is that the curves of the Sony Demo and NBC
average PSNR qualities from approximately 30 dB (acceptable 12 News intersect around 35 dB when the number of multi-
quality) to at least 40 dB (high quality). We selected the GoP plexed streams drops below approximately 20. This is a very im-
structures so that overall the highest RD efficiency is achieved portant observation, because this means that the RD efficiency
for each encoder, as we observed in Section V. This way we are gain of H.264 SVC is completely compensated by the associ-
able to study the implications of the higher rate variability of hi- ated increased rate variability. For very high quality ( 38 dB)
erarchical B frames which result in higher RD efficiency at the the H.264 SVC curve for the Sony Demo sequence even
expense of a significant increase in computational complexity. approaches the MPEG-4 Part 2 curve, and surprisingly, for
2) Results for : Fig. 9 depicts the curves and the NBC 12 News sequence the H.264 SVC curve is below
simulation curves that are obtained with for the five the MPEG-4 Part 2 curve.
sequences. Next, we discuss the simulation curves for While the RD gains of H.264 SVC compared to the other
H.264/AVC encodings (SIM-G16B3-H.264), for H.264 SVC en- two encoders are generally the largest for high qualities, this
codings (SIM-G16B15-SVC), and for MPEG-4 Part 2 encodings advantage does not necessarily translate into effective gains in
(SIM-G16B3-MP4). the number of supported streams. On the contrary, the very ad-
For each sequence, the average bit rate difference between vanced H.264 SVC encoder is outperformed by the MPEG-4
the three encoders is immediately clear. The curves in the Part 2 encoder in case of the complex NBC 12 News sequence
quality range up to 35 dB demonstrate a significant increase in at very high qualities. The above observations are clearly depen-
the number of streams that the link supports for the H.264 SVC dent on the video content, but also on the number of multiplexed
and H.264/AVC encodings compared to MPEG-4 Part 2. How- streams. Therefore, in Section VI-B-3 we perform a second set
ever, the values are affected by the rate variability of the of simulations that estimate the required link capacity given the
video traffic. To illustrate this effect, we additionally plot the number of supported streams subject to a maximum information
curves corresponding to the multiplexing of perfect con- loss probability.
stant bit rate (PCBR) traffic, denoted by PCBR in Fig. 9. We de- Next, we provide in Fig. 11 the detailed analysis of the
fine PCBR video traffic as the sequence of identical frame size gains of H.264 SVC with respect to H.264/AVC for the
values that are all equal to the average frame size of the video information loss probability . Each plot shows two
stream. Hence, the rate variability of a PCBR video stream is curves and two curves for respectively H.264/AVC
zero and is determined by dividing the link capacity by (G16-B3) and H.264 SVC (G16-B15). A quick survey of the
the stream’s average bit rate, resulting in the theoretical max- curves shows a significant increase of the number of
imum value for . Comparing the curves of the un- streams that the link supports for the H.264 SVC encodings
smoothed VBR traffic with those of the PCBR video traffic, compared to H.264/AVC, especially in the lower half of the
we observe large differences that are attributable to the rate quality range in each figure.
variability. The VBR traffic clearly results in fewer supported To obtain insight into the stream gains versus the bit rate
streams on the link than the PCBR video traffic. We also ob- gains, we fit fourth order polynomials (least mean squares fit)
serve for all sequences that the gap between the PCBR through each set of simulation points and corresponding
curves of the H.264 SVC and the H.264/AVC encodings is much points. These fitted polynomials allow for the resampling of
wider than the gap between the corresponding H.264 SVC and the curves and for the computation of relative gains (%) based
H.264/AVC VBR traffic curves. This indicates that the relatively on samples with corresponding average qualities (PSNR). We
large reduction in the average bit rate with H.264 SVC compared define the RD gain or average bit rate gain as the difference be-
to H.264/AVC translates generally into only relatively small in- tween the bit rates of two streams with identical average qual-
creases in the number of supported H.264 SVC VBR traffic ities, respectively encoded with H.264/AVC and SVC. We di-
streams compared to the number of H.264/AVC streams, pro- vide this difference by the H.264/AVC bit rate and express it as
IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008 713
Fig. 11. simulation and curves for five long CIF sequences encoded with H.264/AVC (G16-B3) and H.264 SVC (G16-B15). The channel capacity
is Mbps and the bit loss probability is . (a) Silence of the Lambs (b) Star Wars IV (c) Sony Demo (d) NBC 12 News (e) Tokyo Olympics.
a percentage in Fig. 12. We define the gain or supported average qualities. Secondly, the supported stream gain curves
stream gain as the difference between the value of the SVC reach a maximum and are parabolic in contrast to the linear bit
stream and the value of the H.264/AVC stream, again for rate gain curves. Therefore, the observed gain differences be-
identical qualities. We divide this difference by the value tween the bit rate gains and the supported stream gains depend
of the SVC stream and express it as a percentage in Fig. 12. Pos- on the average quality of the stream. Since supported stream
itive gains correspond to an increase in the number of supported gains differ strongly with the video sequence, there is a strong
streams and a reduction in the average bit rate of H.264 SVC. content dependency as well. All these observations point to the
The reason for this choice of gain definitions is explained next. strong implications of the bit rate variability of the stream under
In Fig. 12, the linear trend curves represent the average bit test, which results in significant supported stream losses com-
rate gains as a function of the average quality and the parabolic pared to the PCBR scenario. There can even be negative gains
trend curves represent the supported stream gains. We observe of supported streams or, equivalently, fewer streams are sup-
that the average bit rate gains exceed 10% and reach values of ported by the link with H.264 SVC than with H.264/AVC even
more than 25%. In the perfect constant bit rate streaming sce- though there is a significant bit rate gain (i.e., average bit rate re-
nario (PCBR), the average bit rate determines the number of duction) with H.264 SVC compared to H.264/AVC. This is the
supported streams on the link. This implies that the supported case for high quality encodings of the sequences NBC 12 News
stream gains equal the bit rate gains according to our gain defini- ( 36 dB), Sony Demo ( 35 dB), and Tokyo Olympics ( 40.5
tions above. However, in our streaming scenario, this is clearly dB).
not the case as we observe supported stream gains that are sig- From these observations, we could conclude that the average
nificantly smaller than the bit rate gains for the entire range of bit rate efficiency improvements of H.264 SVC, using the
714 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008
Fig. 13. Minimum channel capacity simulation results for five long CIF sequences encoded with H.264/AVC (G16-B3), H.264 SVC (G16-B15), and
MPEG-4 Part 2 (G16-B3). The bit loss probability is . curves are provided for the number of streams , 16, and 64. (a) Silence of the Lambs
(b) Star Wars IV (c) Sony Demo (d) NBC 12 News (e) Tokyo Olympics.
can theoretically be determined by building the proba- structure G16-B15, i.e., with five temporal layers, this means
bility density distribution of the aggregated streams (aggregated that to achieve a similar average quality as for H.264/AVC with
frame sizes) and determining the capacity that results in a spec- classical B frames, the qualities of the H.264 SVC I frames must
ified loss . Since these distributions are long-tailed, for a small be higher than the qualities of the H.264/AVC I frames. Con-
loss probability, such as , the value is determined sequently, the H.264 SVC I frame sizes will be larger than the
by the probability situated in the tail of the distribution. The dis- H.264/AVC I frame sizes. This means that the aggregated stream
tribution’s tail length (maximum aggregated frame size) is de- distributions of H.264 SVC streams will have longer tails than
termined by the maximum frame size of the multiplexed stream. the distributions of H.264/AVC streams. Since the value
We have observed that H.264 SVC streams (G16-B15) have for small losses is dependent on the probability in the tail and
larger maximum frame sizes than H.264/AVC streams (G16-B3) the tail length, H.264 SVC can have higher values than
for approximately the same average video quality. The larger H.264/AVC, even though H.264 SVC has smaller average frame
maximum frame sizes of H.264 SVC are caused by the cas- sizes than H.264/AVC.
cading quantizers of H.264 SVC that assign a different quality
or quantizer parameter to frames of different temporal layers. VII. CONCLUSIONS
Since I and P frames need the highest quality, they are also as- We have examined in detail the network traffic characteris-
signed the smallest quantizer parameter, while subsequent tem- tics of variable bit rate H.264/AVC and H.264 SVC single-layer
poral layers are assigned larger quantizers and, hence, lower (non-scalable) encoded video. We have focused on a set of long
quality, as explained in Section III-C. For H.264 SVC with GoP video test sequences with a wide range of typical texture and
716 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008
motion features. In summary, we found the following distinct video transport mechanisms for a wide range of communication
characteristics of the H.264/AVC and H.264 SVC video traffic: networks, including general IP networks, see e.g., [51]–[54],
• From our joint characterization of the average bit rate and wireless networks, see e.g., [55]–[57], and peer-to-peer net-
bit rate variability for a fixed desired video quality, we work [58]–[60], were primarily developed based on MPEG-4
confirmed that H.264/AVC, and H.264 SVC codecs lead Part 2 video traffic. It is therefore necessary to examine how
to significant average bit rate savings with respect to the well these existing traffic models describe and how efficiently
MPEG-4 Part 2 codec. At the same time, the variability the existing mechanisms can transport the significantly more
of the H.264/AVC and H.264 SVC video traffic is signif- variable H.264/AVC and SVC video traffic. If necessary the
icantly higher than the variability of the MPEG-4 Part 2 existing traffic models and transport mechanisms need to be
video traffic. Whereas the coefficient of variation (standard extended to accommodate the unprecedented variability of the
deviation normalized by mean) of the frame sizes reaches H.264/AVC and SVC video traffic.
levels above 2.4 for H.264/AVC, and even above 3.0 for
H.264 SVC, it does generally not exceed 1.5 with MPEG-4 ACKNOWLEDGMENT
Part 2 [24], [30]. The authors thank Prof. Lina Karam of Arizona State Univer-
• The comparison between classical B frames (default in sity for insightful discussions on SVC and the bufferless statis-
H.264/AVC) and hierarchical B frames (H.264 SVC), tical multiplexing experiment.
based on four GoP structure patterns that are supported by
the encoders (G16-B1, G16-B3, G16-B7, and G16-B15), REFERENCES
indicates that hierarchical B frames outperform classical [1] D. Marpe, T. Wiegand, and G. Sullivan, “The H.264/MPEG-4
B frames at the expense of higher rate variability. From advanced video coding standard and its applications,” IEEE Commu-
the four tested GoP structures, G16-B3 results in the best nications Magazine, vol. 44, no. 8, pp. 134–143, Aug. 2006.
[2] R. Schafer, H. Schwarz, D. Marpe, T. Schierl, and T. Wiegand,
RD efficiency for H.264/AVC with classical B frames and “MCTF and scalability extension of H.264/AVC and its application
G16-B15 results in the best RD efficiency for H.264 SVC to video transmission, storage and surveillance,” in Proceedings of
Visual Communications and Image Processing (VCIP), Proceedings
with hierarchical B frames. of SPIE—Volume 5960, Beijing, China, July 2005, pp. 596 011-1–596
• Depending on the application scenario, it may be pos- 011-12.
sible to smooth the video traffic before sending it into [3] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable
video coding extension of the H.264/AVC standard,” IEEE Trans. Cir-
the network, thus reducing the traffic variability at the cuits and Systems for Video Technology, vol. 17, no. 9, pp. 1103–1120,
expense of introducing smoothing delay. We observed Sept. 2007.
[4] T. Lakshman, A. Ortega, and A. Reibman, “VBR video: Tradeoffs and
that the smoothed H.264/AVC and H.264 SVC video potentials,” Proceedings of the IEEE, vol. 86, no. 5, pp. 952–973, May
traffic exhibits variabilities at the same level or above the 1998.
unsmoothed MPEG-4 Part 2 video traffic, indicating that [5] A. R. Reibman and M. T. Sun, Compressed Video Over Networks.
New York: Marcel Dekker, 2000.
even when smoothing is employed, the transport mecha- [6] D. Wu, Y. Hou, W. Zhu, Y.-Q. Zhang, and J. Peha, “Streaming video
nisms for the new H.264/AVC (and extensions) video will over the internet: Approaches and directions,” IEEE Trans. Circuits and
Systems for Video Technology, vol. 11, no. 3, pp. 282–300, Mar. 2001.
need to be designed to accommodate substantial traffic [7] D. Marpe, T. Wiegand, and S. Gordon, “H.264/MPEG-4 AVC Fidelity
variabilities. Range Extensions: Tools, profiles, performance, and application
• Our streaming simulation studies demonstrated (i) that the areas,” in Proc. IEEE Int. Conf. on Image Proc. (ICIP), Sept. 2005,
pp. 593–596.
increased bit rate variability results in significantly higher [8] M. Wien, H. Schwarz, and T. Oelbaum, “Performance analysis of
frame losses for H.264/AVC encoded video compared SVC,” IEEE Trans. Circuits and Systems for Video Technology, vol.
17, no. 9, pp. 1194–1203, Sept. 2007.
to MPEG-4 Part 2 encoded video when transmitting a [9] ISO/IEC JTC 1/SC 29/WG 11 N2802, Information Technology-
single video stream over a bottleneck link, and (ii) that Generic Coding of Audio-Visual Objects-Part 2: Visual, Final Pro-
for bufferless statistical multiplexing, a significant im- posed Draft Amendment 1. Geneva, July 1999.
[10] P. Seeling, M. Reisslein, and B. Kulapala, “Network performance eval-
provement in bit rate-distortion efficiency does not suffice uation with frame size and quality traces of single-layer and two-layer
to conclude that there is an equal gain in the number of video: A tutorial,” IEEE Communications Surveys and Tutorials vol. 6,
no. 3, pp. 58–78, Third Quarter, 2004 [Online]. Available: http://trace.
streams that can be statistically multiplexed onto a link eas.asu.edu, video traces
subject to an information loss probability constraint. We [11] S. Bakiras and V. O. K. Li, “Maximizing the number of users in an
have thus demonstrated the relevance and importance of interactive video-on-demand system,” IEEE Trans. Broadcasting, vol.
48, no. 4, pp. 281–292, Dec. 2002.
investigating the implications of increased video traffic [12] P. Koutsakis and M. Paterakis, “Policing mechanisms for the transmis-
rate variabilities for video network transport, and that sion of videoconference traffic from MPEG-4 and H.263 video coders
in wireless ATM networks,” IEEE Trans. Vehicular Technology, vol.
solely focusing on rate-distortion efficiency improvements 53, no. 5, pp. 1525–1530, 2004.
may not necessarily lead to optimal operating points for [13] B. Nikolaus, J. Ott, C. Borrmann, and U. Borrmann, “Generalized
all networking scenarios. greedy broadcasting for efficient media-on-demand transmissions,”
IEEE Trans. Broadcasting, vol. 51, no. 3, pp. 354–359, 2005.
There are several directions for important future work. One [14] J. Roberts, “Internet traffic, QoS, and pricing,” Proceedings of the
direction is to examine the suitability of existing traffic models IEEE, vol. 92, no. 9, pp. 1389–1399, 2004.
[15] Y. Xu and R. Guerin, “Individual QoS versus aggregate QoS: A loss
and video transport mechanisms for H.264/AVC and SVC performance study,” IEEE/ACM Trans. Networking, vol. 13, no. 2, pp.
video traffic. The existing traffic models, such as [16]–[23], and 370–383, 2005.
IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008 717
[16] A. Alheraish, S. Alshebeili, and T. Alamri, “A GACS modeling ap- [38] C. Bewick, R. Pereira, and M. Merabti, “Network constrained
proach for MPEG broadcast video,” IEEE Trans. Broadcasting, vol. smoothing: Enhanced multiplexing of MPEG-4 video,” in Pro-
50, no. 2, pp. 132–141, June 2004. ceedings of IEEE International Symposium on Computers and
[17] N. Ansari, H. Liu, Y. Q. Shi, and H. Zhao, “On modeling MPEG video Communications, Taormina, Italy, July 2002, pp. 114–119.
traffics,” IEEE Trans. Broadcasting, vol. 48, no. 4, pp. 337–347, Dec. [39] H.-C. Chao, C. L. Hung, and T. G. Tsuei, “ECVBA traffic-smoothing
2002. scheme for VBR media streams,” International Journal of Network
[18] M. Dai and D. Loguinov, “Analysis and modeling of MPEG-4 and Management, vol. 12, pp. 179–185, 2002.
H.264 multi-layer video traffic,” in Proc. of IEEE INFOCOM, Miami, [40] W.-C. Feng and J. Rexford, “Performance evaluation of smoothing al-
FL, Mar. 2005, pp. 2257–2267. gorithms for transmitting prerecorded variable-bit-rate video,” IEEE
[19] X.-D. Huang, Y.-H. Zhou, and R.-F. Zhang, “A multiscale model for Trans. Multimedia, vol. 1, no. 3, pp. 302–312, Sept. 1999.
MPEG-4 varied bit rate video traffic,” IEEE Trans. Broadcasting, vol. [41] T. Gan, K.-K. Ma, and L. Zhang, “Dual-plan bandwidth smoothing
50, no. 3, pp. 323–334, Sept. 2004. for layer-encoded video,” IEEE Trans. Multimedia, vol. 7, no. 2, pp.
[20] M. M. Krunz and A. M. Makowski, “Modeling video traffic using 379–392, Apr. 2005.
input processes: A compromise between Markovian and [42] M. Krunz, W. Zhao, and I. Matta, “Scheduling and bandwidth allo-
LRD models,” IEEE Journal on Selected Areas in Communications, cation for distribution of archived video in VoD systems,” Journal of
vol. 16, pp. 733–748, June 1998. Telecommunication Systems, Special Issue on Multimedia, vol. 9, no.
[21] C. H. Liew, C. K. Kodikara, and A. M. Kondoz, “MPEG-encoded vari- 3/4, pp. 335–355, Sept. 1998.
able bit-rate video traffic modelling,” IEE Proceedings Communica- [43] H. Lai, J. Y. Lee, and L.-K. Chen, “A monotonic-decreasing rate sched-
tions, vol. 152, no. 5, pp. 749–756, Oct. 2005. uler for variable-bit-rate video streaming,” IEEE Trans. Circuits and
[22] U. K. Sarkar, S. Ramakrishnan, and D. Sarkar, “Modeling full-length Systems for Video Technology, vol. 15, no. 2, pp. 221–231, Feb. 2005.
video using Markov-modulated gamma-based framework,” IEEE/ACM [44] A. Solleti and K. J. Christensen, “Efficient transmission of stored
Trans. Networking, vol. 11, no. 4, pp. 638–649, Aug. 2003. video for improved management of network bandwidth,” International
[23] U. K. Sarkar, S. Ramakrishnan, and D. Sarkar, “Study of long duration Journal of Network Management, vol. 10, pp. 277–288, 2000.
MPEG-trace segmentation methods for developing frame size based [45] J. C. H. Yuen, E. Chan, and K.-Y. Lam, “Real time video frames allo-
traffic models,” Computer Networks, vol. 44, no. 2, pp. 177–188, 2004. cation in mobile networks using cooperative pre-fetching,” Multimedia
[24] G. Van der Auwera, M. Reisslein, and L. J. Karam, “Video texture Tools and Applications, vol. 32, no. 3, pp. 329–352, Mar. 2007.
and motion based modeling of rate variability-distortion (VD) curves,” [46] “NS-2 The Network Simulator,” 2007, available from [Online]. Avail-
IEEE Trans. Broadcasting, vol. 53, no. 3, pp. 637–648, Sept. 2007. able: http://www.isi.edu/nsnam/ns/
[25] W.-C. Feng, Buffering Techniques for Delivery of Compressed Video [47] S. Racz, T. Jakabfy, J. Farkas, and C. Antal, “Connection admission
in Video-on-Demand Systems. : Kluwer Academic Publisher, 1997. control for flow level QoS in bufferless models,” in Proc. IEEE IN-
[26] M. Garrett and W. Willinger, “Analysis, modeling and generation of FOCOM, 2005, pp. 1273–1282.
self-similar VBR video traffic,” in Proceedings of ACM Sigcomm, [48] M. Reisslein and K. W. Ross, “Call admission for prerecorded sources
London, UK, Sept. 1994, pp. 269–280. with packet loss,” IEEE Journal on Selected Areas in Communications,
[27] M. Krunz, R. Sass, and H. Hughes, “Statistical characteristics and mul- vol. 15, no. 6, pp. 1167–1180, Aug. 1997.
tiplexing of MPEG streams,” in Proceedings of IEEE Infocom ’95, [49] J. Roberts, U. Mocci, and J. Virtamo, Broadband Network Traffic: Per-
April 1995, pp. 455–462. formance Evaluation and Design of Broadband Multiservice Networks,
[28] M. Reisslein, J. Lassetter, S. Ratman, O. Lotfallah, F. Fitzek, and Final Report of Action COST 242, (Lecture Notes in Computer Science,
S. Panchanathan, “Traffic and quality characterization of scalable Vol. 1155). New York: Springer Verlag, 1996.
encoded video: A large-scale trace-based study, Part 1: Overview [50] Z. Zhang, J. Kurose, J. Salehi, and D. Towsley, “Smoothing, statistical
and definitions,” ASU, Tempe, AZ, Dec. 2003 [Online]. Available: multiplexing and call admission control for stored video,” IEEE
Journal on Selected Areas in Communications, vol. 13, no. 6, pp.
http://trace.eas.asu.edu
1148–1166, Aug. 1997.
[29] O. Rose, “Simple and efficient models for variable bit rate MPEG video
[51] T. Ahmed, A. Mehaoua, R. Boutaba, and Y. Iraqi, “Adaptive packet
traffic,” Performance Evaluation, vol. 30, no. 1–2, pp. 69–85, 1997.
video streaming over IP networks: A cross-layer approach,” IEEE
[30] P. Seeling and M. Reisslein, “The rate variability-distortion (VD)
Journal on Selected Areas in Communications, vol. 23, no. 2, pp.
curve of encoded video and its impact on statistical multiplexing,”
385–401, Feb. 2005.
IEEE Trans. Broadcasting, vol. 51, no. 4, pp. 473–492, Dec. 2005.
[52] T. Kim and M. H. Ammar, “Optimal quality adaptation for scalable
[31] P. Cuenca, A. Garrido, F. Quiles, and L. Orozco-Barbosa, “An effi-
encoded video,” IEEE Journal on Selected Areas in Communications,
cient protocol architecture for error-resilient MPEG-2 video commu-
vol. 23, no. 2, pp. 344–356, Feb. 2005.
nications over ATM networks,” IEEE Trans. Broadcasting, vol. 45, no.
[53] M. Krunz, “Bandwidth allocation strategies for transporting variable-
1, pp. 129–140, Mar. 1999.
bit-rate video traffic,” IEEE Communications Magazine, vol. 37, no. 1,
[32] G. Van der Auwera, P. T. David, and M. Reisslein, “Traffic character- pp. 40–46, Jan. 1999.
istics of H.264/AVC variable bit rate video,” IEEE Communications [54] G.-M. Muntean, P. Perry, and L. Murphy, “A new adaptive multi-
Magazine 2008 [Online]. Available: http://www.fulton.asu.edu/mre/ media streaming system for all-IP multi-service networks,” IEEE
h264CommMag07.pdf, in print Trans. Broadcasting, vol. 50, no. 1, pp. 1–10, Mar. 2004.
[33] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, [55] L. Haratcherev, J. Taal, K. Langendoen, R. Lagendijk, and H. Sips,
T. Stockhammer, and T. Wedi, “Video coding with H.264/AVC: Tools, “Optimized video streaming over 802.11 by cross-layer signaling,”
performance and complexity,” IEEE Circuits and Systems Magazine, IEEE Communications Magazine, vol. 44, no. 1, pp. 115–121, Jan.
vol. 4, no. 1, pp. 7–28, First Quarter, 2004. 2006.
[34] G. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC advanced [56] S. Khan, Y. Peng, E. Steinbach, M. Sgroi, and W. Kellerer, “Appli-
video coding standard: Overview and introduction to the fidelity range cation-driven cross-layer optimization for video streaming over wire-
extensions,” in Proc. of SPIE 5558, Conference on Applications of less networks,” IEEE Communications Magazine, vol. 44, no. 1, pp.
Digital Image Processing XXVII, Special Session on Advances in 122–130, Jan. 2006.
New Emerging Standard: H.264/AVC I, Denver, CO, Aug. 2004, pp. [57] F. Yang, Q. Zhang, W. Zhu, and Y.-Q. Zhang, “Bit allocation for scal-
454–474. able video streaming over mobile wireless internet,” in Proc. IEEE IN-
[35] A. Puri, X. Chen, and A. Luthra, “Video coding using the FOCOM, 2004, pp. 2142–2151.
H.264/MPEG-4 AVC compression standard,” Journal of Visual [58] H.-Y. Hsieh and R. Sivakumar, “Accelerating peer-to-peer networks
Communication and Image Representation, vol. 19, no. 9, pp. for video streaming using multipoint-to-point communication,” IEEE
793–849, Oct. 2004. Communications Magazine, vol. 42, no. 8, pp. 111–119, Aug. 2004.
[36] G. Van der Auwera, P. T. David, and M. Reisslein, “Video traffic anal- [59] E. Kim and J. Liu, “Design of HD-quality streaming networks for real-
ysis of H.264/AVC and extensions: Single-layer statistics,” ASU. time content distribution,” IEEE Trans. Consumer Electronics, vol. 52,
Tempe, AZ, Tech. Rep., July 2007, at [Online]. Available: http://www. no. 2, pp. 392–401, May 2006.
fulton.asu.edu/mre/h264_traffic_single_layer_ext.pdf [60] J. Liang and K. Nahrstedt, “DagStream: Locality aware and failure re-
[37] Z. Chen and K. Ngan, “Recent advances in rate control for video silient peer-to-peer streaming,” in Proc. SPIE/ACM Multimedia Com-
coding,” Signal Processing: Image Communication, vol. 22, no. 1, pp. puting and Networking, Proceedings of SPIE—Volume 6071, Jan. 2006,
19–38, Jan. 2007. pp. 60 710L-1–60 710L-15.
718 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008
Geert Van der Auwera received the Ph.D. degree in Martin Reisslein is an Associate Professor in the De-
Electrical Engineering from Arizona State Univer- partment of Electrical Engineering at Arizona State
sity, Tempe, USA, in 2007, and the Belgian MSEE University (ASU), Tempe. He received the Dipl.-Ing.
degree from Vrije Universiteit Brussel (VUB), (FH) degree from the Fachhochschule Dieburg, Ger-
Brussels, Belgium, in 1997. His research interests many, in 1994, and the M.S.E. degree from the Uni-
are video traffic and quality characterization, video versity of Pennsylvania, Philadelphia, in 1996. Both
streaming mechanisms and protocols, and video in electrical engineering. He received his Ph.D. in
coding. Until the end of 2004, he was a scientific systems engineering from the University of Pennsyl-
advisor for IWT-Flanders, the Institute for the Pro- vania in 1998. During the academic year 1994–1995
motion of Innovation by Science and Technology in he visited the University of Pennsylvania as a Ful-
Flanders, Belgium. In 2000, he joined IWT-Flanders bright scholar. From July 1998 through October 2000
after researching wavelet video coding at the Electronics and Information he was a scientist with the German National Research Center for Information
Processing Department (ETRO), VUB. In 1998, Mr. Van der Auwera’s thesis Technology (GMD FOKUS), Berlin and lecturer at the Technical University
on motion estimation in the wavelet domain received the Barco and IBM prizes Berlin. From October 2000 through August 2005 he was an Assistant Professor
by the Fund for Scientific Research of Flanders, Belgium. at ASU. He served as editor-in-chief of the IEEE Communications Surveys and
Tutorials from January 2003 through February 2007 and has served on the Tech-
nical Program Committees of IEEE Infocom, IEEE Globecom, and the IEEE
International Symposium on Computer and Communications. He has organized
Prasanth T. David received his B.Tech. degree from sessions at the IEEE Computer Communications Workshop (CCW). He main-
the University of Kerala, Trivandrum, India in 2001. tains an extensive library of video traces for network performance evaluation,
From 2001, he was a Systems Design Engineer with including frame size traces of MPEG-4 and H.264 encoded video, at http://trace.
G4 Matrix Technologies, India, where he worked eas.asu.edu. His research interests are in the areas of Internet Quality of Service,
on video processing for a media processing engine. video traffic characterization, wireless networking, optical networking, and en-
From 2005 to 2007 he was with the Electrical En- gineering education. (web: http://www.fulton.asu.edu/~mre).
gineering Dept. at Arizona State University, Tempe,
from where he obtained his M.S. degree. Since then,
he is with Intel Corp., CA. His fields of interest are
multimedia coding, media processors, and video
streaming technologies.