Adaptive 360-Degree Video Streaming Using Layered Video Coding

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Adaptive 360-Degree Video Streaming using Layered Video Coding

Afshin Taghavi Nasrabadi ∗ Anahita Mahzari† Joseph D. Beshay‡ Ravi Prakash§

The University of Texas at Dallas, Texas, U.S.A.

A BSTRACT with higher quality in the predicted viewport. The server stores a
Virtual reality and 360-degree video streaming are growing rapidly; representation set that provides each time-domain segment at differ-
however, streaming 360-degree video is very challenging due to ent qualities with higher quality at a set of possible viewports. Each
high bandwidth requirements. To address this problem, the video representation of a segment has a higher quality for a specific view-
quality is adjusted according to the user viewport prediction. High port that could be of interest to the viewer and lower quality for the
quality video is only streamed for the user viewport, reducing the rest of the 360-degree frame. Thus, higher quality can be provided
overall bandwidth consumption. Existing solutions use shallow to the user with less bandwidth usage. The current state-of-the-art
buffers limited by the accuracy of viewport prediction. Therefore, solutions try to predict the user viewport for a short duration in the
playback is prone to video freezes which are very destructive for the future and request the segments that provide this viewport at higher
Quality of Experience(QoE). We propose using layered encoding quality. When there is a mismatch between the user’s viewport and
for 360-degree video to improve QoE by reducing the probability the predicted one, the lower quality parts are displayed. To mini-
of video freezes and the latency of response to the user head move- mize the occurrence of such mismatches, the video is not buffered
ments. Moreover, this scheme reduces the storage requirements for durations longer than the duration for which the viewport can be
significantly and improves in-network cache performance. predicted accurately, thus increasing the probability of video freeze
in the event of network bandwidth fluctuation. Video freeze is most
Keywords: Adaptive 360 video streaming, SVC, Video freeze destructive for user QoE [3]. In this paper, we argue that using
Index Terms: H.5.1 [ Multimedia Information Systems]: Artifi- Scalable Video Codec [6] encoding in adaptive 360-degree video
cial, augmented, and virtual realities—Video; streaming can mitigate this problem. SVC also provides other ad-
ditional benefits: 1-using enhancement layers at the client allows
1 I NTRODUCTION more flexibility in the adaptation 2-the storage requirement on the
server-side is reduced, and caching performance is improved.
With advances in multimedia technologies, the field is evolving to-
ward providing an immersive experience for users. The demand for 2 R ELATED W ORK
Virtual Reality (VR) Head Mounted Displays (HMD) is growing in
order to enable the immersive video experience. VR HMDs utilize This section provides an overview of 360-degree video streaming.
360-degree videos, also known as spherical video, which involve First mapping and encoding techniques are introduced. Then, the
a 360-degree view of the scene captured from a single point. The recent works on adaptive 360-degree video are discussed.
captured video is mapped to the internal surface of a sphere. An Existing video encoders cannot encode the spherical videos di-
HMD views a limited portion of the spherical video as seen from rectly. In this respect, the video should be mapped to a 2D plane
the center of the sphere, refered to as viewport. The area covered before encoding. There are various map projection methods such
by the viewport is limited to the HMD’s Field of View (FoV) and as equirectangular, pyramid and cube projections. Equirectangular
its coordinates are based on the orientation of the user’s head. projection maps the sphere into a rectangle, however, it results in
The growth in popularity of VR has led video service providers, severe stretching at the north and south poles of sphere which in
such as YouTube and Facebook, to enable streaming 360-degree turn reduces coding efficiency and increases bandwidth consump-
videos. However, streaming 360-degree video brings up new chal- tion. In the pyramid projection, the viewport is mapped to the base
lenges. Presenting high video quality in the viewport requires a of the pyramid, and the remaining parts are mapped to the sides.
complete high-quality 360-degree frame. For example, Oculus rift This causes severe quality degradation for areas far from the base.
viewing resolution is 2160x1200 with a FoV of 110 degree; thus, Moreover, this mapping is GPU-intensive [2]. Cube projection, as
the complete 360-degree frame should have the resolution equal shown in Figure 1-a,b, maps a sphere into six faces of a cube. It pro-
to 8K video with bitrate of 50Mbps to exploit the highest quality vides smooth quality degradation with lower processing complexity
that the device can offer. Therefore, not only does streaming of the than pyramid, and lower bitrate overhead than equirectangular.
whole video in highest quality require large amount of bandwidth Adaptive 360-degree video streaming solutions divide a video
that most end-users do not have, but transmitting complete video into segments with several seconds duration, and each segment is
to the user at highest quality also results in bandwidth waste, since sliced into tiles in the spatial domain. Each tile is encoded at sev-
only a subset, namely the viewport, is viewed by the user. eral quality levels. The client fetches a set of tiles for each segment
One solution is adaptive 360 video streaming. The scheme is based on viewport prediction and network condition. Based on this
similar to Dynamic Adaptive Streaming over HTTP (DASH), but scheme a streaming system is proposed in [5]. Authors have ana-
spatial dimension is also added to the adaptation space. The client lyzed the relation among tile size and bandwidth saving. A similar
predicts the user’s future viewport and requests the future segments method is proposed in [4] with the goal of reducing bandwidth con-
sumption in cellular networks. Different methods are studied for
∗ e-mail: afshin@utdallas.edu viewport prediction, and it has been shown that they reduce band-
† e-mail:anahita.mahzari@utdallas.edu width consumption by 78%. Unlike [5] and [4] which have used
‡ e-mail:joseph.beshay@utdallas.edu equirectangular mapping, Corbillon et al. [1] have shown that cube
§ e-mail:ravip@utdallas.edu mapping provides higher quality. Additionally, instead of encod-
ing each tile of a segment independently, different representations
2017 IEEE Virtual Reality (VR) of a segment are available according to a set of possible viewports.
March 18-22, 2017, Los Angeles, CA, USA Then, the client requests segments according to the viewport pre-
978-1-5090-6647-6/17/$31.00 c 2017 IEEE diction and network condition.

347
Figure 1: Mapped sphere into cube, and layered encoding

3 P ROPOSED M ETHOD Figure 2: Mapping viewport and adaptation


Given a video sequence with the duration of D seconds, we pro-
pose segmenting it in the time domain into segments of s seconds
each. St denotes the t th segment, where t ∈ {0, ..., D/s}. Each seg- nificantly. For encoding a video into multiple quality levels, SVC
ment contains the six faces of the cube-mapped 360-degree video. removes the redundancy between different levels. It only encodes
Each face of the cube is sliced in the spatial domain into m × m the residual signal for higher layers, thus reducing the total storage
tiles denoted T f ,i, j,l , where f ∈ {0, .., 5} is the face number and requirements. If 3 quality levels are required, our proposed method
i, j ∈ {0, .., m − 1} is the latitude and longitude of each tile on a requires 31% less storage space. Furthermore, SVC makes bet-
face, respectively. Tiles are encoded in L different quality levels ter use of existing in-network caching capabilities since all viewers
according to the layered scheme of SVC [6]. The lowest quality is need the base layer of a video regardless of their viewport. The base
the base layer, and each enhancement layer contains additional data layer can be cached at intermediate devices to reduce the load on
that can be added to the lower layers to improve the quality. Figure the server. Also, using SVC in adaptive 360-degree video stream-
1-c shows a frame of spherical video mapped into a cube, with each ing reduces the cost of updating the quality of already downloaded
face sliced into four tiles, and encoded in three layered quality lev- segments when the viewport changes suddenly or when the predic-
els. Each color represents a face of the cube. l ∈ {0, ..., L − 1} is the tion is wrong. Rather than discarding whole tiles or even segments
quality level. The client always requests the base layer for all tiles to (as in [1]), SVC only requests the enhancement layers for the tiles
ensure that all viewports are available for viewing if the user’s view- corresponding the new viewport.
port changes quickly. For tiles currently within the user’s viewport,
the client requests enhancement layers to provide a better quality. 4 C ONCLUSION
Depending on the availability of bandwidth, enhancement layers Using SVC layered video encoding to deliver streaming 360-video
can be requested for the tiles where the user’s viewport may shift. is a promising approach to address a number of the issues with
The range of the surrounding tiles to request depends on the ac- the existing solutions. We presented an overview of a novel pre-
curacy of viewport prediction. Under this scheme, we provide the sentation and adaptation technique. To the end user, it provides a
base layer for all tiles in a single file so the client can fetch it in one higher QoE by reducing video freezes even under challenging net-
request. For all higher quality levels each tile is made available in a work conditions. To the network and content providers, it reduces
separate file to be requested separately by the client. the storage requirements and avoids unnecessary bandwidth waste.
The adaptation algorithm in the client works as follows: assum- We are currently working on implementing and evaluating the
ing the maximum duration of the client buffer is B seconds, the performance of our proposed approach.
client first fills the buffer with the base layer. Then, it starts predict-
ing the viewport for the segment with earliest playback deadline R EFERENCES
based on the previous viewports. The tiles covered by the predicted [1] X. Corbillon, A. Devlic, G. Simon, and J. Chakareski. Viewport-
viewport are identified, and the enhancement layers for those tiles adaptive navigable 360-degree video delivery. arXiv preprint
are requested. Figure 2 shows an example of viewport with FoV arXiv:1609.08042, 2016.
of 100 degree on the sphere. The area covered by the viewport is [2] E. Kuzyakov and D. Pio. Next-generation video en-
denoted by yellow color. By mapping the viewport into the cube, coding techniques for 360 video and VR. 2016.
the client can determine which tiles are covered. Figure 2 shows the https://code.facebook.com/posts/1126354007399553/next-generatio-
tiles requested for each layer. The client can request the enhance- video-encoding-techniques-for-360-video-and-vr/.
ment layer tiles only for the tiles in the viewport, or fetch the first [3] R. K. Mok, E. W. Chan, and R. K. Chang. Measuring the quality of
enhancement layer for the neighboring tiles of the viewport. experience of http video streaming. In 12th IFIP/IEEE International
Symposium on Integrated Network Management (IM 2011) and Work-
The proposed method maintains the bandwidth reduction
shops, pages 485–492. IEEE, 2011.
achieved by the current state-of-the-art method while introducing [4] F. Qian, L. Ji, B. Han, and V. Gopalakrishnan. Optimizing 360 video
several new benefits. Using the layered encoded tiles, the client can delivery over cellular networks. In Proceedings of the 5th Workshop on
buffer B seconds of the video as opposed to the existing methods All Things Cellular: Operations, Applications and Challenges, pages
where the client cannot buffer segments for durations longer than 1–6. ACM, 2016.
the viewport prediction accuracy (usually about 2 seconds [4]). In [5] P. Rondao Alface, J.-F. Macq, and N. Verzijp. Interactive omnidirec-
case of network bandwidth drop, our proposed method can survive tional video delivery: A bandwidth-effective approach. Bell Labs Tech-
long-lived drops by continuing playback of base quality, while other nical Journal, 16(4):135–147, 2012.
methods will encounter video playback freeze which is known to be [6] H. Schwarz, D. Marpe, and T. Wiegand. Overview of the scalable video
the most destructive factor on the user QoE. Additionally, our pro- coding extension of the H. 264/AVC standard. IEEE Transactions on
posed method reduces the storage requirement for 360 video sig- circuits and systems for video technology, 17(9):1103–1120, 2007.

348

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy