Adaptive 360-Degree Video Streaming Using Layered Video Coding
Adaptive 360-Degree Video Streaming Using Layered Video Coding
Adaptive 360-Degree Video Streaming Using Layered Video Coding
A BSTRACT with higher quality in the predicted viewport. The server stores a
Virtual reality and 360-degree video streaming are growing rapidly; representation set that provides each time-domain segment at differ-
however, streaming 360-degree video is very challenging due to ent qualities with higher quality at a set of possible viewports. Each
high bandwidth requirements. To address this problem, the video representation of a segment has a higher quality for a specific view-
quality is adjusted according to the user viewport prediction. High port that could be of interest to the viewer and lower quality for the
quality video is only streamed for the user viewport, reducing the rest of the 360-degree frame. Thus, higher quality can be provided
overall bandwidth consumption. Existing solutions use shallow to the user with less bandwidth usage. The current state-of-the-art
buffers limited by the accuracy of viewport prediction. Therefore, solutions try to predict the user viewport for a short duration in the
playback is prone to video freezes which are very destructive for the future and request the segments that provide this viewport at higher
Quality of Experience(QoE). We propose using layered encoding quality. When there is a mismatch between the user’s viewport and
for 360-degree video to improve QoE by reducing the probability the predicted one, the lower quality parts are displayed. To mini-
of video freezes and the latency of response to the user head move- mize the occurrence of such mismatches, the video is not buffered
ments. Moreover, this scheme reduces the storage requirements for durations longer than the duration for which the viewport can be
significantly and improves in-network cache performance. predicted accurately, thus increasing the probability of video freeze
in the event of network bandwidth fluctuation. Video freeze is most
Keywords: Adaptive 360 video streaming, SVC, Video freeze destructive for user QoE [3]. In this paper, we argue that using
Index Terms: H.5.1 [ Multimedia Information Systems]: Artifi- Scalable Video Codec [6] encoding in adaptive 360-degree video
cial, augmented, and virtual realities—Video; streaming can mitigate this problem. SVC also provides other ad-
ditional benefits: 1-using enhancement layers at the client allows
1 I NTRODUCTION more flexibility in the adaptation 2-the storage requirement on the
server-side is reduced, and caching performance is improved.
With advances in multimedia technologies, the field is evolving to-
ward providing an immersive experience for users. The demand for 2 R ELATED W ORK
Virtual Reality (VR) Head Mounted Displays (HMD) is growing in
order to enable the immersive video experience. VR HMDs utilize This section provides an overview of 360-degree video streaming.
360-degree videos, also known as spherical video, which involve First mapping and encoding techniques are introduced. Then, the
a 360-degree view of the scene captured from a single point. The recent works on adaptive 360-degree video are discussed.
captured video is mapped to the internal surface of a sphere. An Existing video encoders cannot encode the spherical videos di-
HMD views a limited portion of the spherical video as seen from rectly. In this respect, the video should be mapped to a 2D plane
the center of the sphere, refered to as viewport. The area covered before encoding. There are various map projection methods such
by the viewport is limited to the HMD’s Field of View (FoV) and as equirectangular, pyramid and cube projections. Equirectangular
its coordinates are based on the orientation of the user’s head. projection maps the sphere into a rectangle, however, it results in
The growth in popularity of VR has led video service providers, severe stretching at the north and south poles of sphere which in
such as YouTube and Facebook, to enable streaming 360-degree turn reduces coding efficiency and increases bandwidth consump-
videos. However, streaming 360-degree video brings up new chal- tion. In the pyramid projection, the viewport is mapped to the base
lenges. Presenting high video quality in the viewport requires a of the pyramid, and the remaining parts are mapped to the sides.
complete high-quality 360-degree frame. For example, Oculus rift This causes severe quality degradation for areas far from the base.
viewing resolution is 2160x1200 with a FoV of 110 degree; thus, Moreover, this mapping is GPU-intensive [2]. Cube projection, as
the complete 360-degree frame should have the resolution equal shown in Figure 1-a,b, maps a sphere into six faces of a cube. It pro-
to 8K video with bitrate of 50Mbps to exploit the highest quality vides smooth quality degradation with lower processing complexity
that the device can offer. Therefore, not only does streaming of the than pyramid, and lower bitrate overhead than equirectangular.
whole video in highest quality require large amount of bandwidth Adaptive 360-degree video streaming solutions divide a video
that most end-users do not have, but transmitting complete video into segments with several seconds duration, and each segment is
to the user at highest quality also results in bandwidth waste, since sliced into tiles in the spatial domain. Each tile is encoded at sev-
only a subset, namely the viewport, is viewed by the user. eral quality levels. The client fetches a set of tiles for each segment
One solution is adaptive 360 video streaming. The scheme is based on viewport prediction and network condition. Based on this
similar to Dynamic Adaptive Streaming over HTTP (DASH), but scheme a streaming system is proposed in [5]. Authors have ana-
spatial dimension is also added to the adaptation space. The client lyzed the relation among tile size and bandwidth saving. A similar
predicts the user’s future viewport and requests the future segments method is proposed in [4] with the goal of reducing bandwidth con-
sumption in cellular networks. Different methods are studied for
∗ e-mail: afshin@utdallas.edu viewport prediction, and it has been shown that they reduce band-
† e-mail:anahita.mahzari@utdallas.edu width consumption by 78%. Unlike [5] and [4] which have used
‡ e-mail:joseph.beshay@utdallas.edu equirectangular mapping, Corbillon et al. [1] have shown that cube
§ e-mail:ravip@utdallas.edu mapping provides higher quality. Additionally, instead of encod-
ing each tile of a segment independently, different representations
2017 IEEE Virtual Reality (VR) of a segment are available according to a set of possible viewports.
March 18-22, 2017, Los Angeles, CA, USA Then, the client requests segments according to the viewport pre-
978-1-5090-6647-6/17/$31.00
c 2017 IEEE diction and network condition.
347
Figure 1: Mapped sphere into cube, and layered encoding
348