Yan 2021 Fine Grained Motion Estimation For
Yan 2021 Fine Grained Motion Estimation For
1, MARCH 2021
Abstract—Recent advances in video frame interpolation have video compression [2]–[6], video enhancement [7]–[10], social
shown that convolutional neural networks combined with opti- entertainment [11], [12], etc.
cal flow are capable of producing a high-quality intermediate The emergence of convolutional neural network
frame between two consecutive input frames in most scenes.
However, existing methods have difficulties dealing with large (CNN) has greatly promoted the development of video
and non-uniform motions that widely exist in real-world scenes interpolation [13]–[15]. Benefiting from powerful abilities of
because they often adopt the same strategy to deal with dif- feature extraction and information reconstruction of CNN,
ferent motions, which easily results in unsatisfactory results. In CNN-based video interpolation methods have achieved new
this article, we propose a novel fine-grained motion estimation records compared with traditional solutions. Some recent
approach (FGME) for video frame interpolation. It mainly con-
tains two strategies: multi-scale coarse-to-fine optimization and works investigate how to directly hallucinate intermediate
multiple motion features estimation. The first strategy is to grad- frame by using generative CNN [16]–[18]. Without being
ually refine optical flows and weight maps, both of which are used limited to require accurate pixel-wise blending, i.e., estimating
to synthesize the target frame. The second strategy aims to pro- accurate optical flow to warp input frames and then blending
vide fine-grained motion features by generating multiple optical them, these techniques are promising while easily generating
flows and weight maps. To demonstrate its effectiveness, we pro-
pose a fully convolutional neural network with three refinement blur results [19], [20].
scales and four motion features. Surprisingly, this simple network Thanks to the explosive development of deep learning, the
produces state-of-the-art results on three standard benchmark optical flow research has achieved great success in recent
datasets and real-world examples, with advantages in terms years [21]–[26]. The state-of-the-art optical flow method,
of effectiveness, simplicity, and network size over other exist- PWC-Net [27], shows impressive performance in a large
ing approaches. Furthermore, we demonstrate that the FGME
approach has good generality and can significantly improve the variety of scenes. Benefiting from the recent progress of
synthesis quality of other methods. optical flow research, optical flow based video interpolation
approaches are capable of better predicting pixel-wise cor-
Index Terms—Video frame interpolation, multiple motion
features estimation, multi-scale coarse-to-fine optimization. respondence between the target frame and the input frames.
These approaches commonly exploit the estimated optical flow
to map input frames in order to obtain warped frames, and
then fuse them to generate interpolation results [19], [20],
I. I NTRODUCTION [28], [29]. The nature of optical flow allows the CNN based
approaches to find the pixel-wise correspondence between the
ECENT years have witnessed the significant develop-
R ment in video frame synthesis, which aims at pro-
ducing an in-between frame given two consecutive input
target frame and the neighboring frames in a relatively sim-
ple and effective way. Therefore, recent optical flow based
approaches are able to produce high-quality synthesis results
frames to generate both spatially and temporally consistent in most scenes, but they may fail to deal with some challenging
video clips [1], [2]. Different from spatially super-resolving scenes such as large motion, significant motion blur, unpleas-
tasks, such as image/video super-resolution algorithms, video ing artifact, massive occlusion, etc. An intuitive example is
frame interpolation focuses on time-domain super-resolution to shown in Figure 1.
increase temporal information, and outputs slow-motion video. In the task of video frame interpolation, a large number of
Such a special property makes video frame interpolation play pixels with various motions are required to be inferred in a
an important role in a wide range of applications including unified system, which poses great difficulty in the architec-
ture design, correlation learning, and optimization of network.
Manuscript received November 27, 2019; revised February 26, 2020 and
May 2, 2020; accepted July 8, 2020. Date of publication October 20, 2020; Intuitively, large motions reveal a relatively small optical
date of current version March 4, 2021. This work was supported by NSFC flow in coarse scale (low resolution) and hence can be bet-
under Grant 61772137 and Grant 61902076. (Corresponding author: Bo Yan.) ter located in the coarse scale, while small motions have a
Bo Yan, Weimin Tan, and Chuming Lin are with the School of Computer
Science, Shanghai Key Laboratory of Intelligent Information Processing, detailed description of optical flow in fine scale (large res-
Fudan University, Shanghai 201203, China (e-mail: byan@fudan.edu.cn). olution) and hence can be successfully inferred in the fine
Liquan Shen is with the Key Laboratory of Advanced Display and scale despite the large number of small motion pixels. Besides,
System Application, School of Communication and Information Engineering,
Shanghai University, Shanghai 200072, China (e-mail: jsslq@163.com). non-uniform motions commonly exist in the background and
Digital Object Identifier 10.1109/TBC.2020.3028323 foreground. Obviously, the optical flow between neighboring
1557-9611
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
YAN et al.: FINE-GRAINED MOTION ESTIMATION FOR VIDEO FRAME INTERPOLATION 175
Fig. 2. Overview of the proposed fine-grained motion estimation approach (FGME) approach. Firstly, we optionally employ PWC-Net approach [27] as
the optical flow estimator in order to provide a useful initialization of optical flow for fast-moving objects. The inputs of PWC-Net are input frames of full
resolution, and its outputs are optical flows of a quarter of the input resolution. Afterwards, the optical flows outputted by PWC-Net combining input frames
of a quarter of original size are used to estimate motion features. Specifically, we exploit small sub-networks (Neti ) to attentively estimate pixel motions at
different scales, which has the advantage of gradually refining optical flows and weight maps. Then, we generate multiple optical flows and weight maps to
provide fine-grained motion features (purple block). Finally, the estimated optical flows and weight maps are used to synthesize the target frame using the
image generation block. Note that for convenience, only three refinement scales are demonstrated.
frames are combined to generate multiple intermediate frames interpolation, we present an optical flow oriented frame-
with different scales by using a deep submodule. Finally, work to gradually refine the pixel flow and fusion weight in
these intermediate frames and input frames are further pro- order to producing both spatially and temporally consistent
cessed by a synthesis refinement module, producing the final interpolation results.
intermediate frame. The used loss functions are an adversar-
ial (a discriminator network) and two content losses (L1-norm III. T HE P ROPOSED FGME A PPROACH
and a perceptual loss based on VGG network). This work has Figure 2 demonstrates the overall framework of the
used multi-scale information to improve the interpolation qual- proposed FGME approach. Different from single unified video
ity, but multiple intermediate frames with different scales are frame interpolation framework that devotes to infer opti-
used as input to generate the final interpolation frame through a cal flow and fusion weight directly by designing complex
deep submodule, which ignores the characteristics of different network architecture, the FGME approach deals with differ-
scales processing different motions. ent motions differently by exploiting two key strategies of
Besides, Liu et al. [38] proposed a PRBME approach, which multi-scale coarse-to-fine optimization and multiple motion
presents a block-matching based progressively reduced block- features estimation. The trainable components are only sub-
size motion estimation scheme to produce multiple motion networks (Neti ). Figure 3 shows the motion features (MF)
vectors. Firstly, it uses multiple blocks with different sizes and image generation (G), which are used to generate the tar-
to generate multiple motion vectors based on block-matching. get frame. The following subsections are devoted to state the
Then, it selects the optimal motion vector based on block- multi-scale coarse-to-fine optimization and multiple motion
matching distance and the spatial consistency of motion field. features estimation.
Finally, the intermediate frame is generated by averaging the For simplicity, we only describe the case of three refine-
forward and backward blocks based on the estimated motion ment scales, but one can understand other cases analogously
vector. The PRBME method focuses on the estimation and with simple modification, e.g., more refinement scales can be
selection of motion vectors with different block sizes. In con- implemented by cascading in the 3rd sub-network, as shown
trast, the mechanism of multiple motion features estimation of in Figure 2. Given arbitrarily two input frames I1 and I2 , the
our FGME approach aims to automatically learn multiple opti- video interpolation framework outputs multiple optical flows
cal flows and weight maps in the level of deep representations, i
ft→1 i , and fusion weights wi
and ft→2 i
t→1 and wt→2 . Here,
which allows the network to separately deal with different we use MFt = {MFt , MFt , . . . , MFt } to denote the esti-
1 2 n
motions. mated motion features (see purple block in Figure 2), where
MFti = {ft→1
i , f i , wi
t→1 , wt→2 }. The inferred target frame It
Different from the approaches mentioned above, consider- i
t→2
ing the success of optical flow in the task of video frame can be obtained by using the image generation block (G) as
YAN et al.: FINE-GRAINED MOTION ESTIMATION FOR VIDEO FRAME INTERPOLATION 177
Fig. 4. The proposed network architecture for each trainable sub-network shown in blue in Figure 2. The meaning of 6 denotes the channel number of two
optical flows (i.e., forward and backward optical flows) and two weight maps.
Fig. 5. Demonstration of the pipeline of the implemented model (n = 1). The residual maps (see (e) and (h)) clearly show that with the increase of
the refinement scale, the estimation of optical flow around motion boundary becomes more accurate. Besides, the fusion weight around motion boundaries
becomes more smooth and consistent with the motion object.
TABLE I
B. Loss Function R ESULTS OF A BLATION S TUDY W ITH D IFFERENT
The proposed framework gradually refines the optical flow R EFINEMENT S CALES ON V IMEO -90K1 [7]
and fusion weight with the mechanisms of multiple motion
features estimation and multi-scale coarse-to-fine refinement.
Each sub-network outputs the refinement results corresponding
to its scale. We employ L1 norm for constraining each sub-
network. Therefore, the loss function is the sum of L1 norm
of different refinement scales. TABLE II
R ESULTS OF A BLATION S TUDY W ITH D IFFERENT M OTION
F EATURES ON V IMEO -90K1 [7]
H
, W H
, W
s∈{1,...,n} x,y
It 2s−1 2s−1 (x, y) − It2
s−1 2s−1
(x, y)
L=
H×W
(13)
TABLE III
Q UANTITATIVE C OMPARISON W ITH S TATE - OF - THE -A RT A PPROACHES IN T ERMS OF PSNR, SSIM, AND N ETWORK S IZE ON T HREE B ENCHMARKS .
T HE MB-OTHER R EPRESENTS THE OTHER S ET IN THE M IDDLEBURY DATASET. N OTE T HAT M ETHODS W ITH AN A STERISK R EPRESENT THE
P ERFORMANCE OF M ODELS R ETRAINED ON V IMEO -90 K T RAINING S ET FOR A FAIR C OMPARISON
TABLE IV
Q UANTITATIVE C OMPARISON W ITH S TATE - OF - THE -A RT A PPROACHES IN T ERMS OF IE AND NIE ON THE EVALUATION S ET
IN THE M IDDLEBURY DATASET. T HE B OLD F ONT R EPRESENTS THE H IGHEST VALUE . T HE TABLE S HOWS T HAT O UR A PPROACH
ACHIEVES THE B EST P ERFORMANCE ON B OTH AVERAGE S CORES OF IE AND NIE (S MALLER I S B ETTER )
Fig. 7. Interpolation results of different approaches on Vimeo-90K. From the close-up images, we observe that the proposed approach produces better
structural detail than other competing methods.
TABLE V
D EMONSTRATION OF THE G OOD G ENERALITY OF O UR A PPROACH T HAT the least network size with only 2.57 M parameters, which is
C AN I MPROVE THE S YNTHESIS Q UALITY OF S UPER S LO M O [20] AND conducive to the fast fitting of the network. Besides, although
VOXEL F LOW [19] M ETHODS our full model (FGME* approach) has the largest amount of
network parameters, it achieves the best performance. Note
that the most parameters of our full model spend on the
optional flow estimator, which can be further reduced by better
optical flow algorithm.
TABLE VI
Q UANTITATIVE C OMPARISON W ITH S TATE - OF - THE -A RT A PPROACHES IN T ERMS OF PSNR
AND SSIM ON N INE V IDEOS W ITH 1920*1080 R ESOLUTION
Fig. 10. A real-world example (540 × 960)4 to evaluate the practical ability
of different approaches.
TABLE VII
Q UANTITATIVE C OMPARISON W ITH S TATE - OF - THE -A RT A PPROACHES IN
T ERMS OF PSNR AND SSIM ON THE S TREETS OF I NDIA ,
A V IDEO W ITH 540 * 960 R ESOLUTION
factor affecting the synthesis quality of most approaches. In approaches on newly collected nine videos5 with 1920 × 1080
addition, our FGME with and without PWC-Net achieve the resolution. Table VI and Table VII report the quantitative com-
best performance in each range of motion. Despite the increas- parison with state-of-the-art approaches on these two datasets
ing range of motion, the FGME with PWC-Net achieves the in terms of PSNR and SSIM. Our approach performs favor-
most stable performance. ably against existing methods. In addition, a visual comparison
result is also demonstrated in Figure 10. The right area in
Figure 10 (zoom in to get a better observation) contains the
D. Benefits for Other Approaches blue truck with large motion. We observe that the proposed
In this section, we demonstrate the generality of multiple approach produces better structural detail and has less distor-
motion features estimation mechanism of our FGME approach. tion than other competing methods. The same phenomenon
We apply it to other video frame interpolation methods. can also be observed in Figure 1, i.e., our approach produces
Specifically, we modify the models of other methods slightly the best result of preserving the details of motorcycle head
so that they can generate multiple optical flows and weight (shown in yellow box).
graphs. As a result, these methods are capable of providing
fine-grained motion features. As shown in Table V, benefit- V. C ONCLUSION
ing from our proposed multiple motion features estimation In this article, we have presented a fine-grained motion
mechanism, the PSNR and SSIM of SuperSloMo [20] and estimation approach for video frame interpolation. Instead of
VoxelFlow [19] methods are consistently improved, while the uniformly handling pixel motions, we propose to deal with
number of network parameters is slightly increased. The result pixel motions differently based on the proposed two key mech-
indicates the good generality of our multiple motion features anisms of multi-scale coarse-to-fine optimization and multiple
estimation mechanism. motion features estimation, which are utilized to refine opti-
cal flows and weight maps. To demonstrate its effectiveness,
E. Real-World Example and High Definition Videos we present a simple fully convolutional neural network with
To evaluate the effectiveness of video interpolation three refinement scales and four motion features (n = 4).
approaches on real-world examples, we use a real-world video Extensive experiments including ablation study demonstrate
Streets of India4 to test them. In addition, we further con-
duct an experiment to evaluate video frame interpolation 5 https://www.youtube.com/user/HarmonicIncVideo/videos
YAN et al.: FINE-GRAINED MOTION ESTIMATION FOR VIDEO FRAME INTERPOLATION 183
that our FGME approach significantly advances the state-of- [23] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski,
the-art on three standard benchmark datasets, with advantages “A database and evaluation methodology for optical flow,” Int. J.
Comput. Vis., vol. 92, no. 1, pp. 1–31, 2011.
in terms of effectiveness, simplicity, and network size. [24] D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black, “A naturalistic open
R EFERENCES source movie for optical flow evaluation,” in Proc. Eur. Conf. Comput.
Vis. (ECCV), 2012, pp. 611–625.
[1] D. Wang, A. Vincent, P. Blanchfield, and R. Klepko, “Motion- [25] A. E. A. Dosovitskiy, “FlowNet: Learning optical flow with convo-
compensated frame rate up-conversion—Part II: New algorithms lutional networks,” in Proc. Int. Conf. Comput. Vis. (ICCV), 2015,
for frame interpolation,” IEEE Trans. Broadcast., vol. 56, no. 2, pp. 2758–2766.
pp. 142–149, Jan. 2010. [26] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox,
[2] K. Yang, A. Huang, T. Q. Nguyen, C. C. Guest, and P. K. Das, “A “FlowNet 2.0: Evolution of optical flow estimation with deep networks,”
new objective quality metric for frame interpolation used in video in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017,
compression,” IEEE Trans. Broadcast., vol. 54, no. 3, pp. 680–711, pp. 1647–1655.
Sep. 2008. [27] D. Sun, X. Yang, M. Liu, and J. Kautz, “PWC-Net: CNNs for optical
[3] C. Wu, N. Singhal, and P. Krahenbuhl, “Video compression through flow using pyramid, warping, and cost volume,” in Proc. IEEE Conf.
image interpolation,” in Proc. Comput. Vis. Pattern Recognit., 2018, Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 8934–8943.
pp. 425–440. [28] D. Wang, Z. Liang, and A. Vincent, “Motion-compensated frame rate up-
[4] S. Tsekeridou, F. A. Cheikh, M. Gabbouj, and I. Pitas, “Vector rational conversion—Part I: Fast multi-frame motion estimation,” IEEE Trans.
interpolation schemes for erroneous motion field estimation applied to Broadcast., vol. 56, no. 2, pp. 133–141, Feb. 2010.
MPEG-2 error concealment,” IEEE Trans. Multimedia, vol. 6, no. 6, [29] T. Tsai, A. Shi, and K. Huang, “Accurate frame rate up-conversion
pp. 876–885, Dec. 2004. for advanced visual quality,” IEEE Trans. Broadcast., vol. 62, no. 2,
[5] H. Chen, Y. Zhang, Y. Tao, B. Zou, and W. Tang, “An improved temporal pp. 426–435, Jun. 2016.
frame interpolation algorithm for H.264 video compression,” in Proc. [30] S. Niklaus and F. Liu, “Context-aware synthesis for video frame inter-
Data Compression Conf., Mar. 2011, p. 449. polation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
[6] N. Inamoto and H. Saito, “Virtual viewpoint replay for a soccer match 2018, pp. 1701–1710.
by view interpolation from multiple cameras,” IEEE Trans. Multimedia, [31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
vol. 9, no. 6, pp. 1155–1166, Oct. 2007. image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
[7] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhance- (CVPR), 2016, pp. 770–778.
ment with task-oriented flow,” Int. J. Comput. Vis., vol. 127, no. 8, [32] S. Niklaus, L. Mai, and F. Liu, “Video frame interpolation via adap-
pp. 1106–1125, 2019. tive convolution,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
[8] H. F. Ates, “Enhanced low bitrate H.264 video coding using decoder- (CVPR), 2017, pp. 2270–2279.
side super-resolution and frame interpolation,” Opt. Eng., vol. 52, no. 7, [33] S. Meyer, O. Wang, H. Zimmer, M. Grosse, and A. Sorkine-Hornung,
pp. 2131–2139, 2013. “Phase-based frame interpolation for video,” in Proc. IEEE Conf.
[9] L. Yan, Z. Zhaoyang, and A. Ping, “Stereo video coding based on frame Comput. Vis. Pattern Recognit., 2015, pp. 1410–1418.
estimation and interpolation,” IEEE Trans. Broadcast., vol. 49, no. 1, [34] W. Bao, W.-S. Lai, C. Ma, X. Zhang, Z. Gao, and M.-H. Yang, “Depth-
pp. 14–21, Mar. 2003. aware video frame interpolation,” in Proc. IEEE Conf. Comput. Vis.
[10] Y. T. Yang, Y. S. Tung, and J. L. Wu, “Quality enhancement of frame Pattern Recognit. (CVPR), 2019, pp. 3703–3712.
rate up-converted video by adaptive frame skip and reliable motion [35] W. Bao, W.-S. Lai, X. Zhang, Z. Gao, and M.-H. Yang, “MEMC-Net:
extraction,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 12, Motion estimation and motion compensation driven neural network for
pp. 1700–1713, Nov. 2007. video interpolation and enhancement,” IEEE Trans. Pattern Anal. Mach.
[11] J. Janai, F. Guney, J. Wulff, M. J. Black, and A. Geiger, “Slow flow: Intell., early access, Sep. 17, 2019, doi: 10.1109/TPAMI.2019.2941941.
Exploiting high-speed cameras for accurate and diverse optical flow [36] H. Zhang, R. Wang, and Y. Zhao, “Multi-frame pyramid refine-
reference data,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. ment network for video frame interpolation,” IEEE Access, vol. 7,
(CVPR), 2017, pp. 1406–1416. pp. 130610–130621, 2019.
[12] M. Usman, X. He, K.-M. Lam, M. Xu, S. M. M. Bokhari, and J. Chen, [37] J. van Amersfoort et al., “Frame interpolation with multi-scale deep
“Frame interpolation for cloud-based mobile video streaming,” IEEE loss functions and generative adversarial networks,” 2017. [Online].
Trans. Multimedia, vol. 18, no. 5, pp. 831–839, May 2016. Available: arXiv:1711.06045.
[13] K. Simonyan and A. Zisserman, “Very deep convolutional networks for [38] H. Liu, R. Xiong, D. Zhao, S. Ma, and W. Gao, “Multiple hypotheses
large-scale image recognition,” in Proc. Int. Conf. Learn. Represent. bayesian frame rate up-conversion by adaptive fusion of motion-
(ICLR), 2015, p. 6. compensated interpolations,” IEEE Trans. Circuits Syst. Video Technol.,
[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification vol. 22, no. 8, pp. 1188–1198, May 2012.
with deep convolutional neural networks,” in Proc. Neural Inf. Process. [39] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
Syst. (NIPS), 2012, pp. 1097–1105. in Proc. Int. Conf. Learn. Represent. (ICLR), 2015, p. 9.
[15] S. Y. Zhu, B. Zeng, L. Zeng, and M. Gabbouj, “Image interpolation [40] L. Xu, J. Jia, and Y. Matsushita, “Motion detail preserving optical flow
based on non-local geometric similarities and directional gradients,” estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 9,
IEEE Trans. Multimedia, vol. 18, no. 9, pp. 1707–1719, Sep. 2016. pp. 1744–1757, Sep. 2012.
[16] T. Zhou, S. Tulsiani, W. Sun, J. Malik, and A. A. Efros, “View synthesis [41] J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid,
by appearance flow,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, “DeepMatching: Hierarchical deformable dense matching,” Int. J.
pp. 286–301. Comput. Vis., vol. 120, no. 3, pp. 300–323, Dec. 2016.
[17] M. Mathieu, C. Couprie, and Y. Lecun, “Deep multi-scale video
prediction beyond mean square error,” in Proc. Int. Conf. Learn. Bo Yan (Senior Member, IEEE) received the B.E.
Represent. (ICLR), 2016, p. 6. and M.E. degrees in communication engineering
[18] S. Niklaus, L. Mai, and F. Liu, “Video frame interpolation via adaptive from Xi’an Jiaotong University in 1998 and 2001,
separable convolution,” in Proc. Int. Conf. Comput. Vis. (ICCV), 2017, respectively, and the Ph.D. degree in computer sci-
pp. 261–270. ence and engineering from the Chinese University
[19] Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala, “Video frame syn- of Hong Kong in 2004.
thesis using deep voxel flow,” in Proc. Int. Conf. Comput. Vis. (ICCV), From 2004 to 2006, he worked with the National
2017, pp. 4473–4481. Institute of Standards and Technology, USA, as a
[20] H. Jiang et al., “Super SloMO: High quality estimation of multiple Postdoctoral Guest Researcher. He is currently a
intermediate frames for video interpolation,” in Proc. IEEE Conf. Professor with the School of Computer Science,
Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 9000–9008. Fudan University, Shanghai, China. His research
[21] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial interests include video processing, computer vision, and multimedia commu-
pyramid network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. nications. He has served as the Associate Editor for IEEE T RANSACTIONS ON
(CVPR), 2017, pp. 2720–2729. C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY, and the Guest Editor
[22] J. Xu, R. Ranftl, and V. Koltun, “Accurate optical flow via direct cost of Special Issue on “Content-Aware Visual Systems: Analysis, Streaming and
volume processing,” in Proc. Comput. Vis. Pattern Recognit. (CVPR), Retargeting” for IEEE J OURNAL ON E MERGING AND S ELECTED TOPICS IN
2017, pp. 5807–5815. C IRCUITS AND S YSTEMS.
184 IEEE TRANSACTIONS ON BROADCASTING, VOL. 67, NO. 1, MARCH 2021
Weimin Tan (Member, IEEE) received the mas- Liquan Shen (Member, IEEE) received the
ter’s degree from the College of Communication B.S. degree in automation control from Henan
Engineering, Chongqing University, Chongqing, Polytechnic University, Henan, China, in 2001, and
China, and the Ph.D. degree from the School the M.E. and Ph.D. degrees in communication
of Computer Science, Fudan University, Shanghai, and information systems from Shanghai University,
China, where he is currently working as a Shanghai, China, in 2005 and 2008, respectively.
Postdoctoral Researcher. His research interests Since 2008, he has been with the Faculty of
include digital image and video processing. the School of Communication and Information
Engineering, Shanghai University, where he is cur-
rently a Professor. He has authored or coauthored
more than 100 refereed technical papers in inter-
national journals and conferences in the field of video coding and image
Chuming Lin is currently pursuing the bachelor’s processing. He holds ten patents in the areas of image/video coding and
degree with the School of Computer Science, Fudan communications. His major research interests include high efficiency video
University, Shanghai, China. His research interests coding, perceptual coding, video codec optimization, 3DTV, and video quality
include computer vision and machine learning. assessment.