0% found this document useful (0 votes)

20 views11 pages

Yan 2021 Fine Grained Motion Estimation For

The document presents a novel fine-grained motion estimation (FGME) approach for video frame interpolation, which improves the synthesis quality of intermediate frames by utilizing multi-scale coarse-to-fine optimization and multiple motion features estimation. The proposed method effectively addresses challenges associated with large and non-uniform motions in real-world scenes, achieving state-of-the-art results on benchmark datasets. Additionally, FGME demonstrates good generality and can enhance the performance of existing video interpolation methods.

Uploaded by

linzixi637

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views11 pages

Yan 2021 Fine Grained Motion Estimation For

Uploaded by

linzixi637

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

174 IEEE TRANSACTIONS ON BROADCASTING, VOL. 67, NO.

1, MARCH 2021

Fine-Grained Motion Estimation for Video

Frame Interpolation
Bo Yan , Senior Member, IEEE, Weimin Tan , Member, IEEE, Chuming Lin,
and Liquan Shen , Member, IEEE

Abstract—Recent advances in video frame interpolation have video compression [2]–[6], video enhancement [7]–[10], social
shown that convolutional neural networks combined with opti- entertainment [11], [12], etc.
cal flow are capable of producing a high-quality intermediate The emergence of convolutional neural network
frame between two consecutive input frames in most scenes.
However, existing methods have difficulties dealing with large (CNN) has greatly promoted the development of video
and non-uniform motions that widely exist in real-world scenes interpolation [13]–[15]. Benefiting from powerful abilities of
because they often adopt the same strategy to deal with dif- feature extraction and information reconstruction of CNN,
ferent motions, which easily results in unsatisfactory results. In CNN-based video interpolation methods have achieved new
this article, we propose a novel fine-grained motion estimation records compared with traditional solutions. Some recent
approach (FGME) for video frame interpolation. It mainly con-
tains two strategies: multi-scale coarse-to-fine optimization and works investigate how to directly hallucinate intermediate
multiple motion features estimation. The first strategy is to grad- frame by using generative CNN [16]–[18]. Without being
ually refine optical flows and weight maps, both of which are used limited to require accurate pixel-wise blending, i.e., estimating
to synthesize the target frame. The second strategy aims to pro- accurate optical flow to warp input frames and then blending
vide fine-grained motion features by generating multiple optical them, these techniques are promising while easily generating
flows and weight maps. To demonstrate its effectiveness, we pro-
pose a fully convolutional neural network with three refinement blur results [19], [20].
scales and four motion features. Surprisingly, this simple network Thanks to the explosive development of deep learning, the
produces state-of-the-art results on three standard benchmark optical flow research has achieved great success in recent
datasets and real-world examples, with advantages in terms years [21]–[26]. The state-of-the-art optical flow method,
of effectiveness, simplicity, and network size over other exist- PWC-Net [27], shows impressive performance in a large
ing approaches. Furthermore, we demonstrate that the FGME
approach has good generality and can significantly improve the variety of scenes. Benefiting from the recent progress of
synthesis quality of other methods. optical flow research, optical flow based video interpolation
approaches are capable of better predicting pixel-wise cor-
Index Terms—Video frame interpolation, multiple motion
features estimation, multi-scale coarse-to-fine optimization. respondence between the target frame and the input frames.
These approaches commonly exploit the estimated optical flow
to map input frames in order to obtain warped frames, and
then fuse them to generate interpolation results [19], [20],
I. I NTRODUCTION [28], [29]. The nature of optical flow allows the CNN based
approaches to find the pixel-wise correspondence between the
ECENT years have witnessed the significant develop-
R ment in video frame synthesis, which aims at pro-
ducing an in-between frame given two consecutive input
target frame and the neighboring frames in a relatively sim-
ple and effective way. Therefore, recent optical flow based
approaches are able to produce high-quality synthesis results
frames to generate both spatially and temporally consistent in most scenes, but they may fail to deal with some challenging
video clips [1], [2]. Different from spatially super-resolving scenes such as large motion, significant motion blur, unpleas-
tasks, such as image/video super-resolution algorithms, video ing artifact, massive occlusion, etc. An intuitive example is
frame interpolation focuses on time-domain super-resolution to shown in Figure 1.
increase temporal information, and outputs slow-motion video. In the task of video frame interpolation, a large number of
Such a special property makes video frame interpolation play pixels with various motions are required to be inferred in a
an important role in a wide range of applications including unified system, which poses great difficulty in the architec-
ture design, correlation learning, and optimization of network.
Manuscript received November 27, 2019; revised February 26, 2020 and
May 2, 2020; accepted July 8, 2020. Date of publication October 20, 2020; Intuitively, large motions reveal a relatively small optical
date of current version March 4, 2021. This work was supported by NSFC flow in coarse scale (low resolution) and hence can be bet-
under Grant 61772137 and Grant 61902076. (Corresponding author: Bo Yan.) ter located in the coarse scale, while small motions have a
Bo Yan, Weimin Tan, and Chuming Lin are with the School of Computer
Science, Shanghai Key Laboratory of Intelligent Information Processing, detailed description of optical flow in fine scale (large res-
Fudan University, Shanghai 201203, China (e-mail: byan@fudan.edu.cn). olution) and hence can be successfully inferred in the fine
Liquan Shen is with the Key Laboratory of Advanced Display and scale despite the large number of small motion pixels. Besides,
System Application, School of Communication and Information Engineering,
Shanghai University, Shanghai 200072, China (e-mail: jsslq@163.com). non-uniform motions commonly exist in the background and
Digital Object Identifier 10.1109/TBC.2020.3028323 foreground. Obviously, the optical flow between neighboring
1557-9611
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
YAN et al.: FINE-GRAINED MOTION ESTIMATION FOR VIDEO FRAME INTERPOLATION 175

presents a unified deep network to combine bi-direction opti-

cal flow, weight map, and initial flow between input frames.
The approach is able to better estimate the pixel correspon-
dence between the intermediate frame and the input frames,
and thus producing favorable quality in most scenes.
Some recent works are devoted to design new network struc-
tures to directly hallucinate target frame from scratch, with-
out needing accurately estimating pixel-wise correspondence.
Niklaus and Liu [30] use a pre-trained optical flow model
to warp input frames and deep features extracted from conv1
layer in ResNet-18 [31], and then feed the warped frames
Fig. 1. Demonstration of different approaches including SepConv [18],
VoxelFlow [19], SuperSloMo [20], and our FGME* approach that deals with and features to a synthesis network to yield intermediate
a challenging example with large motion. Such challenging example is very frame. The CtxSyn approach [30] first exploits deep features
common in real-world scenes. Our FGME* approach produces the best result. as additional information for synthesizing intermediate frame.
By designing special convolution mode and network struc-
frames is not a single reference correlation, but multiple ture that are suitable for video frame interpolation, the recon-
reference correlations. struction quality of interpolated frame can be improved. A
This motivates us to develop a fine-grained motion estima- good example is AdapConv [32] that uses a locally convo-
tion approach (FGME) by integrating the multi-scale coarse- lutional way to simultaneously learn the local motion and
to-fine optimization and multiple motion features estimation fusion weight. The approach presents an adaptive convolu-
strategies. Our novel approach has following merits: tion way for interpolating arbitrary size of video frame and
• We propose a novel mechanism of multiple motion fea-
does not need to estimate the accurate optical flow. Another
tures estimation, which generates multiple optical flows creative method is SepConv approach [18] that decomposes
and weight maps to provide fine-grained motion features. regular 2D kernels into pairs of 1D kernels to effectively
In the experiment, we have demonstrated that this mech- learn pixel-wise correspondence for all pixels, which produces
anism can greatly benefit other video frame interpolation better quality results than conventional convolution networks.
methods. Meyer et al. [33] presented an interesting approach that utilizes
• We explicitly show that the proposed multi-scale coarse-
phase information to synthesize intermediate frames.
to-fine strategy can deal with different motions differently Bao et al. [34] proposed a DAIN method that utilizes depth
including small and large motions. Due to its flexible information to detect occlusion, which is helpful to obtain
nature, the readership can select an appropriate refinement more accurate optical flow. Their model explicitly extracts
scale according to the application requirements such as multiple features including optical flows, depth maps, con-
running on mobile devices. textual features, and interpolation kernels. Then, these features
• In viewing of different merits of traditional flow-based
combining input frames are used to synthesize the intermediate
approaches and kernel-based approaches, our FGME frame. Different from DAIN method, MEMC-Net method [35]
approach combines both merits, resulting in advantages employed a different strategy that predicts optical flow and
of high synthesis quality, low computational time, and compensation kernels simultaneously. Based on the estimated
small network size. optical flow and interpolation kernels, it uses a differentiable
warping layer to adaptively integrate them in order to output
II. R ELATED W ORKS the final target frame. Both approaches have achieved decent
Over the past decades, numerous video frame interpolation interpolation results.
approaches have been proposed. The majority of video inter- Some works have also observed the importance of
polation methods are based on optical flow. Thus, the accuracy multi-scale for video frame interpolation. For instance,
of optical flow highly affects the interpolation quality of flow- Mathieu et al. [17] present a multi-scale network to address the
based methods, while it is often challenged by fast and large blur problem. The network reconstructs new high-resolution
motions. Since a comprehensive survey of these approaches is frames starting from the low-resolution version. The input of
beyond the scope of this article, here, we mainly discuss recent the network in each stage is the reconstructed low-resolution
video frame interpolation approaches based on convolutional version in previous stage. Zhang et al. [36] propose a multi-
neural networks. frame pyramid refinement network that combines different res-
A typical approach of introducing optical flow into deep olutions of input image into intermediate convolution layers.
networks is VoxelFlow approach [19], which uses an encoder- These generative model based methods focus on generating
decoder style architecture to learning voxel flow and then blur free results by exploiting different resolution versions of
synthesizes the intermediate frame based on the learned voxel input frames.
flow. This approach also presents a multi-scale flow fusion The coarse-to-fine strategy has been used in existing works.
method, but it directly concatenates the voxel flows estimated van Amersfoort et al. [37] proposed a generative adversar-
at different scales together to calculate the final voxel flow. ial network for frame interpolation. Firstly, they generate
Such processing does not discriminately deals with differ- synthesis flow features with different scales in a coarse-
ent motions at different scales. SuperSloMo approach [20] to-fine fashion. Then, the synthesized features and input
176 IEEE TRANSACTIONS ON BROADCASTING, VOL. 67, NO. 1, MARCH 2021

Fig. 2. Overview of the proposed fine-grained motion estimation approach (FGME) approach. Firstly, we optionally employ PWC-Net approach [27] as
the optical flow estimator in order to provide a useful initialization of optical flow for fast-moving objects. The inputs of PWC-Net are input frames of full
resolution, and its outputs are optical flows of a quarter of the input resolution. Afterwards, the optical flows outputted by PWC-Net combining input frames
of a quarter of original size are used to estimate motion features. Specifically, we exploit small sub-networks (Neti ) to attentively estimate pixel motions at
different scales, which has the advantage of gradually refining optical flows and weight maps. Then, we generate multiple optical flows and weight maps to
provide fine-grained motion features (purple block). Finally, the estimated optical flows and weight maps are used to synthesize the target frame using the
image generation block. Note that for convenience, only three refinement scales are demonstrated.

frames are combined to generate multiple intermediate frames interpolation, we present an optical flow oriented frame-
with different scales by using a deep submodule. Finally, work to gradually refine the pixel flow and fusion weight in
these intermediate frames and input frames are further pro- order to producing both spatially and temporally consistent
cessed by a synthesis refinement module, producing the final interpolation results.
intermediate frame. The used loss functions are an adversar-
ial (a discriminator network) and two content losses (L1-norm III. T HE P ROPOSED FGME A PPROACH
and a perceptual loss based on VGG network). This work has Figure 2 demonstrates the overall framework of the
used multi-scale information to improve the interpolation qual- proposed FGME approach. Different from single unified video
ity, but multiple intermediate frames with different scales are frame interpolation framework that devotes to infer opti-
used as input to generate the final interpolation frame through a cal flow and fusion weight directly by designing complex
deep submodule, which ignores the characteristics of different network architecture, the FGME approach deals with differ-
scales processing different motions. ent motions differently by exploiting two key strategies of
Besides, Liu et al. [38] proposed a PRBME approach, which multi-scale coarse-to-fine optimization and multiple motion
presents a block-matching based progressively reduced block- features estimation. The trainable components are only sub-
size motion estimation scheme to produce multiple motion networks (Neti ). Figure 3 shows the motion features (MF)
vectors. Firstly, it uses multiple blocks with different sizes and image generation (G), which are used to generate the tar-
to generate multiple motion vectors based on block-matching. get frame. The following subsections are devoted to state the
Then, it selects the optimal motion vector based on block- multi-scale coarse-to-fine optimization and multiple motion
matching distance and the spatial consistency of motion field. features estimation.
Finally, the intermediate frame is generated by averaging the For simplicity, we only describe the case of three refine-
forward and backward blocks based on the estimated motion ment scales, but one can understand other cases analogously
vector. The PRBME method focuses on the estimation and with simple modification, e.g., more refinement scales can be
selection of motion vectors with different block sizes. In con- implemented by cascading in the 3rd sub-network, as shown
trast, the mechanism of multiple motion features estimation of in Figure 2. Given arbitrarily two input frames I1 and I2 , the
our FGME approach aims to automatically learn multiple opti- video interpolation framework outputs multiple optical flows
cal flows and weight maps in the level of deep representations, i
ft→1 i , and fusion weights wi
and ft→2 i
t→1 and wt→2 . Here,
which allows the network to separately deal with different we use MFt = {MFt , MFt , . . . , MFt } to denote the esti-
1 2 n
motions. mated motion features (see purple block in Figure 2), where
MFti = {ft→1
i , f i , wi
t→1 , wt→2 }. The inferred target frame It
Different from the approaches mentioned above, consider- i
t→2
ing the success of optical flow in the task of video frame can be obtained by using the image generation block (G) as
YAN et al.: FINE-GRAINED MOTION ESTIMATION FOR VIDEO FRAME INTERPOLATION 177

at a given position x as follows:

y∈S (x) w(y) · f2→1 (y)
est
init
ft→1 (x) = (t − 1) · (5)
y∈S (x) w(y)

where S(x) = {x ∈ N (y + (2 − t) ∗ f2→1 est (y)), ∀y ∈ [1, H] ×

[1, W]}. N (y) denotes the neighboring pixels of y. w(.) is

the corresponding bilinear interpolation weight of neighboring
pixels. Similarly, we can obtain the projected flow ft→2init from
est
the flow f1→2 .

y∈S (x) w(y) · f1→2 (y)
est
ft→2 (x) = (2 − t) ·
init (6)
y∈S (x) w(y)

Then, we exploit trainable sub-networks (shown in blue in

Figure 2) to learn optical flows and fusion weights between
the intermediate frame and two input frames. We use gi (.)
to denote the ith trainable sub-network. The first sub-network
g1 utilizes the low-resolution version of input frames and
Fig. 3. Demonstration of multiple motion features estimation (n = 4),
which contains motion features (MF) and image generation (G), as shown
the approximated intermediate flow, producing refined optical
in Figure 2. flows and fusion weights as follows:
H W H W
,4 ,4
MFtNet1 = g1 I14 , I24 , ft→1
init
, ft→2
init
(7)
follows:

It = G(I1 , I2 , MFt ) (1) Different from the first sub-network g1 , the subsequent sub-
n
i networks predict the residue of optical flow and fusion weight.
= wt→1 ∗ ϕ I1 , ft→1
i
+ wit→2 ∗ ϕ I2 , ft→2
i
(2) Each of them is to optimize the output of previous sub-
i=1 network. For instance, the outputs of the first sub-network g1
where ϕ(.) denotes the operation of bilinear inverse warping. are first upscaled as follows:
n denotes the number of estimated optical flows and fusion
ONet1 = u MFtNet1 (8)
weights. For instance, Figure 3 shows the case of n = 4.
Thus, Eq. (2) formally formulates the processing procedure of where u(.) is the optical upscaling operation as follows:
Figure 3.
Since the values of fusion weights wit→1 and wit→2 pro- u(f ) = BilinearResize(f , α) ∗ α (9)
duced by the network are diverse, they can not be directly
used to blend input frames I1 and I2 , as defined in Eq. (2). where α denotes the scaling factor. The reason for multiplying
To ensure that the summation of fusion weights is equal to factor α is to preserve the information of optical flow and
1, wit→1 and wit→2 are subject to the following constraint by fusion weight, because the distance between pixels becomes
utilizing softmax: α times as large as the original one after upsampling.
Afterwards, the upscaled results and the low-resolution ver-

n
i sion of two input frames are fed into the second sub-network
wt→1 + wit→2 = 1 (3)
g2 that outputs their residues. By adding the outputted residues
i=1
and the upscaled results, we can obtain the refined optical flow
i
In order to obtain the optical flow (ft→1 i ) and fusion
and ft→2 and fusion weight as follows:
i i
weight (wt→1 and wt→2 ), we incorporate motion features into
H W H W
each scale, as shown in Figure 2, which consists of an optional ,2 ,2
MFtNet2 = g2 I12 , I22 , ONet1 + ONet1 (10)
flow estimator, three independent sub-networks. The optional
flow estimator (shown in dash line) is used to estimate the bidi- Following this fashion, we can obtain the refined results
rectional optical flow f1→2est and f est between input frames I
2→1 1 produced by the third sub-network g3 as follows:
and I2 , providing a useful initialization of optical flow for fast
moving objects. The effect of the flow estimator will be dis-
Net3
cussed in the experiment in detail. We employ PWC-Net [27] MFt = g3 I1H,W , I2H,W , ONet2 + ONet2 (11)
approach as the flow estimator and use p(.) to denote it. The
bidirectional optical flow between input frames I1 and I2 with where

H × W size is defined as follows:
ONet2 = u MFtNet2 (12)
est
f1→2 , f2→1
est
= p I1H,W , I2H,W (4)
Finally, with the inferred optical flow and fusion weight,
The estimated optical flows f1→2est est
and f2→1 allow us to we can obtain the in-between frame using Eq. (2) (see also
approximate the initial value of the intermediate optical flow Figure 3).
178 IEEE TRANSACTIONS ON BROADCASTING, VOL. 67, NO. 1, MARCH 2021

Fig. 4. The proposed network architecture for each trainable sub-network shown in blue in Figure 2. The meaning of 6 denotes the channel number of two
optical flows (i.e., forward and backward optical flows) and two weight maps.

Fig. 5. Demonstration of the pipeline of the implemented model (n = 1). The residual maps (see (e) and (h)) clearly show that with the increase of
the refinement scale, the estimation of optical flow around motion boundary becomes more accurate. Besides, the fusion weight around motion boundaries
becomes more smooth and consistent with the motion object.

A. Implementation difference of estimated optical flows in Figure 5 (f) and

The proposed framework in Figure 2 is a flexible model, Figure 5 (i) that are respectively outputted by the second sub-
which leaves readers the freedom to develop arbitrary network network and third sub-network. The right man in Figure 5 (a)
architecture. In our experiment, we propose a fully convolu- has small motion, so the right area is visible in Figure 5 (h),
tional network for each sub-network, as illustrated in Figure 4, because small motions have a detailed description of opti-
which consists of 9 convolutions and 2 deconvolutions. Three cal flow in fine scale (large resolution) and hence can be
refinement scales, i.e., three sub-networks, are employed in successfully inferred in the fine scale.
implementing the proposed framework. All of them are natu- Note that optical flow estimation generally performs well
rally combined and jointly optimized in the training process. in smooth areas but poorly near motion boundaries, because
The inputs of each sub-network are two input frames with the motions around the motion boundary is locally non-
specific scale, optical flows and fusion weights produced smooth [20]. Accurately estimating optical flow around motion
by previous sub-network, and warping results. The outputs boundaries plays a significant role in producing high-quality
of each sub-network are refined optical flows and fusion intermediate frame in terms of objective metric and subjective
weights. Note that the implementation poses a balance between quality. From Figure 5, we can observe that with the increase
interpolation quality and network complexity. More advanced of the refinement scale, the estimation of optical flow around
architectures can be easily incorporated, leading to even better motion boundary becomes more accurate, which is obviously
results. demonstrated in the residual maps (see Figure 5 (e) and (h)).
To demonstrate that how the proposed framework deals In addition, Figure 5 (d), (g), and (j) demonstrate the estimated
with pixel motions differently, we illustrate the pipeline of fusion weight (the whiter a pixel, the bigger the contribution).
our model, as shown in Figure 5. It intuitively shows how We observe that the fusion weight around motion bound-
each sub-network optimizes the outputs produced by previous aries becomes more smooth and consistent with the motion
one. The residual maps in Figure 5 (h) intuitively show the object.
YAN et al.: FINE-GRAINED MOTION ESTIMATION FOR VIDEO FRAME INTERPOLATION 179

TABLE I
B. Loss Function R ESULTS OF A BLATION S TUDY W ITH D IFFERENT
The proposed framework gradually refines the optical flow R EFINEMENT S CALES ON V IMEO -90K1 [7]
and fusion weight with the mechanisms of multiple motion
features estimation and multi-scale coarse-to-fine refinement.
Each sub-network outputs the refinement results corresponding
to its scale. We employ L1 norm for constraining each sub-
network. Therefore, the loss function is the sum of L1 norm
of different refinement scales. TABLE II
R ESULTS OF A BLATION S TUDY W ITH D IFFERENT M OTION
F EATURES ON V IMEO -90K1 [7]
H
, W H
, W

s∈{1,...,n} x,y

It 2s−1 2s−1 (x, y) − It2
s−1 2s−1
(x, y)
L=
H×W
(13)

where It and It denote the inferred intermediate frame and

ground truth, respectively. n represents the number of refine-
ment scales. The loss L is applied to the output of each
sub-network and back-propagated through all of them. IV. E XPERIMENTS AND A NALYSES
In this section, we introduce evaluation metrics and utilized
C. Training and Inference datasets, and report the comparison of the proposed approach
The selection of training data is one of the important with state-of-the-art methods. Firstly, we conduct ablation
factors that affects the performance of deep learning based study to analyze the effects of coarse-to-fine refinement and
approaches. A complete training set definitely helps to improve multiple motion features mechanisms to the task of video
the learning ability, resulting in a robust and generative model frame interpolation. Then, we report the comparison results
that is able to work well on most scenes including challenging of our approach with current state-of-the-art methods on three
cases. However, recent advances in video frame interpolation standard benchmark datasets in terms of quantitative evalua-
employ different training sets. Obviously, training on differ- tion, visual quality, computational efficiency, and network size.
ent datasets with different sizes breaks the fair comparison In addition, we demonstrate the generality of the proposed
between different methods. In this article, in order to preserve approach that can benefit for other approaches. Finally, a real-
the honesty of the performance evaluation and fairly compare world example is conducted to verify the effectiveness of the
with other methods, we exploit the same training and testing proposed approach.
sets for comparing existing video interpolation approaches. We employ the evaluation metrics of Peak signal-to-noise
The training set we utilize is from Vimeo-90K 1 [7]. ratio (PSNR) and structural similarity (SSIM). The exper-
Following the principle of training and validation, the 55,095 iments are evaluated on three benchmark datasets: Vimeo-
triplets in Vimeo-90K dataset are resized to 448 × 256 and 90K1 , MiddleBury,2 UCF101,3 and a real-world example
randomly split into 51,313 triplets as training set and 3,782 Streets of India.4 The MiddleBury dataset consists of two
triplets as test set. During training, the intermediate frame is sets: OTHER (12 sequences with 640 × 480 resolution) and
regarded as the reconstruction target and the remaining two EVALUATION (8 sequences). We conduct our experiments
frames are utilized as inputs. on a machine with an Intel i7-7700k CPU and an Nvidia
For fair comparison, several recent and representative GTX 1080Ti GPU. Our framework is implemented on the
video frame interpolation approaches such as SepConv [18], TensorFlow platform. More results and video samples are
VoxelFlow [19], and SuperSloMo [20] are trained on the included in our supplementary material to further clearly
Vimeo-90K training set. The parameters of our model are demonstrate the advantage of FGME approach.
updated with initial learning rate of 10−4 before 180K iteration
steps and changed to 10−5 at the following 30K. The loss is A. Ablation Analysis
minimized using Adam optimizer [39] and back-propagated We conduct the ablation study to explore the effects of the
through all sub-networks. The total training steps of our model proposed two key mechanisms of multi-scale coarse-to-fine
are 210K. In addition, the batch size is set to 4. optimization and multiple motion features estimation.
During inference, the trained model can process videos with 1) Effect of Multi-Scale Coarse-to-Fine Optimization:
arbitrary lengths and resolutions due to the nature of fully Table I shows the quantitative result of ablation experiment
convolutional neural network. The enhanced video can be
with different refinement scales. The “One scale + MF(n = 1)”
obtained by conducting a single feedforward inference over method means that we only use one scale optimization, but it
frame by frame. In the following section, we explicitly report has the same number of network parameters as the three scales.
the estimation accuracy, computational efficiency, and visual
quality of the proposed model. 2 http://vision.middlebury.edu/flow/data/
3 https://liuziwei7.github.io/projects/VoxelFlow
1 http://data.csail.mit.edu/tofu/ 4 https://www.youtube.com/user/HarmonicIncVideo/videos
180 IEEE TRANSACTIONS ON BROADCASTING, VOL. 67, NO. 1, MARCH 2021

TABLE III
Q UANTITATIVE C OMPARISON W ITH S TATE - OF - THE -A RT A PPROACHES IN T ERMS OF PSNR, SSIM, AND N ETWORK S IZE ON T HREE B ENCHMARKS .
T HE MB-OTHER R EPRESENTS THE OTHER S ET IN THE M IDDLEBURY DATASET. N OTE T HAT M ETHODS W ITH AN A STERISK R EPRESENT THE
P ERFORMANCE OF M ODELS R ETRAINED ON V IMEO -90 K T RAINING S ET FOR A FAIR C OMPARISON

3) Effect of Optical Flow Guidance: Table III reports the

PSNR and SSIM results on three public benchmarks. “Our
FGME*” denotes that we use the PWC-Net method to obtain
a relatively good initial optical flow. Table III shows when
introducing flow guidance, we observed consistently signif-
icant improvements on all test sets. Besides, to intuitively
demonstrate the effect of optical flow guidance provided by
flow estimator, we illustrate a visual comparison of with and
without flow guidance, as shown in Figure 6. It demonstrates
that the flow estimator is capable of providing a useful ini-
tialization of optical flow for fast moving objects (see yellow
Fig. 6. Comparison of with and without PWC-Net [27]. balls).

B. Comparison With Prior Art

Besides, “MF(n = 1)” denotes that each sub-network only out- 1) Quantitative Comparison: We compare the proposed
puts one pair of optical flow and fusion weight. As Table I approach with recent and representative state-of-the-art video
shows, when increasing the refinement scale, the quantitative frame interpolation methods including MDP-Flow2 [40],
metrics of PSNR and SSIM are both consistently improved. DeepFlow2 [41], SepConv [18], MPRN [36], ToFlow [7],
The improvement is especially obvious when changing from VoxelFlow [19], SuperSloMo [20], CtxSyn [30], DAIN [34],
“One scale” to “Two scales”. The experiment demonstrates and MEMC-Net [35]. Table IV reports the quantitative result
that the proposed coarse-to-fine refinement mechanism for on the EVALUATION set in the Middlebury dataset. The
video frame interpolation is effective and reasonable. proposed FGME approach is much better than state-of-the-
2) Effect of Multiple Motion Features Estimation: art methods in terms of average scores of IE and NIE.
Traditional methods commonly estimate one pair of optical In addition, Table III obviously shows that the proposed
flow and fusion weight, which ignores the widely existed approach outperforms the current state-of-the-art methods by
non-uniform motions in videos. We propose an important a large margin on all tested benchmarks. Compared with
mechanism of multiple motion features estimation to sepa- SuperSloMo method, the improvements of performance on
rately deal with different motions, i.e., multiple optical flows Vimeo-90K, MB-OTHER, and UCF101 are 0.89 dB, 1.36 dB,
and fusion weights should be inferred when synthesizing tar- and 0.21 dB, respectively. The improvements on Vimeo-90K
get frames. Table II reports the ablation experiment with and MB-OTHER sets are more obvious than on UCF101.
different motion features. We observe that the quantitative This is because the image resolutions and motion amplitudes
metrics of PSNR and SSIM are both consistently improved in UCF101 set are relatively small, thus the improvement
as the increase of motion features (n <= 4). This is because of PSNR value on UCF101 is not significant. It is worth
more motion features can describe the motion of pixels in noting that the PSNR value of our approach is higher than
more detail. Accordingly, it also increases the amount of those of DAIN [34] and MEMC-Net [35] methods, while our
network parameters and computational time. However, too SSIM value is lower. This strange phenomenon is caused by
many motion features can also cause unnecessary noise, which our approach and these two methods using different SSIM
will reduce the synthesis performance, such as n = 5. computing architecture (MATLAB or Python).
YAN et al.: FINE-GRAINED MOTION ESTIMATION FOR VIDEO FRAME INTERPOLATION 181

TABLE IV
Q UANTITATIVE C OMPARISON W ITH S TATE - OF - THE -A RT A PPROACHES IN T ERMS OF IE AND NIE ON THE EVALUATION S ET
IN THE M IDDLEBURY DATASET. T HE B OLD F ONT R EPRESENTS THE H IGHEST VALUE . T HE TABLE S HOWS T HAT O UR A PPROACH
ACHIEVES THE B EST P ERFORMANCE ON B OTH AVERAGE S CORES OF IE AND NIE (S MALLER I S B ETTER )

Fig. 7. Interpolation results of different approaches on Vimeo-90K. From the close-up images, we observe that the proposed approach produces better
structural detail than other competing methods.

Fig. 8. Visual comparison of challenging fast motion tested on Middlebury.

TABLE V
D EMONSTRATION OF THE G OOD G ENERALITY OF O UR A PPROACH T HAT the least network size with only 2.57 M parameters, which is
C AN I MPROVE THE S YNTHESIS Q UALITY OF S UPER S LO M O [20] AND conducive to the fast fitting of the network. Besides, although
VOXEL F LOW [19] M ETHODS our full model (FGME* approach) has the largest amount of
network parameters, it achieves the best performance. Note
that the most parameters of our full model spend on the
optional flow estimator, which can be further reduced by better
optical flow algorithm.

C. Average Motion Amplitude vs. Interpolation Quality

The accuracy of motion estimation seriously affects the
2) Qualitative Comparison: Figure 7 demonstrates the quality of video frame interpolation. This work focuses on
reconstruction results of different approaches on Vimeo-90K 1 . fine-grained motion estimation to solve the challenges of large
From the close-up images, we observe that the proposed motion and non-uniform motion. A good way to demon-
approach produces better structural detail than other compet- strate the performance of each method is to plot a figure
ing methods. This result indicates that these proposed two where the x-axis is the average flow magnitude of a frame,
key strategies of multi-scale coarse-to-fine optimization and and the y-axis is the PSNR. Thus, we can understand the
multiple motion features estimation are essential to the task of performance under the different magnitude of motion. Figure 9
video frame interpolation, so the resultant intermediate frame shows the comparison of different methods on Vimeo-90K
looks much closer to the ground truth. The same phenomenon with respect to the average motion amplitude vs. PSNR. It
can also be observed in Figure 8. clearly shows that the PSNR of each method begins to decrease
3) Network Size: Table III reports the comparison result of as the motion amplitude increases beyond the inflection point
network size. It shows that our model without PWC-Net has ([10, 20)), which indicates that large motion is an important
182 IEEE TRANSACTIONS ON BROADCASTING, VOL. 67, NO. 1, MARCH 2021

TABLE VI
Q UANTITATIVE C OMPARISON W ITH S TATE - OF - THE -A RT A PPROACHES IN T ERMS OF PSNR
AND SSIM ON N INE V IDEOS W ITH 1920*1080 R ESOLUTION

Fig. 10. A real-world example (540 × 960)4 to evaluate the practical ability
of different approaches.

TABLE VII
Q UANTITATIVE C OMPARISON W ITH S TATE - OF - THE -A RT A PPROACHES IN
T ERMS OF PSNR AND SSIM ON THE S TREETS OF I NDIA ,
A V IDEO W ITH 540 * 960 R ESOLUTION

Fig. 9. Comparison of different approaches on Vimeo-90K with respect to

the average motion amplitude vs. interpolation quality (PSNR). Our FGME*
denotes Our FGME with PWC-Net.

factor affecting the synthesis quality of most approaches. In approaches on newly collected nine videos5 with 1920 × 1080
addition, our FGME with and without PWC-Net achieve the resolution. Table VI and Table VII report the quantitative com-
best performance in each range of motion. Despite the increas- parison with state-of-the-art approaches on these two datasets
ing range of motion, the FGME with PWC-Net achieves the in terms of PSNR and SSIM. Our approach performs favor-
most stable performance. ably against existing methods. In addition, a visual comparison
result is also demonstrated in Figure 10. The right area in
Figure 10 (zoom in to get a better observation) contains the
D. Benefits for Other Approaches blue truck with large motion. We observe that the proposed
In this section, we demonstrate the generality of multiple approach produces better structural detail and has less distor-
motion features estimation mechanism of our FGME approach. tion than other competing methods. The same phenomenon
We apply it to other video frame interpolation methods. can also be observed in Figure 1, i.e., our approach produces
Specifically, we modify the models of other methods slightly the best result of preserving the details of motorcycle head
so that they can generate multiple optical flows and weight (shown in yellow box).
graphs. As a result, these methods are capable of providing
fine-grained motion features. As shown in Table V, benefit- V. C ONCLUSION
ing from our proposed multiple motion features estimation In this article, we have presented a fine-grained motion
mechanism, the PSNR and SSIM of SuperSloMo [20] and estimation approach for video frame interpolation. Instead of
VoxelFlow [19] methods are consistently improved, while the uniformly handling pixel motions, we propose to deal with
number of network parameters is slightly increased. The result pixel motions differently based on the proposed two key mech-
indicates the good generality of our multiple motion features anisms of multi-scale coarse-to-fine optimization and multiple
estimation mechanism. motion features estimation, which are utilized to refine opti-
cal flows and weight maps. To demonstrate its effectiveness,
E. Real-World Example and High Definition Videos we present a simple fully convolutional neural network with
To evaluate the effectiveness of video interpolation three refinement scales and four motion features (n = 4).
approaches on real-world examples, we use a real-world video Extensive experiments including ablation study demonstrate
Streets of India4 to test them. In addition, we further con-
duct an experiment to evaluate video frame interpolation 5 https://www.youtube.com/user/HarmonicIncVideo/videos
YAN et al.: FINE-GRAINED MOTION ESTIMATION FOR VIDEO FRAME INTERPOLATION 183

that our FGME approach significantly advances the state-of- [23] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski,
the-art on three standard benchmark datasets, with advantages “A database and evaluation methodology for optical flow,” Int. J.
Comput. Vis., vol. 92, no. 1, pp. 1–31, 2011.
in terms of effectiveness, simplicity, and network size. [24] D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black, “A naturalistic open
R EFERENCES source movie for optical flow evaluation,” in Proc. Eur. Conf. Comput.
Vis. (ECCV), 2012, pp. 611–625.
[1] D. Wang, A. Vincent, P. Blanchfield, and R. Klepko, “Motion- [25] A. E. A. Dosovitskiy, “FlowNet: Learning optical flow with convo-
compensated frame rate up-conversion—Part II: New algorithms lutional networks,” in Proc. Int. Conf. Comput. Vis. (ICCV), 2015,
for frame interpolation,” IEEE Trans. Broadcast., vol. 56, no. 2, pp. 2758–2766.
pp. 142–149, Jan. 2010. [26] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox,
[2] K. Yang, A. Huang, T. Q. Nguyen, C. C. Guest, and P. K. Das, “A “FlowNet 2.0: Evolution of optical flow estimation with deep networks,”
new objective quality metric for frame interpolation used in video in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017,
compression,” IEEE Trans. Broadcast., vol. 54, no. 3, pp. 680–711, pp. 1647–1655.
Sep. 2008. [27] D. Sun, X. Yang, M. Liu, and J. Kautz, “PWC-Net: CNNs for optical
[3] C. Wu, N. Singhal, and P. Krahenbuhl, “Video compression through flow using pyramid, warping, and cost volume,” in Proc. IEEE Conf.
image interpolation,” in Proc. Comput. Vis. Pattern Recognit., 2018, Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 8934–8943.
pp. 425–440. [28] D. Wang, Z. Liang, and A. Vincent, “Motion-compensated frame rate up-
[4] S. Tsekeridou, F. A. Cheikh, M. Gabbouj, and I. Pitas, “Vector rational conversion—Part I: Fast multi-frame motion estimation,” IEEE Trans.
interpolation schemes for erroneous motion field estimation applied to Broadcast., vol. 56, no. 2, pp. 133–141, Feb. 2010.
MPEG-2 error concealment,” IEEE Trans. Multimedia, vol. 6, no. 6, [29] T. Tsai, A. Shi, and K. Huang, “Accurate frame rate up-conversion
pp. 876–885, Dec. 2004. for advanced visual quality,” IEEE Trans. Broadcast., vol. 62, no. 2,
[5] H. Chen, Y. Zhang, Y. Tao, B. Zou, and W. Tang, “An improved temporal pp. 426–435, Jun. 2016.
frame interpolation algorithm for H.264 video compression,” in Proc. [30] S. Niklaus and F. Liu, “Context-aware synthesis for video frame inter-
Data Compression Conf., Mar. 2011, p. 449. polation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
[6] N. Inamoto and H. Saito, “Virtual viewpoint replay for a soccer match 2018, pp. 1701–1710.
by view interpolation from multiple cameras,” IEEE Trans. Multimedia, [31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
vol. 9, no. 6, pp. 1155–1166, Oct. 2007. image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
[7] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhance- (CVPR), 2016, pp. 770–778.
ment with task-oriented flow,” Int. J. Comput. Vis., vol. 127, no. 8, [32] S. Niklaus, L. Mai, and F. Liu, “Video frame interpolation via adap-
pp. 1106–1125, 2019. tive convolution,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
[8] H. F. Ates, “Enhanced low bitrate H.264 video coding using decoder- (CVPR), 2017, pp. 2270–2279.
side super-resolution and frame interpolation,” Opt. Eng., vol. 52, no. 7, [33] S. Meyer, O. Wang, H. Zimmer, M. Grosse, and A. Sorkine-Hornung,
pp. 2131–2139, 2013. “Phase-based frame interpolation for video,” in Proc. IEEE Conf.
[9] L. Yan, Z. Zhaoyang, and A. Ping, “Stereo video coding based on frame Comput. Vis. Pattern Recognit., 2015, pp. 1410–1418.
estimation and interpolation,” IEEE Trans. Broadcast., vol. 49, no. 1, [34] W. Bao, W.-S. Lai, C. Ma, X. Zhang, Z. Gao, and M.-H. Yang, “Depth-
pp. 14–21, Mar. 2003. aware video frame interpolation,” in Proc. IEEE Conf. Comput. Vis.
[10] Y. T. Yang, Y. S. Tung, and J. L. Wu, “Quality enhancement of frame Pattern Recognit. (CVPR), 2019, pp. 3703–3712.
rate up-converted video by adaptive frame skip and reliable motion [35] W. Bao, W.-S. Lai, X. Zhang, Z. Gao, and M.-H. Yang, “MEMC-Net:
extraction,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 12, Motion estimation and motion compensation driven neural network for
pp. 1700–1713, Nov. 2007. video interpolation and enhancement,” IEEE Trans. Pattern Anal. Mach.
[11] J. Janai, F. Guney, J. Wulff, M. J. Black, and A. Geiger, “Slow flow: Intell., early access, Sep. 17, 2019, doi: 10.1109/TPAMI.2019.2941941.
Exploiting high-speed cameras for accurate and diverse optical flow [36] H. Zhang, R. Wang, and Y. Zhao, “Multi-frame pyramid refine-
reference data,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. ment network for video frame interpolation,” IEEE Access, vol. 7,
(CVPR), 2017, pp. 1406–1416. pp. 130610–130621, 2019.
[12] M. Usman, X. He, K.-M. Lam, M. Xu, S. M. M. Bokhari, and J. Chen, [37] J. van Amersfoort et al., “Frame interpolation with multi-scale deep
“Frame interpolation for cloud-based mobile video streaming,” IEEE loss functions and generative adversarial networks,” 2017. [Online].
Trans. Multimedia, vol. 18, no. 5, pp. 831–839, May 2016. Available: arXiv:1711.06045.
[13] K. Simonyan and A. Zisserman, “Very deep convolutional networks for [38] H. Liu, R. Xiong, D. Zhao, S. Ma, and W. Gao, “Multiple hypotheses
large-scale image recognition,” in Proc. Int. Conf. Learn. Represent. bayesian frame rate up-conversion by adaptive fusion of motion-
(ICLR), 2015, p. 6. compensated interpolations,” IEEE Trans. Circuits Syst. Video Technol.,
[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification vol. 22, no. 8, pp. 1188–1198, May 2012.
with deep convolutional neural networks,” in Proc. Neural Inf. Process. [39] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
Syst. (NIPS), 2012, pp. 1097–1105. in Proc. Int. Conf. Learn. Represent. (ICLR), 2015, p. 9.
[15] S. Y. Zhu, B. Zeng, L. Zeng, and M. Gabbouj, “Image interpolation [40] L. Xu, J. Jia, and Y. Matsushita, “Motion detail preserving optical flow
based on non-local geometric similarities and directional gradients,” estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 9,
IEEE Trans. Multimedia, vol. 18, no. 9, pp. 1707–1719, Sep. 2016. pp. 1744–1757, Sep. 2012.
[16] T. Zhou, S. Tulsiani, W. Sun, J. Malik, and A. A. Efros, “View synthesis [41] J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid,
by appearance flow,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, “DeepMatching: Hierarchical deformable dense matching,” Int. J.
pp. 286–301. Comput. Vis., vol. 120, no. 3, pp. 300–323, Dec. 2016.
[17] M. Mathieu, C. Couprie, and Y. Lecun, “Deep multi-scale video
prediction beyond mean square error,” in Proc. Int. Conf. Learn. Bo Yan (Senior Member, IEEE) received the B.E.
Represent. (ICLR), 2016, p. 6. and M.E. degrees in communication engineering
[18] S. Niklaus, L. Mai, and F. Liu, “Video frame interpolation via adaptive from Xi’an Jiaotong University in 1998 and 2001,
separable convolution,” in Proc. Int. Conf. Comput. Vis. (ICCV), 2017, respectively, and the Ph.D. degree in computer sci-
pp. 261–270. ence and engineering from the Chinese University
[19] Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala, “Video frame syn- of Hong Kong in 2004.
thesis using deep voxel flow,” in Proc. Int. Conf. Comput. Vis. (ICCV), From 2004 to 2006, he worked with the National
2017, pp. 4473–4481. Institute of Standards and Technology, USA, as a
[20] H. Jiang et al., “Super SloMO: High quality estimation of multiple Postdoctoral Guest Researcher. He is currently a
intermediate frames for video interpolation,” in Proc. IEEE Conf. Professor with the School of Computer Science,
Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 9000–9008. Fudan University, Shanghai, China. His research
[21] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial interests include video processing, computer vision, and multimedia commu-
pyramid network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. nications. He has served as the Associate Editor for IEEE T RANSACTIONS ON
(CVPR), 2017, pp. 2720–2729. C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY, and the Guest Editor
[22] J. Xu, R. Ranftl, and V. Koltun, “Accurate optical flow via direct cost of Special Issue on “Content-Aware Visual Systems: Analysis, Streaming and
volume processing,” in Proc. Comput. Vis. Pattern Recognit. (CVPR), Retargeting” for IEEE J OURNAL ON E MERGING AND S ELECTED TOPICS IN
2017, pp. 5807–5815. C IRCUITS AND S YSTEMS.
184 IEEE TRANSACTIONS ON BROADCASTING, VOL. 67, NO. 1, MARCH 2021

Weimin Tan (Member, IEEE) received the mas- Liquan Shen (Member, IEEE) received the
ter’s degree from the College of Communication B.S. degree in automation control from Henan
Engineering, Chongqing University, Chongqing, Polytechnic University, Henan, China, in 2001, and
China, and the Ph.D. degree from the School the M.E. and Ph.D. degrees in communication
of Computer Science, Fudan University, Shanghai, and information systems from Shanghai University,
China, where he is currently working as a Shanghai, China, in 2005 and 2008, respectively.
Postdoctoral Researcher. His research interests Since 2008, he has been with the Faculty of
include digital image and video processing. the School of Communication and Information
Engineering, Shanghai University, where he is cur-
rently a Professor. He has authored or coauthored
more than 100 refereed technical papers in inter-
national journals and conferences in the field of video coding and image
Chuming Lin is currently pursuing the bachelor’s processing. He holds ten patents in the areas of image/video coding and
degree with the School of Computer Science, Fudan communications. His major research interests include high efficiency video
University, Shanghai, China. His research interests coding, perceptual coding, video codec optimization, 3DTV, and video quality
include computer vision and machine learning. assessment.

First Grading Exam Kinder
84% (63)
First Grading Exam Kinder
4 pages
Conv Transformer
No ratings yet
Conv Transformer
17 pages
Video Sum5
No ratings yet
Video Sum5
5 pages
OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
No ratings yet
OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
15 pages
Super SloMo
No ratings yet
Super SloMo
12 pages
Generalizable Implicit Motion Modeling For Video Frame Interpolation
No ratings yet
Generalizable Implicit Motion Modeling For Video Frame Interpolation
18 pages
AMT All Pairs Multi Field Transforms For Efficient Frame Interpolation
No ratings yet
AMT All Pairs Multi Field Transforms For Efficient Frame Interpolation
21 pages
OpenCV C# Wrapper Based Video Enhencement Using Different Optical Flow Methods in The Super Resolution
No ratings yet
OpenCV C# Wrapper Based Video Enhencement Using Different Optical Flow Methods in The Super Resolution
6 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Plug-And-Play Video Super-Resolution Using Edge-Preserving Filtering
No ratings yet
Plug-And-Play Video Super-Resolution Using Edge-Preserving Filtering
9 pages
Particle Video: Long-Range Motion Estimation Using Point Trajectories
No ratings yet
Particle Video: Long-Range Motion Estimation Using Point Trajectories
8 pages
Video Motion Estimation and Compensation
No ratings yet
Video Motion Estimation and Compensation
11 pages
Interpolation PDF
No ratings yet
Interpolation PDF
11 pages
Motion-Conditioned Diffusion Model For Controllable Video Synthesis
No ratings yet
Motion-Conditioned Diffusion Model For Controllable Video Synthesis
14 pages
Video Enhancement With Task-Oriented Flow: Liu and Sun 2011 Baker Et Al 2011 Liu and Freeman 2010
No ratings yet
Video Enhancement With Task-Oriented Flow: Liu and Sun 2011 Baker Et Al 2011 Liu and Freeman 2010
20 pages
Frame-Recurrent Video Super-Resolution
No ratings yet
Frame-Recurrent Video Super-Resolution
9 pages
Optical Flow and Trajectory Methods
No ratings yet
Optical Flow and Trajectory Methods
16 pages
Go With The Flow
No ratings yet
Go With The Flow
22 pages
Burgert Go-with-The-Flow Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise CVPR 2025 Paper
No ratings yet
Burgert Go-with-The-Flow Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise CVPR 2025 Paper
11 pages
Research On Human Action Recognition Model Based On Computer Laplacian Matrix An
No ratings yet
Research On Human Action Recognition Model Based On Computer Laplacian Matrix An
6 pages
Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame For 4D Medical Images
No ratings yet
Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame For 4D Medical Images
19 pages
Unit V
No ratings yet
Unit V
48 pages
Unit 4 - Speech and Video Processing (SVP)
No ratings yet
Unit 4 - Speech and Video Processing (SVP)
32 pages
Correlation-Based Motion Vector Processing With Adaptive Interpolation Scheme For Motion-Compensated Frame Interpolation
No ratings yet
Correlation-Based Motion Vector Processing With Adaptive Interpolation Scheme For Motion-Compensated Frame Interpolation
14 pages
FRESCO
No ratings yet
FRESCO
10 pages
Yuvraj 6
No ratings yet
Yuvraj 6
4 pages
FIOT Group 18 IOT Device Identification Copy
No ratings yet
FIOT Group 18 IOT Device Identification Copy
6 pages
Divp Unit 5-1
No ratings yet
Divp Unit 5-1
29 pages
Optical Flow and Motion Analysis
No ratings yet
Optical Flow and Motion Analysis
12 pages
Wang 等 - 2023 - Tracking Everything Everywhere All at Once
No ratings yet
Wang 等 - 2023 - Tracking Everything Everywhere All at Once
15 pages
OpenCV Lections: 7. Working With Camera. Background and Motion Analysis
100% (3)
OpenCV Lections: 7. Working With Camera. Background and Motion Analysis
22 pages
Ch-6 Computer Vision
No ratings yet
Ch-6 Computer Vision
12 pages
2025 - Temporal Regularization Makes Your Video Generator Stronger - Chen Et Al
No ratings yet
2025 - Temporal Regularization Makes Your Video Generator Stronger - Chen Et Al
13 pages
Optical Flow (Motion Vector) Computation
No ratings yet
Optical Flow (Motion Vector) Computation
18 pages
Video Interpolation Final Report
No ratings yet
Video Interpolation Final Report
70 pages
Chapter E 5
No ratings yet
Chapter E 5
117 pages
Depth-Aware Video Frame Interpolation Supplementary Material
No ratings yet
Depth-Aware Video Frame Interpolation Supplementary Material
17 pages
MotionEstimation FirstReport
No ratings yet
MotionEstimation FirstReport
5 pages
Lect15 PDF
No ratings yet
Lect15 PDF
37 pages
Motion Optical Flow
No ratings yet
Motion Optical Flow
34 pages
Mahapatra Controllable Animation of Fluid Elements in Still Images CVPR 2022 Paper
No ratings yet
Mahapatra Controllable Animation of Fluid Elements in Still Images CVPR 2022 Paper
10 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Motion Detection in Video
No ratings yet
Motion Detection in Video
6 pages
Frame-Skip Convolutional Neural Networks For Action Recognition
No ratings yet
Frame-Skip Convolutional Neural Networks For Action Recognition
6 pages
Ilidrissi-Tan2019 Article ADeepUnifiedFrameworkForSuspic
No ratings yet
Ilidrissi-Tan2019 Article ADeepUnifiedFrameworkForSuspic
6 pages
Zhou Unsupervised Cumulative Domain Adaptation For Foggy Scene Optical Flow CVPR 2023 Paper
No ratings yet
Zhou Unsupervised Cumulative Domain Adaptation For Foggy Scene Optical Flow CVPR 2023 Paper
10 pages
Motion Estimation: Advancements and Applications in Computer Vision
From Everand
Motion Estimation: Advancements and Applications in Computer Vision
Fouad Sabry
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Fyp Review 2
No ratings yet
Fyp Review 2
8 pages
Flow-Edge Guided Video Completion
No ratings yet
Flow-Edge Guided Video Completion
17 pages
Gob Spic 20
No ratings yet
Gob Spic 20
29 pages
08.20.layered Representation For Motion Analysis
No ratings yet
08.20.layered Representation For Motion Analysis
6 pages
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
From Everand
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
Fouad Sabry
No ratings yet
Human Fall Detection Using Optical Flow Farne Back
No ratings yet
Human Fall Detection Using Optical Flow Farne Back
15 pages
Detecting Abnormal Events in University Areas
No ratings yet
Detecting Abnormal Events in University Areas
5 pages
2302.03011 Ai Video Creation
No ratings yet
2302.03011 Ai Video Creation
26 pages
Computer Vision and Image Understanding: Hamed Elwarfalli, Russell C. Hardie
No ratings yet
Computer Vision and Image Understanding: Hamed Elwarfalli, Russell C. Hardie
11 pages
Lecture 20 - Video Coding
No ratings yet
Lecture 20 - Video Coding
36 pages
PHD Research Proposal of Jan Paul
No ratings yet
PHD Research Proposal of Jan Paul
13 pages
Deep Video Deblurring
No ratings yet
Deep Video Deblurring
10 pages
RLHF3
No ratings yet
RLHF3
13 pages
Government Elementary School Profile (GESP) : Online Orientation On BEIS Forms School Year 2020-2021
No ratings yet
Government Elementary School Profile (GESP) : Online Orientation On BEIS Forms School Year 2020-2021
73 pages
Calculus For Engineering Q1 Exercises Lecture 1
No ratings yet
Calculus For Engineering Q1 Exercises Lecture 1
3 pages
Thesis Cover Sheet Template
100% (3)
Thesis Cover Sheet Template
5 pages
Cultural Heritage
No ratings yet
Cultural Heritage
1 page
Knight-Bagehot Fellowship: in Economics and Business Journalism
No ratings yet
Knight-Bagehot Fellowship: in Economics and Business Journalism
23 pages
The Unforgiving Complexity of Teaching
No ratings yet
The Unforgiving Complexity of Teaching
3 pages
Abhinav Saxena: Profile Info Work Experience
No ratings yet
Abhinav Saxena: Profile Info Work Experience
1 page
RA CHEMENG CEBU Nov2018 PDF
No ratings yet
RA CHEMENG CEBU Nov2018 PDF
11 pages
CRYPTOCURRENCY CULTURE AND ITS EFFECTS ON STUDENTS PSYCHOLOGICAL STATE (Group 8)
No ratings yet
CRYPTOCURRENCY CULTURE AND ITS EFFECTS ON STUDENTS PSYCHOLOGICAL STATE (Group 8)
43 pages
UNIT 3 A
No ratings yet
UNIT 3 A
36 pages
ESL Language Scope and Sequence
No ratings yet
ESL Language Scope and Sequence
35 pages
2014 Guide Komai Fellowship
No ratings yet
2014 Guide Komai Fellowship
4 pages
Problem Sheet - Subject - Final Exam Odd Sem - 1920 (SMA NSA)
No ratings yet
Problem Sheet - Subject - Final Exam Odd Sem - 1920 (SMA NSA)
4 pages
Interview Questions, Study Materials For Computer Science
No ratings yet
Interview Questions, Study Materials For Computer Science
2 pages
CCCR Application Form
No ratings yet
CCCR Application Form
3 pages
Value Education
No ratings yet
Value Education
2 pages
Effects of Social Media Platforms To Students' Academic Performance in The New Normal Education
No ratings yet
Effects of Social Media Platforms To Students' Academic Performance in The New Normal Education
27 pages
Dustin F. Edward: Career Objective
No ratings yet
Dustin F. Edward: Career Objective
1 page
Session 2
No ratings yet
Session 2
3 pages
International Business Negotiation (STRM060)
No ratings yet
International Business Negotiation (STRM060)
11 pages
Orientation On MELC For EPP - TLE Presentationpdf
63% (8)
Orientation On MELC For EPP - TLE Presentationpdf
75 pages
Kapil Sibal'S Brand of Education Reforms: A Presentation ON
No ratings yet
Kapil Sibal'S Brand of Education Reforms: A Presentation ON
11 pages
LW313 Outline 2019
No ratings yet
LW313 Outline 2019
10 pages
The Dark Psychology of Social Media Design
No ratings yet
The Dark Psychology of Social Media Design
2 pages
Principal Translations of The Thirteen C
No ratings yet
Principal Translations of The Thirteen C
10 pages
GA EM2 Lesson Plan
No ratings yet
GA EM2 Lesson Plan
6 pages
Vbu0003411 1748013240
No ratings yet
Vbu0003411 1748013240
2 pages
APSACS Summer Holiday Homework Guide 2025
No ratings yet
APSACS Summer Holiday Homework Guide 2025
34 pages
Survey Questionnaire Group 3
No ratings yet
Survey Questionnaire Group 3
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Yan 2021 Fine Grained Motion Estimation For

Uploaded by

Yan 2021 Fine Grained Motion Estimation For

Uploaded by

174 IEEE TRANSACTIONS ON BROADCASTING, VOL. 67, NO.

Fine-Grained Motion Estimation for Video

presents a unified deep network to combine bi-direction opti-

at a given position x as follows:

where S(x) = {x ∈ N (y + (2 − t) ∗ f2→1 est (y)), ∀y ∈ [1, H] ×

[1, W]}. N (y) denotes the neighboring pixels of y. w(.) is

Then, we exploit trainable sub-networks (shown in blue in

A. Implementation difference of estimated optical flows in Figure 5 (f) and

where It and It denote the inferred intermediate frame and

3) Effect of Optical Flow Guidance: Table III reports the

B. Comparison With Prior Art

Fig. 8. Visual comparison of challenging fast motion tested on Middlebury.

C. Average Motion Amplitude vs. Interpolation Quality

Fig. 9. Comparison of different approaches on Vimeo-90K with respect to

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.