Deep Quantified Visibility Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

atmosphere

Article
Deep Quantified Visibility Estimation for Traffic Image
Fang Zhang, Tingzhao Yu * , Zhimin Li, Kuoyin Wang, Yu Chen, Yan Huang and Qiuming Kuang *

Public Meteorological Service Center, China Meteorological Administration, Beijing 100081, China
* Correspondence: tsingzao@hotmail.com (T.Y.); qmkuang@hotmail.com (Q.K.)

Abstract: Image-based quantified visibility estimation is an important task for both atmospheric sci-
ence and computer vision. Traditional methods rely largely on meteorological observation or manual
camera calibration, which restricts its performance and generality. In this paper, we propose a new
end-to-end pipeline for single image-based quantified visibility estimation by an elaborate integration
between meteorological physical constraint and deep learning architecture design. Specifically, the
proposed Deep Quantified Visibility Estimation Network (abbreviated as DQVENet) consists of three
modules, i.e., the Transmission Estimation Module (TEM), the Depth Estimation Module (DEM), and
the Extinction coEfficient Estimation Module (E3M). Casting on these modules, the meteorological
prior constraint can be combined with deep learning. To validate the performance of DQVENet, this
paper also constructs a traffic image dataset (named QVEData) with accurate visibility calibration.
Experimental results compared with many state-of-the-art methods on QVEData demonstrate the
effectiveness and superiority of DQVENet.

Keywords: quantified visibility estimation; deep learning; traffic image; atmospheric observation

1. Introduction
Quantified visibility estimation is of great significance in applications such as air
safety [1], ground transport control [2], as well as air quality assessment [3]. Recognizing
Citation: Zhang, F.; Yu, T.; Li, Z.;
the current visibility state has been an urgent need for many areas [4,5] and plays a key
Wang, K.; Chen, Y.; Huang, Y.; Kuang, role in machine learning [6].
Q. Deep Quantified Visibility Typical visibility estimation methods rely largely on the professional meteorologi-
Estimation for Traffic Image. cal stations with expensive sensors and human observations. However, limited by the
Atmosphere 2023, 14, 61. https:// manufacturing costs, these stations are often distributed in a non-uniform way, which in
doi.org/10.3390/atmos14010061 turn, restricts the capability of accurate visibility estimation. Recently, with the widespread
use of mobile cameras, thousands of images have been collected under different weather
Academic Editors: Duanyang Liu,
conditions, which might provide an alternative solution for quantified visibility estimation.
Chandrasekar Radhakrishnan,
Haonan Chen and V. Chandrasekar
In fact, many researchers have tried to explore the possibility of estimating visibility
from single images in the past decades [7–12]. Hautiére et al. [7] propose a probabilistic
Received: 13 October 2022 model-based approach, which takes into account the distribution of contrasts in the scene.
Revised: 25 November 2022 Thus, the proposed model is more robust to illumination variations. Though helpful, this
Accepted: 24 December 2022 method is still limited in ability to deal with cameras captured in daylight. In order to
Published: 28 December 2022
improve the performance during night, Varjo et al. [8] propose a method based on feature
vectors that are projections of the scene images with lighting normalization. Li et al. [9]
estimate the extinction coefficient in a clear atmosphere by assuming it to be approximately a
Copyright: © 2023 by the authors.
constant. Combined with the dark channel prior [13], the ratio of two extinction coefficients
Licensee MDPI, Basel, Switzerland. in the current and clear atmosphere can be calculated, and the quantified visibility can then
This article is an open access article be obtained. It is further improved by employing an edge collapse-based transmission
distributed under the terms and refinement [10].
conditions of the Creative Commons With the rapid development of visibility estimating methods, one notable issue is that
Attribution (CC BY) license (https:// the precise visibility label is difficult to obtain, and the not necessarily accurate visibility
creativecommons.org/licenses/by/ label has agreement on image-based visibility estimation. On one hand, Song et al. [11]
4.0/). solve this by associating a label distribution with each image. The label distribution contains

Atmosphere 2023, 14, 61. https://doi.org/10.3390/atmos14010061 https://www.mdpi.com/journal/atmosphere


Atmosphere 2023, 14, 61 2 of 15

all of the possible visibilites with corresponding probabilities. After that, typical machine
learning-based methods can be employed. On the other hand, Xun et al. [12] use the extra
ordinal information and relative relation of images for visibility estimation. They pre-train
a model using classified outdoor foggy images and then fine-tune the model via indoor
synthetic continuous annotation.
Though effective, most of the former mentioned methods confront the following
problems. Under the influence of camera angle, previous methods need large efforts on
parameter calibration, which (a) restricts the model generalization and increases the model
costs. Moreover, (b) the image scale is also sensitive. For instance, some cameras focus on
local road details (small scale), while others concentrate on global information (large scale).
Apart from that, the precedent visibility estimation approaches exploit (c) either data-driven
or physical-constrained techniques. We believe the physical constraint plays a key role in
visibility estimation and the data-driven strategy is also important in future work.
Consequently, to solve these issues and achieve quantified visibility estimation from
images, this paper proposes a new network architecture, named Deep Quantified Visibility
Estimation Network (abbreviated DQVENet), especially for traffic images. The contribu-
tions of this paper are summarized as follows.
1. By integrating physical constraint with deep learning network, a novel framework
named DQVENet is proposed for single image visibility estimation.
2. Within this framework, a Transmission Estimation Module (TEM), a Depth Estimation
Module (DEM), and an Extinction Coefficient Estimation Module (E3M) is unified as
a whole according to meteorological theory.
3. A new benchmark dataset, which is especially designed for traffic image-based quan-
tified visibility estimation, is constructed.

2. Related Work
From the perspective of machine learning, image-based quantified visibility estima-
tion is equal to learning a regression model. This paper focuses on deep learning-based
visibility estimation. As a result, this section first presents a short investigation of deep
learning methods followed by discussing single image visibility estimation. Considering
the great similarity between visibility estimation and weather recognition, we give a brief
introduction to it. Furthermore, taking the fact that visibility can be estimated via dehazing
techniques into consideration, single image-based haze removal is also discussed.

2.1. Deep Learning


Since the emergence of AlexNet [14] in computer vision, deep learning has attracted
attention among areas such as speech recognition [15], natural language processing [16],
video understanding [17], etc. Recently, it also shows great potential in atmospheric science.
For example, by a thorough combination of materials such as satellite, radar, and ground
station, Sønderby et al. [18] propose MetNet for precipitation forecasting. Results show
that it is superior to pure numerical weather prediction. For getting precise and accurate
prediction, Yu et al. [19] proposes an auxiliary guided spatial distortion network. However,
few existing methods take both a deep learning architecture design and meteorological
theory into account, which might be a possible avenue for future research. To achieve
that, Kuang et al. [20] introduce three branches corresponding to the temperature variation
equation in deep learning and propose a new method for temperature forecasting.

2.2. Image-Based Visibility Estimation


Visibility-related studies have been a hot research topic for decades. Basically, both
the parameter settings of the camera and the environmental conditions highly affect image
quality, which influences estimation accuracy. To solve this issue, Li et al. [21] employ a
pre-trained convolutional neural netwok to extract the visibility features automatically
instead of manual extraction. Giyenko et al. [22] also implement a simple but useful three-
layer network for visibility estimation and trained it on a dataset collected in South Korea.
Atmosphere 2023, 14, 61 3 of 15

Palvanov et al. [23] propose a new approach based on deep integrated convolutional neural
networks for the estimation of visibility distances from camera imagery. This network uses
three paralleled streams of deep integrated convolutional neural networks.

2.3. Weather Recognition from Images


Apart from quantified visibility estimation, recognizing the weather conditions from
images is also of great significance. Under the framework of pattern recognition, a possible
solution is treating weather recognition as image classification [6]. Casting on this mecha-
nism, there are methods concentrating on extracting discriminative features such as global
histogram [24] as input for classification methods such as support vector machine [25].
With the overwhelming successes of deep learning, many brilliant methods have also been
proposed [24,26,27].

2.4. Single Image Haze Removal


There is a deterministic correlation between the fog image and clear image, and
the visibility can be mathematically calculated through an investigation of these two
images. Consequently, there are also methods estimating image visibility via dehazing
techniques. Specifically, Zhou et al. [28] propose a visibility estimation method based
on dark channel prior and image entropy. The dark channel prior is used to estimate
atmospheric transmittance and optimized via guided filter. The road region is extracted
based on a region growing algorithm. After that, the depth map is calculated using lane
information and the haze visibility is obtained based on the minimum image entropy of road
images. Bae et al. [29] propose a visibility distance estimation method using dark channel
prior-based single image haze removal. Hence, the dark channel for an input sea-fog image
is first calculated. The binary transmission image is obtained by applying a threshold to
the estimated transmission from the dark channel. Then, the sum of the distance values of
pixels, corresponding to the sea-fog boundary, is averaged, in order to derive the visibility
distance. Furthermore, image dehazing is a highly ill-posed problem [30] and various
methods have been developed to tackle this problem. Specifically, the key ingredient
for haze removal is the estimation of the transmission map. Except for dark priors [13],
Cai et al. [31] introduce an end-to-end network for estimating it with a novel activation
unit. Ren et al. [32] and Li et al. [33] also provide effective frameworks for transmission
map estimation.

3. Materials
3.1. Motivation
Quantified visibility estimation is quite important to traffic control. Nevertheless,
there is few datasets focusing on this, which limits the developing of estimation methods.
To illustrate this key issue and demonstrate the effectiveness of the proposed method,
this paper provides a new dataset named QVEData (abbreviation for Quantified Visibility
Estimation Dataset).

3.2. Dataset Construction


Images are first obtained via real high-speed road cameras and 24,031 traffic images
are collected during this stage. The corresponding quantified visibility observation are
acquired using the nearest meteorological station. Basically, five cameras (denoted as C1,
C2, C3, C4, and C5) are employed and the distances between the camera and the nearest
visibility station are listed as following Table 1.

Table 1. Distances between the camera and the nearest visibility station.

Camera Id C1 C2 C3 C4 C5
Distance 450 m 402 m 268 m 501 m 171 m
Atmosphere 2023, 14, 61 4 of 15

Unfortunately, most of these images have high similarity and the corresponding
visibility observations tend to be consistent. After similarity eliminating and manual
quality control, 3236 images are finally preserved for constructing the final dataset. We
denote this dataset Quantified Visibility Estimation Dataset (abbreviated as QVEData).
Considering the fact that this paper is devoted to visibility estimation of traffic images, the
selected cameras are along a high-speed road. Therefore, the road is the main content of
the image. Nevertheless, due to the variation of traffic cameras, camera angles, weather
conditions, illuminations, moving vehicles, etc., the finally preserved 3326 images are
quite different, which increases the difficulty of visibility estimation. Figure 1 presents the
visibility distribution of QVEData. Specifically, most of the corresponding visibility lies in
the interval of 0–20 km, with a focus on 0–10 km.

60
50
Number of images

40
30
20
10
0 0 25 50 75 100 125 150 175
Visibility (x200m)
Figure 1. Visibility distribution of QVEData. (Note ×200 m denotes the unit of the x-axis. In other
words, if x = 25, then the visibility is actually 25 × 200 m = 5000 m.)

3.3. Comparison with Other Datasets


Generally, there are many significant fog-related datasets. Table 2 gives an intuitive
comparison among these datasets and the proposed QVEData. Specifically, we divide
these datasets into three categories according the corresponding tasks, i.e., image dehazing,
weather recognition, and visibility estimation.

Table 2. Comparison of different fog-related datasets.

Task Name Year Reference Property


D-Haze 2016 [34] indoor, synthetic images
I-Haze 2018 [35] indoor, real images
O-Haze 2018 [36] outdoor, real images
RESIDE 2018 [37] indoor and outdoor, synthetic and real images
Image Dehazing Dense-Haze 2019 [38] outdoor, synthetic images
synthetic images, with clean image, transmission map, and
Haze4K 2021 [39]
atmospheric light as groundtruth
NH-Haze 2020 [40] outdoor, real, non-homogeneous images
REVIDE 2021 [41] real, videos
Atmosphere 2023, 14, 61 5 of 15

Table 2. Cont.

Task Name Year Reference Property


MWI 2015 [42] Sunny, Rainy, Snowy, Haze
Image2Weather 2017 [43] Sunny, Cloudy, Snowy, Rainy, Foggy, Others
MWD 2017 [44] Cloudy, Rainy, Snowy, Haze, Thunder, Sunny
Weather Recognition RFS 2018 [27] Rain, Fog, Snow
Five-class 2019 [45] Sunny, Cloudy, Foggy, Rainy, Snowy
Sunny, Fog50, Fog200, Fog500, RoadSnow, RoadWet,
TWData 2020 [6]
RoadIce
WILD 2002 [46] real images
Visibility Estimation real traffic images, with professional meteorological
QVEData 2022 ours
calibration

4. Methods
4.1. Preliminary
For a given hazy image I, it is degraded due to the presence of haze, which can be
mathematically formulated as

I ( x ) = J ( x )t( x ) + A( x )(1 − t( x )) (1)

where J is the groundtruth image (without haze), t is the transmission map, A is the global
atmospheric light, and x is the pixel position. Specifically, the transmission map t can be
expressed as

t( x ) = e− βd( x) (2)

where β represents the atmospheric extinction coefficient and d is the scene depth.
According to Koschmieder’s law, the visibility can be expressed as a function of the
atmospheric extinction coefficient, i.e.,

C = e− βV (3)

where C is a threshold contrast and V is the visibility in metres. In other words, the visibility
can be quantified as

ln C
V=− (4)
β

Typically, the threshold C is set to be 0.05. Consequently, Equation (4) can be approximately
formulated as
3
V= (5)
β

For Equation (2), we can rewrite it as

− ln t( x )
β= (6)
d( x )

Consequently, for a given position, the corresponding visibility can be roughly estimated as

3d( x )
V=− (7)
ln t( x )
ResEncoder DepthEncoder

Atmosphere 2023, 14, 61 6 of 15

Conv Block

Conv Block

Conv Block

Conv Block
by integrating Equation (6) to Equation (5).

4.2. Overall Architecture DensetNet based Atmospheric Extinction


Coefficient Estimation

From Equation (7), the visibility V is highly correlated with transmission map t and
scene depth d. This motivates us to design a network that is physically constrained by
Equation (7). As a result, this paper proposes Deep Quantified Visibility Estimation network
(abbreviated as DQVENet), an elaborately designed convolutional architecture, especially
for single image-based quantified visibility estimation. The overall architecture can be
found in Figure 2.

Depth Estimation Module


Depth

Visibility

Conv Block

Conv Block

Conv Block

Conv Block
Input
x4
x8
Extinction Coefficient Estimation Module Quantified Visibility Estimation
x16
x32

Transsimission

Transmission Estimation Module

Figure 2. Overall architecture of the proposed DQVENet.

Following the working mechanism of Equation (5), DQVENet mainly consists of three
basic modules, i.e., the Transmission Estimation Module (TEM), the Depth Estimation
Module (DEM), and the Extinction coEfficient Estimation Module (E3M). We then present
the details of each module.

4.3. Transmission Estimation Module


To estimate the transmission map from a single image, previous methods [30,32,47]
often use the multi-level features. DQVENet employs the densely connected pyramid
network [30] as the Transmission Estimation Module (TEM). Specifically, TEM adopts a
densely connected UNet [30,48] structure for feature encoder-decoder.
Denote the input image as X ∈ Rw×h×c , where w, h, and c represent the width, height,
and the number of channels of X. It is first processed by a 3 × 3 convolution with stride
2, followed by batch normalization, ReLU activation, and Max-pooling, which can be
mathematically formulated as

0
XTEM = MaxPool(ReLU(BN(Conv3×3 ( X )))) (8)
w h
0
Here, XTEM ∈ R 4 × 4 ×c0 is the pre-processed feature for the following cascaded encoder-
decoder. Without loss of generality and for simplicity, we still use X to represent the input
feature in the following.
The encoder comprises of both a dense block and a transition block. For dense block,
it contains two sequential batch normalization—ReLU activation—convolution layers.
Mathematically, it can be defined as
E1
Xdense = Conv1×1 (ReLU(BN( X )))
E2
(9)
Xdense = Conv3×3 (ReLU(BN( X )))

For the transition block, it contains an additional Max-pooling layer for enlarging receptive
fields compared with the dense block, i.e.,

E1
Xtrans = MaxPool(Conv1×1 (ReLU(BN( X )))) (10)
Atmosphere 2023, 14, 61 7 of 15

Similar to the encoder, the decoder also involves a dense block and a transition block as
D1
Xdense = Conv1×1 (ReLU(BN( X )))
D2
Xdense = Conv3×3 (ReLU(BN( X ))) (11)
D1
Xtrans = Upsample(TransConv1×1 (ReLU(BN( X ))))

We should note that the stride of the transpose convolution is set to 1, and the up-sampling
operation is indeed achieved by the nearest interpolation. Moreover, the encoder feature at
different levels are also fed to the corresponding decoder layer for precise feature extraction.
For getting global structural information at different scales, the multi-level pyramid
pooling block is employed. Specifically, four Max-pooling operations with size 14 , 18 , 16 1
,
1
and 32 are used, i.e.,
 1

 X pool = Upsample(ReLU(Conv1×1 (MaxPool4 ( X ))))

 X 2pool = Upsample(ReLU(Conv1×1 (MaxPool8 ( X ))))


(12)


 X 3pool = Upsample(ReLU(Conv1×1 (MaxPool16 ( X ))))

 4

X pool = Upsample(ReLU(Conv1×1 (MaxPool32 ( X ))))

and they are concatenated for estimating the final transmission map Xt using Tanh activa-
tion as

Xt = Tanh(Conv1×1 (concat([ X 1pool , X 2pool , X 3pool , X 4pool ]))) (13)

4.4. Depth Estimation Module


Depth estimation from a single image has been explored through various approaches
such as non-parametric scene sampling [49], supervised end-to-end learning [50], semi-
supervised estimation [51], and generative adversarial prediction [52].
Taking the fact that obtaining ground-truth scene depth from unconstrained images is
relatively hard into consideration, DQVENet focuses on seeking the approximated scene
depth in an unsupervised or self-supervised manner. Specifically, the pre-trained self-
supervised scene estimation method proposed by Godard et al. [53] is employed as the
Depth Estimation Module (DEM).
Basically, the DEM consists of several ResNet [54]-based encoder-decoder blocks, i.e.,
Res-UNet [55]. Compared with vanilla UNet [48] or other backbones, taking Res-UNet as
our quantified visibility-oriented depth estimation, i.e., DEM, has the following advantages.
1. It contains more layers with more parameters. Consequently, it can describe the more
abstract high-level features accurately.
2. The structure is more sophisticated. In application, the images can be obtained at any
place, any time, and any view, which increases the difficulty of image recognition.
This architecture promotes the corresponding generalization.
3. Benefiting from the residual connection, training such a network is supposed to be
more easy with less risk of gradient vanishing.

4.5. Joint Atmospheric Extinction Coefficient and Quantified Visibility Estimation


Denoting the estimated transmission map and the depth map as Xt ∈ Rw×h×3 and
Xd ∈ Rw×h×1 , respectively, the atmospheric extinction coefficient can be obtained accord-
ing to Equation (6) and the quantified visibility estimation can be further achieved via
Equation (5).
However, imposing Xt and Xd to Equation (6) directly confronts the following two
main inconveniences.
1. Due to the lack of labeled transmission and depth maps, we cannot fine-tune the
TEM and DEM on our own data or other weather images. Consequently, we only
Atmosphere 2023, 14, 61 8 of 15

use the pre-trained model and the pre-obtained transmission map Xt and depth
map Xd are relatively coarse, which will restrict the performance of quantified
visibility estimation.
2. The two modules, i.e., TEM and DEM, work separately. Limited by this strategy, the
obtained transmission map and depth map are relatively independent. Nevertheless,
for a given image, these items should be highly correlated. Moreover, bringing Xt and
Xd into Equation (6) makes the network not end-to-end, which in turns constraints
the efficiency.
To overcome the former illustrated problems, this paper attempts to integrate the
TEM and DEM together for joint atmospheric extinction coefficient and quantified visibility
estimation. As a result, the DenseNet [56]-based Extinction coEfficient Estimation Module
(E3M) with joint quantified visibility estimation is introduced.
For E3M, the estimated transmission map Xt and depth map Xd are first concatenated
as the input, and then processed via a pre-trained 121-layer DenseNet by replacing the
last classification layer with a regression layer. The number of channel inputs for the first
convolutional layer is also set to 4 instead of the original 3. This makes the whole network
end-to-end trainable, and we denote this architecture DQVENet.

4.6. Iterative Training


For getting better performance, this paper employs an iterative training strategy. To
be specific, both TEM and DEM are first pre-trained using the corresponding dehaze
dataset [30] and depth estimation dataset [53]. After that, these two modules are frozen
and combined to the DQVENet for training E3M. Finally, the whole DQVENet is fine-tuned
with a relatively small learning rate.
DQVENet is implemented using Pytorch (https://pytorch.org/, accessed on 28 Novem-
ber 2022). During pre-training of TEM and DEM, we use the same setting as the original
paper. For E3M, the stochastic gradient descent optimizer with learning rate 1 × 10−4 is
employed. The learning rate drops by 0.1 during epoch 150 and 225. The early stopping
strategy is also implemented, i.e., training stops after 300 epochs or the validation loss does
not decrease within 10 epochs. As for fine-tuning DQVENet, a small learning rate 1 × 10−5
is employed.

5. Results
To validate the performance of the proposed DQVENet, this section conducts exper-
iments on QVEData. Specifically, there are a total of 3236 images for QVEData. We split
these images randomly into training, validation, and testing with ratio 60%, 10%, and 30%,
respectively. Casting on this, these experiments can be mainly divided into two categories,
i.e., qualitative results and quantitative results.

5.1. Qualitative Results


Figure 3 first presents an intuitive illustration of the learned depth map and trans-
mission map with their corresponding original image. For depth map, the green area
represents short depth and the blue area denotes long depth. While for transmission map,
the dark area indicates low transmissivity and the light area stands for high transmissivity.
From this figure, both the estimated depth map of DEM and the estimated transmis-
sion map of TEM can describe the scene depth and transmission map to some extent. This
is mainly due to the reason that these two modules are pre-trained using the correspond-
ing dataset.
Atmosphere 2023, 14, 61 9 of 15

Input Depth Map Transmission Map

Visibility
Conv Block

tion Module Quantified Visibility Estimation

Figure 3. Visualization of the obtained depth map and transmission map for DQVENet.

5.2. Quantitative Results


Figure 4 then shows the quantified visibility estimation of DQVENet and the ground
truth visibility observation. To be specific, the first two rows present examples that
DQVENet is underestimated. The third row gives results of DQVENet being overesti-
mated; the last row shows the instances that DQVENet fits groundtruth observations well.
Figure 5 presents the DQVENet results and the groundtruth visibility observation on
the testing set of QVEData. Specifically, the 971 testing images are first sorted from small to
large according to their groundtruth visibility, i.e., the orange line. Then, the corresponding
predictions are plotted along with the groundtruth, i.e., the blue line. Finally, we also
present the soft gap, i.e., the red area, for better demonstrating the effectiveness of DQVENet.
Except for some overestimated cases, DQVENet can fit the visibility well in general.
Atmosphere 2023, 14, 61 10 of 15

Prediction: 829m Prediction: 895m Prediction: 810m Prediction: 461m


GroundTruth: 1094m GroundTruth: 1379m GroundTruth: 1205m GroundTruth: 1135m

Prediction: 789m Prediction: 895m Prediction: 895m Prediction: 986m


GroundTruth: 1291m GroundTruth: 762m GroundTruth: 1240m GroundTruth: 1065m

Prediction: 605m Prediction: 1432m Prediction: 1596m Prediction: 1767m


GroundTruth: 133m GroundTruth: 837m GroundTruth: 1012m GroundTruth: 138m

Prediction: 1262m Prediction: 1082m Prediction: 1191m Prediction: 3651m


GroundTruth: 1259m GroundTruth: 1060m GroundTruth: 1091m GroundTruth: 3587m

Figure 4. Quantified visibility estimation of DQVENet.

50 DQVE Prediction
Ground Truth
40
Visibility(km)

30

20

10

0
0 200 400 600 800 1000
Index of testing images
Figure 5. Quantified visibility estimation of DQVENet vs. groundtruth observation.

Furthermore, Figure 6 shows the correlation analysis of DQVENet. Within this scatter
figure, the DQVENet Prediction is treated as the x-axis, and the groundtruth observation is
the y-axis. The more dots converge at the middle red line, the better the performance of
the method is. Nevertheless, we cannot restrict the scatters to lie in a fixed gap shown in
the following Figure 6a. Instead, a proportional gap, as in the following Figure 6b, might
Atmosphere 2023, 14, 61 11 of 15

be more feasible. From this figure, the DQVENet is capable of describing the tendency of
visibility estimations.

40 (a) 40 (b)
35 35
Grouht Truth (km) 30 30

Grouht Truth (km)


25 25
20 20
15 15
10 10
5 5
00 10 20 30 40 00 10 20 30 40
DQVE Prediction (km) DQVE Prediction (km)
Figure 6. Correlation analysis of the proposed DQVENet; (a) hard gap. (b) soft gap.

Generally, for meteorological forecasting or machine learning-based regression, the


Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Bias Error, and Correlation
are supposed to be significant indicators. Nevertheless, for quantified visibility estimation,
the former three criterions, i.e., MAE, RMSE, and Bias, are not suitable. MAE, RMSE, or Bias
for different visibility groundtruths are not comparable. In other words, if the groundtruth
is 20,000 m, the estimated results within (19,000 m, 21,000 m) are acceptable for a traffic
manager. However, if the groundtruth is 1000 m, the desired results should be in the
range (900 m, 1100 m). As a result, we put these criterions aside and apply Correlation
as a basic criterion due to the fact that correlations are overall statistical results instead of
point-to-point comparisons. Suppose the predicted visibility is Yp and the groundtruth
visibility is Y, then it is defined as

E[(Y − µY )(y p − µYp )]


Corr (Y, Yp ) = (14)
σY σYp

The larger the correlation coefficient, the better the model performance. The correlation
coefficient of DQVENet is 0.7237, demonstrating that DQVENet is competent to quantify
visibility estimation.

6. Discussion
For further illustrating the effectiveness of DQVENet, this section compares DQVENet
with several state-of-the-art methods, including the ResNet family [54] and the EfficientNet
family [57]. The classification accuracy is discussed.
To demonstrate the performance of the estimation method, we define the classification
accuracy as

N (|Yp − Y | < Y × t)
Acct = × 100% (15)
N (Y )

Here, Yp and Y denote the predicted and groundtruth visibility, respectively. | · | denotes
the absolute value, N (·) represents the number. 0 < t < 1 is a threshold that can be defined
arbitrarily, and Acct is the accuracy according to t.
In other words, the classification accuracy under threshold t is the ratio between the
number of samples that lie in the interval [Y − Y × t, Y + Y × t] and the total number of
samples. We denote this accuracy as a soft gap. This soft gap differs from the hard gap,
Atmosphere 2023, 14, 61 12 of 15

which splits visibilities hardly into intervals such as [0–100 m], [100–500 m], etc., due to the
fact that this paper focuses on quantified visibility estimation instead of classification. More
specifically, if the groundtruth visibility is 200 m, the estimated 100 m will be regarded
as better than 200.1 m for the hard gap. Nevertheless, within this soft partition, 200.1 m
will be treated as a preferred estimation than 100 m. Detailed results can be found in the
following Figure 7.

DQVE
EFFICIENTNET_B0
EFFICIENTNET_B1
0.8 EFFICIENTNET_B2
EFFICIENTNET_B4
RESNET101
RESNET152
0.6 RESNET50
Accuracy

0.4

0.2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9


Threshold
Figure 7. Classification accuracy of the proposed DQVENet and other compared methods under
different thresholds.

From this figure, the classification accuracy of all compared methods increases with
the threshold being larger. In general, DQVENet outperforms the ResNet family (including
ResNet-50, ResNet-101, and ResNet-152) and the EfficientNet family (including EfficientNet-
B0, EfficientNet-B1, EfficientNet-B2, and EfficientNet-B4), illustrating that DQVENet is a
feasible method for quantified visibility estimation.

7. Conclusions
In this paper, we propose a framework (DQVENet) for quantified visibility estimation
based on deep learning. Specifically, the physical constraint is employed as the guideline of
network design. Under this framework, a transmission estimation module, a depth estima-
tion module, and an extinction coefficient estimation module are introduced. Furthermore,
a new dataset named QVEData, that is especially collected for traffic image-based quanti-
fied visibility estimation, is proposed. Experimental results on this dataset demonstrate
the effectiveness of DQVENet. This framework is flexible so that the three modules can be
replaced by any other backbones. We should also note that DQVENet tries to integrate me-
teorological priors into deep learning network design, which might be a promising avenue
for future research. Our future work will focus on digging deeper into interdisciplinary
theories between them.

Author Contributions: Conceptualization, F.Z. and T.Y.; methodology, T.Y. and F.Z.; software, K.W.;
validation, T.Y.; formal analysis, Y.H.; investigation, Z.L.; resources, Q.K.; data curation, Y.C.; writing—
original draft preparation, T.Y.; visualization, F.Z.; supervision, T.Y.; project administration, T.Y.;
funding acquisition, T.Y. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded in part by the Natural Science Foundation of China under grant
number 62106270 and in part by the Application of FY-4B for Highway Traffic Meteorological Service
under grant number FY-APP-2021.0111.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Atmosphere 2023, 14, 61 13 of 15

Data Availability Statement: QVEData will be released at https://github.com/Tsingzao/DQVENet,


accessed on 28 November 2022 for scientific purpose once the manuscript is published.
Conflicts of Interest: The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

DQVENet Deep Quantified Visibility Estimation Network


QVEData Quantified Visibility Estimation Dataset
TEM Transmission Estimation Module
DEM Depth Estimation Module
E3M Extinction coEfficient Estimation Module
ResNet Residual Network

References
1. Ding, J.; Zhang, G.; Wang, S.; Xue, B.; Yang, J.; Gao, J.; Wang, K.; Jiang, R.; Zhu, X. Forecast of Hourly Airport Visibility Based on
Artificial Intelligence Methods. Atmosphere 2022, 13, 75. [CrossRef]
2. Zhang, Y.; Wang, Y.; Zhu, Y.; Yang, L.; Ge, L.; Luo, C. Visibility Prediction Based on Machine Learning Algorithms. Atmosphere
2022, 13, 1125. [CrossRef]
3. Gueymard, C.A. Visibility estimates from atmospheric and radiometric variables using artificial neural networks. Air Pollut.
XXV 2017, 211, 129.
4. Long, Q.; Wu, B.; Mi, X.; Liu, S.; Fei, X.; Ju, T. Review on Parameterization Schemes of Visibility in Fog and Brief Discussion of
Applications Performance. Atmosphere 2021, 12, 1666. [CrossRef]
5. Cordeiro, F.M.; França, G.B.; de Albuquerque Neto, F.L.; Gultepe, I. Visibility and Ceiling Nowcasting Using Artificial Intelligence
Techniques for Aviation Applications. Atmosphere 2021, 12, 1657. [CrossRef]
6. Yu, T.; Kuang, Q.; Hu, J.; Zheng, J.; Li, X. Global-similarity local-salience network for traffic weather recognition. IEEE Access
2020, 9, 4607–4615. [CrossRef]
7. Hautiére, N.; Babari, R.; Dumont, É.; Brémond, R.; Paparoditis, N. Estimating meteorological visibility using cameras: A
probabilistic model-driven approach. In Proceedings of the Asian Conference on Computer Vision, Queenstown, New Zealand,
8–12 November 2010; pp. 243–254.
8. Varjo, S.; Hannuksela, J. Image based visibility estimation during day and night. In Proceedings of the Asian Conference on
Computer Vision, Singapore, 1–5 November 2014; pp. 277–289.
9. Li, Q.; Xie, B. Visibility estimation using a single image. In Proceedings of the CCF Chinese Conference on Computer Vision,
Tianjin, China, 11–14 October 2017; pp. 343–355.
10. Li, Q.; Li, Y.; Xie, B. Single image-based scene visibility estimation. IEEE Access 2019, 7, 24430–24439. [CrossRef]
11. Song, M.; Xu, H.; Liu, X.F.; Li, Q. Visibility Estimation via Deep Label Distribution Learning. J. Cloud Comput. 2021, 10, 46.
[CrossRef]
12. Xun, L.; Zhang, H.; Yan, Q.; Wu, Q.; Zhang, J. VISOR-NET: Visibility Estimation Based on Deep Ordinal Relative Learning under
Discrete-Level Labels. Sensors 2022, 22, 6227. [CrossRef]
13. He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010,
33, 2341–2353.
14. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017,
60, 84–90. [CrossRef]
15. Abdel-Hamid, O.; Mohamed, A.r.; Jiang, H.; Deng, L.; Penn, G.; Yu, D. Convolutional neural networks for speech recognition.
IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 1533–1545. [CrossRef]
16. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need.
Adv. Neural Inf. Process. Syst. 2017, 30 .
17. Yu, T.; Wang, L.; Guo, C.; Gu, H.; Xiang, S.; Pan, C. Pseudo low rank video representation. Pattern Recognit. 2019, 85, 50–59.
[CrossRef]
18. Sønderby, C.K.; Espeholt, L.; Heek, J.; Dehghani, M.; Oliver, A.; Salimans, T.; Agrawal, S.; Hickey, J.; Kalchbrenner, N. Metnet:
A neural weather model for precipitation forecasting. arXiv 2020, arXiv:2003.12140.
19. Yu, T.; Kuang, Q.; Zheng, J.; Hu, J. Deep precipitation downscaling. IEEE Geosci. Remote. Sens. Lett. 2021, 19, 1001405. [CrossRef]
20. Kuang, Q.; Yu, T. MetPGNet: Meteorological Prior Guided Network for Temperature Forecasting. IEEE Geosci. Remote. Sens. Lett.
2021, 19, 1004305. [CrossRef]
21. Li, S.; Fu, H.; Lo, W.L. Meteorological visibility evaluation on webcam weather image using deep learning features. Int. J. Comput.
Theory Eng. 2017, 9, 455–461. [CrossRef]
Atmosphere 2023, 14, 61 14 of 15

22. Giyenko, A.; Palvanov, A.; Cho, Y. Application of convolutional neural networks for visibility estimation of CCTV images. In
Proceedings of the 2018 International Conference on Information Networking (ICOIN), Chiang Mai, Thailand, 10–12 January
2018; pp. 875–879.
23. Palvanov, A.; Cho, Y.I. Visnet: Deep convolutional neural networks for forecasting atmospheric visibility. Sensors 2019, 19, 1343.
[CrossRef]
24. Yan, X.; Luo, Y.; Zheng, X. Weather recognition based on images captured by vision system in vehicle. In Proceedings of the
International Symposium on Neural Networks, Wuhan, China, 26–29 May 2009; pp. 390–398.
25. Lu, C.; Lin, D.; Jia, J.; Tang, C.K. Two-class weather classification. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3718–3725.
26. An, J.; Chen, Y.; Shin, H. Weather classification using convolutional neural networks. In Proceedings of the 2018 International
SoC Design Conference (ISOCC), Daegu, Korea, 12–15 November 2018; pp. 245–246.
27. Guerra, J.C.V.; Khanam, Z.; Ehsan, S.; Stolkin, R.; McDonald-Maier, K. Weather Classification: A new multi-class dataset,
data augmentation approach and comprehensive evaluations of Convolutional Neural Networks. In Proceedings of the 2018
NASA/ESA Conference on Adaptive Hardware and Systems (AHS), Edinburgh, UK, 6–9 August 2018; pp. 305–310.
28. Zhou, K.; Cheng, X.; Tan, M.; Li, H. Visibility estimation based on dark channel prior and image entropy. J. Nanjing Univ. Posts
Telecommun. (Nat. Sci. Ed.) 2016, 36, 90–95.
29. Bae, T.W.; Han, J.H.; Kim, K.J.; Kim, Y.T. Coastal Visibility Distance Estimation Using Dark Channel Prior and Distance Map
Under Sea-Fog: Korean Peninsula Case. Sensors 2019, 19, 4432. [CrossRef] [PubMed]
30. Zhang, H.; Patel, V.M. Densely connected pyramid dehazing network. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3194–3203.
31. Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image
Process. 2016, 25, 5187–5198. [CrossRef] [PubMed]
32. Ren, W.; Liu, S.; Zhang, H.; Pan, J.; Cao, X.; Yang, M.H. Single image dehazing via multi-scale convolutional neural networks. In
Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 154–169.
33. Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. An all-in-one network for dehazing and beyond. arXiv 2017, arXiv:1707.06543.
34. Ancuti, C.; Ancuti, C.O.; De Vleeschouwer, C. D-hazy: A dataset to evaluate quantitatively dehazing algorithms. In Proceedings
of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 2226–2230.
35. Ancuti, C.; Ancuti, C.O.; Timofte, R.; Vleeschouwer, C.D. I-HAZE: A dehazing benchmark with real hazy and haze-free indoor
images. In Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems, Poitiers, France,
24–27 September 2018; pp. 620–631.
36. Ancuti, C.O.; Ancuti, C.; Timofte, R.; De Vleeschouwer, C. O-haze: A dehazing benchmark with real hazy and haze-free outdoor
images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT,
USA, 18–22 June 2018; pp. 754–762.
37. Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking single-image dehazing and beyond. IEEE Trans. Image
Process. 2018, 28, 492–505. [CrossRef]
38. Ancuti, C.O.; Ancuti, C.; Sbert, M.; Timofte, R. Dense-haze: A benchmark for image dehazing with dense-haze and haze-free
images. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September
2019; pp. 1014–1018.
39. Liu, Y.; Zhu, L.; Pei, S.; Fu, H.; Qin, J.; Zhang, Q.; Wan, L.; Feng, W. From synthetic to real: Image dehazing collaborating with
unlabeled real data. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, 20–24 October 2021;
pp. 50–58.
40. Ancuti, C.O.; Ancuti, C.; Timofte, R. NH-HAZE: An image dehazing benchmark with non-homogeneous hazy and haze-free
images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA,
14–19 June 2020; pp. 444–445.
41. Zhang, X.; Dong, H.; Pan, J.; Zhu, C.; Tai, Y.; Wang, C.; Li, J.; Huang, F.; Wang, F. Learning to restore hazy video: A new real-world
dataset and a new method. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville,
TN, USA, 19–25 June 2021; pp. 9239–9248.
42. Zhang, Z.; Ma, H. Multi-class weather classification on single images. In Proceedings of the 2015 IEEE International Conference
on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 4396–4400.
43. Chu, W.T.; Zheng, X.Y.; Ding, D.S. Camera as weather sensor: Estimating weather information from single images. J. Vis. Commun.
Image Represent. 2017, 46, 233–249. [CrossRef]
44. Lin, D.; Lu, C.; Huang, H.; Jia, J. RSCM: Region selection and concurrency model for multi-class weather recognition. IEEE Trans.
Image Process. 2017, 26, 4154–4167. [CrossRef]
45. Zhao, B.; Hua, L.; Li, X.; Lu, X.; Wang, Z. Weather recognition via classification labels and weather-cue maps. Pattern Recognit.
2019, 95, 272–284. [CrossRef]
46. Narasimhan, S.G.; Wang, C.; Nayar, S.K. All the images of an outdoor scene. In Proceedings of the European Conference on
Computer Vision, Copenhagen, Denmark, 28–31 May 2002; pp. 148–162.
47. Ancuti, C.O.; Ancuti, C. Single image dehazing by multi-scale fusion. IEEE Trans. Image Process. 2013, 22, 3271–3282. [CrossRef]
Atmosphere 2023, 14, 61 15 of 15

48. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the
International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October
2015; pp. 234–241.
49. Karsch, K.; Liu, C.; Kang, S.B. Depth transfer: Depth extraction from video using non-parametric sampling. IEEE Trans. Pattern
Anal. Mach. Intell. 2014, 36, 2144–2158. [CrossRef]
50. Fu, H.; Gong, M.; Wang, C.; Batmanghelich, K.; Tao, D. Deep ordinal regression network for monocular depth estimation. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018;
pp. 2002–2011.
51. Kuznietsov, Y.; Stuckler, J.; Leibe, B. Semi-supervised deep learning for monocular depth map prediction. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6647–6655.
52. Pilzer, A.; Xu, D.; Puscas, M.; Ricci, E.; Sebe, N. Unsupervised adversarial depth estimation using cycled generative networks. In
Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 587–595.
53. Godard, C.; Mac Aodha, O.; Firman, M.; Brostow, G.J. Digging into self-supervised monocular depth estimation. In Proceedings
of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019;
pp. 3828–3838.
54. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
55. Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual u-net. IEEE Geosci. Remote. Sens. Lett. 2018, 15, 749–753. [CrossRef]
56. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708.
57. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International
Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy