Electronics 13 02790

electronics
Article
Real-Time Deep Learning Framework for Accurate Speed
Estimation of Surrounding Vehicles in Autonomous Driving
Iván García-Aguilar 1,2 , Jorge García-González 1,2 , Enrique Domínguez 1,2 , Ezequiel López-Rubio 1,2
and Rafael M. Luque-Baena 1,2, *
1 Institute of Software Technologies and Engineering (ITIS), University of Málaga, C/Arquitecto Francisco
Peñalosa, 18, 29010 Málaga, Spain; ivangarcia@uma.es (I.G.-A.); jorgegarcia@uma.es (J.G.-G.);
enriqued@uma.es (E.D.); elr@uma.es (E.L.-R.)
2 Biomedical Research Institute of Málaga (IBIMA), C/Doctor Miguel Díaz Recio, 28, 29010 Málaga, Spain
* Correspondence: rmluque@uma.es
Abstract: Accurate speed estimation of surrounding vehicles is of paramount importance for au-
tonomous driving to prevent potential hazards. This paper emphasizes the critical role of precise
speed estimation and presents a novel real-time framework based on deep learning to achieve this
from images captured by an onboard camera. The system detects and tracks vehicles using convolu-
tional neural networks and analyzes their trajectories with a tracking algorithm. Vehicle speeds are
then accurately estimated using a regression model based on random sample consensus. A synthetic
dataset using the CARLA simulator has been generated to validate the presented methodology. The
system can simultaneously estimate the speed of multiple vehicles and can be easily integrated into
onboard computer systems, providing a cost-effective solution for real-time speed estimation. This
technology holds significant potential for enhancing vehicle safety systems, driver assistance, and
autonomous driving.
Keywords: speed estimation; deep learning; convolutional neural networks; regression

Citation: García-Aguilar, I.;
García-González, J.; Domínguez, E.;
López-Rubio, E.; Luque-Baena, R.M.
Real-Time Deep Learning Framework 1. Introduction
for Accurate Speed Estimation of The impact of the actual Advanced Driver Assistance Systems (ADAS) is profound in
Surrounding Vehicles in Autonomous today’s automotive landscape. ADAS contributes to a reduction in road fatalities by mini-
Driving. Electronics 2024, 13, 2790. mizing human errors [1,2]. However, pursuing even higher levels of autonomy (specifically,
https://doi.org/10.3390/
self-driving vehicles rather than mere driver support features) remains an ambitious challenge.
electronics13142790
We must delve into the intricacies of subsystems within the perception and decision-
Academic Editors: Qiang Wang, making engines to achieve this goal. These subsystems operate under adverse conditions,
Weihong Ren and Huijie Fan including lightning, severe weather, and complex traffic scenarios. The Society of Auto-
motive Engineers (SAE) has meticulously defined six levels of driving automation. At the
Received: 21 June 2024
pinnacle of this hierarchy lies Level 5, denoting a vehicle capable of navigating all situations
Revised: 11 July 2024
without human intervention [3].
Accepted: 12 July 2024
Published: 16 July 2024
To ascend to these lofty levels of automation, robust perception algorithms and
anomaly detection methods are essential. Additionally, assimilating vast amounts of
data (coupled with continuous learning) is crucial. The journey toward Level 5 automation
remains exciting and challenging as we strive for safer, more autonomous transportation.
Copyright: © 2024 by the authors. Tracking surrounding vehicles is pivotal as a critical subsystem within perception
Licensee MDPI, Basel, Switzerland. tasks. Its significance extends across various features, including lane detection, active
This article is an open access article brake systems, and identifying hazardous maneuvers [4,5]. The field of target tracking
distributed under the terms and has undergone extensive study, particularly in the context of surveillance and defense-
conditions of the Creative Commons related applications [6–9]. Research in this area delves into complex challenges related to
Attribution (CC BY) license (https:// multi-target tracking, extent estimation, and track-before-detect methods [8,10,11]. In these
creativecommons.org/licenses/by/ scenarios, RADAR emerges as the sensor of choice.
4.0/).
Electronics 2024, 13, 2790. https://doi.org/10.3390/electronics13142790 https://www.mdpi.com/journal/electronics

Electronics 2024, 13, 2790 2 of 15
Modern automobiles come equipped with two types of automotive RADAR sen-
sors: short-range (SRR) and long-range (LRR). These sensors operate in the 24 GHz and
76–81 GHz frequency bands, respectively. While RADAR sensors remain unaffected by
adverse weather conditions such as lightning, fog, and rain, their performance can be sig-
nificantly compromised. The prevalence of metallic surfaces in automotive environments
leads to cluttered signals and low signal-to-noise ratios (SNR).
As an appealing alternative to RADAR, LiDAR sensing offers several advantages.
LiDAR is less susceptible to clutter noise and provides substantially higher resolution.
However, the current production costs of LiDARs render them prohibitively expensive
for most vehicles. Additionally, integrating LiDAR data necessitates specialized graphical
processing [12].
In response to these challenges, windshield-installed cameras are gaining popularity.
These cameras capture images that are subsequently processed to identify lane markings,
pedestrians, and speed limits on traffic signs [13]. Moreover, user experience design
increasingly relies on vision-based stimuli rather than traditional RADAR information [14].
In the dynamic realm of automobile technology, the domain of target tracking within
video sequences remains surprisingly underexplored. Despite the growing popularity of
camera-based algorithms, the intricacies of video-based target tracking have not received
the attention they deserve. This gap in research motivated our investigation into camera-
based relative speed estimation.
Our innovative approach centers around a single windshield-mounted camera, which
captures images of the surrounding environment. An object detector then meticulously
processes these images, allowing us to extract precise vehicle bounding boxes. The centroids
of these bounding boxes serve as key reference points for tracking the vehicles as they move
through the scene. By analyzing changes in their size over time, we can infer the relative
velocity of the tracked vehicles. Remarkably, this entire process occurs autonomously,
without any human intervention.
Our contribution to the field is significant: we present a video-based relative speed
estimation solution. What sets our approach apart is its adaptability. We allow flexibility in
choosing the vehicle detection algorithm, ensuring compatibility with various scenarios
and sensor setups. Additionally, our method exhibits resilience against unwanted outliers
and inaccuracies thanks to our utilization of robust estimation techniques.
In summary, our work bridges a critical gap in the study of target tracking, unlocking
new possibilities for safer and more efficient automotive systems.
The remainder of this paper is structured as follows. Section 2 details the state of
the art and related works in the field of the presented methodology. Section 3 introduces
the convolutional neural networks for detection, the regression models, and the proposed
methodology outlined in this article. Section 4 describes the experiments and results, includ-
ing the selected dataset, the metrics used for evaluation, and the quantitative and qualitative
results. Section 5 discusses the results. Finally, Section 6 comprises the conclusions and
future directions.
2. Related Works
The first step of this proposal is to detect vehicles within an image obtained by an
onboard camera. Generic object detection is a long-standing problem that has been widely
explored in computer vision. Classical object detection methods using manually designed
features could be used, but models based on deep convolutional neural networks trained
to infer features are faster and more accurate automatically. These methods can be divided
into two main paradigms: two-stage detection models and one-stage detection models.
Methods from the first set first propose region candidates where there could be an object
and then apply an object classifier model to determine if there is any object and its class.
Examples are CenterNet [15], R-CNN [16], Fast R-CNN [17], Faster R-CNN [18], and
EfficientNet [19]. Although the accuracy of these methods is usually high, their two-step
paradigm is computationally demanding and requires more time to perform the detection.
Electronics 2024, 13, 2790 3 of 15
The reason for discarding them is the time-dependent nature of the proposal. Two-stage
models include SSD (single-shot multi-box detector) and YOLO (you only look once). These
models create candidate regions and their detected objects within a single process flow,
avoiding unnecessary calculations. Usually, they are faster than two-stage models but with
lower accuracy or more instability. However, YOLO’s latest proposals are not only fast but
also reliable. This makes them the default candidate when it comes to real-time applications.
After detecting vehicles, our proposal must track them for a long time. The same
vehicle in consecutive video frames must be identified as the same vehicle. This problem
has been studied within computer vision for surveillance tasks for decades. Although deep
learning solutions for this problem are viable, classical methods are also competitive ways
to solve it. The SORT algorithm tracks multiple objects simultaneously and online using
state estimation techniques and data association. DeepSORT is an extension that also uses
appearance features obtained via deep learning. Other options like [20] use backtracking to
refine anomalies. Combining DeepSORT, YOLO, and non-maximum-suppression is used
by [21] to filter trajectories.
Speed estimation is the last step once vehicles are detected and properly tracked.
Several ways of dealing with this problem have been proposed when working with static
cameras. Homographic transformations based on manually selected reference points to
correct the camera’s perspective are widely used to correctly measure distances properly
and, thus, speeds [22]. Ref. [23] uses evolutionary algorithms to align points between
planes and adapt to perspective changes. Neural networks are also used to measure speed.
Ref. [24] proposes the creation of a perspective transformation based on a geometry of
vanishing points. In that way, from 2D, bounding boxes can be extended to 3D to obtain
speeds. When the camera is not static, the problem is more complex since most options
applied to the static problem are not useful. This is why, when working with measures
from a moving vehicle, more specific hardware like RADAR and LiDAR is usually required.
Ref. [25] summarizes the general approach to vehicle speed estimation from static traffic
or speed cameras. The general approach implies obtaining a relation between pixels and
meters, usually obtained from camera intrinsic and extrinsic parameters in combination
with terrain information. Ref. [26] is an example of this approach.
Another typical approach is to rely on specific hardware such as LiDAR to perform
the speed estimation [27–29]. Even if these approximations are the most accurate and have
been extensively studied in the literature, their major problem is that they rely on hardware
systems useful only for measuring distances and with a high monetary cost.
Synthetic datasets have become increasingly prevalent in validating vehicle detection
and speed estimation systems. The CARLA simulator, as discussed by Dosovitskiy et al. [30],
has provided a versatile platform for generating realistic driving scenarios, enabling extensive
testing and validation of autonomous driving algorithms. This approach has been adopted by
several studies, such as Ros et al. [31] and Gaidon et al. [32], to augment training data and
enhance model performance.
3. Models and Methods

In this section, the proposed methodology is denoted as RT-VE (real-time vehicle esti-
mation) is described in detail. The provided method is shown in Figure 1. The methodology
is designed to accurately estimate the speed of surrounding vehicles in real time, leveraging
deep learning techniques. Initially, images captured by an onboard camera are used as
input. Convolutional neural networks (CNNs) are employed to detect and track vehicles
within these images. The detected trajectories are then analyzed using a robust tracking
algorithm. Vehicle speeds are estimated using a regression model based on random sample
consensus (RANSAC). A synthetic dataset has been generated to validate the presented
methodology, ensuring the system’s reliability and accuracy. The framework is capable of
simultaneously estimating the speed of multiple vehicles and can be seamlessly integrated
into onboard computer systems, offering a cost-effective solution for real-time speed esti-
mation. This technology has significant potential applications in vehicle safety systems,
Electronics 2024, 13, 2790 4 of 15
driver assistance, and autonomous driving. Below, each of the components comprising the
proposed methodology is detailed.
Figure 1. Flow diagram of the proposed approach.
3.1. Convolutional Neural Networks for Object Detection

The object detection step is the most computationally heavy of all the processes in
the proposed methodology. To achieve real-time, the YOLO model was selected following
other state-of-the-art proposals, such as [33]. Identifying vehicles whose speed must be
estimated requires the application of an object detection model in real time. For this
purpose, YOLOv7x (you only look once version 7X) [34] has been utilized. Nevertheless,
the methodology presented is not exclusively dependent on this specific model, allowing for
the potential use of other models. YOLOv7x represents a significant advancement within
the YOLO model family, renowned for its exceptional accuracy and real-time processing
capabilities. Starting from a pre-trained state with the COCO dataset [35], this model has
demonstrated the ability to rapidly and efficiently identify and locate vehicles in high-traffic
areas under a wide range of conditions.
The operation of the YOLOv7x model is predicated on dividing the input image
into a grid. Subsequently, each cell within the grid simultaneously predicts a series of
bounding boxes and the probability of the class to which each detected object belongs. The
architecture of this model is meticulously designed for real-time processing, enabling high-
speed detection of numerous objects in a single pass. This capability is crucial for scenarios
that demand immediate and precise object detection, such as autonomous driving systems.
Integrating this model with a tracking algorithm makes it feasible to uniquely identify
and locate vehicles in each frame of the video sequence under evaluation. This integration
is essential for determining the position of vehicles over time, thereby establishing the
foundation for the subsequent speed estimation phase. The speed estimation is based on
analyzing the temporal changes in the bounding box that encloses each vehicle. As the
bounding box’s dimensions and position evolve over consecutive frames, the tracking
algorithm provides the necessary data to accurately calculate the vehicle’s speed.
Electronics 2024, 13, 2790 5 of 15
3.2. Regression Models

The random sample consensus algorithm (RANSAC) [36] is one of the most robust
methods for estimating model parameters from a dataset containing a significant proportion
of outliers. Unlike several traditional techniques, which outliers can highly influence,
RANSAC performs an iterative selection of randomly generated subsets, fitting the model.
It then performs a consensus counting of the number of outliers, determined as those
points that fit well within a specified margin of error. In this way, it is possible to identify
the model that best represents most of the data, thus effectively minimizing the impact
of outliers.
In scenarios based on driver support systems, the data captured by onboard cameras
may contain several inaccuracies due to factors such as changing lighting conditions and
occlusions, which can influence the determination of the exact location of vehicles, thus
introducing some outliers into the dataset. This can significantly bias the results. By
employing RANSAC, this estimation process becomes more robust in obtaining reliable
estimates. Although RANSAC is suitable for this application, the use of other regression
models is not mutually exclusive. The methodology is designed to be flexible, allowing
the substitution of such a model for a more appropriate one according to the scope of
the application.
3.3. Methodology
This section outlines our approach to identifying vehicles moving at dangerous speeds.
Prior to deployment in real-world scenarios, the methodology underwent extensive testing
on a computer platform. Synthetic video data were utilized to simulate diverse driving
scenarios within the CARLA simulator, a tool designed for autonomous driving research.
This simulated environment mimics urban settings with varying traffic dynamics. The
processing was conducted under conditions similar to those expected in deployment, lever-
aging the computer’s higher computational capabilities to facilitate parallel testing and
model refinement. This approach ensured that the detection and tracking algorithms were
thoroughly validated and optimized before transitioning to field trials and actual deploy-
ment scenarios. Initially, we utilize a deep convolutional neural network to perform object
detection on each video frame, producing a list of detected vehicles and their bounding
boxes for each frame. Subsequently, we employ an object-tracking method to ascertain
the trajectories of the vehicles. A trajectory is considered as a sequence of a vehicle’s posi-
tions across video frames, marked by the corresponding bounding boxes, acknowledging
that some intermediate frames may lack detections if the vehicle was not identified in
those frames.
Let t denote the current frame index within the acquired video sequence. Additionally,
let δ represent the angular diameter of an object, expressed in radians, i.e., the apparent
diameter. Then we have the following:
d
δ = 2 arctan (1)
2D
where D represents the distance from the camera to the object of interest, and d is the actual
diameter of the object, both measured in meters. From (1), we have the following:
d
=D (2)
2 tan 2δ
In practice, the approaching vehicle is visible for a brief period. Thus, it can be
assumed that its speed, v, remains constant relative to the camera during this interval:
D = e0 + vt (3)
where e0 represents the distance at time t = 0 to the other vehicle.

Electronics 2024, 13, 2790 6 of 15
Given the assumption that the speed is constant, from (2) and (3), the following can
be obtained:
d
= e0 + vt (4)
2 tan 2δ
Additionally, let us assume that the distance from the camera to the vehicle is signif-
icantly larger than the vehicle’s size, i.e., D ≫ d. Under this condition, the small angle
approximation α ≈ arctan α can be applied to (4). This results in the following:
1 e + vt
= 0 (5)
δ d
It is essential to highlight that d and e0 are assumed constant for each vehicle. Therefore,
from Equation (5), it can be obtained that the inverse of the apparent diameter 1δ is linearly
related to the time index t. A practical approach to approximating the apparent diameter δ
of a vehicle is to take the square root of the number of pixels (area) in its bounding box.
Let y denote the collected samples for the approximation, i.e., the values of 1δ . Using
this information, the slope of the line associated with Equation (5) can be computed via
linear regression. This slope defines the speed v of the incoming vehicle, relative to the
onboard camera. For each frame of the acquired video where the incoming vehicle is
detected, a sample is collected for the linear regression method. It can be assumed without
loss of generality that d = 1 in Equation (5) for the sake of performing the linear regression.
However, this implies that the relative speeds v are not calibrated.
After the linear regression is carried out, the computation of a calibration constant K
is performed by comparison of the non-calibrated speeds v with the ground truth speeds.
Once this procedure is completed, the uncalibrated speeds should be multiplied by K to
yield the calibrated estimated speeds.
The random sample consensus (RANSAC) linear regression method has been utilized
to estimate the speed v. This method automatically identifies and excludes outliers yi
within the context of a standard linear regression. The RANSAC algorithm for linear
regression includes the following steps:
1. Draw a random subset S of n data samples:
S ⊆ {( xi , yi )}iN=1 (6)
2. A linear model is fit to the following subset:
y = β0 + β1 x (7)
where the coefficients β 0 and β 1 are found by minimization:
min
β 0 ,β 1
∑ (yi − ( β 0 + β 1 xi ))2 (8)
( xi ,yi )∈S
3. Inliers are identified, i.e., data samples from the complete set that are found to lie
within a threshold ϵ of the model:
inliers = {( xi , yi ) | |yi − ( β 0 + β 1 xi )| < ϵ} (9)
4. If the inlier count |inliers| exceeds a preset threshold T, then the model must be fit
again, this time employing all inliers:
min
β 0 ,β 1
∑ (yi − ( β 0 + β 1 xi ))2 (10)
( xi ,yi )∈inliers
Electronics 2024, 13, 2790 7 of 15
5. Repeat the preceding steps 1–4 for a preset iteration count k or until convergence.
The final model is the best one that has been encountered through the execution of
the loop:
ŷ = β̂ 0 + β̂ 1 x (11)
The final stage of our method involves applying a threshold to the calibrated estimated
speeds. The purpose of this procedure is to identify and flag approaching vehicles with
excessively high speeds as dangerous.
4. Experiments and Results

Below, the selected dataset is presented, along with the evaluation metrics estab-
lished to verify the robustness of the proposed methodology. Finally, the results obtained
are shown.
4.1. Dataset
CARLA [30], an open-source simulator for research in the autonomous driving envi-
ronment, has been used to generate the required data for this study. It provides a realistic
and flexible environment that allows the generation of various driving scenarios. Thus,
it offers many advantages in simulating complex urban environments, such as applying
multiple weather conditions or creating specific traffic scenarios. The simulator is an ideal
tool for generating large and customizable synthetic data sets based on this flexibility.
A 20-min driving sequence has been generated to analyze the results obtained after
applying the proposed methodology, which contains a series of vehicles circulating on the
road performing a series of maneuvers. The generated driving sequence represents a typical
urban environment with a mix of straight roads, intersections, and roundabouts to reflect
common driving conditions. The simulation was conducted under clear weather conditions
to ensure consistent data quality. The vehicles within the scenario performed a variety
of maneuvers, including acceleration, deceleration, and lane changes, to capture a broad
spectrum of driving behaviors. Each vehicle’s speed and position data were recorded at a
high frequency, providing detailed ground truth information for every frame. This detailed
data collection allows for precise validation of the speed estimation model. A single, well-
defined scenario ensures controlled conditions, minimizing external variables that could
affect the results, and allows for a focused assessment of the model’s performance.
During this simulation, the real data of the speed of each vehicle expressed in meters
per second has been recorded for each frame. This information makes it possible to
accurately validate the model by estimating the established speeds. The use of synthetic
data generated by Carla is particularly advantageous in this context because, at the moment,
there are no publicly available datasets that provide complete real scenarios with the exact
actual speeds of each vehicle and of the vehicle capturing the images at each instant.
Another advantage of using this simulator is that it allows large volumes of information
generation without the ethical concerns associated with real-world data collection. An
example of one of the frames that make up the sequence is shown in Figure 2.
Electronics 2024, 13, 2790 8 of 15
Figure 2. Example of a frame generated with Carla’s simulator.
4.2. Metrics
A carefully selected set of metrics was employed to evaluate the performance of
the velocity estimation method. These metrics were chosen to provide a comprehensive
assessment of actual and estimated velocities and the accuracy and consistency of the
model predictions.
The mean speed metric serves as a baseline measure of the actual and estimated
speeds, helping contextualize the model’s performance. Alongside this, the median speed
offers a robust measure of central tendency less sensitive to outliers than the mean, giving a
more accurate representation of typical speeds. The standard deviation of both the real and
estimated speeds is used to indicate variability, allowing us to see the range and distribution
of the speeds. This helps understand the consistency and spread of the speed values in
the dataset.
Regarding error metrics, the mean absolute error provides a straightforward measure
of prediction accuracy by indicating the average error magnitude in the model’s speed
estimations. Complementing this, the mean squared error emphasizes more significant
errors due to the squaring process, offering a more sensitive measure of prediction accuracy.
Similarly, the median absolute error provides a robust measure of prediction accuracy
less affected by outliers, indicating the typical prediction error. While similar to the mean
squared error, the median squared error is more robust to outliers, providing a balanced
measure of more significant errors.
Additionally, the standard deviation of error indicates the variability in the prediction
errors, which helps understand the consistency of the model’s performance. Finally, the
coefficient of determination measures how well the estimated speeds approximate the real
speeds, with a value closer to 1 indicating a better fit for the model.
4.3. Results
This section presents a detailed analysis of the speed estimation for several vehicles.
Tests have been conducted using synthetic data on a high-performance computer to evaluate
the accuracy and performance of the detection model. The experiments were performed on
a system with an NVIDIA (Santa Clara, CA, USA) GeForce RTX 3080 GPU and 64 GB of
RAM, ensuring robust computational capabilities. Despite these tests being conducted in a
controlled, high-resource environment, the detection model has been specifically adapted
for real-time execution on low-cost devices embedded directly within the vehicle. This
adaptation is crucial for practical deployment scenarios where computational resources are
limited. The decision to utilize YOLO as the detection algorithm stems from its efficiency
Electronics 2024, 13, 2790 9 of 15
and effectiveness in real-time object detection, making it well-suited for implementation on

embedded systems.
The experiments utilized an object detection model to conduct the numerical studies.
The YOLO (you only look once) algorithm was employed explicitly for real-time object
detection tasks. Several assumptions and boundary conditions were established for the
experiments. Firstly, it was assumed that the synthetic data used for training and testing
accurately represented real-world scenarios, including variations in vehicle speed, lighting
conditions, and environmental factors. The boundary conditions included setting vehicles’
initial positions and speeds within a predefined range to ensure consistency across all
tests. Moreover, it was assumed that the detection model would operate under controlled
conditions, meaning that extreme weather conditions or highly irregular road surfaces were
not part of the testing scenarios. The model was also tested assuming that the input data had
minimal noise, although some degree of noise and anomalies were considered to evaluate
the model’s robustness. Finally, the model’s performance was assessed, assuming it would
be deployed on low-cost devices with limited computational resources, which guided the
choice of the YOLO algorithm for its efficiency and effectiveness in real-time processing.
The analysis compares the ground truth (GT) speeds with the estimated speeds, and
various statistical measures are computed, including the absolute error, mean squared error
(MSE), median absolute error, median squared error, and coefficient of determination (R2 ).
Table 1 provides a summary of the statistical measures for each vehicle.
Table 1. Comparison of real and estimated vehicle speeds (m/s) with error metrics for several vehicles.
Real Estimated Absolute Mean Median Median Coefficient of

Car
Speed Speed Error Squared Absolute Squared Determination
Error Error Error
10 10.09 ± 5.86 11.05 ± 7.21 3.29 31.59 2.08 4.34 0.08
22 11.68 ± 5.21 11.73 ± 6.64 2.64 27.49 1.12 1.24 −0.01
50 10.78 ± 4.79 11.12 ± 6.47 3.11 27.04 1.63 2.65 −0.18
53 6.75 ± 3.69 6.39 ± 4.19 2.81 21.46 1.35 1.82 −0.58
56 2.09 ± 2.06 2.90 ± 2.12 2.26 11.85 1.19 1.42 −1.80
59 7.86 ± 4.35 8.53 ± 6.51 3.18 28.21 1.71 2.93 −0.49
60 6.90 ± 6.15 7.40 ± 6.88 2.60 23.38 0.55 0.30 0.38
82 5.94 ± 3.31 6.65 ± 3.82 2.43 26.56 0.96 0.92 −1.42
122 11.08 ± 5.56 10.97 ± 6.81 2.32 17.34 1.22 1.50 0.44
125 8.53 ± 5.39 9.55 ± 6.42 3.21 30.34 1.80 3.23 −0.04
143 11.34 ± 6.56 12.02 ± 7.07 2.70 20.94 1.42 2.01 0.51
148 5.28 ± 4.31 4.81 ± 4.68 1.73 9.45 0.78 0.61 0.49
157 9.60 ± 4.22 9.47 ± 4.88 2.05 12.74 1.11 1.23 0.29
166 6.63 ± 2.73 7.01 ± 4.75 2.29 25.35 0.74 0.54 −2.40
176 9.67 ± 4.24 9.78 ± 6.27 3.11 27.76 1.94 3.75 −0.54
The estimated speeds are generally close to the real speeds, with some deviations.
For example, vehicle 10 has an average real speed of 10.09 ± 5.86 and an estimated speed
of 11.05 ± 7.21. This indicates that the estimation method performs reasonably well in
general. According to the absolute and mean squared error, it varies among the vehicles.
For instance, vehicle 56 has a relatively high Absolute Error of 2.26 and an MSE of 11.85,
suggesting significant deviations in speed estimation for this vehicle.
The median errors provide additional insight into the performance of the estimation
method. Vehicle 60 shows a low median absolute error of 0.55 and a median squared error
of 0.30, indicating accurate estimation for the majority of the data points.
Finally, the coefficient of determination varies significantly, with several vehicles
showing positive R2 values, indicating a good fit, while others have negative values,
suggesting poor performance in those cases. For instance, vehicle 166 has an R2 of −2.40,
indicating a poor fit.
Electronics 2024, 13, 2790 10 of 15
Figures 3–6 provide a visual representation of the speeds for selected vehicles:
Figure 3. Comparison between actual and estimated speeds for the vehicle with ID 122.
The speeds for vehicle 122 are shown in Figure 3. There is a good alignment between
the GT and estimated speeds, although some discrepancies are observed, particularly in
the middle section of the frame range.
Figure 4 depicts the speeds for vehicle 148. The GT and estimated speeds show a high
level of correspondence throughout the frames, with minor deviations. The plot suggests
that the estimation method performs well for this vehicle.
Electronics 2024, 13, 2790 11 of 15
Figure 5 illustrates the speeds for vehicle 157. The GT and estimated speeds are
closely aligned for the majority of the frames. However, there are spikes in the estimated
speed that do not correspond to the GT speed, indicating some outliers or noise in the
estimation process.
In Figure 6, the GT speed (red) and the estimated speed (green) for vehicle 166 are plot-
ted against the frame number. The plot shows periods of close alignment between the real
and estimated speeds, with occasional significant deviations, particularly at higher speeds.
In Figure 7, an example is presented in which the speed of a vehicle is estimated. This
illustration highlights the methodology employed to calculate the velocity, providing a
clear demonstration of the estimation process in practical application.
Electronics 2024, 13, 2790 12 of 15
Figure 7. Example of speed estimation for two specific vehicles.
5. Discussion
Analyzing the speed estimation for several vehicles reveals several key insights and
areas for improvement. This section discusses the implications of the statistical results and
figures, as well as potential factors influencing the performance of the estimation method.
One of the most prominent observations from the results is the variability in the esti-
mation performance across different vehicles. While the mean estimated speeds generally
align closely with the real speeds, the errors and the coefficient of determination (R2 ),
values exhibit considerable variation. Vehicles such as 122 and 143 show relatively high R2
values (0.44 and 0.51, respectively), indicating a good fit between the real and estimated
speeds. Conversely, vehicles like 166 and 56 have significantly negative R2 values (−2.40
and −1.80, respectively), suggesting poor performance.
The variability in performance can be attributed to several factors:
• Vehicle speed dynamics: Vehicles with more dynamic and variable speeds may pose a
greater challenge for the estimation method, leading to higher errors. This is evident
in vehicles like 56, with high absolute and mean squared errors.
• Sensor and environmental factors: The quality of sensor data and environmental
conditions such as lighting, weather, and road conditions can impact speed estima-
tion accuracy. Variations in these factors across different vehicles and frames may
contribute to discrepancies in the results.
• Algorithm limitations: The underlying assumptions and limitations of the estimation
algorithm itself could result in varying accuracy. For example, if the algorithm is more
sensitive to specific speed ranges or vehicle types, this could explain the observed
performance differences.
According to the error analysis metrics, absolute, mean squared error (MSE), median
absolute, and median squared error provide a comprehensive picture of the estimation
accuracy. Vehicles with lower errors, such as 148 and 60, indicate that the estimation
method is highly reliable for those cases. In contrast, vehicles like 166 and 56, with higher
errors, highlight potential outliers or specific scenarios where the method struggles.
• Absolute and median errors: Lower median errors in vehicles such as 60 (Median
absolute error of 0.55) suggest that the estimation method performs consistently well
for most frames. However, higher absolute errors indicate that there are instances
with significant deviations.
• Squared errors: The MSE and median squared error metrics emphasize the impact of
larger deviations. For instance, vehicle 56 has an MSE of 11.85, reflecting the influence
of a few large errors on the overall performance.
Figures provide visual evidence of the alignment and discrepancies between the
ground truth and estimated speeds, serving as a diagnostic tool to identify specific frames or
periods where the estimation method succeeds or fails. For instance, Figure 1 (Vehicle 166)
reveals both good alignment and significant deviations, particularly at higher speeds,
suggesting challenges with fluctuating speeds. Figure 2 (Vehicle 157) indicates generally
Electronics 2024, 13, 2790 13 of 15
good performance with occasional outliers. Figure 3 (Vehicle 148) demonstrates high
effectiveness with minor deviations, while Figure 4 (Vehicle 122) shows good overall
alignment but highlights specific segments with potential challenges. These visual insights
are crucial for understanding and improving the estimation method’s performance.
The discrepancies between the real and predicted values can be attributed to several
factors. One primary factor is vehicle speed’s dynamic and variable nature, which can pose
a significant challenge for the estimation model. Vehicles that exhibit abrupt or frequent
speed changes, as observed in the cases of vehicles 56 and 166, tend to generate higher
errors due to the model’s difficulty in quickly adapting to these changes. Additionally,
environmental conditions play a crucial role. Factors such as lighting, weather conditions,
and road surface quality can impact the accuracy of the input data, leading to discrepancies
in the estimated speeds. The algorithm’s capacity to handle noise or anomalies in the data
is also limited; spikes or deviations in the estimated speeds, as seen in vehicle 157, suggest
the presence of noise or anomalous data that the model fails to filter adequately. Finally,
inherent limitations of the estimation algorithm, especially if it is not optimized for all
speed ranges or vehicle types, can contribute to variability in performance.
Based on the analysis, several recommendations can be made to enhance the speed
estimation method:
• Algorithm refinement: Further refining the estimation algorithm to better accommo-
date dynamic speed changes and reduce sensitivity to outliers could improve accuracy.
Incorporating machine learning techniques to adapt to different vehicle dynamics
might be beneficial.
• Error handling and correction: Implementing error correction mechanisms, such as
filtering techniques to smooth out spikes or anomalies in the estimated speeds, could
enhance overall performance.
• Contextual adaptation: Developing context-aware algorithms that adjust their param-
eters based on real-time conditions (e.g., road type, traffic density) could lead to more
accurate and reliable speed estimation.
6. Conclusions and Future Work

In this work, based on vehicles detected from a single onboard camera, the speeds of
those other vehicles are estimated using a linear regression model. The proposal requires no
other input or hardware to measure distance or relative speed, so we consider it a promising
research line to obtain an alternative to other highly costly hardware such as LIDAR.
Experiments have been carried out using data obtained through the CARLA simulator.
This approach was selected due to the difficulty of obtaining actual data with ego and
relative velocity for all vehicles. Experiments show that the proposal works within a
reasonable error margin, given the limited information used to obtain the estimation.
As stated, the proposal is based on detections made using an object detection model.
Improvements in this key element would imply greater stability in tracking the vehicles
and their speeds. Also, including error handling during the tracking phase could lead to
fewer anomalies in the speed estimation. Other techniques to detect and smooth those
anomalies could be explored. However, anomaly detection needs to be online to work in
real-time as a drive assistant system, and very few samples after the anomaly event are
required not to slow the detection of genuine speed spikes.
Author Contributions: Conceptualization, E.L.-R. and R.M.L.-B.; methodology, I.G.-A., J.G.-G. and
R.M.L.-B.; software, E.D.; validation, I.G.-A., J.G.-G. and E.L.-R.; formal analysis, E.L.-R.; investigation,
I.G.-A., J.G.-G., E.D., E.L.-R. and R.M.L.-B.; resources, E.D.; data curation, I.G.-A.; writing—original
draft preparation, I.G.-A., J.G.-G., E.D., E.L.-R. and R.M.L.-B.; writing—review and editing, I.G.-A.
and E.L.-R.; visualization, I.G.-A. and J.G.-G.; supervision, E.D., E.L.-R. and R.M.L.-B.; project
administration, R.M.L.-B.; funding acquisition, E.L.-R. All authors have read and agreed to the
published version of the manuscript.
Electronics 2024, 13, 2790 14 of 15
Funding: This work is partially supported by the Ministry of Science and Innovation of Spain,
grant number PID2022-136764OA-I00, project name Automated Detection of Non-Lesional Focal
Epilepsy by Probabilistic Diffusion Deep Neural Models. It includes funds from the European
Regional Development Fund (ERDF). It is also partially supported by the University of Málaga
(Spain) under grants B1-2022_14, project name Detección de trayectorias anómalas de vehículos
en cámaras de tráfico; and, by the Fundación Unicaja under project PUNI-003_2023, project name
Intelligent System to Help the Clinical Diagnosis of Non-Obstructive Coronary Artery Disease in
Coronary Angiography.
Data Availability Statement: The data supporting this study are publicly available in the GitHub
repository at https://github.com/IvanGarcia7/SpeedCARLADataset (accessed on 15 July 2024).
Acknowledgments: The authors thankfully acknowledge the computer resources, technical expertise,
and assistance provided by the SCBI (Supercomputing and Bioinformatics) Center of the University
of Málaga. They also gratefully acknowledge the support of NVIDIA Corporation with the donation
of an RTX A6000 GPU with 48 Gb. The authors also thankfully acknowledge the grant of the
Universidad de Málaga and the Instituto de Investigación Biomédica de Málaga y Plataforma en
Nanomedicina-IBIMA Plataforma BIONAND.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Hamid, U.Z.A.; Zakuan, F.R.A.; Zulkepli, K.A.; Azmi, M.Z.; Zamzuri, H.; Rahman, M.A.A.; Zakaria, M.A. Autonomous
emergency braking system with potential field risk assessment for frontal collision mitigation. In Proceedings of the 2017 IEEE
Conference on Systems, Process and Control (ICSPC), Malacca, Malaysia, 15–17 December 2017 ; pp. 71–76.
2. Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A Survey of Autonomous Driving: Common Practices and Emerging
Technologies. IEEE Access 2020, 8, 58443–58469. [CrossRef]
3. J3016_202104; Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles. Society
of Automobile Engineers: Warrendale, PA, USA, 2018.
4. Tang, J.; Li, S.; Liu, P. A review of lane detection methods based on deep learning. Pattern Recognit. 2021, 111, 107623. [CrossRef]
5. Zou, Q.; Jiang, H.; Dai, Q.; Yue, Y.; Chen, L.; Wang, Q. Robust lane detection from continuous driving scenes using deep neural
networks. IEEE Trans. Veh. Technol. 2019, 69, 41–54. [CrossRef]
6. Bar-Shalom, Y.; Willett, P.K.; Tian, X. Tracking and Data Fusion; YBS Publishing: Storrs, CT, USA, 2011; Volume 11.
7. Vo, B.N.; Vo, B.T.; Phung, D. Labeled random finite sets and the Bayes multi-target tracking filter. IEEE Trans. Signal Process. 2014,
62, 6554–6567. [CrossRef]
8. McPhee, H.; Ortega, L.; Vilà-Valls, J.; Chaumette, E. Accounting for Acceleration–Signal Parameters Estimation Performance
Limits in High Dynamics Applications. IEEE Trans. Aerosp. Electron. Syst. 2022, 59, 610–622. [CrossRef]
9. Blackman, S.S. Multiple-Target Tracking with Radar Applications; Artech House, Inc.: Dedham, MA, USA, 1986.
10. Granstrom, K.; Lundquist, C.; Orguner, O. Extended target tracking using a Gaussian-mixture PHD filter. IEEE Trans. Aerosp.
Electron. Syst. 2012, 48, 3268–3286. [CrossRef]
11. Baum, M.; Hanebeck, U.D. Shape tracking of extended objects and group targets with star-convex RHMs. In Proceedings of the
14th International Conference on Information Fusion, Chicago, IL, USA, 5–8 July 2011; pp. 1–8.
12. Wang, P. Research on Comparison of LiDAR and Camera in Autonomous Driving. J. Phys. Conf. Ser. 2021, 2093, 012032.
[CrossRef]
13. Olaverri-Monreal, C.; Gomes, P.; Fernandes, R.; Vieira, F.; Ferreira, M. The See-Through System: A VANET-enabled assistant for
overtaking maneuvers. In Proceedings of the 2010 IEEE Intelligent Vehicles Symposium, La Jolla, CA, USA, 21–24 June 2010;
pp. 123–128.
14. Kato, S.; Takeuchi, E.; Ishiguro, Y.; Ninomiya, Y.; Takeda, K.; Hamada, T. An open approach to autonomous vehicles. IEEE Micro
2015, 35, 60–68. [CrossRef]
15. Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850.
16. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014;
pp. 580–587.
17. Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile,
7–13 December 2015; pp. 1440–1448.
18. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv
2015, arXiv:1506.01497.
19. Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946.
Electronics 2024, 13, 2790 15 of 15
20. Chen, J.; Ding, G.; Yang, Y.; Han, W.; Xu, K.; Gao, T.; Zhang, Z.; Ouyang, W.; Cai, H.; Chen, Z. Dual-Modality Vehicle
Anomaly Detection via Bilateral Trajectory Tracing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, Nashville, TN, USA, 1 September 2021; pp. 4011–4020.
21. Wang, L.; Lam, C.T.; Law, K.; Ng, B.; Ke, W.; Im, M. Real-Time Traffic Monitoring and Status Detection with a Multi-vehicle
Tracking System. In Proceedings of the International Conference on Intelligent Transport Systems, Indianapolis, IN, USA, 19–22
September 2021; pp. 13–25.
22. García-González, J.; Molina-Cabello, M.A.; Luque-Baena, R.M.; de Lazcano-Lobato, J.M.O.; López-Rubio, E. Road pollution
estimation from vehicle tracking in surveillance videos by deep convolutional neural networks. Appl. Soft Comput. 2021,
113, 107950. [CrossRef]
23. Mejia, H.; Palomo, E.; López-Rubio, E.; Pineda, I.; Fonseca, R. Vehicle Speed Estimation Using Computer Vision and Evolutionary
Camera Calibration. In Proceedings of the NeurIPS 2021 Workshop LatinX in AI, Virtually, 7 December 2021.
24. Kocur, V.; Ftáčnik, M. Detection of 3D bounding boxes of vehicles using perspective transformation for accurate speed
measurement. Mach. Vis. Appl. 2020, 31, 62. [CrossRef]
25. Fernández Llorca, D.; Hernández Martínez, A.; Garcia Daza, I. Vision-based vehicle speed estimation: A survey. IET Intell. Transp.
Syst. 2021, 15, 987–1005. [CrossRef]
26. Kumar, T.; Kushwaha, D.S. An Efficient Approach for Detection and Speed Estimation of Moving Vehicles. Procedia Comput. Sci.
2016, 89, 726–731. . [CrossRef]
27. Zhang, J.; Xiao, W.; Coifman, B.; Mills, J.P. Vehicle Tracking and Speed Estimation From Roadside Lidar. IEEE J. Sel. Top. Appl.
Earth Obs. Remote. Sens. 2020, 13, 5597–5608. [CrossRef]
28. Wu, J.; Zhuang, X.; Tian, Y.; Cheng, Z.; Liu, S. Real-Time Point Cloud Clustering Algorithm Based on Roadside LiDAR. IEEE
Sensors J. 2024, 24, 10608–10619. [CrossRef]
29. Gong, Z.; Wang, Z.; Yu, G.; Liu, W.; Yang, S.; Zhou, B. FecNet: A Feature Enhancement and Cascade Network for Object Detection
Using Roadside LiDAR. IEEE Sensors J. 2023, 23, 23780–23791. [CrossRef]
30. Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the
1st Annual Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017; pp. 1–16.
31. Ros, G.; Sellart, L.; Materzynska, J.; Vazquez, D.; Lopez, A.M. The SYNTHIA Dataset: A Large Collection of Synthetic Images
for Semantic Segmentation of Urban Scenes. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3234–3243. [CrossRef]
32. Gaidon, A.; Wang, Q.; Cabon, Y.; Vig, E. VirtualWorlds as Proxy for Multi-object Tracking Analysis. In Proceedings of the 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4340–4349.
[CrossRef]
33. Cheng, T.; Song, L.; Ge, Y.; Liu, W.; Wang, X.; Shan, Y. Yolo-world: Real-time open-vocabulary object detection. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle WA, USA, 17–21 June 2024; pp. 16901–16911.
34. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object
detectors. arXiv 2022, arXiv:2207.02696.
35. Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Doll’a r, P.; Zitnick, C.L. Microsoft
COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312.
36. Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and
automated cartography. Commun. ACM 1981, 24, 381–395. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Electronics 13 02790

Uploaded by

Copyright:

Available Formats

Electronics 13 02790

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Electronics 13 02790

Uploaded by

Copyright:

Available Formats

electronics

Keywords: speed estimation; deep learning; convolutional neural networks; regression

Electronics 2024, 13, 2790. https://doi.org/10.3390/electronics13142790 https://www.mdpi.com/journal/electronics

3. Models and Methods

Figure 1. Flow diagram of the proposed approach.

3.1. Convolutional Neural Networks for Object Detection

3.2. Regression Models

where e0 represents the distance at time t = 0 to the other vehicle.

2. A linear model is fit to the following subset:

where the coefficients β 0 and β 1 are found by minimization:

inliers = {( xi , yi ) | |yi − ( β 0 + β 1 xi )| < ϵ} (9)

4. Experiments and Results

Figure 2. Example of a frame generated with Carla’s simulator.

and effectiveness in real-time object detection, making it well-suited for implementation on

Real Estimated Absolute Mean Median Median Coefficient of

Figure 7. Example of speed estimation for two specific vehicles.

6. Conclusions and Future Work

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.