A Pavement Crack Detection and Evaluation Framework for a

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

applied

sciences
Article
A Pavement Crack Detection and Evaluation Framework for a
UAV Inspection System Based on Deep Learning
Xinbao Chen * , Chang Liu *, Long Chen, Xiaodong Zhu, Yaohui Zhang and Chenxi Wang

School of Earth Sciences and Spatial Information Engineering, Hunan University of Sciences and Technology,
Xiangtan 411201, China; 1901080111@mail.hnust.edu.cn (L.C.); xiaodong@mail.hnust.edu.cn (X.Z.);
eminem@mail.hnust.edu.cn (Y.Z.); 2101080215@mail.hnust.edu.cn (C.W.)
* Correspondence: xchen@hnust.edu.cn (X.C.); liuchang@mail.hnust.edu.cn (C.L.); Tel.: +86-18670925719 (X.C.);
+86-18845160205 (C.L.)

Abstract: Existing studies often lack a systematic solution for an Unmanned Aerial Vehicles (UAV)
inspection system, which hinders their widespread application in crack detection. To enhance its
substantial practicality, this study proposes a formal and systematic framework for UAV inspection
systems, specifically designed for automatic crack detection and pavement distress evaluation. The
framework integrates UAV data acquisition, deep-learning-based crack identification, and road
damage assessment in a comprehensive and orderly manner. Firstly, a flight control strategy is
presented, and road crack data are collected using DJI Mini 2 UAV imagery, establishing high-quality
UAV crack image datasets with ground truth information. Secondly, a validation and comparison
study is conducted to enhance the automatic crack detection capability and provide an appropriate
deployment scheme for UAV inspection systems. This study develops automatic crack detection
models based on mainstream deep learning algorithms (namely, Faster-RCNN, YOLOv5s, YOLOv7-
tiny, and YOLOv8s) in urban road scenarios. The results demonstrate that the Faster-RCNN algorithm
achieves the highest accuracy and is suitable for the online data collection of UAV and offline
inspection at work stations. Meanwhile, the YOLO models, while slightly lower in accuracy, are the
fastest algorithms and are suitable for the lightweight deployment of UAV with online collection and
real-time inspection. Quantitative measurement methods for road cracks are presented to assess road
Citation: Chen, X.; Liu, C.; Chen, L.; damage, which will enhance the application of UAV inspection systems and provide factual evidence
Zhu, X.; Zhang, Y.; Wang, C. A for the maintenance decisions made by road authorities.
Pavement Crack Detection and
Evaluation Framework for a UAV Keywords: road cracks; UAV; deep learning; target detection; road damage evaluation; framework
Inspection System Based on Deep
Learning. Appl. Sci. 2024, 14, 1157.
https://doi.org/10.3390/
app14031157
1. Introduction
Academic Editor: Luís Picado Santos Roads are one of the crucial transportation infrastructures that deteriorate over time,
Received: 13 December 2023
due to factors such as heavy vehicles, changing weather conditions, human activity, and
Revised: 20 January 2024 the use of inferior materials. This deterioration impacts economic development, travel
Accepted: 27 January 2024 safety, and social activities [1]. Therefore, it is crucial to periodically assess the condition of
Published: 30 January 2024 roads to ensure their longevity and safety. Additionally, it is imperative to accurately and
promptly identify road damage, especially cracks, in order to prevent further deterioration
and enable timely repairs.
Currently, pavement condition inspection technologies mainly include traditional
Copyright: © 2024 by the authors. manual measurements and automatic distress inspections, such as vehicle-mounted in-
Licensee MDPI, Basel, Switzerland. spection [2]. Manual inspection methods rely primarily on visual discrimination, requiring
This article is an open access article personnel to travel along roads to identify damage points. However, this approach is slow,
distributed under the terms and
laborious, subjective, time-consuming, and has a lower accuracy [3]. Therefore, the devel-
conditions of the Creative Commons
opment of automatic inspection technologies is crucial for quickly and accurately detecting
Attribution (CC BY) license (https://
and identifying cracks on the road. In recent years, intelligent crack inspection systems
creativecommons.org/licenses/by/
have gained increasing attention and application, such as vehicle-mounted inspections and
4.0/).

Appl. Sci. 2024, 14, 1157. https://doi.org/10.3390/app14031157 https://www.mdpi.com/journal/applsci


Appl. Sci. 2024, 14, 1157 2 of 23

their intelligent systems [4]. Guo et al. [5] utilized core components such as on-mounted
high-definition image sensors, laser sensors, and infrared sensors, etc. These components
enable the acquisition of high-precision road crack data in real-time. However, the overall
configuration of the vehicle-mounted system is expensive and limited in scope, making it
challenging to widely apply [2].
Notably, automatic pavement distress inspection has traditionally utilized image-
processing techniques such as Gabor filtering [6], edge detection, intensity thresholding [7],
and texture analysis. Cracks are identified by analyzing the changes in edge gradients
and intensity differences compared to the background, and then extracting them through
threshold segmentation [2]. However, these methods are highly influenced by environmen-
tal factors, including lighting conditions, which can affect their accuracy. Moreover, these
methods are not effective when the camera configurations vary, making their widespread
use impractical [1,8]. Given the limitations of these traditional approaches, it is crucial to
develop a cost-effective, accurate, fast, and independent method for the accurate detection
of road cracks.
In recent years, there have been significant advancements in machine learning and
deep learning algorithms, leading to the emergence of automatic deep learning methods
as accurate alternatives to traditional object recognition methods. These methods have
shown immense potential in visual applications and image analysis, particularly in road
distress inspection [1,8]. Krizhevsky et al. [9] proposed a deep convolutional neural net-
work (CNN) architecture for image classification, especially in the detection of distresses
in asphalt pavements. Cao et al. [3] presented an attention-based crack network (AC-
Net) for automatic pavement crack detection. Extensive experiments on the CRACK500
demonstrated that ACNet achieved a higher detection accuracy compared to eight other
methods. Tran et al. [10] utilized a supervised machine learning network called RetinaNet
to detect and classify various types of cracks that had developed in asphalt pavements,
including lane markers. The validation results showed that the trained network model
achieved an overall detection and classification accuracy of 84.9%, considering both the
crack type and severity level. Xiao et al. [11] proposed an improved model called C-Mask
RCNN, which enhances the quality of crack region proposal generation through cascad-
ing multi-threshold detectors. The experimental results indicated that the mean average
precision of the C-Mask RCNN model’s detection component was 95.4%, surpassing the
conventional model by 9.7%. Xu K et al. [12] also proposed a crack detection method based
on an improved Faster-RCNN for small cracks in asphalt pavements, even under complex
backgrounds. The experiments demonstrated that the improved Faster-RCNN model
achieved a detection accuracy of 85.64%. Xu X et al. [13] conducted experiments to evaluate
the effectiveness of Faster R-CNN and Mask R-CNN and compared their performances in
different scenarios. The results showed that Faster R-CNN exhibited a superior crack detec-
tion accuracy compared to Mask R-CNN, while both models demonstrated efficiency in
completing the detection task with small training datasets. The study focuses on comparing
Faster R-CNN and Mask R-CNN, but does not compare the proposed methods with other
existing crack detection methods. In general, these above-mentioned methods not only
detect the category of an object, but also determine the object’s location in the image [14].
The use of deep learning methods can reduce labor costs and improve work efficiency and
intelligence in recognizing road cracks [1].
Meanwhile, unmanned aerial vehicles (UAV) have demonstrated their versatility in a
wide range of applications, including urban road inspections. This is attributed to their
exceptional maneuverability, extensive coverage, and cost effectiveness [2]. Equipped with
high-resolution cameras and various sensors, these vehicles can capture images of the
road surface from multiple angles and heights, providing a comprehensive assessment
of its condition. Several researchers have utilized UAV imagery to study deep learning
methods for road crack object detection, and they have achieved impressive accuracy
results. Yokoyama et al. [15] proposed an automatic crack detection technique using arti-
ficial neural networks. The study focused on classifying cracks and non-cracks, and the
Appl. Sci. 2024, 14, 1157 3 of 23

algorithm achieved a success rate of 79.9%. Zhu et al. [2] utilized images collected by a
UAV to conduct experimental comparisons of three deep learning target detection methods
(Faster R-CNN, YOLOv3, and YOLOv4) via convolutional neural networks (CNN). The
study verified that the YOLOv3 algorithm is optimal, with an accuracy of 56.6% mAP. In
another study, Jiang et al. [16] proposed an RDD-YOLOv5 algorithm with Self-Attention
for UAV road crack detection, which significantly improved the accuracy with an mAP of
91.48%. Furthermore, Zhang et al. [17] proposed an improved YOLO3 algorithm for road
damage detection from UAV imagery, incorporating a multi-layer attention mechanism.
This enhancement resulted in an improved detection accuracy with an mAP of 68.75%.
Samadzadegan et al. [1] utilized the YOLOv4 deep learning network and evaluated its
performance using various metrics such as F1-score, precision, recall, mAP, and IoU. The
results showed that the proposed model had an acceptable performance in road crack
recognition. Additionally, Zhou et al. [18] introduced a UAV visual inspection method
based on deep learning and image segmentation for detecting cracks on crane surfaces.
Moreover, Xiang et al. [19] presented a lightweight UAV road crack detection algorithm
called GC-YOLOv5s, which achieved an accuracy validation mAP of 74.3%, outperforming
the original YOLOv5 by 8.2%. Wang et al. [20] introduced BL-YOLOv8, an improved road
defect detection model that enhances the accuracy of detecting road defects compared to
the original YOLOv8 model. BL-YOLOv8 surpasses other mainstream object detection
models, such as Faster R-CNN, SDD, YOLOv3-tiny, YOLOv5s, YOLOv6s, and YOLOv7-
tiny, by achieving detection accuracy improvements of 17.5%, 18%, 14.6%, 5.5%, 5.2%,
2.4%, and 3.3%, respectively. Furthermore, Omoebamije et al. [21] proposed an improved
CNN method based on UAV imagery, demonstrating a remarkable accuracy of 99.04% on
a customized test datasets. Lastly, Zhao et al. [22] proposed a highway crack detection
and CrackNet classification method using UAV remote sensing images, achieving 85% and
78% accuracy for transverse and longitudinal crack detection, respectively. These aforemen-
tioned studies primarily aim to enhance the deep learning algorithm using UAV images.
This enhancement improves the accuracy of road crack detection and also establishes
the methodological foundation for the crack target recognition algorithm discussed in
this paper.
However, most of the above-mentioned studies primarily focused on UAV detection
algorithms and neglected UAV data acquisition and high-quality imagery integrated into
detection methods. For instance, the flight settings required for capturing high-quality
images have not been thoroughly studied [2]. Flying too high or too fast may result in
poor-quality images [22]. Zhu et al. [2] and Jiang et al. [16] both introduced flight setup and
experimental tricks for efficient UAV inspection. Liu K.C. et al. [23] proposed a systematic
solution for automatic crack detection for UAV inspection. These studies are still incomplete
due to a lack of detailed data acquisition and pavement distress assessment. Additionally,
there is a lack of quantitative measurement methods for cracks, which hampers accurate
data support for road distress evaluation. Furthermore, inconsistency in flight altitude and
the absence of ground real-scale information of cracks adversely impact the subsequent
quantitative assessment of cracks.
Obviously, existing studies frequently lack a systematic solution or integrated frame-
work for UAV inspection technology, which hinders its widespread application in pavement
distress detection. Therefore, this study aims to propose a formal and systematic framework
for automatic crack detection and pavement distress evaluation in UAV inspection systems,
with the goal of making them widely applicable.
Our proposed framework for a UAV inspection system for automatic road crack
detection offers several advantages: (1) It demonstrates a more systematic solution. The
framework integrates data acquisition, crack identification, and road damage assessment
in orderly and closely linked steps, making it a comprehensive system. (2) It exhibits
a greater robustness. By adhering to the flight control strategy and model deployment
scheme, the drone ensures high-quality data collection while employing state-of-the-art
automatic detection algorithms based on deep learning models that guarantee accurate
Appl. Sci. 2024, 14, 1157 4 of 23

crack identification. (3) It presents an enhanced practicality. The system utilizes the cost-
effective DJI (DJ-Innovations Company, headquartered in Shenzhen, China) Min2 drone for
imagery acquisition and DL-based model deployment, making it an economically viable
solution with significant potential for widespread implementation.
The rest of this paper is organized as follows: Section 2 presents the framework for the
UAV inspection system designed specifically for pavement distress analysis. In Section 3,
we provide a comprehensive overview of four prominent deep-learning-based crack de-
tection algorithms, namely Faster-RCNN, YOLOv5s, YOLOv7-tiny, and YOLOv8s, along
with their distinctive characteristics. Section 4 elaborates on the well-defined procedures
employed for UAV data acquisition and subsequent data reprocessing. The experimental
setup and comparative results are presented in Section 5. In Section 6, we propose quan-
titative methods to evaluate road cracks and assess pavement distress levels. Finally, in
Section 7, we summarize our research while discussing future work.

2. Framework of UAV Inspection System


To enhance the practical application of UAV inspection systems in road crack detection,
this study presents a comprehensive DL-based method and technical solution framework.
As illustrated in Figure 1, the technical framework consists of four main components:
(1) Data Acquisition: a flight suitability parameter model is established to ensure high-
quality pavement imagery acquisition by the UAV. Prior image data are utilized to create
crack datasets for model training, while the subsequent phase is directly employed for pave-
ment crack detection. (2) Model Training and Evaluation: UAV imagery is pre-processed
through frame extraction, image dividing, and data enhancement, and then labeled ac-
cording to five major categories of cracks (longitudinal, transverse, diagonal, mesh, and no
cracks) to create the datasets. Based on this, four mainstream DL target detection algorithms
(Faster-RCNN, YOLOv5, YOLOv7, and YOLOv8) are individually conducted for the road
crack detection model training. Finally, the models are compared and validated using
precision (P), recall (R), F1-score, and mean accuracy precision (mAP) as evaluation metrics,
and the best model is selected. (3) Model Application and Road Crack Detection: The
preferred model is employed to identify road crack targets using UAV imagery. To reduce
the computing resources, the full-scale images are divided into smaller images before
detection. (4) Road Distress Evaluations: Quantitative assessments (for instance, cracks
count, cracks length, and crack area etc.) are conducted to evaluate pavement distress,
which provide
Appl. Sci. 2024, 14, x FOR factual evidence and a solid data foundation for evaluating road damage
PEER REVIEW 5 of 24
and planning road repair work for transportation departments.

Figure 1. A framework of UAV inspection system based on deep learning for pavement distress.
Figure 1. A framework of UAV inspection system based on deep learning for pavement distress.
3. Deep Learning Algorithms
In recent years, there has been significant progress in deep learning technology, lead-
ing to a paradigm shift in target detection methods from traditional algorithms based on
manual features to deep-neural-network-based detection methods [24]. These deep learn-
ing algorithms can be categorized into two major approaches (Figure 2): (1) Two-stage
methods (two-stage algorithms), which involve labeling multiple target candidate regions
in the image and subsequently classifying and regressing the boundary of each candidate
Appl. Sci. 2024, 14, 1157 5 of 23
Figure 1. A framework of UAV inspection system based on deep learning for pavement distress.

3. Deep Learning Algorithms


3. Deep Learning Algorithms
In recent years, there has been significant progress in deep learning technology, lead-
In recent years, there has been significant progress in deep learning technology, lead-
ing to a paradigm shift in target detection methods from traditional algorithms based on
ing to a paradigm shift in target detection methods from traditional algorithms based
manual features to deep-neural-network-based detection methods [24]. These deep learn-
on manual features to deep-neural-network-based detection methods [24]. These deep
ing algorithms can be categorized into two major approaches (Figure 2): (1) Two-stage
learning algorithms can be categorized into two major approaches (Figure 2): (1) Two-stage
methods (two-stage algorithms), which involve labeling multiple target candidate regions
methods (two-stage algorithms), which involve labeling multiple target candidate regions
in the
in the image
image and
and subsequently
subsequently classifying
classifying and
and regressing
regressing thethe boundary
boundary of of each
each candidate
candidate
region. Representative algorithms belonging to this approach include the RCNN
region. Representative algorithms belonging to this approach include the RCNN series.
series.
(2) Single-stage methods (one-stage algorithms), which directly perform
(2) Single-stage methods (one-stage algorithms), which directly perform the localization the localization
and classification
and classification of of all
all detected
detected targets
targets across
across the
the entire
entire image
image without
without requiring
requiring the
the ex-
ex-
plicit labeling of candidate regions. Representative algorithms belonging to
plicit labeling of candidate regions. Representative algorithms belonging to this approach this approach
include the
include the YOLO
YOLO(You (YouOnly
OnlyLook
LookOnce)
Once)series.
series. Both
Both approaches
approaches have
have their
their own own ad-
advan-
vantages, with the single-stage algorithm being faster and the two-stage algorithm
tages, with the single-stage algorithm being faster and the two-stage algorithm being more being
more accurate.
accurate. Therefore,
Therefore, this selects
this study study the
selects theRCNN
Faster Faster algorithm
RCNN algorithm [25]YOLOv5
[25] and the and the
YOLOv5 algorithm
algorithm [26] asrepresentatives
[26] as typical typical representatives
of these oftwothese
major twoapproaches.
major approaches. Addi-
Additionally,
tionally, the latest improved algorithms of YOLOv5, namely YOLOv7
the latest improved algorithms of YOLOv5, namely YOLOv7 [27] and YOLOv8 [28], are [27] and YOLOv8
[28], are introduced
introduced into comparative
into comparative validationvalidation in this
in this study study
on the use on the use
of deep of deep
learning learning
algorithms
algorithms for road crack detection
for road crack detection using drones. using drones.

Figure 2. A road map of object detection models (modified from [23]).

3.1. Faster-RCNN Algorithm


The Faster-RCNN algorithm is a typical representative example of a two-stage al-
gorithm for target detection. The Faster-RCNN model consists of four components: a
Backbone, Region Proposal Networks (RPN), ROI (Region of Interest) Pooling, and Classi-
fier. The Backbone extracts a feature map that is used for candidate detection area extraction
and classification. The RPN further refines the candidate detection areas based on the initial
feature map, which may contain the target features. These refined areas are then used for
further classification and localization. The ROI Pooling fine-tunes the candidate detection
areas based on their candidate box coordinates. Finally, the Classification component uses
the proposals and feature maps to determine the category of the proposal and regress the
candidate detection frames to obtain their final precise locations.
The network architecture of Faster-RCNN is illustrated in Figure 3. Firstly, an arbitrary
input image (P × Q) is resized to a standard image (M × N) and then fed into the
network. The backbone (e.g., VGG and ResNet, etc.) extracts features from the M × N
image, followed by convolution and pooling operations, resulting in feature maps for
this input. These feature maps contain information about different scales and semantics,
enabling the detection of objects with various scales and shapes in the image. In the Region
Prediction Network (RPN), the RPN network performs a 3 × 3 convolution to generate
Positive Anchors and the corresponding Bounding Box Regression offsets. It then calculates
Proposals, which are utilized by the ROI pooling layer to extract the Proposals from the
Feature Maps. The Proposal Feature is further processed through fully connected and
softmax networks for classification.
input. These feature maps contain information about different scales and semantics, ena-
bling the detection of objects with various scales and shapes in the image. In the Region
Prediction Network (RPN), the RPN network performs a 3 × 3 convolution to generate
Positive Anchors and the corresponding Bounding Box Regression offsets. It then calcu-
lates Proposals, which are utilized by the ROI pooling layer to extract the Proposals from
Appl. Sci. 2024, 14, 1157 6 of 23
the Feature Maps. The Proposal Feature is further processed through fully connected and
softmax networks for classification.

Figure3.3.An
Figure Anillustration
illustrationof
of Faster-RCNN
Faster-RCNN (modified
(modified from
from[29]).
[29]).

3.2.YOLO
3.2. YOLOSeries
SeriesAlgorithms
Algorithms
TheYOLO
The YOLO series
series algorithm
algorithmisisaatypical
typicalrepresentative
representative example
exampleof the
of one-stage algo-al-
the one-stage
rithm target
gorithm targetdetection
detectionmodel. In comparison
model. In comparison to thetoFaster RCNNRCNN
the Faster algorithm, YOLO elimi-
algorithm, YOLO
nates the need
eliminates to extract
the need candidate
to extract regionsregions
candidate that may contain
that targets. It
may contain completes
targets. the de-
It completes
tection
the task using
detection only one
task using onlynetwork and predicts
one network the category
and predicts and location
the category of the target
and location of the
object in the detection output through regression. Currently, YOLOv5 is
target object in the detection output through regression. Currently, YOLOv5 is the initial the initial model
of theof
model series, whichwhich
the series, has been
hasproven to be stable
been proven to be and
stable is and
widely used inused
is widely lightweight road
in lightweight
crack detection methods due to its excellent accuracy [17,21]. YOLOv5
road crack detection methods due to its excellent accuracy [17,21]. YOLOv5 consists of consists of several
networks
several with different
networks depths, depths,
with different namely n, s, m, l,n,and
namely s, m,x. The depth
l, and anddepth
x. The widthand of the net- of
width
work
the increase
network in the order
increase in theoforder
n, s, m, l, and
of n, s, m,x.l,Among
and x. these
Among options,
theseYOLOv5s is suitable is
options, YOLOv5s
for small
suitable fordeep networks
small or small-scale
deep networks datasets.datasets.
or small-scale
Thenetwork
The networkarchitecture
architecture of
of YOLOv5
YOLOv5 is is depicted
depictedin inFigure
Figure4.4.The
Themodel
modelcomprises
comprises
three main components: the backbone network (BackBone), the
three main components: the backbone network (BackBone), the neck network (Neck), neck network (Neck), and
and
the head detection network (Head). The backbone network (Backbone) primarily performs
feature extraction by utilizing a convolutional network to extract object information from
the image. This information is then used to create a feature pyramid, which is later
employed for target detection. The backbone network consists of various modules, such
as the Focus module, Conv module, C3 module, and SPFF module. Notably, the SPPF
(Spatial Pyramid Pooling Faster) module is capable of converting feature maps of any size
into fixed-size feature vectors. This allows for the fusion of local and global features at the
Feather Map level and further expands the receptive field of the feature map. Consequently,
objects can be effectively detected even when input at different scales. The neck network
(Neck) is responsible for the multi-scale feature fusion of the feature map. It adopts the
structure of the Feature Pyramid Network (FPN) and the Path Aggregation Network (PAN),
which enhances the model’s ability to capture object features at various scales and improves
the accuracy and performance of target detection. The head network (Head), also known
as the detection module, utilizes techniques like anchor boxes to process the input feature
mapping and generate regression predictions. These predictions include information about
the type, location, and confidence of the crack detection object.
YOLOv7 [27] is an enhanced target detection framework based on YOLOv5. It in-
corporates a deeper network structure and robust methods, resulting in an improved
accuracy and speed compared to YOLOv5. YOLOv7 introduces several techniques, such
as Long-Range Attention Off Network (ELAN) and Bottleneck Attention Module (BAM),
to enhance its learning capability. ELAN expands, shuffles, and merges the quantity (Car-
dinality), thereby improving the equilibrium state of the learning network. To prevent
overfitting, YOLOv7 employs a regularization method similar to DropBlock. This regular-
ization method enhances the stability and robustness of the model, enabling it to be trained
on larger datasets.
adopts the structure of the Feature Pyramid Network (FPN) and the Path Aggregation
Network (PAN), which enhances the model’s ability to capture object features at various
scales and improves the accuracy and performance of target detection. The head network
(Head), also known as the detection module, utilizes techniques like anchor boxes to pro-
Appl. Sci. 2024, 14, 1157 7 of 23in-
cess the input feature mapping and generate regression predictions. These predictions
clude information about the type, location, and confidence of the crack detection object.

Network architecture
Figure4.4.Network
Figure architecture of
of YOLOv5.
YOLOv5.

YOLOv8 [27]
YOLOv7 [28] was
is anreleased
enhancedin January 2023 by Ultralytics,
target detection framework thebased
company that developed
on YOLOv5. It incor-
YOLOv5. YOLOv8 further optimizes the model structure and training
porates a deeper network structure and robust methods, resulting in an improvedstrategy based accu-
on
YOLOv7 to enhance both detection speed and accuracy. Notably, YOLOv8 incorporates
racy and speed compared to YOLOv5. YOLOv7 introduces several techniques, such as
a more efficient long-range attention network called Extended-ELAN (E-ELAN), which
enhances the model’s feature extraction capability. Moreover, YOLOv8 introduces new
loss functions, such as VFL Loss and Loss+DFL (Distribution Focal Loss), to improve the
model’s localization accuracy and category differentiation ability. Additionally, YOLOv8
employs new data enhancement methods, including Mosaic + MixUp, to enhance the
generalization and robustness of the model.
In the current field of deep learning models, Faster-RCNN, YOLOv5, YOLOv7, and
YOLOv8 are all target detection methods known for their high accuracy and advanced
algorithms. However, there are some variations in terms of model structure, accuracy,
speed, training strategy, and robustness. The selection of the appropriate algorithm should
be based on specific requirements and application scenarios to effectively address the needs
of UAV road crack target detection.
Appl. Sci. 2024, 14, 1157 8 of 23

4. UAV Data Acquisition and Preprocessing


4.1. Flight Control Strategy
During the flight process of a UAV equipped with a high-definition camera, the
acquired imagery may suffer from distortion, degradation, or uncovered road due to
improper human control or mismatched flight parameters. Therefore, it is crucial to
establish a flight control strategy and experimental techniques for the UAV flight parameters
to enhance the quality of the imagery captured by the UAV in real-world scenarios.

4.1.1. Flight Height


To determine the optimal altitude of the UAV and ensure its efficient flight, the
following considerations should be taken into account: (i) the UAV camera view should
cover the full width of the road that needs to be inspected; (ii) it is important to avoid any
interference from auxiliary facilities such as road trees and street lights during the flight;
and (iii) to minimize image distortion, it is crucial to maintain a constant altitude, consistent
speed, and capture vertical imagery.
To cover the full width of the pavement, a minimum flight altitude is required. Based
on the Pinhole Imaging Principle and the Triangular Similarity Geometric Relationship
(Figure 5a), the minimum flight altitude should satisfy Equation (1):

H ≥ ( f ∗ W )/Sw (1)

where H represents flight vertical height. f represents the focal length of the camera. W
Appl. Sci. 2024, 14, x FOR PEER REVIEW 9 of 24
represents the width of the road to be inspected. Sw represents the camera sensor size
(Sw × Sh ).

(a) (b) (c)


Figure
Figure5.5.Diagram of of
Diagram thethe
main flight
main parameters
flight ((a,c)
parameters modified
(a,c) from
modified [16]).
from (a) flight
[16]). height
(a) flight (H); (b)
height (H);
ground sampling distance (GSD); and (c) flight velocity (v).
(b) ground sampling distance (GSD); and (c) flight velocity (v).

InInour
ourexperiment,
experiment,thetheDJI
DJIMini
Mini2 2drone
dronewaswaschosen
chosentotoperform
performthe theflight
flightmission.
mission.
The
The camera sensor format was CMOS 1/2.3 inches, with a full-frame sensorofsize
camera sensor format was CMOS 1/2.3 inches, with a full-frame sensor size 17.3of
mm × 13.0 mm. The main focal length (f) was 24.0 mm. The experimental
17.3 mm × 13.0 mm. The main focal length (f ) was 24.0 mm. The experimental pave- pavement con-
sisted
mentof a bi-directional
consisted eight-lane road.
of a bi-directional To ensure
eight-lane road.high-definition imagery quality,
To ensure high-definition the
imagery
experiment was conducted only on the left lane, from east to west. The width
quality, the experiment was conducted only on the left lane, from east to west. The width of the road
(W) wasroad
of the measured
(W) was tomeasured
be 16 m. to
Thebeminimum
16 m. Theflight
minimumaltitude was
flight calculated
altitude at 22.20 m.at
was calculated
Taking
22.20 m.intoTaking
account theaccount
into tolerance
thefor flight stability,
tolerance the
for flight final flight
stability, the height was height
final flight chosenwasas
22.5 m. as 22.5 m.
chosen

4.1.2.Ground
4.1.2. GroundSampling
Sampling
TheGround
The GroundSampling
SamplingDistance
Distance(GSD)
(GSD)isisa acrucial
crucialparameter
parameterininremote
remotesensing
sensingand
and
imageprocessing.
image processing. It quantifies
It quantifies thethe distance
distance between
between the individual
the individual pixels
pixels in anin an image
image and
andground
the the ground
truth,truth,
whichwhich directly
directly affects
affects thethe accuracy
accuracy of of geospatialmeasurements
geospatial measurementsforfor
cracks. (i) DJI can officially provide GSD values that are applicable to a wide range of focal
lengths [16]. The most commonly observed GSD value, typically associated with a 24 mm
focal length, is calculated as H/55. (ii) Alternatively, GSD can be derived directly from the
diagram in Figure 5b, using Equation (2):

GSD ≈ (μ ∗ H ) / f (2)
Appl. Sci. 2024, 14, 1157 9 of 23

cracks. (i) DJI can officially provide GSD values that are applicable to a wide range of focal
lengths [16]. The most commonly observed GSD value, typically associated with a 24 mm
focal length, is calculated as H/55. (ii) Alternatively, GSD can be derived directly from the
diagram in Figure 5b, using Equation (2):

GSD ≈ (µ ∗ H )/ f (2)

where GSD represents the ground sampling distance of a flight, and its unit is cm/pixel;
µ is the image pixel calibration size (µm), which can be officially provided by DJI. Take DJI
Mini 2 as an example, where µ is given as 4.4 µm. If the flight height is 22.5 m, thereby
GSD can be computed as 0.4125 cm/pixel.

4.1.3. Flight Velocity


The appropriate flight velocity is also essential in UAV imagery acquisition to avoid
redundancy and motion blurring. It should be determined based on the degree of overlap
between neighboring images and the consistency and quality stability of the aerial images.
Typically, a minimum forward overlap of 75% and a minimum side overlap of 60% are
recommended. Figure 5c illustrates how the flight velocity can be calculated based on
the desired overlap degree and the sampling frequency of the neighboring frames, using
Formula (3):
v = L ∗ (1 − r )/t (3)
where v is the flight velocity (m/s) and t represents the shooting interval of two adjacent
images (s), typically set to 2. The overlap degree (r) is defined as the forward overlap and is
commonly taken as 50–75%, since UAVs are often operated at the same speed and uniform
linear motion with the forward direction. L represents the ground truth length of the road
in an image (m). L can be determined based on the GSD and the road width (W) covered
by the UAV imagery, using the following formula.

L = W/GSD = (W ∗ f )/(µ ∗ H ) (4)

In our experiment, the road width (W) was chosen as 16.0 m and the GSD was
computed as 0.4125 m/pixel using Equation (2), thereby, the ground-truth road length
(L) in an image was calculated as 38.78 m using Equation (4). Given that the sampling
frequency t was 2 s, the forward overlap of the captured images was set to 75%. According
to the Equation (3), the minimum speed v was 4.85 m/s, and finally, 5 m/s was determined
as the flight velocity for this experiment.

4.2. UAV Imagery Data Preprocessing


4.2.1. Frame Extraction and Fusion from UAV Imagery Video
Video frame data play a crucial role in acquiring UAV pavement crack images. In order
to obtain and supplement the original pavement crack datasets, it is necessary to extract
and crop the frames. During the frame extraction process, it is important to consider the
overlap, spacing, and seamlessness of neighboring video frames to ensure the integrity and
independence of each frame. Additionally, the setting of the frame extraction interval is of
the utmost importance. If the interval is too large, it may result in a lack of seamless fusion
and docking. On the other hand, if the interval is too small, there will be significant overlap
between frames, leading to an excessive number of frames and an increased computing
cost. The formula for calculating the extraction interval number (N) is as follows:

GSD × Fl × (1 − r )
 
N≤ × f ps (5)
v

where Fl is the frame image size (px), i.e., the flight direction; fps represents the frames per
second in each video; and the other variables are described in the previous section.
Appl. Sci. 2024, 14, 1157 10 of 23

For this experiment, UAV imagery in the DJI Mini 2 was set to 4K HD, which cor-
responds to the DJI official frame image size of 3840 px × 2160 px. Namely, Fl was
taken as 3840 px. The fps was officially 24 f·s−1 , and GSD and v were calculated to be
0.4125 cm/pixel and 5.0 m/s as above, respectively. The overlap (r) was taken as 75%.
Using Equation (5), the extraction interval number (N) was found to be [19.01], which was
rounded to 19. Finally, to ensure sufficient overlap, this study extracted an image every
19 frames from the video. The extracted frame images were then used to stitch together
overlapping parts of neighboring frames using the picture fusion technique.

4.2.2. Pavement Cracks Datasets with GSD Information


Due to the large size of the acquired images or frame images, utilizing them directly
as inputs for model training would lead to a sluggish training speed and a significant
consumption of processing resources. Therefore, to enhance the parallel batch comput-
ing speed, deep learning models have specific requirements for training image datasets.
The original images need to be trimmed after frame extraction and fusion, resulting in
images with consistent specifications. In this experiment, the extracted frame images from
videos were used as the initial images, which were further cropped into 640 px × 640 px
specification images. The cropping process is shown in Figure 6. Assuming the original
image size of a frame was a 3840 px × 2160 px road image, 18 640 px × 640 px specification
Appl. Sci. 2024, 14, x FOR PEER REVIEW
images could be cropped. To expand the number of samples for crack categories, data 11 ofen-
24
hancement methods such as augmentation, translation, flipping, and rotation were applied.
Additionally, the blurred images were removed to ensure they did not affect the training
effect
effectofofthe
themodel
modeltraining.
training.The
TheUAV
UAVcrack original
crack datasets
original were
datasets constructed
were by by
constructed manually
manu-
screening, classifying, and confirming the coverage of the four major crack categories and
ally screening, classifying, and confirming the coverage of the four major crack categories
the no-crack images.
and the no-crack images.

Figure6.6. Diagram
Figure Diagram of
of trimming
trimmingprocess
processfor
forthe
thelarge
largeframe
frameimage
image(modified
(modifiedfrom
from[10]).
[10]).

Road
Road crack
crack labeling
labeling plays
plays aa crucial
crucial role
role in
in training
training and
and testing
testing deep
deep learning
learning models.
models.
The
Theaccuracy
accuracyofoflabeling
labelingdirectly
directlyimpacts
impactsthe
thequality
qualityofof model
model learning.
learning. In
In this experiment,
this experiment,
we
weemployed
employedvarious
variousmethods,
methods,including
includingmanual
manual visual
visuallabeling and
labeling thethe
and Labelimg
Labelimgtool, to
tool,
decipher, mark, and categorize different types of cracks based on the original
to decipher, mark, and categorize different types of cracks based on the original UAV UAV crack
datasets. The goal
crack datasets. wasgoal
The to create
was anto improved
create an training
improved settraining
for crackset
recognition.
for crack Based on the
recognition.
prominence of cracks and their associated damage hazards, road cracks were
Based on the prominence of cracks and their associated damage hazards, road cracks were categorized
categorized into four types: longitudinal cracks, transverse cracks, diagonal cracks, and
mesh cracks. These categories are illustrated in Table 1.

Table 1. Classification and description of road cracks.

Longitudinal Crack Transverse Crack Oblique Crack Alligator Crack No-Cracks


Road
Road crack
Roadcrack labeling
cracklabeling plays
playsaaacrucial
labelingplays crucial role
crucialrole
roleinin training
intraining
trainingandand testing
andtesting deep
testingdeep learning
deeplearning models.
learningmodels.
models.
The
The accuracy
Theaccuracy
accuracyof of labeling
oflabeling directly
labelingdirectly impacts
directlyimpacts
impactsthe the quality
thequality
qualityofof model
ofmodel learning.
modellearning.
learning.In In this
Inthis experiment,
thisexperiment,
experiment,
we
we employed
we employed various
employed various methods,
various methods, including
methods, including manual
including manual visual
manual visual labeling
visual labeling and
labeling and the
and the Labelimg
the Labelimg tool,
Labelimg tool,
tool,
Appl. Sci. 2024, 14, 1157
to
to decipher,
to decipher, mark,
decipher, mark,
mark, andand categorize
and categorize different
categorize different types
different types of
types of cracks
of cracks based
cracks based
based onon the
on the original11UAV
original
the original of 23
UAV
UAV
crack
crack datasets.
crack datasets. The
datasets. The goal
The goal was
goal was
was toto create
to create
create an an improved
an improved training
improved training set
training set for
set for crack
for crack recognition.
crack recognition.
recognition.
Based
Based
Basedonon the
onthe prominence
theprominence
prominenceof of cracks
ofcracks
cracksand and their
andtheir associated
theirassociated damage
associateddamage hazards,
damagehazards,
hazards,roadroad cracks
roadcracks were
crackswere
were
into four
categorized
categorizedtypes:
into
categorized into longitudinal
four
into four types:
four types: cracks,
longitudinal
types: longitudinal transverse
cracks,
longitudinal cracks, cracks, diagonal
transverse
cracks, transverse cracks,
cracks,
transverse cracks, and
diagonal
cracks, diagonal mesh cracks.
cracks,
diagonal cracks, and
cracks, and
and
These
mesh
mesh categories
cracks.
meshcracks.
cracks. These
These
These arecategories
illustratedare
categories
categories in illustrated
are
are Table 1. in
illustrated
illustrated in Table
inTable
Table1.1.
1.

Table
Table1.
Table
Table 1.Classification
1.
1. Classificationand
Classification
Classification anddescription
and
and descriptionof
description
description ofroad
of
of roadcracks.
road
road cracks.
cracks.
cracks.

Longitudinal
Longitudinal
Longitudinal Crack
Crack
Crack
Longitudinal Crack Transverse
Transverse
Transverse Crack
Transverse Crack
Crack
Crack Oblique
Oblique
Oblique Crack
Oblique Crack
Crack
Crack Alligator
Alligator
Alligator Crack
Crack
Crack
Alligator Crack No-Cracks
No-Cracks
No-Cracks
No-Cracks
(LC)
(LC)
(LC)
(LC) (TC)
(TC)
(TC)
(TC) (OC)
(OC)
(OC)
(OC) (AC)
(AC)
(AC)
(AC) (Other)
(Other)
(Other)
(Other)

Generally,
Generally,
Generally,the the problem
theproblem
problemof of imbalanced
ofimbalanced
imbalancedsamplesample distribution
sampledistribution
distributionin in datasets
indatasets
datasetscan can often
canoften lead
oftenlead
lead
Generally, the problem of imbalanced sample distribution in datasets can often lead to
to
to overfitting
tooverfitting
overfittingof of the
ofthe model
themodel
model[30]. [30]. To
[30].To address
Toaddress
addressthisthis issue,
thisissue, this
issue,this experiment
thisexperiment
experimentfully fully considered
fullyconsidered
consideredthe the
the
overfitting of the model [30]. To address this issue, this experiment fully considered the
balance
balance
balanceof of the
ofthe sample
thesample distribution
sampledistribution
distributionwhenwhen creating
whencreating
creatingthe the labeling
thelabeling datasets.
labelingdatasets.
datasets.Each Each type
Eachtype
typeofof road
ofroad
road
balance of the sample distribution when creating the labeling datasets. Each type of road
pavement
pavement crack
pavement crack had
had aaa more
crack had more equal
more equal number
equal number distribution,
number distribution,
distribution, as as shown
as shown
shown in in Figure
in Figure
Figure 7.7.
7.AA total
A total of
total of
of
pavement crack had a more equal number distribution, as shown in Figure 7. A total of
1388
1388 pavement
1388pavement
pavementcrackcrack images
crackimages
imagesbasedbased
basedon on
onaaaUAV
UAV
UAVwere were collected
werecollected
collectedandand labeled,
andlabeled,
labeled,withwith 304
with304 samples
304samples
samples
1388 pavement crack images based on a UAV were collected and labeled, with 304 samples
identified
identified
identifiedas
Appl. Sci. 2024, 14, x FOR PEER REVIEW
identified as
as being
asbeing
being
beingof of
of the
ofthe
the longitudinal
thelongitudinal
longitudinal
longitudinal crack
crack
crack
crack (LC)(LC)
(LC)
(LC) type,
type,
type,
type, 303 303
303
303 samples
samples
samples
samples identified
identified
identified
identified as
as
as beingbeing
asbeing
being
12of of
of
of
ofthe
24
the
the transverse
transverse crack
crack (TC)
(TC) type,
type, 313
313 samples
samples identified
identified as
as being
being of
of
transverse crack (TC) type, 313 samples identified as being of the obliquely oriented crack the
the obliquely
obliquely oriented
oriented
crack
crack
(OC) (OC)
cracktype,
(OC)
(OC)368type,
type,
type, 368
368
samples samples
368samples
samples
identifiedidentified
identified
identified
as being as
as being
asofbeing
being of
of the
ofthe
the alligator alligator
thealligator
alligator
crack (AC)crack
crack
crack (AC)
(AC)
type,(AC)
and type,
type,
type, and
100 and
and100
samples100
100
being identified
samples as of the no-crack
being identified as of the type. To ensure
no-crack theensure
type. To DL-basedthe model’s
DL-based effectiveness, the
model’s effec-
datasets were divided into training, validation, and test sets in the ratio of 80%, 10%,
tiveness, the datasets were divided into training, validation, and test sets in the ratio ofand
10%, 10%,
80%, respectively.
and 10%, respectively.

Figure
Figure 7. Samples
Samples and
and distribution
distribution of
of the pavement crack datasets.

Notably, existing
Notably, existing crack
crackdatasets
datasetsoften
oftendo donot
notprovide
provideground-truth
ground-truthinformation,
information, partic-
par-
ularly regarding
ticularly regardingthethe
spatial resolution
spatial of UAV
resolution of UAVimagery.
imagery.ThisThis
lacklack
of information directly
of information di-
affectsaffects
rectly the accuracy of crackofidentification
the accuracy and measurement
crack identification in future studies.
and measurement in futureInstudies.
this study,
In
the UAV
this study,data
the collection
UAV dataprocess included
collection processthe recording
included the of the real-time
recording of theflight height,
real-time an
flight
important
height, an parameter
important for each image.
parameter Thereby,
for each theThereby,
image. Ground Sample Distance
the Ground (GSD)
Sample can be
Distance
calculated using Formula (2), and also documented in each treated image,
(GSD) can be calculated using Formula (2), and also documented in each treated image, which is crucial
for theissubsequent
which crucial for automated evaluation
the subsequent of pavement
automated evaluationdamage.
of pavement damage.

5. Experiments and Results


5.1. Experimental Scenario
In this experiment, the flight mission was located on Xuefu Road, Xiangtan City, Hu-
nan Province, China, as shown in Figure 8. The Xuefu Road is an asphalt pavement with
eight two-way lanes and a one-way road width of 16 m. The UAV aerial photography
covered a distance of 1.5 km. The road was built and opened to traffic in 2010. After more
Appl. Sci. 2024, 14, 1157 12 of 23

5. Experiments and Results


5.1. Experimental Scenario
In this experiment, the flight mission was located on Xuefu Road, Xiangtan City,
Hunan Province, China, as shown in Figure 8. The Xuefu Road is an asphalt pavement
with eight two-way lanes and a one-way road width of 16 m. The UAV aerial photography
covered a distance of 1.5 km. The road was built and opened to traffic in 2010. After more
than 13 years, the road surface has suffered significant damage, including transverse cracks,
longitudinal cracks, alligator cracks, and no-cracks. The experiment was conducted at
10:00 a.m. on a sunny day with relatively sparse traffic. Based on the previously obtained
Appl. Sci. 2024, 14, x FOR PEER REVIEW 13 of 24
UAV flight parameters, the flight height (H) was set to 22.5 m and the flight velocity (v)
was set to 5.0 m/s.

Figure8.8.Experimental
Figure Experimental road
road andand scenario.
scenario.

5.2. Experimental Configuration


5.2. Experimental Configuration
The deep learning algorithms used in this experiment were executed on the same spec-
The deep learning algorithms used in this experiment were executed on the same
ifications. The specific configuration and experimental environment are detailed in Table 2.
specifications.
The Faster-RCNN The specific
model configuration
employed and experimental
the VGG feature environment
extraction network, are detailed in
while YOLOv5,
YOLOv7, and YOLOv8 utilized YOLOv5s, YOLOv7-tiny, and YOLOv8s, respectively. The while
Table 2. The Faster-RCNN model employed the VGG feature extraction network,
YOLOv5,
input imageYOLOv7, and
size of these YOLOv8
models utilizedtoYOLOv5s,
was unified 640 px × 640YOLOv7-tiny,
px. The training and YOLOv8s, re-
iterations
spectively. The input image size of these models was unified to 640
(Epoch) were set to 200, as depicted in Figure 9. The YOLO algorithm series models px × 640 px. The train-
were
ing iterations
trained (Epoch)
with a batch size were setwhereas
of eight, to 200, asthedepicted in Figure
Faster-RCNN used9. The YOLO
a batch size of algorithm
four. The series
experiment’s
models were hyperparameters
trained with awere configured
batch as follows:
size of eight, the initial
whereas learning rate was
the Faster-RCNN usedseta batch
to 0.01, the learning rate decay employed the Cosine Annealing algorithm,
size of four. The experiment’s hyperparameters were configured as follows: the initialthe optimizer
used was SGD
learning rate (Stochastic
was set to Gradient
0.01, theDescent),
learningandratethe Momentum
decay employed wasthe
set Cosine
to 0.937.Annealing al-
gorithm, the optimizer used was SGD (Stochastic Gradient Descent), and the Momentum
Table 2. Configuration of the experimental environment.
was set to 0.937.
Software Configure Matrix Versions
Operating system Windows10 Python 3.9
CPU Intel Core i5-9300H PyTorch 2.0
GPU NVIDIA GeoForce GTX 1660Ti 6G CUDA 11.8
models were trained with a batch size of eight, whereas the Faster-RCNN used a ba
size of four. The experiment’s hyperparameters were configured as follows: the init
learning rate was set to 0.01, the learning rate decay employed the Cosine Annealing
Appl. Sci. 2024, 14, 1157
gorithm, the optimizer used was SGD (Stochastic Gradient Descent), and the Momentu
13 of 23
was set to 0.937.

Figure 9. Loss plot of the YOLOv5s model (the optimal iterations number is 200).
Figure 9. Loss plot of the YOLOv5s model (the optimal iterations number is 200).
5.3. Evaluation Metrics of Models
5.3.1. Running Performance
To validate the computational complexity of deep learning models, five evaluation
metrics in this experiment were firstly used to assess the algorithm’s running performance:
the number of parameters, video memory usage, training duration, memory consumption,
and frame rate (FPS). It is important to note that the FPS measures the number of images
processed per second and serves as a significant indicator of prediction speed.

5.3.2. Accuracy Effectiveness


Furthermore, in order to demonstrate the algorithm’s effectiveness for deep learning
models, four evaluation metrics in this experiment were used to access the detection
accuracy: Precision (P), Recall (R), F1-Score, Average Precision (AP), and Mean Average
Precision (mAP). P represents the probability of correct target detection and is calculated as
the ratio of the number of correctly classified samples (TP) to the total number of samples.
R represents the probability of correctly recognizing the target among all positive samples
and is calculated as the ratio of the number of correctly classified positive samples to the
number of all positive samples. F1-Score is a comprehensive evaluation index that takes
into account the effects of accuracy and recall. AP is obtained by calculating the area under
the Precision–Recall curve and reflects the precision of individual crack categories. The
mAP characterizes the average across the four crack categories and reflects the overall
classification precision of the crack prediction.

5.4. Experimental Results


To validate the viability of our proposed framework for analyzing UAV imagery
crack datasets, this study employed four prominent deep learning algorithms (Faster-
RCNN, YOLOv5s, YOLOv7-tiny, and YOLOv8s) for conducting pavement crack object
detection and a comparative analysis. The experiments were conducted using identical
hardware and software environments, with consistent iteration numbers, training datasets,
validation datasets, and test datasets. The results were evaluated based on the model
performance during execution, the recognition accuracy of the models, and variations in
crack category classification.
Appl. Sci. 2024, 14, 1157 14 of 23

5.4.1. Comparison Results of Running Performance


The operational performances of the four models are presented in Table 3. Among
them, the Faster-RCNN model exhibited, significantly, the lowest performance, with the
highest number of parameters (136.75 × 106 ), memory consumption (534.2 MB), and video
memory usage (5.6 GB), as well as the longest training duration (7.1 h) and the lowest
frame rate (12.80 f·s−1 ). On the other hand, the YOLO series models, which are single-stage
algorithms, demonstrated significantly faster running performances. The YOLOv7-tiny
model had the fewest parameters and minimal memory requirements, while achieving
higher frame rates for YOLOv5s and YOLOv8s, along with a faster execution speed.

Table 3. The results of running performance with various models.

Number of Training Duration Memory Video Memory


Models FPS (f·s−1 )
Parameters (×106 ) (h) Consumption (MB) Usage (GB)
Faster-RCNN 136.75 7.1 534.2 5.6 12.80
YOLO v5s 7.02 3.7 14.12 3.5 127.42
YOLO v7-tiny 6.01 3.8 12.01 1.9 82.56
YOLO v8s 11.13 3.1 21.98 3.6 125.74

When considering identical datasets, it can be concluded that the Faster-RCNN model
required a superior running performance and environment configuration, whereas the
YOLO model series required a lower hardware and software environment configura-
tion, while offering faster training speeds. Consequently, the YOLO model series algo-
rithms are highly suitable for the lightweight deployment of real-time detection tasks on
UAV platforms.

5.4.2. Comparison Results of Detection Accuracy


The Results of Overall Detection Accuracy
The results of comparing the overall detection accuracy are presented in Table 4.
Among all models, the Faster-RCNN model demonstrated the highest accuracy, surpassing
the YOLO series models in all evaluation indexes of accuracy. It achieved a precision (P),
recall (R), F1 value, and mean average precision (mAP) of 75.6%, 76.4%, 75.3%, and 79.3%,
respectively. Among the YOLO series models, YOLOv7-tiny exhibited a lower overall
precision with values of 66.9% (P), 66.5% (R), 66.7% (F1-score), and 65.5% (mAP). On
the other hand, both YOLOv5s and YOLOv8s showed similar overall precision, but were
slightly inferior to Faster-RCNN by approximately a margin from around 3% to 5%.

Table 4. Results of overall accuracy with various models (%).

Models Precision Recall F1-Score mAP


Faster-RCNN 75.6 76.4 75.3 79.3
YOLO v5s 75.1 71.0 72.6 74.0
YOLO v7-tiny 66.9 66.5 66.7 65.5
YOLO v8s 74.4 75.6 75.0 77.1

The Results of Detection Accuracy under Different Crack Types


To further clarify the discrepancies in the model recognition accuracy among the
different crack categories, a comparative analysis of the model recognition accuracy was
conducted for the four types of cracks: longitudinal cracks (LC), transverse cracks (TC),
oblique cracks (OC), and mesh cracks (AC). The results are presented in Table 5.
(i) Regarding the identification of longitudinal cracks (LC), the Faster-RCNN model
exhibited the highest accuracy, with an average precision (AP) of 85.7% and the highest
F1 value of 82.3%. In contrast, the YOLO series demonstrated a relatively inferior
Appl. Sci. 2024, 14, 1157 15 of 23

average precision, with YOLOv7-tiny exhibiting the lowest performance. Therefore,


Faster-RCNN outperformed the other models in recognizing longitudinal cracks.
(ii) For transverse cracks (TC), the YOLOv8s model achieved a superior recognition accu-
racy with an AP score of 89.5%, followed by YOLOv5s. Although there was a slight
decrease in F1 score for YOLOv8s compared to YOLOv5s, their overall recognition ac-
curacies did not significantly differ from each other; however, YOLOv7-tiny displayed
a weaker recognition accuracy.
(iii) All four algorithm models exhibit low recognition accuracy and F1 values for oblique
cracks (OC) compared to the other types of cracks; however, among them, Faster-
RCNN still maintained the highest level of recognition accuracy, while all models
belonging to the YOLO series demonstrated lower levels of recognition accuracy—this
explains why Faster-RCNN performed better overall.
(iv) In terms of recognizing mesh cracks (AC), an outstanding performance was observed
from the YOLOv8s model, which attained a remarkable recognition accuracy and F1
value at 91.0% and 90.6%, respectively; meanwhile, although slightly less effective
than its counterpart, the YOLOv5s model also showcased a commendable perfor-
mance, whereas a poor performance was exhibited by the YOLOv7-tiny model.

Table 5. Results of detection accuracy with various models under four crack types (%).

AP (%) F1-Score
Models
TC LC AC OC TC LC AC OC
Faster-RCNN 85.7 83.4 60.2 87.8 82.3 78.0 58.1 82.9
YOLO v5s 75.5 87.4 43.8 89.1 72.3 86.5 43.5 88.0
YOLO v7-tiny 70.4 81.2 40.7 80.7 70.0 79.0 44.8 77.1
YOLO v8s 75.4 89.5 45.4 91.0 74.4 85.0 48.5 90.6

The Results of Detection Accuracy under Different Crack Datasets


In this study, our self-made pavement crack datasets strictly followed the UAV flight
parameter settings and data acquisition process mentioned in Section 4. To validate the
reliability and advantages of our self-made crack datasets, we conducted a comparative
study using these four model algorithms on existing various open-source UAV pavement
crack datasets. Our experiment involved comparing the detection accuracy of our crack
datasets with datasets such as UAPD [2], RDD2022 [31], UMSC [19], UAVRoadCrack [21],
and CrackForest [32]. We evaluated and compared the accuracy performances of Faster-
RCNN, YOLOv5, YOLOv7-tiny, and YOLOv8s after 200 training cycles, as well as Faster-
RCNN after 15 rounds.
The results, as presented in Table 6, indicate that our lab’s datasets outperformed
other datasets used in similar models on most metrics, exhibiting the highest accuracy
for crack recognition and algorithmic efficiency. However, the model performance varied
across different datasets; while UAVRoadCrack performed relatively well, the UAPD
dataset showed the worst performance. These findings strongly highlight the advantages
of utilizing our self-collected pavement images via a UAV and emphasize the importance
of flight parameter modeling for the quality control of UAV imagery.

The Results of Detection Effectiveness


To facilitate a more intuitive comparison of the effects, a specific image with four
types of cracks was selected from the test set to evaluate and compare their recognition
performances, as presented in Table 7. Based on the results obtained, it is evident that
Faster-RCNN outperformed the YOLO series algorithms in terms of overall performance.
It is worth noting that, for challenging oblique cracks (OC), all the YOLO series algorithms
exhibited unsatisfactory recognition with low confidence levels, often resulting in the
omission or separate identification of complete cracks, whereas the Faster-RCNN model
demonstrated a superior capability in recognizing oblique cracks (OC) more comprehen-
Appl. Sci. 2024, 14, 1157 16 of
of 23
Appl.
Appl. Sci.
Sci. 2024,
2024, 14,
14, xx FOR
FOR PEER
PEER REVIEW
REVIEW 17
17 of 25
25

Appl. Sci. 2024, 14, x FOR PEER REVIEW 17 of 25

sively.
Appl. Sci. 2024, 14, x FOR PEER REVIEW Additionally, the Faster-RCNN model also exhibited an excellent performance 17 of in
25
exhibited unsatisfactory recognition with low confidence levels,
detecting subtle cracks, as shown in Table 7. For instance, it successfully identified a17 often resulting in the
subtle
Appl.
Appl. Sci.
Sci. 2024,
2024, 14,
14, xx FOR
FOR PEER
PEER REVIEW
REVIEW 17 of
of 25
25
omission
exhibited or
transverse separate
crack within
unsatisfactory identification
a recognition
longitudinalof with
complete
crack.low cracks,
In other whereas
crack
confidence types, the
levels, allFaster-RCNN
modelingmodel
fourresulting
often inalgo-
the
demonstrated
rithms a superior
demonstrated capabilityinof
effectiveness in complete
recognizing oblique cracksthe(OC) more comprehen-
omission
exhibited
sively.
or separate
unsatisfactory
Additionally, the recognition crack
identification
Faster-RCNN with detection.
modellowalso A whereas
cracks, comparative
confidence
exhibited levels,
an
analysis
Faster-RCNN
often
excellent
considering
resulting
performance
model
in thein
both combined
demonstrated
omission or aeffects
superior
separate and confidence
capability
identification in
of levels reveals
recognizing
complete that
oblique
cracks, Faster-RCNN
cracks
whereas (OC)
the achieved
more
Faster-RCNN the
comprehen- best
model
exhibited
detecting
overall unsatisfactory
subtle cracks,
performance;
sively. Additionally, among recognition
theasFaster-RCNN
shown
the YOLO with
in Table 7.low
series
model For confidence
instance,
algorithms,
also levels, often
it successfully
YOLOv5s
exhibited and
an excellent resulting
identified
YOLOv8s a in the
subtle
showed
performance in
demonstrated
omission
transverse orcracka superior
separate
within a capability
identification
longitudinal in complete
of recognizing
crack. In oblique
cracks,
other crack cracks
whereas
types, (OC)
the
all more
Faster-RCNN
four comprehen-
modeling model
algo-
comparable
detecting results,
subtle while
cracks, as YOLOv7-tiny
shown in Tableperformed
7. For relatively
instance, it poorly, withidentified
successfully lower confidence
a subtle
sively. Additionally,
demonstrated a across
superiorthe Faster-RCNN model also oblique
exhibited an excellent performance in
rithms
levels demonstrated
observed
transverse crack within alla capability
effectiveness
detected inincrack
recognizing
results.
longitudinal detection.
crack. other A
In instance,
crack cracks
types, (OC)
comparative more
all analysis
four comprehen-
considering
modeling algo-
detecting
sively.
both subtle
combined cracks,
Additionally,
effectsthe as
and shown in
Faster-RCNN
confidence Table 7.
model
levels For it
also exhibited
reveals that successfully
an excellent
Faster-RCNN identified a
performance
achieved subtle
the best in
rithms demonstrated
transverse crackcracks,
within effectiveness
a longitudinalin crack
crack.detection. A
In instance,
other comparative
crack types, all analysis
four considering
modeling algo-
detecting
overall
Table subtle
performance;
6. Comparison
both combined as
among
of model shown
effectseffectiveness in
the
valuation
and confidence Table
YOLO
with 7. For
series
various
levels UAV it
algorithms,
cracks
reveals that successfully
YOLOv5s
datasets.
Faster-RCNN identified
and
achieved a subtle
YOLOv8s
the best
rithms
showed demonstrated
transverse crack
comparable within a
results, while in
longitudinal crack
crack.
YOLOv7-tinydetection.
In other A comparative
crack
performed types, all
relatively analysis
four considering
modeling
poorly, with algo-
lower
overall
both
Faster-RCNN performance;
combined effects among
and
YOLO the YOLO
confidence
v5s levelsseries
reveals
YOLOalgorithms,
that YOLOv5s
Faster-RCNN
v7-Tiny and v8s
achieved
YOLO YOLOv8s
the best
rithms demonstrated
confidence
showed levels
comparable effectiveness
observed
results, across
while in
all crack detection.
detected results. A comparative analysis considering
Datasets FPS overall F1 performance;
mAP FPS among F1 the YOLOv7-tiny
YOLO
mAP FPS performed
series algorithms,
F1 relatively
mAP YOLOv5sFPSpoorly,
andF1with lower
YOLOv8s
mAP
(f.s−1 ) both(%)combined
confidence levels
(%) effects
(f.s−1 )and confidence
observed across
(%) all levels
(f.sreveals
detected
(%) −results.
1) that Faster-RCNN
(%) (%) (f.s−1achieved
) (%) the(%) best
showed
Table
overall comparableofresults,
7. Comparison
performance; detection
among while YOLOv7-tiny
results
the with various
YOLO series performed
models.
algorithms, relatively
YOLOv5s poorly,
and with lower
YOLOv8s
UAPD [2] 9.14 confidence
47.9 48.8 59.7
levels observed 52.7
across 57.7
all detected74.51 56.7
results. 52.8 65.4 57.4 58.6
RDD2022 [31] 11.36 showed
Table comparable
69.57. Comparison
68.8 ofresults,
63.21 while
65.2
detection YOLOv7-tiny
60.9
results 65.47 performed
with various 63.1
models. relatively
65.6 poorly,
53.71 with lower
66.5 67.7
UMSC [19] 11.72 confidence
73.4 68.8
levels 97.87
observed 68.7
across 74.3
all detected76.81 63.8
results. 70.1 89.78 72.8 70.4
Input Images
UAVRoadCrack [21] 10.57 Faster-RCNN
68.97. Comparison
Table 68.5 108.6 YOLOv5s
of detection77.8results
75.7 75.39
with various YOLOv7-Tiny
62.5
models. 65.3 69.36 YOLOv8s
71.0 68.8
CrackForest [32] / 57.4 59.1 / 57.8 58.8 67.45 61.2 63.5 61.21 60.9 65.2
Our Input
DatasetsImages 12.80 Faster-RCNN
75.37.
Table
Table 79.3
7. Comparison
Comparison127.4
of YOLOv5s
72.6results
of detection
detection 74.0
results with YOLOv7-Tiny
82.56
with various
various 66.7
models.
models. 65.5 125.7 YOLOv8s
75.0 77.1
Input Images Faster-RCNN YOLOv5s YOLOv7-Tiny YOLOv8s
Table 7. Comparison of detection results with various models.
Input Images Faster-RCNN YOLOv5s YOLOv7-Tiny YOLOv8s
Input Images Faster-RCNN YOLOv5s YOLOv7-Tiny YOLOv8s
Appl. Sci. 2024, 14, 1157 17 of 23

In summary, this experiment compared the accuracy and effectiveness of different


models for crack recognition from UAV imagery. The Faster-RCNN model demonstrated
the highest accuracy and effectiveness in recognizing fine cracks. On the other hand, the
YOLO series model showed significant advantages in terms of training speed and low
requirements for video memory. Among the YOLO models, YOLOv5s and YOLOv8s
exhibited a comparable recognition accuracy, while YOLOv7-tiny performed the worst.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 18 ofThe
24
experiment primarily focused on evaluating the data acquisition quality of UAV imagery,
which yielded optimal results in the testing phase.
6.
6. Road
Road Crack
Crack Measurements
Measurements and and Pavement
Pavement Distress
Distress Evaluations
Evaluations
The
The primary goal of road crack recognition is to evaluatepavement
primary goal of road crack recognition is to evaluate pavementdamagedamageon onroads.
roads.
This
This will help to enhance the application of these models and provide factual evidence for
will help to enhance the application of these models and provide factual evidence for
the maintenancedecisions
the maintenance decisionsmade
made byby
roadroad authorities.
authorities. After After conducting
conducting a comparative
a comparative study
study of various
of various modelingmodeling algorithms,
algorithms, it was it was determined
determined that the that the model
model trainedtrained by Faster-
by Faster-RCNN
RCNN outperformed the YOLO serial models and could
outperformed the YOLO serial models and could be identified as the refined model be identified as the refined
for
model for this
this experiment. experiment.
Due
Due totothe
thelarge
largesize
sizeofofthe
theobtained
obtained images,
images, it is
it not efficient
is not to use
efficient them
to use directly
them for
directly
road crack
for road recognition.
crack This would
recognition. resultresult
This would in a slow
in a recognition
slow recognitionspeed speed
and require a sig-
and require
nificant amount
a significant of processing
amount resources.
of processing To address
resources. To addressthis issue and ensure
this issue that the
and ensure thatUAV
the
recognition
UAV recognitionmodelmodel
remained small small
remained and fast,
andthefast,strategy of ‘Divide
the strategy and Merge’
of ‘Divide and Merge’was em-
was
ployed
employed intointo
thethe
UAV UAV imagery
imagery with
withlarge-size
large-sizephotos.
photos.This Thisstrategy
strategyutilizes
utilizesaa‘Divide-
‘Divide-
Recognition-Merge-Fusion’
Recognition-Merge-Fusion’ method during crack detection, as illustrated in Figure 10. The The
original frame image (3840 px ××2160
original 2160px)px)was
wasdivided
dividedinto into18
18consistently
consistently smaller
smaller images
images
(640 px ××640
(640 640px),
px),each
each assigned
assigned aa unique
unique number.
number. Using Using thethe optimal
optimal model
model trained
trained byby
Faster-RCNN
Faster-RCNN in in this
thisexperiment,
experiment,crackscrackswere
were identified
identified within
within eacheach cropped
cropped image.
image. Fi-
Finally,
these identified
nally, images
these identified were stitched
images together,
were stitched with overlapping
together, with overlapping multi-crack confidence
multi-crack confi-
recognition boxes merging with neighboring combinations.
dence recognition boxes merging with neighboring combinations.

Figure
Figure 10.
10. Illustration
Illustration of
of “Divide
“Divide and
and Merge
Merge strategy”
strategy” of UAV imagery for crack detection.

6.1. Measurement
6.1. Measurement Methods
Methods of
of Pavement
Pavement Cracks
Cracks
The measurement
The measurement methods
methods forforcrack
crackanalysis
analysisplay
playa acrucial
crucialrole inin
role statistically analyz-
statistically ana-
ing the quantity of cracks. These methods consider various factors, such as crack
lyzing the quantity of cracks. These methods consider various factors, such as crack loca-location,
crack type, crack length, crack width, crack depth, and crack area. In order to improve the
tion, crack type, crack length, crack width, crack depth, and crack area. In order to im-
prove the practicality of these methods in road damage maintenance, the quantity of
cracks can be roughly estimated, temporarily excluding small cracks.
(i) Pavement Crack Location: The pixel position of the detected crack in the original
UAV imagery can be determined based on the corresponding image number; mean-
while, the actual ground position can be inferred through GSD calculation.
(ii) Pavement Crack Length: This can be determined based on the pixel size of the confi-
Appl. Sci. 2024, 14, 1157 18 of 23

practicality of these methods in road damage maintenance, the quantity of cracks can be
roughly estimated, temporarily excluding small cracks.
(i) Pavement Crack Location: The pixel position of the detected crack in the original UAV
Appl. Sci. 2024, 14, x FOR PEER REVIEW
imagery can be determined based on the corresponding image number; meanwhile,
the actual ground position can be inferred through GSD calculation.
(ii) Pavement Crack Length: This can be determined based on the pixel size of the
confidence frame model, as illustrated in Figure 11. Horizontal cracks are measured
lengths;
by diagonal
their horizontal cracks
border pixel by estimated
lengths; border
vertical cracks by diagonal distance
their vertical pixels; an
border pixel
cracks diagonal
lengths; primarily by measuring
cracks by estimatedborder
border pixel areas.
diagonal distance pixels; and mesh
(iii) cracks
Pavementprimarily by measuring
Crack Width: The border pixel areas.width of a crack can be determined
maximum
(iii) Pavement
tifying the Crack Width:with
region The maximum
the highestwidth of a crack can be
concentration of determined
extracted by identi-
crack pixels.
fying the region with the highest concentration of extracted crack pixels.
(iv) Pavement Crack Area: This mainly aims at alligator cracks (AC), with a measu
(iv) Pavement Crack Area: This mainly aims at alligator cracks (AC), with a measurement
ofofthe
the crack
crack area.
area. It canItbe
can be calculated
calculated by the
by the pixels pixels
of AC basedofonACthe based on the con
confidence
frame model.
frame model.

Figure11.11.
Figure Diagram
Diagram of crack
of crack measurements.
measurements.

Finally, to determine the location, length (L), and area (A) of road cracks with ground
Finally, to determine the location, length (L), and area (A) of road cracks with
truth, the quantitative results of cracks can be determined by multiplying the ground
truth,
sampling the quantitative
distance results
(GSD, Unit: of cracks
cm/pixel) by thecan beatdetermined
pixels which they areby multiplying
located. the grou
The actual
plingordistance
length (GSD,
width of the crackUnit: cm/pixel)
in meters by the pixels
can be calculated at which
as the pixel they
length (m) are located. Th
× GSD/100,
while
lengththeor
actual
widtharea
ofofthe
thecrack
block in
affected
metersby can
the crack in square meters
be calculated as the can be length
pixel derived (m) × G
from the pixel area (m2 ) × GSD2 /1002 .
while the actual area of the block affected by the crack in square meters can be
from
6.2. the pixel
Evaluation areaof(m
Methods
2) × GSD2/1002.
Pavement Distress
The evaluation of pavement damage can be determined using the internationally
6.2. Evaluation
recognized Methods
pavement of Pavement
damage Distress
index (PCI), which is also adopted in China. The PCI
provides a crucial indicator for assessing the level of pavement integrity. Additionally,
The evaluation of pavement damage can be determined using the internation
the pavement damage rate (DR) represents the most direct manifestation and reflection
ognized pavement damage index (PCI), which is also adopted in China. The PCI p
a crucial indicator for assessing the level of pavement integrity. Additionally, th
ment damage rate (DR) represents the most direct manifestation and reflectio
physical properties related to the pavement condition. In this study, we refer to sp
tions such as ‘Technical Code of Maintenance for Urban Road (CJJ36-2016)’ [33] an
Appl. Sci. 2024, 14, 1157 19 of 23

of the physical properties related to the pavement condition. In this study, we refer to
specifications such as ‘Technical Code of Maintenance for Urban Road (CJJ36-2016)’ [33] and
‘Highway Performance Assessment Standards (DB11/T1614-2019)’ [34] from the Chinese
government, incorporating their respective calculation formulas as follows:

N
DR = 100 × ∑ wi Ai /A (6)
i =1

PCI = 100 − a0 DR a1 (7)


Appl. Sci. 2024, 14, x FOR PEER REVIEW 20 of 24
where Ai is the damage area of the pavement of the ith crack type (m2 ); N is the total
number of damage types, taken here as 4; A is the pavement area of the investigated road
section 2 ); and
section(the(theinvestigated
investigated road
road length multipliedby
length multiplied bythe
theeffective
effectivepavement
pavementwidth, width, mm 2); and
wwi iisisthe
thedamage
damage weight of the
weight of the pavement
pavementof ofthe ithcrack
theith cracktype,
type,directly
directlysetset
asas1. 1. According
According
totothethe “Highway Performance Assessment Standards (DB11/T 1614-2019)” [34], a0aand
“Highway Performance Assessment Standards (DB11/T 1614-2019)” [34], 0 and a1 a1
represent
represent the material coefficients of the pavement, in which asphalt pavement is taken as as
the material coefficients of the pavement, in which asphalt pavement is taken
aa0 0== 10 and aa11 == 0.4,
10 and 0.4, while
while concrete
concretepavement
pavementisistaken
takenas asaa0 0= =9 9and
anda1a=1 0.42.
= 0.42. It evident
It is is evident
that a higher DR leads to a lower PCI value, indicating poorer pavement
that a higher DR leads to a lower PCI value, indicating poorer pavement integrity. integrity.

6.3. Visualization Results of Pavement Distress


6.3. Visualization Results of Pavement Distress
The
Theoriginal
original frame image was
frame image wasutilized
utilizedfor
forcrack
crack detection
detection in in
thisthis experiment,
experiment, as vi-
as vis-
sualized in Figure 12. On the right side, the statistical results of the four
ualized in Figure 12. On the right side, the statistical results of the four types of cracktypes of crack
measurement
measurement are are presented. This study
presented. This study employed
employedthe thepreferred
preferredFaster-RCNN
Faster-RCNN trained
trained
model with a remarkable detection accuracy of 87.2% (mAP). By conducting
model with a remarkable detection accuracy of 87.2% (mAP). By conducting crack meas- crack measure-
ment and and
urement statistics on aon
statistics regional road
a regional section,
road a damage
section, a damageraterate
(DR)
(DR)of of
29.5% and
29.5% pavement
and pave-
damage index index
ment damage (PCI) (PCI)
of 61.28 were were
of 61.28 calculated, indicating
calculated, a medium
indicating a medium rating for for
rating thethe
road
section integrity
road section in thisinregion.
integrity this region.

Figure12.
Figure 12.Visulization
Visulization results
results of
of crack
crack detection
detectionand
andpavement
pavementdistress
distressevaluation.
evaluation.

7.7.Discussion
Discussion
This study
This study proposes
proposesaacomprehensive
comprehensive and systematic
and framework
systematic frameworkand method for au-for
and method
tomatic crack
automatic crackdetection and
detection and pavement
pavementdistress
distressevaluation
evaluationin in
a UAV
a UAV inspection system.
inspection system.
Theframework
The framework begins
begins by
by establishing
establishing the
the flight
flight parameter
parametersettings
settingsand
andexperimental
experimental
techniques to
techniques to enhance
enhance the high-quality
high-quality imagery
imageryusingusingthe
theDJI
DJIMin2
Min2drone inin
drone real-world
real-world
scenarios. Additionally, a benchmark dataset was created and has been made
scenarios. Additionally, a benchmark dataset was created and has been made available available to
the community. The dataset includes important information such as the
to the community. The dataset includes important information such as the GSD, which GSD, which is is
essential for evaluating pavement distress. In this experiment, our self-made crack dataset
demonstrated its superiority compared to existing datasets used in similar algorithms,
achieving the highest accuracy in crack recognition and algorithmic efficiency. The exper-
imental result (refer to Table 6) revealed the significance of data acquisition quality in the
accuracy of crack target recognition, with high-quality image data from the UAV imagery
effectively improving the recognition accuracy.
Appl. Sci. 2024, 14, 1157 20 of 23

essential for evaluating pavement distress. In this experiment, our self-made crack dataset
demonstrated its superiority compared to existing datasets used in similar algorithms,
achieving the highest accuracy in crack recognition and algorithmic efficiency. The experi-
mental result (refer to Table 6) revealed the significance of data acquisition quality in the
accuracy of crack target recognition, with high-quality image data from the UAV imagery
effectively improving the recognition accuracy.
In this experiment, the detection capability for road cracks in a UAV inspection system
could be enhanced through a range of strategies. Firstly, adhering to a drone flight control
strategy ensured a consistent high and stable speed during data acquisition on urban
roads. This guaranteed the collection of clear and high-quality drone images with attached
real spatial scale information for distress assessments. Secondly, the sampling ‘divide
and conquer’ strategy for model training and target detection involves various key steps,
including ‘the frame extracting from video and image cropping for large image’ and ‘model
learning and crack detection for small images’, as well as ‘fusion and splicing from small
images’. This approach effectively improves the accuracy of identifying cracks in large-
scale images while enhancing the operational efficiency of these models. Thirdly, the
deployment of drone detection algorithms using both ‘online–offline’ and ‘online–online’
strategies provides flexibility based on different scenarios. The ‘one-stage’ algorithm
operates quickly, but has a lower detection accuracy, whereas the ‘two-stage’ algorithm
exhibits a slower running efficiency but a higher detection accuracy. These deep learning
models can be deployed accordingly, depending on the specific application scenarios. For
instance, in sudden situations requiring fast real-time detection, lightweight deployment
using a ‘two-stage’ algorithm such as YOLO series models can be employed.
To propose a suitable deployment scheme for the UAV inspection system, this study
utilized prominent deep learning algorithms, namely Faster-RCNN, YOLOv5s, YOLOv7-
tiny, and YOLOv8s, for pavement crack object detection and a comparative analysis. The
results revealed that Faster-RCNN demonstrated the best overall performance, with a
precision (P) of 75.6%, a recall (R) of 76.4%, an F1-score of 75.3%, and a mean Average
Precision (mAP) of 79.3%. Moreover, the mAP of Faster-RCNN surpassed that of YOLOv5s,
YOLOv7-tiny, and YOLOv8s by 4.7%, 10%, and 4%, respectively. This indicates that Faster-
RCNN outperformed in terms of detection accuracy, but required a higher environment
configuration, making it suitable for online data collection using a UAV and offline in-
spection at work stations. On the other hand, the YOLO serial models, while slightly less
accurate, were the fastest algorithms and are suitable for the lightweight deployment of
UAVs with online collection and real-time inspection. Many studies have also proposed re-
fined YOLO-based algorithms for crack detection in drones, mainly due to their lightweight
deployment in UAV systems. For instance, the BL-YOLOv8 model [20] reduces both the
number of parameters and computational complexity compared to the original YOLOv8
model and other YOLO serial models. This offers the potential to directly deploy the YOLO
serial models on cost-effective embedded devices or mobile devices.
Finally, road crack measurement methods are presented to assess road damage, which
will enhance the application of the UAV inspection system and provide factual evidence
for the maintenance decisions made by road authorities. Notably, a crack is a significant
indicator for evaluating road distress. In this study, the evaluation results were primarily
obtained through a comprehensive assessment of the crack area, degree of damage, and
their proportions. However, relying solely on cracks to determine road distress may be
deemed limited, and this should only be considered as a reference for the relevant road
authorities. Therefore, it is essential to conduct a comprehensive evaluation that takes into
account multiple factors, such as rutting and potholes.

8. Conclusions
The traditional manual inspection of road cracks is inefficient, time-consuming, and
labor-intensive. Additionally, using multifunctional road inspection vehicles can be ex-
pensive. However, the use of UAVs equipped with high-resolution vision sensors offers
Appl. Sci. 2024, 14, 1157 21 of 23

a solution. These UAVs can remotely capture and display images of the pavement from
high altitudes, allowing for the identification of local damages such as cracks. The UAV
inspection system, which is based on the commercial DJI Min2 drone, offers several advan-
tages. It is cost-effective, non-contact, highly precise, and enables remote visualization. As
a result, it is particularly well-suited for remote pavement detection. In addition, automatic
crack detection technology based on deep learning models brings significant additional
value to the field of road maintenance and safety. It can be integrated into the commercial
UAV system, thereby reducing the workload of maintenance personnel.
In this study, the contributions are summarized as follows: (1) A pavement crack
detection and evaluation framework of a UAV inspection system based on deep learn-
ing was proposed and can provide technical guidelines for road authorities. (2) To en-
hance automatic crack detection capability and design a suitable scheme for implementing
deep-learning-based models in a UAV inspection system, we conducted a validation and
comparative study on prevalent deep learning algorithms for detecting pavement cracks
in urban road scenarios. The study demonstrates the robustness of these algorithms in
terms of their performance and accuracy, as well as their effectiveness in handling our
customized crack image datasets and other popular crack datasets. Furthermore, this
research provides recommendations for leveraging UAVs in deploying these algorithms.
(3) Quantitative methods for road cracks were proposed and pavement distress evaluations
were also carried out in our experiment. Obviously, our final evaluation results were also
guaranteed according to GSD. (4) A pavement crack image dataset integrated with GSD
was established and has been made publicly available for the research community, serving
as a valuable supplement to existing crack databases.
In summary, the UAV inspection system, under the guidance of our proposed frame-
work, has been proven to be feasible, yielding more satisfactory results. However, drone
inspection has the inherent limitation of a limited battery life, making it difficult to perform
long-distance continuous road inspection tasks. Drones are better suited for short-distance
inspections in complex urban scenarios [16]. With advancements in drone and vision
computer technology, drones equipped with lightweight sensors and these lightweight
crack detection algorithms are expected to gain popularity for road distress inspection.
In the future, this study aims to incorporate improved YOLO algorithms into the UAV
inspection system to enhance road crack recognition accuracy. Furthermore, in order to
establish a comprehensive UAV inspection system for road distress, we plan to continue
researching multi-category defect detection systems in the future, including various road
issues such as rutting and potholes, among which are cracks. Additionally, efforts will be
made to enhance UAV flight autonomy for stability and high-speed aerial photography,
further improving the quality of aerial images and catering to the requirements of various
complex road scenarios.

Author Contributions: Conceptualization, X.C., C.L. and L.C.; methodology, X.C. and L.C.; software,
C.L. and L.C.; validation, C.L., X.Z. and L.C.; formal analysis, X.C., X.Z. and Y.Z.; investigation, L.C.,
C.L., C.W. and Y.Z.; resources, X.C. and C.L.; data curation, C.L. and C.W.; writing—original draft
preparation, X.C., C.L. and Y.Z.; writing—review and editing, X.C. All authors have read and agreed
to the published version of the manuscript.
Funding: This research was funded by China Postdoctoral Science Foundation (2017M622577), Hunan
Provincial Natural Science Foundation (2018JJ2118). Chinese national college students innovation
and entrepreneurship training program (S202310534031, S202310534169).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The UAV crack dataset presented in this study are openly available in
FigShare at 10.6084/m9.figshare.25103138.
Acknowledgments: The authors would like to express many thanks to all the anonymous reviewers.
Appl. Sci. 2024, 14, 1157 22 of 23

Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Samadzadegan, F.; Dadrass Javan, F.; Hasanlou, M.; Gholamshahi, M.; Ashtari Mahini, F. Automatic Road Crack Recognition
Based on Deep Learning Networks from UAV Imagery. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, X-4/W1-2022,
685–690. [CrossRef]
2. Zhu, J.; Zhong, J.; Ma, T.; Huang, X.; Zhang, W.; Zhou, Y. Pavement distress detection using convolutional neural networks with
images captured via UAV. Autom. Constr. 2022, 133, 103991. [CrossRef]
3. Cao, J.; Yang, G.T.; Yang, X.Y. Pavement Crack Detection with Deep Learning Based on Attention Mechanism. J. Comput. Aided
Des. Comput. Graph. 2020, 32, 1324–1333.
4. Qi, S.; Li, G.; Chen, D.; Chai, M.; Zhou, Y.; Du, Q.; Cao, Y.; Tang, L.; Jia, H. Damage Properties of the Block-Stone Embankment in
the Qinghai–Tibet Highway Using Ground-Penetrating Radar Imagery. Remote Sens. 2022, 14, 2950. [CrossRef]
5. Guo, S.; Xu, Z.; Li, X.; Zhu, P. Detection and Characterization of Cracks in Highway Pavement with the Amplitude Variation of
GPR Diffracted Waves: Insights from Forward Modeling and Field Data. Remote Sens. 2022, 14, 976. [CrossRef]
6. Salman, M.; Mathavan, S.; Kamal, K.; Rahman, M. Pavement crack detection using the Gabor filter. In Proceedings of the 16th
international IEEE Conference on Intelligent Transportation Systems (ITSC 2013), The Hague, The Netherlands, 6–9 October 2013;
pp. 2039–2044. [CrossRef]
7. Ayenu-Prah, A.; Attoh-Okine, N. Evaluating pavement cracks with bidimensional empirical mode decomposition. EURASIP J.
Adv. Signal Process. 2008, 2008, 861701. [CrossRef]
8. Majidifard, H.; Adu-Gyamfi, Y.; Buttlar, W.G. Deep machine learning approach to develop a new asphalt pavement condition
index. Constr. Build. Mater. 2020, 247, 118513. [CrossRef]
9. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017,
60, 84–90. [CrossRef]
10. Tran, V.P.; Tran, T.S.; Lee, H.J.; Kim, K.D.; Baek, J.; Nguyen, T.T. One stage detector (RetinaNet)-based crack detection for asphalt
pavements considering pavement distresses and surface objects. J. Civ. Struct. Health Monit. 2021, 11, 205–222. [CrossRef]
11. Xiao, L.Y.; Li, W.; Yuan, B.; Cui, Y.Q.; Gao, R.; Wang, W.Q. Pavement Crack Automatic Identification Method Based on Improved
Mask R-CNN Model. Geomat. Inf. Sci. Wuhan Univ. 2022, 47, 623–631. [CrossRef]
12. Xu, K.; Ma, R.G. Crack detection of asphalt pavement based on improved faster RCNN. Comput. Syst. Appl. 2022, 31, 341–348.
[CrossRef]
13. Xu, X.; Zhao, M.; Shi, P.; Ren, R.; He, X.; Wei, X.; Yang, H. Crack detection and comparison study based on faster R-CNN and
mask R-CNN. Sensors 2022, 22, 1215. [CrossRef]
14. Yan, K.; Zhang, Z. Automated asphalt highway pavement crack detection based on deformable single shot multi-box detector
under a complex environment. IEEE Access 2021, 9, 150925–150938. [CrossRef]
15. Yokoyama, S.; Matsumoto, T. Development of an automatic detector of cracks in concrete using machine learning. Procedia Eng.
2017, 171, 1250–1255. [CrossRef]
16. Jiang, Y.T.; Yan, H.T.; Zhang, Y.R.; Wu, K.Q.; Liu, R.Y.; Lin, C.Y. RDD-YOLOv5: Road Defect Detection Algorithm with Self-
Attention Based on Unmanned Aerial Vehicle Inspection. Sensors 2023, 23, 8241. [CrossRef] [PubMed]
17. Zhang, Y.; Zuo, Z.; Xu, X.; Wu, J.; Zhu, J.; Zhang, H.; Wang, J.; Tian, Y. Road damage detection using UAV images based on
multi-level attention mechanism. Autom. Constr. 2022, 144, 104613. [CrossRef]
18. Zhou, Q.; Ding, S.; Qing, G.; Hu, J. UAV vision detection method for crane surface cracks based on Faster R-CNN and image
segmentation. J. Civ. Struct. Health Monit. 2022, 12, 845–855. [CrossRef]
19. Xiang, X.; Hu, H.; Ding, Y.; Zheng, Y.; Wu, S. GC-YOLOv5s: A Lightweight Detector for UAV Road Crack Detection. Appl. Sci.
2023, 13, 11030. [CrossRef]
20. Wang, X.; Gao, H.; Jia, Z.; Li, Z. BL-YOLOv8: An Improved Road Defect Detection Model Based on YOLOv8. Sensors 2023,
23, 8361. [CrossRef] [PubMed]
21. Omoebamije, O.; Omoniyi, T.M.; Musa, A.; Duna, S. An improved deep learning convolutional neural network for crack detection
based on UAV images. Innov. Infrastruct. Solut. 2023, 8, 236. [CrossRef]
22. Zhao, Y.; Zhou, L.; Wang, X.; Wang, F.; Shi, G. Highway Crack Detection and Classification Using UAV Remote Sensing Images
Based on CrackNet and CrackClassification. Appl. Sci. 2023, 13, 7269. [CrossRef]
23. Liu, K. Learning-based defect recognitions for autonomous uav inspections. arXiv 2023, arXiv:2302.06093v1.
24. Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [CrossRef]
25. Bubbliiiing. Faster-RCNN-PyTorch[CP]. 2023. Available online: https://github.com/bubbliiiing/faster-rcnn-pytorch (accessed
on 26 January 2024).
26. UItralyics. YOLOv5[CP]. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 26 January 2024).
27. Wong, K.Y. YOLOv7[CP]. 2023. Available online: https://github.com/WongKinYiu/yolov7 (accessed on 26 January 2024).
28. Ultralytics. YOLOv8[CP]. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 26 January 2024).
29. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural
Inf. Process. Syst. 2015, 28, 1137–1149. [CrossRef]
Appl. Sci. 2024, 14, 1157 23 of 23

30. Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks.
Neural Netw. 2018, 106, 249–259. [CrossRef]
31. Sami, A.A.; Sakib, S.; Deb, K.; Sarker, I.H. Improved YOLOv5-Based Real-Time Road Pavement Damage Detection i-n Road
Infrastructure Management. Algorithms 2023, 16, 452. [CrossRef]
32. Faramarzi, M. Road damage detection and classification using deep neural networks (YOLOv4) with smartphone images.
SSRN 2020. [CrossRef]
33. CJJ36-2016; Technical Code of Maintenance for Urban Road. Ministry of Housing and Urban-Rural Development of the People’s
Pepublic of China: Beijing, China, 2017. Available online: https://www.mohurd.gov.cn/gongkai/zhengce/zhengcefilelib/2017
02/20170228_231174.html (accessed on 10 May 2023).
34. JTG 5210-2018; Highway Performance Assessment Standards. Ministry of Transport of the People’s Republic of China:
Beijing, China, 2018. Available online: https://xxgk.mot.gov.cn/2020/jigou/glj/202006/t20200623_3313114.html (accessed on
10 May 2023).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy