Electronics 11 00476 v2
Electronics 11 00476 v2
Electronics 11 00476 v2
Article
Bin-Picking Solution for Randomly Placed Automotive
Connectors Based on Machine Learning Techniques
Pedro Torres 1,2, * , Janis Arents 3 , Hugo Marques 1 and Paulo Marques 1,4
1 Instituto Politécnico de Castelo Branco, 6000-084 Castelo Branco, Portugal; hugo@ipcb.pt (H.M.);
paulo.marques@stshield.com (P.M.)
2 SYSTEC—Research Center for Systems & Technologies, 4200-465 Porto, Portugal
3 Institute of Electronics and Computer Science, LV-1006 Riga, Latvia; janis.arents@edi.lv
4 StoneShield—Engineering, Lda, 6000-790 Castelo Branco, Portugal
* Correspondence: pedrotorres@ipcb.pt
Abstract: This paper presents the development of a bin-picking solution based on low-cost vision
systems for the manipulation of automotive electrical connectors using machine learning techniques.
The automotive sector has always been in a state of constant growth and change, which also implies
constant challenges in the wire harnesses sector, and the emerging growth of electric cars is proof of
this and represents a challenge for the industry. Traditionally, this sector is based on strong human
work manufacturing and the need arises to make the digital transition, supported in the context of
Industry 4.0, allowing the automation of processes and freeing operators for other activities with
more added value. Depending on the car model and its feature packs, a connector can interface with
a different number of wires, but the connector holes are the same. Holes not connected with wires
need to be sealed, mainly to guarantee the tightness of the cable. Seals are inserted manually or, more
recently, through robotic stations. Due to the huge variety of references and connector configurations,
layout errors sometimes occur during seal insertion due to changed references or problems with the
seal insertion machine. Consequently, faulty connectors are dumped into boxes, piling up different
Citation: Torres, P.; Arents, J.; types of references. These connectors are not trash and need to be reused. This article proposes a
Marques, H.; Marques, P. Bin-Picking bin-picking solution for classification, selection and separation, using a two-finger gripper, of these
Solution for Randomly Placed
connectors for reuse in a new operation of removal and insertion of seals. Connectors are identified
Automotive Connectors Based on
through a 3D vision system, consisting of an Intel RealSense camera for object depth information and
Machine Learning Techniques.
the YOLOv5 algorithm for object classification. The advantage of this approach over other solutions
Electronics 2022, 11, 476. https://
doi.org/10.3390/electronics11030476
is the ability to accurately detect and grasp small objects through a low-cost 3D camera even when
the image resolution is low, benefiting from the power of machine learning algorithms.
Academic Editor: Gemma Piella
Received: 11 January 2022 Keywords: bin-picking; machine learning; robotics; YOLOv5; Industry 4.0
Accepted: 3 February 2022
Published: 6 February 2022
are becoming more comfortable, safer, more efficient and less polluting, but they are also
increasingly complex systems with lots of electronics. The Electric Distribution System
(EDS) has to constantly adapt to these changes in terms of concept quality and technological
requirements. According to recent market reports [2,3], the rise of electric vehicles is driving
the market. In 2020, global sales of plug-in electric cars increased 39% from the previous
year to 3.1 million units. By the end of 2026, annual sales of battery-powered electric
cars are expected to exceed 7 million and to contribute about 15% of total vehicle sales.
This increase in sales is mainly due to increased regulatory standards imposed by various
organizations and governments to limit emissions and promote zero-emission automobiles.
As more electric vehicles circulate, the electric harness market is also expected to witness
growth, since electric harnesses are used more in electric vehicles than in conventional
fossil fuel vehicles.
To meet the growing needs, the electric harness market needs to digitize and automate
processes to increase production levels and also reduce the number of failures, often
associated with human error. The tasks performed in wire harnesses have traditionally
been difficult for robots. Therefore, the solution involves changing the harness architectures,
through a Design-for-Automation logic, as well as automating some current processes
through robotic stations.
Traditionally, grasping and sorting randomly positioned objects requires human re-
sources, which is a very monotonous task, lacks creativity and is no longer sustainable
in the context of smart manufacturing [4]. Industrial robots, however, require a supple-
mentary cognitive sensing system that can acquire and process information about the
environment and guide the robot to grasp arbitrarily placed objects out of the bin. In
industry settings, this problem has been commonly referred to as bin-picking [5] and also
historically addressed as one of the greatest robotic challenges in manufacturing automa-
tion [6]. Bin-picking depends on visual-based robotic grasping approaches, which can
be divided into methods where the shape of the object is analyzed (analytic approaches)
or machine learning-based methods (data-driven approaches). Data-driven approaches
can be categorized as model-free or model-based, where model-based approaches require
prior knowledge of the object to determine the grasping position and model-free meth-
ods directly search for possible grasping points [7]. In the process of sorting automotive
connectors, several different object types can be present and mixed in one pile. To be
efficient, it is crucial to determine the object type before grasping as different grasping
approaches are required for different connectors. Analytic methods fall short due to the
high level of diversity in the region of interest. However, machine learning approaches
tend to generalize and cope with uncertainties of the environment. Therefore, in this article,
we focus on model-based, machine learning grasping methods.
The remainder of this paper is organized as follows: Section 2 describes the state-of-
the-art and related works. Section 3 describes the materials and methods for the bin-picking
solution. Section 4 presents the experimental results achieved and, finally, Section 5 presents
the conclusions and future work.
2. Related Work
Bin-picking is a methodology used in Vision-Guided Robotics systems in which pieces
are randomly selected and extracted in a container, using a vision system for location
and a robotic system for extraction and its subsequent replacement. In recent years, a
large number of 3D vision systems have emerged on the market that make it possible to
implement bin-picking solutions in a smart factory context. Photoneo [8] is one of these
brands which provides a 3D vision system, with software capable of training Convolutional
Neural Networks (CNN) to recognize and classify objects and integrate with different
models of robots. In addition, several other players bring machine vision solutions to the
market for bin-picking applications, including Zivid [9], Solomon [10], Pickit [11] and more.
All of these systems provide very efficient and robust features for the industry, but they
are still very expensive systems and are not accessible to the vast majority of small- and
Electronics 2022, 11, 476 3 of 14
medium-sized enterprises (SME). This has led to the pursuit of alternative solutions based
on more low-cost 3D vision cameras, investing in the research and the improvements of
the machine learning algorithms. One such solution is proposed in [12], where the authors
propose an object detection method based on the YOLOv5 algorithm, which can perform
accurate positioning and recognition of objects to be grasped by an arm robot with an Intel
RealSense D415 camera in an eye-to-hand configuration.
Bin-picking solutions have been studied for a long time, and in [13], some limitations
and challenges of current solutions for the industry are identified and a system for grasping
sheet metal parts is proposed. In [14], a solution is proposed with an ABB IRB2400 robot
with a 3D vision system for picking and placing randomly located pieces. More recently,
in [15] the authors propose a CAD-based 6-DoF (degree of freedom) pose estimation
pipeline for robotic random bin-picking tasks using the 3D camera.
Picking only one object in a pile of random objects is a very challenging task, and
in [16] a method is proposed to first compute grasping pose candidates by using the
graspability index. Then, a CNN is trained to predict whether or not the robot can pick
one and only one object from the bin. In [17], an approach for bin-picking industrial parts
with arbitrary geometries is proposed based on the YOLOv3 algorithm. In [18], a flexible
system for the integration of 3D computer vision and artificial intelligence solutions with
industrial robots is proposed using the ROS framework, a Kinect V2 sensor and the UR5
collaborative robot.
One of the challenging tasks in bin-pinking systems is identifying the best way to grip
an object, therefore, it is necessary to identify the best gripper for the operation in addition
to locating the objects and calculating the pose. In [19], the authors propose a system for the
detection of object location, pose estimation, distance measurement and surface orientation
angle detection. In [20], an object pose estimation method based on a landmark feature
is proposed to estimate the rotation angle of the object. The sensitivity of the 3D vision
system is very important to the success rate of a bin-picking solution; obviously, low-cost
vision systems are useful for demonstrating concepts, but they are not usually suitable for
working day-to-day in industrial scenarios.
The success rate of the bin-picking solution beyond the vision sensor depends a lot
on the efficiency of the implemented algorithms. In [21], the authors compared the results
of point cloud registration based on ICP (Iterative Closest Point) with data from different
3D sensors to analyze the success rate in bin-picking solutions. Object detection is one
of the main tasks of computer vision, which consists of determining the location in the
image where certain objects are present, as well as classifying them. The rapid advances of
machine learning and deep learning techniques have greatly accelerated the achievements
in object detection. With deep learning networks and the computing power of GPUs, the
performance of object detectors and trackers has greatly improved. In [22], a review of
object detection methods with deep learning is performed, where the fundamentals of this
technique are discussed. One of the algorithms that have emerged in recent years is YOLO
(You Only Look Once). The YOLO model is designed to encompass an architecture that
processes all image resources (the authors called it the Darknet architecture) and is followed
by two fully connected layers performing bounding box prediction for objects. Since its
inception in 2015, YOLO has evolved, and in 2020 the company Ultralytics converted the
previous version of YOLO into the PyTorch framework, giving it more visibility. YOLOv5 is
written in Python instead of C as in previous versions. In addition, the PyTorch community
is also larger than the Darknet community, which means that PyTorch will receive more
contributions and growth potential in the future. The complete study of the evolution
of YOLO can be found in [23]. YOLOv5 is now a reference and is extensively used in
object detection tasks in various domains. As an example, in [24] it is used as a face
detection algorithm suitable for complex scenes, and in [25] it is used as a real-time detection
algorithm for kiwifruit defects. In this work, YOLOv5 is the algorithm implemented for the
identification and recognition of electrical connectors in a bin-picking application.
Electronics 2021, 10, x FOR PEER REVIEW 4 of 15
Electronics 2021, 10, x FOR PEER REVIEW 4 of 15
3. Materials
3. Materials and Methods
Methods
3. Materialsand and Methods
This section
This section describes
describes the
the methodology
methodology followed
followed to
to implement
implement our
our bin-picking
bin-picking solu-
solu-
tion This
for section
small describesconnectors
automotive the methodology
and the followed
machine to implement
learning our bin-picking
algorithm used for solu-
object
tion
tion for
for small
small automotive
automotive connectors
connectors and
and the
the machine
machine learning
learning algorithm
algorithm used
used for
for object
object
recognition and
recognition and the
the respective
respective robot
robot navigation
navigation process.
process. The
The core concept
concept for
for our
our ap-
ap-
recognition and the respective robot navigation process. The corecore
concept for our approach
proach
proach is depicted
is depicted in Figure
in 1. 1.
Figure 1.
is depicted in Figure
Figure 1.
Figure1.
Figure Bin-picking concept
Bin-pickingconcept
1.Bin-picking applied
conceptapplied to
appliedto unsorted
tounsorted small
unsorted small plastic
small plastic connectors.
plastic connectors.
connectors.
The process
Theprocess
The of
process of assembling
ofassembling seals
assemblingseals into
sealsinto connectors
intoconnectors
connectorsmay may produce
produceaaasignificant
mayproduce significant amount
significantamount
amountof of
of
poorly
poorly assembled
assembled connectors.
connectors. The
The connectors
connectors that
that fail
fail the
the quality
quality tests
tests
poorly assembled connectors. The connectors that fail the quality tests are placed in large are
are placed
placed in
in large
large
boxes
boxes for
boxes for reuse.
reuse. In
In the end, each
the end,
end, eachbox
each boxwill
box willcontain
will containmultiple
contain multipletypes
multiple typesofof
types ofunsorted
unsorted
unsorted connectors,
connectors,
connectors, as
as depicted
depicted inin Figure
Figure 2a.
2a. Each
Each box
box isis then
then verted
verted (still
(still unsorted)
unsorted) in
in open
open trays
as depicted in Figure 2a. Each box is then verted (still unsorted) in open trays and our goal and our
our goal
goal
is for
is
is for the
for the robot
the robot to
robot to perform
to perform the
perform the bin-picking
the bin-picking process
bin-picking process and
process and sort
and sort the
sort the connectors
the connectors into
connectors into different
into different
different
output boxes.
output boxes.
output boxes.
(a)
(a) (b)
(b) (c)
(c)
Figure
Figure2.
Figure 2. (a)
2. (a) Boxes
(a) Boxes of
Boxes of connectorsfor
of connectors
connectors forreuse.
for reuse.(b,c)
reuse. (b,c)Sample
(b,c) Sampleofof
Sample ofsome
someofof
some of the
the connectors
connectors
the used
used
connectors for
forfor
used recogni-
recognition,
recogni-
tion, in different
in different
tion, poses.
poses.
in different poses.
The operation
The operation
The takes place
operation takes place in
in aa ‘Bin-Picking
‘Bin-Picking Station’
‘Bin-Picking Station’(see
Station’ (seeFigure
(see Figure3),
Figure 3),which
3), whichconsists
which consistsofof
consists ofa
aacollaborative
collaborativerobot
collaborative robot
robot for
forfor parts’
parts’ manipulation,
manipulation,
parts’ manipulation, one
oneone Intel
Intel RealSense
RealSense
Intel camera
camera
RealSense for stereoscopic
stereoscopic
for stereoscopic
camera for (3D)
(3D) vision,
vision, one one working
working table,table,
two two
‘Open ‘Open Trays’
Trays’ containing
containing unsorted
unsorted small small
plastic
(3D) vision, one working table, two ‘Open Trays’ containing unsorted small plastic con- plastic con-
connectors
nectors
and eight
nectors and
and eight ‘boxes’,
‘boxes’,
eight ‘boxes’,
where thewhere thewill
robot
where the robot will
putwill
robot put
the put theconnectors.
sorted
the sorted connectors.
sorted connectors.
This station is responsible for grasping the connectors in the ‘Open Trays’ and sorting
the connectors into the output boxes, correctly aligned to be reused in other workstations
to remove seals and re-insert the connectors into the production lines. As can be seen in
Figure 3, the layout has been prepared to maximize the robot’s working area and operating
times. As a collaborative robot is used, it can work on one side of the station, left or right,
while an operator can insert new bins and remove sorted boxes on the other side, reducing
downtimes as much as possible.
Electronics2022,
Electronics 2021,11,
10,476
x FOR PEER REVIEW 55of
of14
15
This station is responsible for grasping the connectors in the ‘Open Trays’ and sorting
the connectors into the output boxes, correctly aligned to be reused in other workstations
to remove seals and re-insert the connectors into the production lines. As can be seen in
Figure 3, the layout has been prepared to maximize the robot’s working area and operat-
ing times. As a collaborative robot is used, it can work on one side of the station, left or
Figurewhile
right,
Figure 3.3. Layout
Layout conceptfor
an operator
concept for
cantheinsert
the Bin-Picking
new bins
Bin-Picking Station.
and remove sorted boxes on the other side,
Station.
reducing downtimes as much as possible.
Thissuccess
The station of
success is
ofresponsible
objectdetection
object for grasping
detection isisstronglytheinfluenced
strongly connectors
influencedby inbythe
the ‘Open
the Trays’
labeling
labeling and and
and sorting
training
training of
the
of theconnectors
objects to into
be the output
detected. Theboxes,
taskscorrectly
performed aligned
in to
the be reused
training
the objects to be detected. The tasks performed in the training process for any object de- in other
process workstations
for any object
to remove
detection
tection areareseals andcomposed
typically
typically re-insert
composed the
ofofconnectors
fourstages,
four stages, into the production
asasdepicted
depicted ininFigure
Figurelines. As can images
4. Several
4. Several be seen of
images in
the
the connectors
Figure
connectors in different
3, the layout
in different
has been poses and under
prepared
poses and under different the
to maximize
different lighting
robot’s
lighting conditions
workingwere
conditions were acquired
area acquired
and operat- to
to
cover
cover as
as much
ing times. As of
much the
thevariability
aofcollaborative
variability as
aspossible
robot is used,
possible ininit
the Bin-Picking
can
the work on one
Bin-Picking Workstation.
side of the
Workstation. Using a labeling
station,
Using left or
a label-
application,
right,
ing while by
application, an defining
operator regions,
by defining can insert references
regions, new binswere
references andcreated
remove
were for all connectors,
sorted
created forboxes as depicted
on the other
all connectors, in
asside,
de-
Figure
reducing 5. downtimes
picted in Figure 5. as much as possible.
The success of object detection is strongly influenced by the labeling and training of
the objects to be detected. The tasks performed in the training process for any object de-
Training image Training with Results
tection are typically composed Labeling
of four stages, as depicted in Figure 4. Several images of
acquisition YOLOv5 analysis
the connectors in different poses and under different lighting conditions were acquired to
Electronics 2021, 10, x FOR PEER REVIEW 6 of 15
cover 4.
Figure asTypical
much of theperformed
tasks variability inas
thepossible in the Bin-Picking
object recognition training Workstation. Using a label-
process.
training process.
ing application, by defining regions, references were created for all connectors, as de-
pictedThe inoutputs
Figure 5. generated in the labeling application were then used in the training pro-
cess. In the training task, the images and labels were organized into training, testing and
Training groups.
validation image The PyTorch-based algorithm, Training withwas configured
YOLOv5, Results
and used to
Labeling
trainacquisition
the data. YOLOv5 allowed us to work with differentYOLOv5levels of complexity
analysis
associated
with neural networks. At the time, four models were of interest to us: YOLOv5s (small),
Figure 4. Typical
YOLOv5m tasks performed
(medium), YOLOv5lin(large)
the object
andrecognition
YOLOv5xtraining process.
(extra (X)large).
The outputs generated in the labeling application were then used in the training pro-
cess. In the training task, the images and labels were organized into training, testing and
validation groups. The PyTorch-based algorithm, YOLOv5, was configured and used to
train the data. YOLOv5 allowed us to work with different levels of complexity associated
with neural networks. At the time, four models were of interest to us: YOLOv5s (small),
YOLOv5m (medium), YOLOv5l (large) and YOLOv5x (extra (X)large).
Figure5.5.Creating
Figure Creatingreferences
referencesthrough
throughaalabeling
labelingapplication.
application.
The main difference of each version is in the complexity and number of hidden layers
of each deep neural network, which varies from simpler (small) to more complex (Xlarge).
To choose the best version to use in each application, a trade-off analysis between speed,
computational processing time and accuracy is required. Larger neural networks favor
Electronics 2022, 11, 476 6 of 14
The outputs generated in the labeling application were then used in the training
process. In the training task, the images and labels were organized into training, testing
and validation groups. The PyTorch-based algorithm, YOLOv5, was configured and used
to train the data. YOLOv5 allowed us to work with different levels of complexity associated
with neural networks. At the time, four models were of interest to us: YOLOv5s (small),
YOLOv5m (medium), YOLOv5l (large) and YOLOv5x (extra (X)large).
The main difference of each version is in the complexity and number of hidden layers
of each deep neural network, which varies from simpler (small) to more complex (Xlarge).
To choose the best version to use in each application, a trade-off analysis between speed,
computational processing time and accuracy is required. Larger neural networks favor
better accuracy results, but on the other hand the computational cost tends to be very slow,
sometimes not valid for real-time applications. All YOLOv5 versions were tested and the
YOLOv5s was selected for the final application, as it produced the best accuracy–speed–
robustness relation for our use case. The YOLOv5 object detection algorithm works like
a regression problem with three main components or sections, the Backbone, the Head
and the Detection, as illustrated in Figure 6. The Backbone is a CNN that collects and
models image resources at different granularities. The Head is a series of layers to combine
image resources to throw them into a prediction process. Detection is a process that uses
Head resources and performs box and class prediction steps. To do this, a loss function
for bounding boxes’ predictions based on the distance information between the predicted
frame and the real frame, known as Generalized Intersection over Union (GIoU), is used.
This function is proposed in [26] and described by Equation (1):
|C/A ∪ B|
GIoULoss = 1 − IoU − (1)
|C |
where IoU is the Intersection over Union, a common evaluation metric used to measure
the accuracy of an object detector, by comparing two arbitrary shapes (volumes) A and B:
| A ∩ B|
IoU = (2)
| A ∪ B|
Electronics 2021, 10, x FOR PEER REVIEW 7 of 15
and C is the smallest convex shape involving A and B.
Figure6.6.YOLOv5
Figure YOLOv5architecture.
architecture.
Toobtain
To obtainvalid
validoutputs,
outputs,YOLOv5
YOLOv5requires
requirestraining
trainingdatasets
datasetstotohave
haveaaminimum
minimumofof
100images.
100 images.ItItisisknown
knownthat
thatbybyincreasing
increasingthe
thedataset
datasetsize,
size,the
theoutput
outputresults
resultsimprove,
improve,
however,there
however, thereisisaaside
sideeffect
effectsince
sinceweweneed
needtotoconsider
considerthethecompromise
compromisebetween
betweenthe
the
dimension of
dimension of the datasets
datasetsand
andthetheassociated processing
associated processing time in the
time training
in the process.
training Sev-
process.
eral training experiments were performed, with a total of about 2000 different images of
connectors.
The evaluation of the classification algorithms performance is carried out by a con-
fusion matrix. A confusion matrix is a table for summarizing the performance of a classi-
fication algorithm. Each row of the matrix represents the instances in an actual class while
Electronics 2022, 11, 476 7 of 14
Several training experiments were performed, with a total of about 2000 different images
of connectors.
The evaluation of the classification algorithms performance is carried out by a confu-
sion matrix. A confusion matrix is a table for summarizing the performance of a classifi-
cation algorithm. Each row of the matrix represents the instances in an actual class while
each column represents the instances in a predicted class, or vice versa. The computation
of a confusion matrix can provide a better idea of what the classification model is getting
right and what types of errors it is making.
where all parameters are homogeneous matrices, with rotation and position elements.
G T is the homogeneous matrix that represents the pose of each recognized connector
P
in the reference frame of the gripper. As expressed by the equation, to achieve this, it
is necessary to compute the kinematics of each reference frame involved, such as the
relationship between the gripper and the robot base ( B TG ), the camera pose in the robot’s
reference frame ( B TC ) and the object pose in the camera reference frame ( C TP ).
This equation allows knowing, at each instant, the position and orientation of the
connector to be grasped. The computation is performed in Python and the interaction with
the robot controller is performed through Modbus. Trajectory planning is performed in the
main program running at the controller.
This process is performed by a calibration procedure, as illustrated in Figure 7 and defined
by Equation (3):
𝐺 (3)
Electronics 2022, 11, 476 𝑇𝑃 = ( 𝐵 𝑇𝐺 )−1 × 𝐵 𝑇𝐶 × 𝐶 𝑇𝑃 8 of 14
where all parameters are homogeneous matrices, with rotation and position elements.
Figure
Figure7.7.Coordinates
Coordinatesreference
referencesystem.
system.
4. Results
𝐺
𝑇𝑃 is the homogeneous matrix that represents the pose of each recognized connector
in theThis
reference frame
section of the
presents thegripper.
results As expressed
obtained in thebytraining
the equation, to achieve this,
and classification it isas
tasks,
necessary
well as theto solution
compute forthegrasping
kinematics theofrecognized
each reference frame
objects. Asinvolved,
stated insuch as the
Section rela-
3, about
tionship between
2000 images werethe gripper in
acquired and the robot
different poses ( 𝐵𝑇𝐺lighting
baseand ), the camera pose intothe
conditions robot’s
obtain ref-
a large
𝐵 represent the greatest possible variability of the system. 𝐶 With this dataset,
erence frame ( 𝑇𝐶 ) and the object pose in the camera reference frame ( 𝑇𝑃 ).
dataset that can
several
Thistests and training
equation allows tasks wereat
knowing, performed to obtain
each instant, the mostand
the position robust model that
orientation can
of the
be used intoreal-time
connector object
be grasped. Theidentification.
computation is performed in Python and the interaction with
Tablecontroller
the robot 1 presentsisthe results ofthrough
performed just twoModbus.
setups, from a largerplanning
Trajectory number ofis setup exercises.
performed in
Themain
the computation was performed
program running by a portable computer with an Intel Core™ i7-10510U
at the controller.
CPU running at 1.80–2.30 GHz, and with 16 GB of RAM memory.
4. Results
Table 1. Time taken for training.
This section presents the results obtained in the training and classification tasks, as
well as the solution for grasping the recognized Setup objects.
1 As stated in Section
Setup 3,2about 2000
images were acquired in different poses (Forand lighting conditions to obtain
Comparison) a large dataset
(Best Results)
that can represent the greatest possible variability
Image Size 640 × 480
of the system. With this dataset, several
640 × 480
tests Number
and training tasks were
of training images performed to obtain
192 the most robust model that
552 can be used
in real-time objectofidentification.
Number test images 40 56
Table of
Number 1 presents
validationthe results of just two 40
images setups, from a larger number72of setup exer-
cises. The computationMethod was performed by YOLOv5s
a portable computer with an YOLOv5s
Intel Core™ i7-
10510U CPU running at 1.80–2.30 GHz, and with 16 GB of RAM memory. 15
Epochs 15
Batch Size 8 8
Training time 40 min 150 min
Table 1. Time taken for training.
Figure 8.
Figure 8. Obtained confusion
confusion matrix
matrix for
for Setup
Setup 1.
1.
Figure 9.
Figure 9. Setup
Setup 11 output
output results
results using
using the
the YOLO5vs
YOLO5vs algorithm.
algorithm.
Electronics 2022, 11, 476 10 of 14
Nonetheless, by correlating the results with the confusion matrix, it can be concluded
that the outcome suffers from an insufficient dataset for training (low number of samples)
(only references ref2 and ref3 were correctly identified).
Figure 10 depicts classification results produced by the image testing phase of the
training process. These were used to validate and fine-tune our classification model. As
it can be seen, almost all connectors were classified either as ref2 or ref3. Therefore, the
algorithm was not able to correctly detect the false positives.
Figure 9. Setup 1 output results using the YOLO5vs algorithm.
Figure10.
Figure 10.Setup
Setup11classification
classificationresults
resultsusing
usingthe
theYOLOv5s
YOLOv5salgorithm.
algorithm.
4.2.
4.2.Results
ResultsforforSetup
Setup22
Setup
Setup22produced
producedbetter
betterresults.
results. Despite
Despite all
all references
references being
beingclassified,
classified, due
duetotovery
very
close
closesimilarity
similaritybetween
betweenref4
ref4and
andref5,
ref5,depending
dependingon ontheir
theirposition
positionononthe
thetray,
tray,the
thesystem
system
still
stillbecomes
becomesconfused.
confused.These
Thesetwo
tworeferences
referenceshave
havethethesame
sameshape
shapeand
andoverall
overallcolor—only
color—only
the
the top layer has a different color. By correlating the output results from theconfusion
top layer has a different color. By correlating the output results from the confusion
matrix
matrix(see (seeFigure
Figure11)11)and
andthe
theresults
resultsfrom
fromthe
theYOLOv5s
YOLOv5salgorithm
algorithm(Figure
(Figure12),
12),we
wecan
can
conclude that the results are not yet as desired, as they still suffer from
conclude that the results are not yet as desired, as they still suffer from an insufficientan insufficient
dataset
datasetfor fortraining
training(insufficient
(insufficientnumber
numberof ofsamples);
samples);however,
however,ititisisnow
nowpossible
possibleclassify
classify
the
thefive
fivereferences
referenceswithwithgood
goodprecision
precisionvalues.
values.
Figure 13 depicts classification results produced by the image testing phase of the
training process. These were used to validate and fine-tune our classification model.
By comparing with the first experiment (Setup 1), in this experiment, all five connector
references were detected with good accuracy, even in the presence of connectors with
similar characteristics, such as shape and color.
Electronics 2021,11,
Electronics2022, 10,476
x FOR PEER REVIEW 11
11 of
of 14
15
Figure 11.
11. Confusion
Confusion matrix
matrix for
for Setup
Setup 2.
Figure
Figure 11. Confusion matrix for Setup 2. 2.
Figure12.
Figure Setup22output
12.Setup outputresults
results using
using thethe YOLOv5s
YOLOv5s algorithm.
algorithm.
Figure 12. Setup 2 output results using the YOLOv5s algorithm.
This trained
Figure model
13 depicts was the one
classification chosen
results for thebyimplementation
produced the image testing ofphase
real-time object
of the
training process. These were used to validate and fine-tune our classification model. Byofper-
Figure
detection in 13
thedepicts classification
bin-picking solution.results
This produced
choice took by the
into image
account testing
the phase
algorithm’s the
training process.
formance, with
comparing These
considering were used to validate
its accuracy(Setup
the first experiment and processingand fine-tune our
time, and all
1), in this experiment, classification
its five
comparison model.
connectorwith By
refer-other
comparing
ences with
state-of-the-art the first
algorithms,
were detected experiment
with good when (Setup
usingeven
accuracy, 1),
the in in
same this experiment,
datasets.ofThe
the presence all five
comparison
connectors connector refer-
results are
with similar
ences were
depicted detected
in Table
characteristics, 2.aswith
such shape good
and accuracy,
color. even in the presence of connectors with similar
characteristics, such as shape and color.
Table 2. Object detection results: comparison between different state-of-the-art algorithms.
Faster
Measure SSD YOLOv5s YOLOv5m YOLOv5l YOLOv5x
R-CNN
Precision 0.70 0.80 0.82 0.83 0.85 0.90
Recall 0.41 0.45 0.99 0.99 0.98 0.99
mAP@0.5 0.74 0.76 0.97 0.97 0.98 0.99
CPU time 120 min 135 min 150 min 3 h 40 min 4 h 50 min 6 h 20 min
Electronics 2022, 11, 476 12 of 14
Electronics 2021, 10, x FOR PEER REVIEW 12 of 15
Figure13.
Figure 13. Setup
Setup 22 classification
classificationresults
resultsusing
usingthe
theYOLOv5s
YOLOv5salgorithm.
algorithm.
This trained
Clearly, modelpresented
YOLOv5s was the one chosen
better for than
results the implementation of real-time
SSD (Single Shot Detector)object
[30] and
detection in the bin-picking solution. This choice took into account the algorithm’s
Faster R-CNN [31]. We can also observe that no significant gains were obtained when perfor-
using
mance,versions
higher considering its accuracy
of YOLOv5, andprovide
which processing time,precision
similar and its comparison with
but are more other state-
computationally
of-the-art algorithms, when using the same datasets. The comparison results are depicted
demanding.
in Table 2.
4.3. Identification
Table 2. Object detection results: comparison between different state-of-the-art algorithms.
After the pattern recognition and pose detection, a match between the identified and
trained patterns needs toFaster
take place,
R- which occurs by moving and changing the orientation
ectronics 2021, 10, x FOR PEER REVIEW Measure SSD YOLOv5s YOLOv5m YOLOv5l 13 YOLOv5x
of 15
of the robot gripper, a process
CNN illustrated in Figure 14. This task has a special interest to
identify the best0.70
Precision pose for grasping
0.80 each0.82
kind of object.0.83 0.85 0.90
Recall 0.41 0.45 0.99 0.99 0.98 0.99
mAP@0.5 0.74 0.76 0.97 0.97 0.98 0.99
CPU time 120 min 135 min 150 min 3 h 40 min 4 h 50 min 6 h 20 min
Clearly, YOLOv5s presented better results than SSD (Single Shot Detector) [30] and
Faster R-CNN [31]. We can also observe that no significant gains were obtained when
using higher versions of YOLOv5, which provide similar precision but are more compu-
tationally demanding.
4.3. Identification
(a) After the pattern recognition
(b) and pose detection, a match between(c) the identified and
trained patterns needs to take place, which occurs by moving and changing the orienta-
Figure 14. Steps in identifying inconnector
Steps gripper, orientation.
identifying connector(a)orientation.
search for a shape and orientation. (b)and orientation.
Figure
tion of14.
the robot a process illustrated in Figure(a)
14.search for has
This task a shape
a special interest
approach to the shape. (c) matching shape.
(b)
to approach to the
identify the bestshape.
pose (c)
formatching
graspingshape.
each kind of object.
The core goal
The was
core to achieve
goal was to anachieve
averageancycle timecycle
average of 10 time
s for of
a robotic
10 s forarm to suc-
a robotic arm to suc-
cessfully recognize and pick up
cessfully recognize and a cable connector
pick up with unpredictable
a cable connector positions, positions,
with unpredictable thanks to thanks to
AI-based machine
AI-basedvision.
machine Table 3 depicts
vision. Table 3the timesthe
depicts measured. Since thisSince
times measured. is notthis
a collabo-
is not a collabora-
rative operation, the average time used for the pick-and-place operation is a reference
tive operation, the average time used for the pick-and-place operation is a reference value,
value, for for
non-collaborative robots.
non-collaborative robots.
Table 3. Average cycle time taken to successfully pick up the cable connectors with unpredictable
positions.
Table 3. Average cycle time taken to successfully pick up the cable connectors with unpredictable positions.
5. Conclusions
This work aimed to demonstrate that for small objects, such as automotive connectors,
bin-picking solutions with a low-cost 3D vision system are possible. The machine vision
algorithm plays an important role in correctly identifying objects, and this is only possible
due to the contribution of machine learning algorithms. The YOLO algorithm has been
shown to have great potential for these tasks, in particular, YOLOv5 was shown to recognize
these kinds of small objects with high accuracy and repeatability. Grasping this type of
connector is a challenging task due to its layout not being solid and it being capable of being
vacuum aspirated, which makes manipulation difficult. Our test scenario used a two-finger
gripper, which implies identifying the best pose for grasping the connector, and is more
propitious to collisions when interacting with very close objects. Despite these challenges,
this work demonstrated that it is possible to grasp small objects in bulk, classifying them
and sorting them into different output boxes.
As future work, the main focus will be further reducing the cycle time, possibly by
improving the time required to identify the best posture to grip the connectors. Additionally,
a new type of gripper is being considered that would be more suitable for grasping these
types of objects.
Author Contributions: Conceptualization, P.T., H.M. and P.M.; methodology, P.T. and H.M.; software
and hardware, P.T.; validation, P.T., J.A., H.M. and P.M.; investigation, P.T., J.A., H.M. and P.M.;
writing—original draft preparation, P.T.; writing review and editing, P.T., J.A., H.M. and P.M.; funding
acquisition, P.M. All authors have read and agreed to the published version of the manuscript.
Funding: This work results from a sub-project that has indirectly received funding from the European
Union’s Horizon 2020 research and innovation programme via an Open Call issued and executed
under project TRINITY (grant agreement No 825196).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Indicators and a Monitoring Framework. Available online: https://indicators.report/targets/12-5/ (accessed on 26
December 2021).
2. Automotive Wiring Harness Market—Growth, Trends, COVID-19 Impact, and Forecasts (2021–2026). Available online: https:
//www.mordorintelligence.com/industry-reports/automotive-wiring-harness-market (accessed on 26 December 2021).
3. Automotive Wiring Harness Market by Application. Available online: https://www.marketsandmarkets.com/Market-Reports/
automotive-wiring-harness-market-170344950.html (accessed on 26 December 2021).
4. Ghobakhloo, M. Industry 4.0, digitization, and opportunities for sustainability. J. Clean. Prod. 2020, 252, 119869. [CrossRef]
5. Fujita, M.; Domae, Y.; Noda, A.; Garcia Ricardez, G.A.; Nagatani, T.; Zeng, A.; Song, S.; Rodriguez, A.; Causo, A.; et al.;
Ogasawara, T. What are the important technologies for bin picking? Technology analysis of robots in competitions based on a set
of performance metrics. Adv. Robot. 2020, 34, 560–574. [CrossRef]
6. Marvel, J.; Eastman, R.; Cheok, G.; Saidi, K.; Hong, T.; Messina, E.; Bollinger, B.; Evans, P.; Guthrie, J.; et al.; Martinez, C.
Technology Readiness Levels for Randomized Bin Picking. In Proceedings of the Workshop on Performance Metrics for Intelligent
Systems, Hyattsville, MD, USA, 20–22 March 2012; pp. 109–113.
7. Kleeberger, K.; Bormann, R.; Kraus, W.; Huber, M.F. A Survey on Learning-Based Robotic Grasping. Curr. Robot. Rep. 2020, 1,
239–249. [CrossRef]
8. Photoneo. Available online: https://www.photoneo.com/ (accessed on 29 December 2021).
9. Zivid. Available online: https://www.zivid.com/ (accessed on 29 December 2021).
10. Solomon. Available online: https://www.solomon-3d.com/ (accessed on 29 December 2021).
11. Pickit. Available online: https://www.pickit3d.com/en/ (accessed on 29 December 2021).
Electronics 2022, 11, 476 14 of 14
12. Song, Q.; Li, S.; Bai, Q.; Yang, J.; Zhang, X.; Li, Z.; Duan, Z. Object Detection Method for Grasping Robot Based on Improved
YOLOv5. Micromachines 2021, 12, 1273. [CrossRef] [PubMed]
13. Pochyly, A.; Kubela, T.; Singule, V.; Cihak, P. 3D Vision Systems for Industrial Bin-Picking Applications. In Proceedings of the
15th International Conference MECHATRONIKA, Prague, Czech Republic, 5–7 December 2012; pp. 1–6.
14. Martinez, C.; Chen, H.; Boca, R. Automated 3D Vision Guided Bin Picking Process for Randomly Located Industrial Parts.
In Proceedings of the 2015 IEEE International Conference on Industrial Technology (ICIT), Seville, Spain, 17–19 March 2015;
pp. 3172–3177. [CrossRef]
15. Yan, W.; Xu, Z.; Zhou, X.; Su, Q.; Li, S.; Wu, H. Fast Object Pose Estimation Using Adaptive Threshold for Bin-Picking. IEEE
Access 2020, 8, 63055–63064. [CrossRef]
16. Matsumura, R.; Domae, Y.; Wan, W.; Harada, K. Learning Based Robotic Bin-picking for Potentially Tangled Objects. In
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8
November 2019; pp. 7990–7997. [CrossRef]
17. Lee, S.; Lee, Y. Real-Time Industrial Bin-Picking with a Hybrid Deep Learning-Engineering Approach. In Proceedings of the 2020
IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Korea, 19–22 February 2020; pp. 584–588.
[CrossRef]
18. Arents, J.; Cacurs, R.; Greitans, M. Integration of Computervision and Artificial Intelligence Subsystems with Robot Operating
System Based Motion Planning for Industrial Robots. Autom. Control Comput. Sci. 2018, 52, 392–401. [CrossRef]
19. Kim, K.; Kang, S.; Kim, J.; Lee, J.; Kim, J. Bin Picking Method Using Multiple Local Features. In Proceedings of the 2015
12th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Goyang-si, Korea, 28–30 October 2015;
pp. 148–150. [CrossRef]
20. Pyo, J.; Cho, J.; Kang, S.; Kim, K. Precise Pose Estimation Using Landmark Feature Extraction and Blob Analysis for Bin Picking.
In Proceedings of the 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Jeju, Korea,
28 June–1 July 2017; pp. 494–496. [CrossRef]
21. Dolezel, P.; Pidanic, J.; Zalabsky, T.; Dvorak, M. Bin Picking Success Rate Depending on Sensor Sensitivity. In Proceedings of the
2019 20th International Carpathian Control Conference (ICCC), Krakow-Wieliczka, Poland, 26–29 May 2019; pp. 1–6. [CrossRef]
22. Zhao, Z.-Q.; Zheng, P.; Xu, S.-T.; Wu, X. Object Detection With Deep Learning: A Review. IEEE Trans. Neural Netw. Learn. Syst.
2019, 30, 3212–3232. [CrossRef] [PubMed]
23. Thuan, D. Evolution Of Yolo Algorithm And Yolov5: The State-Of-The-Art Object Detection Algorithm. Bachelor’s Thesis, Oulu
University of Applied Sciences, Oulu, Finland, 2021.
24. Xu, Q.; Zhu, Z.; Ge, H.; Zhang, Z.; Zang, X. Effective Face Detector Based on YOLOv5 and Superresolution Reconstruction.
Comput. Math. Methods Med. 2021, 2021, 1–9. [CrossRef] [PubMed]
25. Yao, J.; Qi, J.; Zhang, J.; Shao, H.; Yang, J.; Li, X. A Real-Time Detection Algorithm for Kiwifruit Defects Based on YOLOv5.
Electronics 2021, 10, 1711. [CrossRef]
26. Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and A Loss
for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, Long
Beach, CA, USA, 16–20 June 2019; pp. 658–666.
27. Zhang, B.; Xie, Y.; Zhou, J.; Wang, K.; Zhang, Z. State-of-the-art robotic grippers, grasping and control strategies, as well as their
applications in agricultural robots: A review. Comput. Electron. Agric. 2020, 177, 105694. [CrossRef]
28. Fantoni, G.; Santochi, M.; Dini, G.; Tracht, K.; Scholz-Reiter, B.; Fleischer, J.; Lien, T.K.; Seliger, G.; Reinhart, G.; Franke, J.; et al.
Grasping devices and methods in automated production processes. CIRP Ann. 2014, 63, 679–701. [CrossRef]
29. Guo, J.; Fu, L.; Jia, M.; Wang, K.; Liu, S. Fast and Robust Bin-picking System for Densely Piled Industrial Objects. In Proceedings
of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 2845–2850. [CrossRef]
30. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single Shot Multibox Detector. In European
Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37.
31. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural
Inf. Process. Syst. 2015, 28, 91–99. [CrossRef] [PubMed]