1 s2.0 S0278612524001584 Main

Journal of Manufacturing Systems 76 (2024) 133–157
Contents lists available at ScienceDirect
Journal of Manufacturing Systems

journal homepage: www.elsevier.com/locate/jmansys
Technical paper
Developing a deep learning-based uncertainty-aware tool wear prediction

method using smartphone sensors for the turning process of Ti-6Al-4V
Gyeongho Kim a ,1 , Sang Min Yang b ,1 , Dong Min Kim c , Jae Gyeong Choi a , Sunghoon Lim a,d,e ,∗,
Hyung Wook Park b ,∗
a
Department of Industrial Engineering, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan 44919, Republic of Korea
b Department of Mechanical Engineering, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan 44919, Republic of Korea
c Dongnam Division, Korea Institute of Industrial Technology, 25, Yeonkkot-ro 165 gil, Jinju 52845, Republic of Korea
d Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan 44919, Republic of Korea
e
Industry Intelligentization Institute, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan 44919, Republic of Korea
ARTICLE INFO ABSTRACT
Keywords: Accurately predicting tool wear is crucial for intelligent machining process monitoring, control, and quality
Aleatoric uncertainty improvement. Recent studies on tool wear prediction predominantly apply deep learning-based data-driven
Deep learning approaches that use multivariate time-series signals from high-precision sensors. However, the reliance on
Epistemic uncertainty
these sensors incurs high installation and operation costs, posing practical challenges for small and medium-
Machining
sized enterprises. This work proposes a novel deep learning-based approach that employs smartphone sensors
Smartphone sensor
Tool wear prediction
to predict tool wear, which addresses the problems associated with smartphone sensor data, including higher
noise levels and increased data and model uncertainties. To this end, this work develops various data-driven
techniques for effective tool wear prediction and uncertainty quantification. First, a Kalman filter-based noise
suppression method is applied to reduce undesired noise effects. Second, a novel uncertainty modeling method
consisting of a Bayesian deep learning approach and a density output structure is proposed to capture both
aleatoric and epistemic uncertainties during tool wear prediction. The proposed method not only takes into
account high noise levels and induced uncertainty, but also continuously quantifies and dissects predictive
uncertainty. The proposed method’s effectiveness is validated with real-world datasets from Ti-6Al-4V turning
experiments under three different machining conditions. The comprehensive experimental results indicate
the superior prediction performance of the proposed method compared to existing data-driven methods,
probabilistic deep learning-based methods, and state-of-the-art methods. For each of the three distinct datasets,
the proposed method provides the lowest mean absolute error (MAE) values of 2.5815, 1.2414, and 1.2269,
with the highest 𝑅2 values of 0.9951, 0.9971, and 0.9982, respectively.
1. Introduction manufacturing systems using various high-precision sensors and indus-

trial internet of things (IIoT) have paved the way for the application
Recent advances in data-driven approaches based on artificial in- of machine learning (ML) and deep learning (DL)-based approaches
telligence (AI) techniques have accelerated digital transformation in suitable for extracting helpful information inherent in data. DL-based
the manufacturing industry, leading to the smart manufacturing era. approaches with a high expressive power and modeling capacity, ac-
Data-driven approaches have also played an important role in cyber companied by the use of high-precision sensors and measurement data,
manufacturing, where physical models cannot be easily employed due have led to unprecedented success in diverse manufacturing tasks, such
to theoretical assumptions, non-linear variable relationships, and high as predictive maintenance, fault diagnosis, and process control [3–
entry barriers [1,2]. The advantages of data-driven approaches include 6]. Machining, a traditional manufacturing process, has also benefited
real-time monitoring and control, process optimization flexibility, and from advanced data-driven approaches, especially from prognostics and
facile quality management. Notably, large-scale data collected from
∗ Corresponding authors at: Department of Industrial Engineering and Department of Mechanical Engineering, Ulsan National Institute of Science and
Technology, 50 UNIST-gil, Ulsan 44919, Republic of Korea.
E-mail addresses: kkh0608@unist.ac.kr (G. Kim), yangsangmin@unist.ac.kr (S.M. Yang), dkim0707@kitech.re.kr (D.M. Kim), choil6043@unist.ac.kr
(J.G. Choi), sunghoonlim@unist.ac.kr (S. Lim), hwpark@unist.ac.kr (H.W. Park).
1
Equal contribution.
https://doi.org/10.1016/j.jmsy.2024.07.010
Received 20 February 2024; Received in revised form 10 July 2024; Accepted 21 July 2024
Available online 30 July 2024
0278-6125/© 2024 The Society of Manufacturing Engineers. Published by Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training,
and similar technologies.
G. Kim et al. Journal of Manufacturing Systems 76 (2024) 133–157
diverse sensor signals, such as cutting force, vibration, and acoustic

Nomenclature emission signals [14]. However, conventional data-driven approaches
AI Artificial intelligence have limited model capacity to capture such complex implicit tempo-
BMA Bayesian model averaging ral characteristics and high-level features in multivariate time-series
sensor signals [15,16]. Furthermore, conventional ML-based methods
BNN Bayesian neural network
are unsuitable for real-time prediction, which uses long sensor signal
CNC Computer numerical control
sequences collected during machining processes [14].
CNN Convolutional neural network
Many recent works on data-driven tool wear prediction utilize
CPS Cyber-physical system highly expressive DL-based predictive models capable of handling com-
DL Deep learning plex sensor signals [17,18]. In particular, DL algorithms like recurrent
ELBO Evidence lower bound neural networks (RNNs), convolutional neural networks (CNNs), and
EN EfficientNet their variants, which take multivariate time-series data as inputs, are
EOL End-of-life widely adopted for the online prediction of ongoing tool wear degrees.
GRU Gated recurrent unit Zhang et al. propose a CNN-based tool wear monitoring method that
IIoT Industrial internet of things fuses multi-channel information during the machining process [12].
KF Kalman filter Wang et al. present a hybrid approach using a bidirectional gated
recurrent unit (GRU) model with a local feature extraction method for
LR Linear regression
tool wear prediction [19]. Sun et al. propose a long short-term memory
LSTM Long short-term memory
(LSTM)-based model that forecasts future tool wear autoregressively
MC Monte Carlo and a CNN-based module that predicts flank wear using sensor sig-
ML Machine learning nals [20]. Shi et al. present a DL-based tool wear prediction method
PF Particle filter based on a multiple stacked sparse auto-encoder that captures vibration
PHM Prognostics and health management signals for deep feature learning and multi-feature fusion [21].
RF Random forest There also are advanced DL-based approaches that use other feature-
RNN Recurrent neural network learning techniques with different model architectures to improve tool
RUL Remaining useful life wear prediction performance. Qin et al. perform tool wear identifi-
SGD Stochastic gradient descent cation and prediction using a stack sparse self-coding network based
on extracted features from various sensor signals [22]. Self-supervised
SME Small and medium-sized enterprise
learning has been applied to train a disentangled variational auto-
SVM Support vector machine
encoder (VAE) with a temporal CNN for tool wear prediction [23].
TCM Tool condition monitoring Wang et al. propose a multi-task learning approach using a deep belief
VAE Variational auto-encoder network that simultaneously performs TCM and surface quality predic-
VI Variational inference tion [24]. Liu et al. develop meta-invariant feature space learning to
enable tool wear prediction under multiple machining conditions [25].
There have also been several works that propose appropriate ap-
proaches for practical applications by combining data-driven predictive
health management (PHM) applications. For instance, the DL-based models with advanced DL techniques [3,26,27]. For instance, Li et al.
data-driven prediction of tool wear or remaining useful life (RUL) develop a hybrid approach to tool wear monitoring by integrating a
of tools and equipment has not only improved prediction accuracy physics-based model with a DL model to perform a physics-informed
but also enhanced process efficiency and productivity through pre- prediction, which shows superior performance compared to existing
ventive maintenance [7,8]. Alongside this trend, this work presents a data-driven approaches [28].
novel data-driven tool wear prediction method that uses mobile sensor While DL-based approaches have shown superior performance in
measurements with advanced DL techniques for efficient and reliable machining processes, several obstacles prohibit their use in real-world
operation, which is suitable for practical machining processes. practice. First, data used to train DL-based predictive models in the
Predicting tool wear during the machining process is one of the literature predominantly rely on high-precision sensors. These high-
crucial tasks in manufacturing, as it enables real-time tool condition precision sensors collect vibration, acoustic, acceleration, and force
monitoring (TCM) and helps prevent tool breakage, thus leading to signals in a time-series form [29]. However, high-precision sensors
quality improvement in the machining process. Recently, various pro- are expensive to install and operate for many small and medium-
posed data-driven prediction methods have effectively monitored tool sized enterprises (SMEs), which account for most currently operating
wear using data collected from computer numerical control (CNC) manufacturing companies [30]. In addition, using multiple sensors to
machines via programmable logic controllers or time-series data from collect diverse features incurs higher costs that impair practicality.
multiple sensors. Traditionally, statistical data-driven approaches based For instance, in the recent work for tool wear RUL estimation [31],
on state space models (e.g., linear dynamical systems) and conventional the acoustic emission and acceleration sensors are predominantly used
ML algorithms have been widely utilized for tool wear prediction. Zhu to estimate the degree of tool wear. However, the average cost of
et al. utilize the hidden Markov model with physics-informed feature installing the high-precision sensors used in existing work [31] is
learning for tool wear monitoring [9]. Zhu and Liu present a hidden assumed to be approximately $15,000 as of June 2024, which is bur-
semi-Markov model-based online tool wear monitoring system for a densome for SMEs. Moreover, installing related systems integrated with
high-speed milling process [10]. Wang and Gao develop a particle existing manufacturing systems for data collection further aggravates
filter (PF)-based tool wear prediction model with an adaptive sampling the burden for real-world applications [1,32] and even incurs ongoing
approach [11]. A probabilistic tool wear prediction method based on additional costs in terms of sensor maintenance. One feasible solution
an enhanced PF has also been presented [12]. Wang et al. propose to alleviate the expenditure burden of high-precision sensors is mobile
using an exponential tool degradation model with a sequential Bayesian sensing [33]. For example, a smartphone already has built-in sensors to
update to predict the RUL of machining tools [13]. In addition, con- collect data online that can monitor manufacturing processes [34]. As
ventional ML algorithms, including support vector machine (SVM), mobile sensing can reduce installation and operation costs by almost
linear regression (LR), random forest (RF), and gradient boosting ma- 90%, compared to the high-precision sensors for data collection [31],
chine, have been widely employed for tool wear prediction based on and improves adaptability and usability, it has been practically utilized
134
in manufacturing and related industries for collecting various sensor randomness, it cannot be reduced and, therefore, should be directly
signals, such as audio, vibration, and acceleration [35–37]. However, modeled. The need to capture model uncertainty (i.e., epistemic un-
despite its potential benefits for improved productivity and monitoring certainty, knowledge uncertainty) is also elevated for the DL-based
efficiency, which will be especially useful for SMEs, the smartphone predictive models to enable calibrated model confidence in predic-
sensing technique has not been applied to machining processes. There- tion results and improved prediction performance. Since noisy mobile
fore, this work proposes using a data-driven tool wear monitoring sensing data increase uncertainty in prediction results, epistemic uncer-
method that solely uses a smartphone and its sensors for data measure- tainty should be captured to better understand the underlying dynamics
ment, enabling the cost-effective predictive maintenance of machining of tool wear phenomena. Therefore, it is essential to take uncertainty
tools. In particular, the proposed method can improve productivity for into account when developing a reliable predictive model under noisy
SMEs, as managing multiple machining equipment can be conducted sensing conditions. Few existing works in the literature have proposed
only using a smartphone without specialized personnel. uncertainty-aware tool wear prediction methods based on probabilistic
However, multiple problems arise when using smartphones to col- approaches. Some analytical model-based approaches enable uncer-
lect input data for tool wear prediction. In reality, there are sev- tainty calculation based on measurement error and model precision,
eral differences between the measurements from smartphone sensors which are difficult to apply in real-time [47,48]. In addition, conven-
(e.g., audio, accelerometer) and those from high-precision sensors. tional analytical approaches fail to generate continuous uncertainty
First, data collected from the built-in smartphone sensors contain a estimates during machining processes.
higher level of sensor noise (i.e., measurement error), similar to other A complete modeling of uncertainty in tool wear prediction requires
mobile sensing techniques, compared to high-precision sensors [38]. capturing both aleatoric and epistemic uncertainties because they stem
Furthermore, smartphone sensors are more vulnerable to irrelevant from different sources (i.e., data and model). However, existing data-
noise generated during machining processes because the embedded driven methods that provide probabilistic tool wear prediction lack the
sensors in smartphones usually have lower product specification levels, holistic incorporation of aleatoric and epistemic uncertainties [49,50].
such as resolution and frequency, than high-precision sensors. These Huang et al. utilize a Gaussian process (GP) for probabilistic tool
noisy sensor measurements are likely to hinder the training of deep wear prediction [51]. However, GP has limited scalability and cannot
neural network (DNN)-based predictive models, as the DNN model is capture aleatoric uncertainty. A few DL-based probabilistic approaches
likely to learn uninformative features that do not help the accurate use Bayesian learning techniques for uncertainty-aware tool wear pre-
prediction of ongoing tool wear. Therefore, considering that the various diction [14,52]. Nevertheless, existing DL-based approaches lack the
types of potential noise arising from mechanical and electrical distur- full consideration of both aleatoric and epistemic uncertainties and do
bances are prevalent in real-world manufacturing processes [39,40], not consider noisy sensing scenarios using smartphone sensors. Based
the sensor noise problem must be addressed to develop an accurate tool on the aforementioned uncertainty problems induced by mobile sensing
wear prediction method using smartphone sensors. using smartphone sensors, this work performs a technically novel tool
Several studies have been conducted to reduce sensor noise for var- wear prediction by developing a holistic uncertainty modeling method
ious prediction tasks, including tool wear prediction in manufacturing. for the first time in the literature. First, the Bayesian learning approach
Wegener et al. simulate various machining noise types composed of based on a mean-field variational Bayes is applied to incorporate
wave-based, geometrical acoustics, diffusion-based, and stochastic to epistemic uncertainty in the DL-based predictive model. In addition,
demonstrate that noise reduction crucially affects prediction perfor- a density output structure is utilized in the model architecture, with a
mance during the manufacturing processes [41]. Guo et al. have used modified probabilistic objective function for loss attenuation, to capture
a wavelet transform adapted for TCM and noise removal using a sound aleatoric uncertainty during tool wear prediction. During inference,
signal and have shown its effectiveness in the milling process [42]. the proposed method can simultaneously capture and estimate the two
Peng et al. develop a low-pass filter for an accurate online fan tray di- uncertainty types. Therefore, this work develops and combines the
agnosis system [43]. The low-pass filter allows signals with a frequency noise suppression method with a novel uncertainty modeling method to
lower than a specific cutoff frequency to pass through while attenuating improve prediction performance and the robustness of the data-driven
or blocking signals with higher frequencies. approach to TCM. In particular, it is worth noting that this work’s pri-
Among these several noise reduction methods, the Kalman filter mary technical novelty comes from combining and developing existing
(KF) is one of the most predominantly used filtering algorithms that techniques (i.e., KF) and novel data-driven approaches (i.e., Bayesian
handle noisy measurement data to infer true state information, which learning, density output) for smartphone sensor-enabled TCM. Due to
helps mitigate the effects of undesired noise in sensor signals [44]. these qualities, the proposed method can also be applied for quality
Zhong et al. utilize KF to remove sensor noise and enhance the pre- improvement and process control in advanced manufacturing areas,
dictive model’s accuracy during manufacturing [45]. Sjöberg et al. use such as cyber manufacturing and cyber–physical systems (CPS), where
a KF-based calibration approach to mitigate the effects of noise signals uncertainty stemming from the complexity of manufacturing systems
in manufacturing systems and digital twins [40]. This work focuses on should be taken into account [53]. In addition, incorporating and ana-
the KF due to its several advantages compared to other sensor noise lyzing aleatoric and epistemic uncertainties using smartphone sensor
reduction methods, especially for online health monitoring. The KF data can facilitate efficient preventive maintenance for SMEs, thus
performs online filtering, allowing it to update the state variables in improving productivity.
real-time as new data are received. This is particularly useful for track- In this work, experimental data collected from real-world machining
ing rapidly changing states in dynamic environments. Furthermore, (i.e., turning) processes have been used to validate the effectiveness
the KF can adapt to dynamic system characteristics; it adjusts weights of the proposed tool wear prediction method. Extensive experiments
to maintain optimal state estimation, making it suitable for scenarios and performance comparisons that utilize existing methods prove the
where the system undergoes rapid changes. Hence, this work proposes proposed method’s effectiveness in tool wear prediction and robustness
a KF-based noise suppression method for accurate tool wear prediction to noisy measurement data. Furthermore, the proposed method’s ability
using mobile sensing data. to quantify uncertainty in tool wear prediction has been validated via
Another problem induced by the higher level of sensor noise is the qualitative and quantitative analysis. In particular, this work validates
increased uncertainty in tool wear prediction. Due to the noisy data data-driven uncertainty estimates using a conventional analytical ap-
signals collected from smartphone sensors, the uncertainty that exists proach for the first time in the literature. An overview of the proposed
in data (i.e., aleatoric uncertainty) increases as well and thus should uncertainty-aware tool wear prediction method, including the main
be addressed for accurate tool wear prediction [46]. In particular, components, is visualized in Fig. 1. This work’s main contributions to
since aleatoric uncertainty stems from measurement noise, error, and the literature are as follows.
135
Fig. 1. An overview of the proposed uncertainty-aware tool wear prediction method.
• Mobile sensing data collected with smartphone sensors are uti- respectively. Based on the given data , this work postulates a data-
lized for tool wear prediction during machining processes, which driven predictive model 𝑓 parameterized by a set of learnable param-
enhances its applicability for small and medium-sized manufac- eters 𝝎. In a supervised learning scheme, the predictive model that
turing companies due to low installation and operation costs for represents the conditional likelihood of the data 𝑓𝝎 = 𝑝(𝑦|𝑥) is trained
data collection. with a proper loss function 𝓁(𝝎).
• A KF-based noise suppression method is applied to reduce the
effects of noise caused by mobile sensing in tool wear prediction. 2.1. Noise suppression for mobile sensing data
In addition, a deep learning-based uncertainty-aware model that
uses Bayesian learning and a density output structure is also As mentioned above, data collected by smartphone sensors have
developed to enable robust prediction and uncertainty quantifi- relatively higher noise levels than the high-precision sensors [38], and
cation, holistically capturing both aleatoric and epistemic uncer- thus, this noise could significantly deteriorate the performance of the
tainties that stem from different sources. data-driven tool wear prediction method. To alleviate the effects of
inherent noise from mobile sensing, the noise suppression process must
The rest of this paper is organized as follows. Section 2 provides be performed to apply the data-driven method effectively. In general,
a detailed illustration of the proposed uncertainty-aware tool wear wavelet transform, fast Fourier transform, inverse Fourier transform,
prediction method and its components. In particular, the KF-based and spectral subtraction are commonly used to reduce the noise. How-
noise suppression method for mobile sensing data, the use of Bayesian ever, these methods are crucially related to the frequency domain; some
learning and a density output structure to develop an uncertainty-aware machining process (e.g., turning) characteristics do not exist in the
model, and the validation method based on a conventional analytical form of static frequency. Furthermore, for subtractive manufacturing
approach are explained. Section 3 presents the experimental setup, data processes, such as the turning, milling, and drilling process, noise
description, tool wear measurement and calculation, data preprocess- reduction has to be conducted regardless of the frequency domain area.
ing, architecture details, and implementation. Extensive experimental Considering the aforementioned machining characteristics, this work
results and discussion are provided in Section 4. Finally, the conclusions employs KF to reduce the noise effects from mobile sensing data. KF
and future works are delineated in Section 5. is known for its effectiveness in noise suppression due to its ability
to estimate a system’s state in the presence of uncertain and noisy
measurements. Moreover, several studies have been conducted using
2. Proposed tool wear prediction method KF, especially regarding the machining process [54–56]. The noise
of sensor measurement data is reduced through the KF process, as
This section illustrates in detail the proposed tool wear predic- described below.
tion method, which takes into account noise and induced uncertainty
for modeling and inference. The proposed method consists of three 𝑥̂ 𝑘|𝑘−1 = 𝐹𝑘 𝑥̂ 𝑘−1|𝑘−1 + 𝐵𝑘 𝑢𝑘 (1)
parts: (1) noise suppression, (2) uncertainty modeling, and (3) predic-
tive model development. First, the proposed noise suppression method 𝑃𝑘|𝑘−1 = 𝐹𝑘 𝑃𝑘−1|𝑘−1 𝐹𝑘𝑇 + 𝑄𝑘 (2)
based on KF, which handles noisy mobile sensing data, is illustrated.
Then, how aleatoric and epistemic uncertainties are modeled using 𝐾𝑘 = 𝑃𝑘|𝑘−1 𝐻𝑘𝑇 (𝐻𝑘 𝑃𝑘|𝑘−1 𝐻𝑘𝑇 + 𝑅𝑘 )−1 (3)
Bayesian learning and a density output structure for tool wear pre-
diction is delineated. Lastly, detailed descriptions of the proposed
𝑥̂ 𝑘|𝑘 = 𝑥̂ 𝑘|𝑘−1 + 𝐾𝑘 (𝑧𝑘 − 𝐻𝑘 𝑥̂ 𝑘|𝑘−1 ) (4)
predictive model for tool wear prediction are provided. In addition to
the explanations of the proposed uncertainty-aware tool wear predic-
tion method, a novel validation method of the uncertainty estimates 𝑃𝑘|𝑘 = (𝐼 − 𝐾𝑘 𝐻𝑘 )𝑃𝑘|𝑘−1 (5)
using a conventional analytical approach is illustrated. Assuming discrete time intervals, 𝑥̂ 𝑘|𝑘−1 is the estimated state at time
In data-driven tool wear prediction during machining processes, the 𝑘 − 1, 𝑃𝑘|𝑘−1 is the predicted error covariance at time 𝑘 − 1, and 𝑧𝑘 is
data are collected sequentially as multivariate time-series. The input the actual measurement at time 𝑘. 𝐹𝑘 is the state transition matrix, 𝐵𝑘
data used for prediction are denoted as  = {(𝑥𝑖 , 𝑦𝑖 )| 𝑖 = 1, … , 𝑁}. is the control input matrix, and 𝑢𝑘 is the control input vector at time
All data are assumed to be independent and identically distributed, 𝑘. 𝑄𝑘 is the covariance matrix of the process noise, 𝐾𝑘 is the Kalman
∏
𝑝(|𝝎) = 𝑖 𝑝(𝑦𝑖 |𝑥𝑖 , 𝝎). Each pair of 𝑥 and 𝑦 indicates the input signal gain, 𝐻𝑘 is the measurement matrix, and 𝑅𝑘 is the covariance matrix
(i.e., sensor measurements) and the corresponding tool wear degree, of the measurement noise. The noise suppression process is performed
136
Fig. 2. Illustration of the proposed uncertainty modeling method.
through two steps: (1) the prediction step and (2) the update step. approaches not only have solid theoretical groundings in uncertainty-
As shown in Eq. (1), the state prediction is performed based on the aware modeling but also have shown superior performance and scal-
information available up to the previous time step 𝑘 − 1. The error ability in the manufacturing domain [14,57]. In particular, recent de-
covariance prediction represents the best estimate of the uncertainty or velopments in Bayesian neural networks (BNNs) that combine Bayesian
error in the predicted state, as shown in Eq. (2). During the prediction inference with DNN-based predictive models have further accelerated
step, KF estimates forward in time, taking into account the system the widespread application of BNNs in real-world practice. Hence,
dynamics. The Kalman gain is a key factor that determines the weight this work proposes applying a novel Bayesian learning approach to
given to the measurement update through the update step, as expressed capture epistemic uncertainty for mobile sensing data-based tool wear
in Eq. (3). Then, the state update combines the predicted state with the prediction.
measured data to obtain a refined estimate of the current state through Compared to the maximum likelihood estimation (MLE) that aims
Eq. (4). Finally, the error covariance update refines the uncertainty to find a set of model parameters 𝝎 that maximize the likelihood
estimate in the state based on the incorporation of new measurements, 𝑝(|𝝎), the proposed method takes a Bayesian learning approach to
as shown in Eq. (5). The update step in KF uses the Kalman gain to model epistemic uncertainty for tool wear prediction. In particular,
combine the predicted state with the actual measurements, producing the proposed method seeks to find a posterior distribution over model
an improved state estimate with reduced noise and data uncertainty. parameters 𝑝(𝝎|) to incorporate epistemic uncertainty. Using a Bayes
In this way, the proposed method aims to suppress unexpected noise theorem, the exact posterior is calculated as shown in Eq. (6). By treat-
effects from mobile sensing data, so as to achieve accurate and robust ing the model parameter as a random variable that follows a probability
tool wear prediction during machining processes. distribution and by utilizing the computed posterior, the epistemic
uncertainty can be captured during prediction. In detail, the prediction
2.2. Incorporating uncertainty for tool wear prediction using the Bayesian learning approach takes into account the uncertainty
in model parameters by marginalization over the calculated posterior,
While the KF-based noise suppression has been performed to reduce as indicated in Eq. (7). This process is interpreted as an ensemble
the effects of noisy sensor measurements that occur during smartphone of infinitely many models in a Bayesian manner, called the Bayesian
mobile sensing, the remaining uncertainty still has to be taken into ac- model averaging (BMA). However, when using a DL-based data-driven
count for more accurate and robust tool wear prediction. In particular, predictive model, the calculation of the exact posterior is analytically
the uncertainty should be directly incorporated into a predictive model intractable (see the denominator of Eq. (6)), and thus is impractical to
to enable uncertainty-aware tool wear prediction. To this end, this work optimize due to the large number of model parameters. To overcome
considers two different primary sources of uncertainty in data-driven
this problem and enable the modeling of epistemic uncertainty using
prediction: (1) epistemic uncertainty and (2) aleatoric uncertainty,
Bayesian learning, the proposed method adopts a variational inference
as expounded in Section 1. First, as noisy mobile sensing data leads
(VI), one of the most scalable and theoretically sound approximate
to increased uncertainty during prediction, it is essential to model
Bayesian inference techniques.
and quantify epistemic uncertainty. Second, aleatoric uncertainty can
be directly modeled using a specific predictive model architecture, 𝑝(|𝝎)𝑝(𝝎) 𝑝(|𝝎)𝑝(𝝎)
𝑝(𝝎|) = = (6)
which enables the modeling of inherent aleatoric uncertainty by learn- 𝑝() ∫𝝎′ 𝑝(|𝝎′ )𝑝(𝝎′ ) 𝑑𝝎′
ing. Therefore, this work proposes an uncertainty-aware method that
not only captures the two primary uncertainty types but also dis- 𝑝(𝑦∗ |𝑥∗ , ) = 𝑝(𝑦∗ |𝑥∗ , 𝝎)𝑝(𝝎|) 𝑑𝝎
∫𝝎 (7)
criminates between the two during prediction. In this way, different [ ]
= E𝝎∼𝑝(𝝎|) 𝑝(𝑦∗ |𝑥∗ , 𝝎)
uncertainty types can be adeptly captured using a single predictive
model, improving prediction performance and robustness. An illustra- There are several approaches to overcoming the intractability issue
tion of the proposed method for uncertainty modeling, including a of exact posterior calculation, including various approximate inference
Bayesian learning approach and a density output structure, is shown in techniques, such as Laplace approximation, expectation propagation,
Fig. 2. From the data processing perspective of the predictive model, and sampling-based methods (e.g., Markov chain Monte Carlo). Among
the aleatoric and epistemic uncertainties are captured sequentially, existing approximate inference techniques, VI (i.e., variational Bayes)
each via probabilistic model parameters and predictive distribution. is known to have high approximation accuracy to true posterior distri-
Detailed descriptions of the components of the proposed uncertainty bution while being scalable to large-sized models like DNNs. Therefore,
modeling method are provided below. the proposed method develops a VI-based uncertainty-aware tool wear
prediction approach for practical, real-world applications. Based on VI,
2.2.1. Modeling epistemic uncertainty the proposed method postulates a variational posterior 𝑞(𝝎) parameter-
Most existing uncertainty-aware prediction approaches focus on ized with variational parameters 𝝓 to approximate the true posterior
incorporating epistemic uncertainty through various methods, such 𝑝(𝝎|) of the tool wear prediction model. Hence, a set of parameters
as bootstrapping, ensembling, and Bayesian learning (i.e., Bayesian of approximate posterior 𝝓 is optimized rather than 𝝎 during training.
inference). While these models have differences in technical details, the To measure the distance between two distributions, a Kullback–Liebler
underlying idea of using multiple predictive models to incorporate epis- divergence 𝐷𝐾𝐿 is used, as expressed in Eq. (8). Using VI, 𝐷𝐾𝐿 between
temic uncertainty is shared. Among all models, Bayesian learning-based variational posterior 𝑞𝝓 (𝝎) and the true posterior 𝑝(𝝎|) is minimized,
137
as shown in Eq. (9). The derivation leads to the evidence lower bound
(ELBO), which becomes a lower bound on the marginal likelihood 𝑝(),
∑
𝑀
[ ]
because (1) it does not depend on the variational parameter 𝝓 and (2) 𝐸𝐿𝐵𝑂 ≈ log(𝑝(|𝝎(𝑚) )) + log(𝑝(𝝎(𝑚) )) − log(𝑞𝝓 (𝝎(𝑚) )) (12)
𝐷𝐾𝐿 is always greater than or equal to 0. 𝑡=1
∞ ( 𝑝(𝑥) ) where:
𝐷𝐾𝐿 (𝑃 ∥ 𝑄) = 𝑝(𝑥) log 𝑑𝑥 (8)
∫−∞ 𝑞(𝑥)
𝝎(𝑚) ∼ 𝑞𝝓 (𝝎) for 𝑚 = 1, … , 𝑀
( 𝑞𝝓 (𝝎) )
𝐷𝐾𝐿 (𝑞𝝓 (𝝎) ∥ 𝑝(𝝎|)) = 𝑞 (𝝎) log 𝑑𝝎
∫ 𝝓 𝑝(𝝎|)
( 𝑞𝝓 (𝝎)𝑝() ) 2.2.2. Modeling aleatoric uncertainty
= 𝑞 (𝝎) log 𝑑𝝎 As explained above, the proposed method incorporates epistemic
∫ 𝝓 𝑝(𝝎, )
[ uncertainty for tool wear prediction via a VI-based Bayesian learning
= 𝑞 (𝝎) log(𝑞𝝓 (𝝎)) + log(𝑝())
∫ 𝝓 (9)
approach. On the other hand, the proposed method captures aleatoric
]
− log(𝑝(𝝎, )) 𝑑𝝎 uncertainty for tool wear prediction with a novel technique that di-
[ rectly represents measurement noise and data’s inherent randomness.
= log(𝑝()) + 𝑞 (𝝎) log(𝑞𝝓 (𝝎)) Specifically, to model aleatoric uncertainty, the proposed method de-
∫ 𝝓
] signs the output of the DNN-based predictive model to resemble the
− log(𝑝(𝝎, )) 𝑑𝝎
structure of a density network [60] with an inherent noise assump-
= log(𝑝()) − 𝐸𝐿𝐵𝑂
tion [46]. To the best of our knowledge, this work first addresses
Based on the derivation of ELBO that is previously provided, finding the aleatoric uncertainty under a structured probabilistic framework
a surrogate posterior using VI via minimization of 𝐷𝐾𝐿 leads to the to improve tool wear prediction performance and robustness. In par-
maximization of ELBO, as shown in Eq. (9). This approximate inference ticular, this work adopts the heteroscedastic noise assumption, which
process is also known as a minimum description length [58]. Therefore, supposes that the noise and uncertainty level of data are different
the proposed method based on VI has enabled a tractable calculation for every machining condition and vary by machining pass. In the
of finding a surrogate posterior of the tool wear prediction model. proposed method, the predictive model 𝑓𝝎 is trained to predict the
This enables a streamlined integration with any DNN-based predictive inherent uncertainty of the target (i.e., the tool wear degree) by itself.
models, which is one technical novelty of this work. For practical In particular, the proposed method imposes a Gaussian distribution for
calculation, ELBO can be expressed as Eq. (10), which only requires the tool wear degree, whose distributional parameters (i.e., location
the expectation over variational posterior that is analytically tractable.
∏ and scale) are predicted by the model. Therefore, the predictive model
In particular, the likelihood 𝑝(|𝝎) = 𝑖 𝑝(𝑦𝑖 |𝑥𝑖 , 𝝎) is modeled with a
is designed to have a density output structure that returns the data
DNN predictive model, and the prior distribution is set to a standard
generating distribution of a tool wear degree, as shown in Eq. (13).
Gaussian  (0, 𝐼) [14]. In addition, the proposed method employs
In this manner, the aleatoric uncertainty of the data can be captured
mean-field VI, in which every model parameter is assumed to be
naturally by learning (i.e., training), as both predictive mean 𝑦̂ and
independent. Therefore, the variational posterior is fully factorized
∏ observation noise 𝜎̂ are dependent on what the model learned using
as 𝑞(𝝎) = 𝑗 𝑞(𝜔𝑗 ). Based on the aforementioned VI process, dur-
ing training, ELBO is maximized by variational parameters 𝝓 to find data.
the variational posterior 𝑞(𝝎) that most closely approximates the true 𝑓𝝎 (𝑥) = [𝑦,
̂ 𝜎] ̂ 𝜎̂ 2 )
̂ 𝑤ℎ𝑒𝑟𝑒 𝑝(𝑦|𝑥) =  (𝑦, (13)
posterior distribution of the model parameters.
[ ] Therefore, the likelihood is transformed into that of a Gaussian
𝐸𝐿𝐵𝑂 = 𝑞𝝓 (𝝎) log(𝑝(𝝎, )) − log(𝑞𝝓 (𝝎)) 𝑑𝝎
∫ predictive distribution that captures aleatoric uncertainty, as shown
[ ] (10) in Eq. (14). One desirable effect of designing the output as a density
= 𝑞𝝓 (𝝎) log(𝑝(|𝝎)) + log(𝑝(𝝎)) − log(𝑞𝝓 (𝝎)) 𝑑𝝎
∫ to capture aleatoric uncertainty, as done in the proposed method,
[ ]
= E𝝎∼𝑞𝝓 log(𝑝(|𝝎)) + log(𝑝(𝝎)) − log(𝑞𝝓 (𝝎)) is a complementary behavior of model predictions [𝑦(𝑥), ̂ 𝜎(𝑥)].
̂ Also
known as the loss attenuation [46], the predicted scale parameter
However, several issues should be addressed before applying VI to
𝜎(𝑥)
̂ not only captures aleatoric uncertainty but also strikes a balance
a DNN-based tool wear prediction model. First, the variational poste-
for residual error (𝑦 − 𝑦).
̂ This is observed during the model training
rior should be designed to have analytical tractability and sampling
that maximizes the likelihood term (as ELBO is maximized during
efficiency. Therefore, the proposed method designs the approximate
training, as shown in Eq. (10)). When the model predicts a high level
posterior 𝑞𝜙 as a Gaussian distribution  (𝜇, 𝜌2 ) with learnable param-
of aleatoric uncertainty, a large value of 𝜎̂ 2 (𝑥) regulates the effects of
eters 𝜙 = (𝜇, 𝜌). To facilitate the backpropagation-based training of the
residual error, neglecting the contribution of uncertain data samples
DNN-based predictive model and the sampling of model parameters, a
reparameterization trick is used in the proposed method, as expressed in ELBO computation. At the same time, the logarithm term of the
in Eq. (11). In this way, the proposed method enables the estimate of predicted variance in Eq. (14) prohibits generating too large aleatoric
the expectation in Eq. (10) differentiable with respect to the learnable uncertainty for every data sample. In this way, the proposed method
parameters. Based on this technique, the proposed method can be effectively captures aleatoric uncertainty during tool wear prediction.
applied to any DNN-based model architectures, such as CNN and RNN. Furthermore, the level of 𝜎(𝑥),
̂ which is continuously predicted during
Second, as the expectation over 𝑞𝝓 is analytically difficult to derive, an machining, is used as an uncertainty estimate.
unbiased Monte Carlo (MC) estimation is used to calculate ELBO during ( 1 (𝑦 − 𝑦)̂2 )
training [59], as shown in Eq. (12). log(𝑝(𝑦|𝑥, 𝝎)) = log √ ⋅ exp(− )
2 𝜎
̂ 2
2𝜋 𝜎̂ 2 (14)
𝑞𝜙 (𝜔) =  (𝜇, 𝜌2 ) 1 ̂2
1 (𝑦 − 𝑦)
(11) = − log(𝜎̂ 2 ) − +𝐶
𝜔=𝜇+𝜖⋅𝜌 2 2 𝜎̂ 2
where: where:
𝜖 ∼  (0, 1): randomly sampled Gaussian noise 𝐶: constant term independent of 𝝎
138
2.3. Probabilistic predictive model 2.3.3. Model inference

To use the trained predictive model 𝑓𝝎 , the proposed method ob-
tains model parameters from the variational posterior 𝑞̂𝝓 (𝝎). Because
Based on the components of the proposed uncertainty modeling
Bayesian learning is used to capture epistemic uncertainty, multiple
method for tool wear prediction described above, this section details
samples from the predictive model are utilized in a BMA manner
the predictive model’s architecture and how the proposed method is
through posterior predictive, as expressed in Eq. (7). Hence, the tool
optimized and used for prediction. In particular, how the two types of
wear prediction on a new input data sample 𝑥∗ is performed based on
uncertainty (i.e., aleatoric and epistemic uncertainties) modeled in the MC estimation, with each set of model parameters 𝜔(𝑡) = 𝜇(𝑡) +𝜖⋅𝜌(𝑡) sam-
proposed method can be dissected from the predictive distribution is pled from the obtained variational posterior, as shown in Eq. (17). In
explained in detail. practice, the number of MC samples (i.e., 𝑇 ) during inference is usually
set to 10 to 30 for a stable prediction and likelihood computation [58].
2.3.1. Model architecture 𝑝(𝑦∗ |𝑥∗ , ) = {𝑓𝝎(𝑡) (𝑥∗ )| 𝑡 = 1, … , 𝑇 } (17)
To apply the proposed uncertainty modeling method, a backbone
architecture for the predictive model is required. In particular, a DNN 2.3.4. Uncertainty dissection
architecture that can handle multivariate time-series data with effi- Using the aggregated tool wear prediction results, the uncertainty
cient computation is required (as discussed in Section 1) because, in captured by the predictive model can be effortlessly computed by
practice, the tool wear prediction should be continually performed in the total prediction variance. As the proposed method utilizes the
real-time during machining processes. To this end, this work employs Bayesian learning approach and the density output structure to capture
a backbone model based on an EfficientNet (EN) [61] that achieves both aleatoric and epistemic uncertainties, each type of uncertainty
high performance using efficient convolutional operations. The se- can be calculated respectively per data sample. In other words, the
proposed method can distinguish between sources of the predictive
lected predictive model uses a mobile inverted bottleneck convolution
uncertainty in an online manner during tool wear prediction, unlike
(MBConv) [62] that achieves computational efficiency using inverted
existing uncertainty-aware prediction methods in the literature. The
residuals that combine linear bottlenecks with depthwise separable
dissection of different uncertainty types is an essential characteristic
convolution operations. In addition, a squeeze-and-excitation mecha-
of the proposed method because it can not only ensure that uncer-
nism [63] is adopted that fuses spatial and channel-wise information in
tainties from various sources are captured for more robust prediction
local receptive fields. To improve prediction performance, this mech- but also enables reliable quantification of prediction confidence during
anism automatically extracts latent information and recalibrates it to machining processes. This work claims that the predictive uncertainty
facilitate feature learning. of the proposed method 𝑢𝑝𝑟𝑒𝑑 can be dissected into aleatoric uncer-
In this way, the proposed EN-based predictive model has a com- tainty 𝑢𝑎 and epistemic uncertainty 𝑢𝑒 , as shown in Eq. (18). The
putationally efficient structure, thus being suitable for real-time tool proposed method’s ability to capture both uncertainty types enables
wear prediction. On top of the EN-based architecture, a prediction this streamlined uncertainty dissection, which is one of the crucial
module, which consists of global average pooling and several fully technical novelties. In practice, the dissection of the resulting predic-
connected layers, is constructed. In particular, as previously men- tive uncertainty can be used to diagnose the source of uncertainty
tioned, two outputs that comprise the density’s distributional param- using the proposed method for informed decision-making like devel-
eters (i.e., 𝑦, ̂ are yielded. Batch normalization (BN) and dropout are
̂ 𝜎) oping tool maintenance strategies. In addition, the experimental results
additionally used for model regularization. The complete architecture empirically demonstrate that incorporating both uncertainty types con-
details of the proposed tool wear prediction model, determined based sistently improves tool wear prediction performance, which will be
on hyperparameter tuning, are illustrated later in Section 3. discussed later in Section 4.
1 ∑( ∗ )2
𝑇
𝑢𝑎 = ̂ )𝑡
𝜎(𝑥
𝑇 𝑡=1
2.3.2. Model optimization (18)
1 ∑( ∗ )2 ( 1 ∑
𝑇 𝑇
Applying the proposed uncertainty modeling method to the EN- )2
𝑢𝑒 = ̂ )𝑡 −
𝑦(𝑥 ̂ ∗ )𝑡
𝑦(𝑥
based predictive model requires proper training procedures that incor- 𝑇 𝑡=1 𝑇 𝑡=1
porate two types of uncertainty (i.e., aleatoric and epistemic uncertain-
where:
ties) by learning from data. During training, the learnable parameters
of the proposed VI-based BNN with a density output structure are op- 𝑢𝑎 : aleatoric uncertainty
timized using an ELBO-based loss function. In particular, the negative 𝑢𝑒 : epistemic uncertainty
ELBO with a likelihood term represented as Eq. (14) is minimized, as 𝑢𝑝𝑟𝑒𝑑 = 𝑢𝑎 + 𝑢𝑒 : predictive uncertainty
expressed in Eq. (15), to find the approximate variational parameters
𝝓 = (𝝁, 𝝆). As previously mentioned, the expectation over the vari- 2.4. Validation using a conventional analytical approach
ational posterior is approximated with MC sampling during training.
For optimization, as the proposed predictive model comprises a set The analytical tool wear model has uncertainty due to systematic
of trainable weights, gradient-based backpropagation is utilized for errors and statistical estimates of random errors. In addition to provid-
training. In practice, a stochastic gradient descent (SGD)-based update ing uncertainty-aware tool wear predictions, yielding a valid degree of
predictive uncertainty during the machining process is also important.
is performed, as shown in Eq. (16). One technical novelty is that
Therefore, this work uses the conventional analytical model of uncer-
the proposed method can be trained with any SGD-type optimization
tainty to validate the proposed method’s predictive uncertainty [64,65]
algorithm used for DL training.
based on the measured tool wear through Eq. (19) to Eq. (23). It is
[ ] worth noting that, to the best of our knowledge, this is the first work to
𝓁 = E𝑥,𝑦∼ E𝝎∼𝑞𝝓 [log(𝑞𝝓 (𝝎)) − log(𝑝(𝑦|𝑥, 𝝎)) − log(𝑝(𝝎))] (15)
validate a data-driven uncertainty-aware method with a conventional
( 𝜕𝓁 analytical approach to verify the reliability of the uncertainty estimates.
𝜕𝓁 )
𝛥𝝁 = −𝛾 + In the given process model 𝑅 (i.e., the analytical tool wear model),
𝜕𝝎 𝜕𝝁
( 𝜕𝓁 (16) the function has the process parameters 𝑥𝑖 . Each process parameter has
𝜕𝓁 )
𝛥𝝆 = −𝛾 𝜖+ a variation 𝛿𝑥𝑖 , which causes the model uncertainty 𝛿𝑅𝑖 , as shown in
𝜕𝝎 𝜕𝝆
139
Eq. (20). To calculate model uncertainty, the relative variation for the Table 1
process model is expressed using the chain rule, as shown in Eq. (21). Machining conditions and information on the turning experiments of Ti-6Al-4V.
This process is performed using an analytical model that estimates the Condition Experiment 1 Experiment 2 Experiment 3
relative uncertainty in the results arising from the uncertainty of the Cutting speed (m/min) 100.00 80.00 100.00
process parameters. Based on the calculated 𝑢𝑥𝑖 , the uncertainty for Feed (mm/rev) 0.15 0.15 0.20
Axial depth (mm) 1.00 1.00 1.00
each process parameter can be estimated as Eq. (22), where < 𝑥𝑖 >
Machining length (pass) 1∼6 1∼6 1∼6
is an average value of the process parameters. Machining distance (mm) 60.00 60.00 60.00
Material removal rate (mm3 /s) 250.00 200.00 333.30
𝑅 = 𝑅(𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑁 ) (19)
Machining time (s) 229.90 232.50 161.10
𝜕𝑅
𝛿𝑅𝑖 = 𝛿𝑥 (20)
𝜕𝑥𝑖 𝑖 Table 2
𝛿𝑅𝑖 1 𝜕𝑅 𝑥 𝜕𝑅 𝛿𝑥𝑖 Descriptive statistics of datasets.
= = 𝑖 (21) Dataset Variable Mean SD Min Max Unit
𝑅 𝑅 𝜕𝑥𝑖 𝑅 𝜕𝑥𝑖 𝑥𝑖
𝑥𝑖 𝜕𝑅 𝛿𝑥𝑖 𝑥 𝜕𝑅 𝐴𝑐𝑐𝑥 −0.5066 0.0011 −0.5134 −0.5005 m∕s2
𝑢𝑅 𝑖 = = 𝑖 𝑢 (22) 𝐴𝑐𝑐𝑦 −0.8566 0.0024 −0.9394 −0.7812 m∕s2
𝑅 𝜕𝑥𝑖 𝑥𝑖 𝑅 𝜕𝑥𝑖 𝑥𝑖 1
𝐴𝑐𝑐𝑧 −0.0117 0.0025 −0.1177 0.0864 m∕s2
where: Audio 78.3776 7.7395 48.9995 95.8572 dB
𝐴𝑐𝑐𝑥 −0.5072 0.0010 −0.5125 −0.4999 m∕s2
𝑢𝑥𝑖 = 𝛿𝑥𝑖 ∕ < 𝑥𝑖 > 𝐴𝑐𝑐𝑦 −0.8569 0.0022 −0.9362 −0.7823 m∕s2
2
𝐴𝑐𝑐𝑧 −0.0116 0.0024 −0.1166 0.0840 m∕s2
Therefore, the final uncertainty from the analytical model can be
Audio 76.9458 6.3028 60.3477 95.3525 dB
calculated as Eq. (23), which is the summation of the root squared
𝐴𝑐𝑐𝑥 −0.5074 0.0009 −0.5129 −0.5023 m∕s2
errors for all relative process parameter uncertainty. Through the con-
𝐴𝑐𝑐𝑦 −0.8570 0.0024 −0.9373 −0.7832 m∕s2
ventional analytical approach, the model uncertainty can be calculated 3
𝐴𝑐𝑐𝑧 −0.0121 0.0026 −0.1161 0.0847 m∕s2
in a specific machining distance, because each process parameter is Audio 77.0196 7.4985 44.2063 92.9458 dB
a value that is calculated for a certain machining length. Although
this analytical approach cannot calculate continuous uncertainty es-
timates, it can still be used for the comparative verification of the
proposed tool wear prediction method later discussed in Section 4. To 3.2. Data description
the best of our knowledge, this work is the first to technically validate
the data-driven uncertainty quantification results using a conventional Three datasets are collected under different machining conditions
analytical approach, thereby further providing theoretical validity. and used for tool wear prediction experiments in this work. Installed
sensors in the smartphone include acceleration, audio, and gyroscope
{( )2 ( 𝑥 )2 (𝑥 )2 } 12 sensors, which can be readily used to collect real-time signals during
𝛿𝑅 𝑥1 𝜕𝑅 𝜕𝑅 𝜕𝑅
= 𝑢𝑅 = ± 𝑢𝑥1 + 2 𝑢𝑥 2 + ⋯ + 𝑁 𝑢𝑥𝑁 the machining process. However, as gyroscopic sensor signals do not
𝑅 𝑅 𝜕𝑥1 𝑅 𝜕𝑥2 𝑅 𝜕𝑥𝑁
show significant changes during machining, they are not used for
(23) tool wear prediction in this work. Hence, data collected from the
where: experiments exist in the form of multivariate time-series sensor sig-
nals consisting of four variables, including accelerations on each axis,
𝑢𝑥𝑖 = 𝛿𝑥𝑖 ∕ < 𝑥𝑖 > denoted as 𝐴𝑐𝑐𝑥 , 𝐴𝑐𝑐𝑦 , 𝐴𝑐𝑐𝑧 , and an audio signal. The acceleration
variables are measured in m∕s2 and the audio signal is measured in
3. Experiments
dB. Exemplary sensor measurements used as input data are visualized
in Fig. 4, which indicates that each variable shows different patterns
3.1. Experimental setup
over time. The descriptive statistics for each experimental dataset are
provided in Table 2.
In this work, a turning process is performed on a CNC lathe machine
to identify the actual tool wear (SKT 28; Hyundai Wia Co.). The Ti-
3.3. Resulting tool wear
6Al-4V is used as a work material, with the tungsten carbide cutting
tool (CNGP120408; Kennametal Co.) used with a rake angle of 10◦
and a clearance angle of 7◦ . To acquire the sensor measurements, a A 3D confocal laser scanning microscopy (MSV266; Leica Co.) mea-
smartphone (iPhone 13; Apple Co.) is attached to the tool jig. The sures tool wear three times under all machining distances. The tool
acceleration and acoustic emission data have been collected at a fre- wear images for various machining conditions and distances are shown
quency of 100 Hz. While an iPhone is used to collect input data in this in Fig. 5. The measured tool wear degree varies by machining condi-
work, other types of smartphones (e.g., Samsung Galaxy) can also be tion, as shown in Fig. 6. High cutting speed and material removal rate
used for data collection because similar sensors are installed in other of Conditions 1 and 3 (as provided in Table 1) tend to show more rapid
smartphone types. The detailed experimental setup is shown in Fig. 3. tool wear generation compared to Condition 2. The experiments are
The machining conditions are compromised with a cutting speed of performed for identical machining distances under different conditions.
80 and 100 m∕min, a feed rate of 0.15 and 0.2 mm/rev, and a cut Condition 3 shows the highest tool wear degree, especially in the early
depth of 1 mm, as shown in Table 1. To demonstrate the robustness phase of the machining process (i.e., 1 and 2 passes); the high cutting
of the tool wear prediction model across various turning conditions, speed and feed rate generate a large contact area directly related to the
the experiments aim to present different patterns of flank wear by elevated temperature. However, as the machining process continues,
employing conditions with both conservative and non-conservative the tool wear trend is reversed due to the thermal softening of the work
machining parameter settings. The spindle speed was adjusted by con- material. Elevated temperatures induce thermal softening in the work
sidering the diameter of the work material. Each machining pass has material, resulting in a reduction in material hardness. Remarkably,
a machining distance of 60 mm, and the three experiment conditions an extended machining length correlates with decreased tool wear.
use a material removal rate of 250.0 mm3 ∕s, 200.0 mm3 ∕s, and 333.3 On the other hand, the lowest tool wear is observed in Condition 2.
mm3 ∕s, respectively. All three experiments are performed until the end- Low cutting speeds and feeds are employed in titanium machining
of-life (EOL) of the cutting tools, which covers the total wear generation to reduce tool wear by minimizing heat generation at the tool-chip
trend before tool replacement. and tool-work material interface. Additionally, lower cutting speeds
140
Fig. 3. An experimental setup for the turning process of Ti-6Al-4V.
trends and the total machining time before the EOL of the tool. Second,
coated cutting tools might also affect tool wear behavior, while this
work’s experiments have used uncoated tungsten carbide cutting tools,
which negatively affects tool wear generation. In addition, this work
employs lubricant conditions as a dry machining condition for rough
machining, which generally leads to faster tool wear generation. Yet,
the datasets used in this work can sufficiently validate the effectiveness
of the tool wear prediction methods since the experiments cover entire
tool wear trends.
3.4. Tool wear calculation
The tool wear model is empirically applied to predict continuous

tool wear using the abrasive wear model, as expressed in Eq. (24).
The abrasive tool wear model refers to an empirical model that esti-
mates the wear phenomenon occurring during the abrasive machining
process. In this process, the cutting tool surface undergoes wear over
time due to its interaction with the work material. The employed tool
Fig. 4. Visualization of sensor measurements, including acceleration signals in 𝑥, 𝑦, wear model is a function of 𝑎𝑘 , 𝑉𝑐 , 𝑏𝑘 , and 𝑡, where 𝑎𝑘 and 𝑏𝑘 are
𝑧 directions, and an audio signal. The 𝑥-axis indicates measured time and the 𝑦-axis parameters, 𝑉𝑐 is a cutting speed, and 𝑡 is a time. Ideally, tool wear must
indicates signal amplitude. be measured as frequently (e.g., continually) as possible. However,
it may not be feasible to change the cutting tool when considering
factors, such as tool replacement costs and productivity. Moreover, the
help mitigate chemical reactions between the tool and work material, tool wear trend usually changes depending on the machining time.
preserving tool integrity and minimizing wear associated with material Because this trend exhibits nonlinear characteristics, in this work, the
work hardening. During the repeated experiments, a comparatively abrasive wear model is computed at each time step for more accurate
reduced standard deviation in tool wear is observed under Condition 2 wear calculation. To estimate the tool wear model’s parameters, the
as opposed to Conditions 1 and 3. This phenomenon can be attributed Levenberg–Marquarte (LM) method is applied, as shown in Eq. (25).
to the elevated cutting and transverse speeds that induce a rapid 𝑃𝑘+1 is a parameter vector at iteration 𝑘 + 1, 𝑃𝑘 is a parameter vector
increase in tool wear. However, in Condition 2, the abrupt occurrence at iteration 𝑘, and 𝑟 is a residual vector at the current parameters. 𝐽𝑟 is
of tool wear is not evident, highlighting the mitigating effect of the a Jacobian matrix for the residual vector 𝑟, 𝑣𝑘 is a damping parameter,
specified conditions on the observed tool wear rates. and 𝑑𝑖𝑎𝑔(𝐽𝑟𝑇 𝐽𝑟 ) is a diagonal matrix created from the square of the
In general, the stages of tool wear are categorized into three: (1) Jacobian matrix. The LM method is an iterative optimization algorithm
an initial stage where tool wear rapidly increases, (2) an intermediate used for solving non-linear least squares problems. In particular, this
stage where tool wear gradually increases, and (3) a final stage where algorithm is popularly used in the numerical optimization field. Each
tool wear rapidly increases again. In the three experiments performed iteration of the LM method calculates a step size that minimizes a cost
function associated with the sum of squared residuals between observed
in this work, all the possible stages of tool wear are observed in
and predicted values. The algorithm dynamically adjusts the damping
each machining condition. Therefore, each dataset covers the entire
parameter during the optimization process, enabling it to navigate both
tool wear generation trends throughout the tool’s operational lifetime.
steep and shallow regions of the cost function. The computed tool wear
However, compared to existing works [66], the experiments exhibit
model parameters and constants used to generate a continuous degree
shorter machining time (e.g., around 4 mins), as provided in Table 1.
of tool wear are shown in Table 3.
This can be attributed to several reasons. First, using different work
materials and lubricants might lead to different tool wear generation 𝑉 𝐵𝑘 = 𝑎𝑘 𝑉𝑐 𝑡 + 𝑏𝑘 (24)
141
Fig. 5. Tool wear measurement at 1 to 6 passes (from left to right) of (a) Experiment 1, (b) Experiment 2, and (c) Experiment 3, respectively.
the original data into suitable input shapes for predictive models, a
sliding window method [67] is applied to generate sliced time-series
data samples with identical sequence lengths. To reduce training time
and downsample the data, a sliding window of size 400 with a stride of
1 is used [14,15]. The number of sliced data samples for each dataset
correlates with the machining time, resulting in 19,901, 20,096, and
12,977, respectively. For each experiment, the entire data are divided
into training, validation, and test sets with a ratio of 70%, 10%, and
20%, respectively.
𝑥 − 𝑥̄
𝑥𝑠𝑐𝑎𝑙𝑒𝑑 = . (28)
𝜎
where:
𝑥: the original independent variable

̄ the mean value
𝑥:
𝜎: the standard deviation
𝑥𝑠𝑐𝑎𝑙𝑒𝑑 : the scaled independent variable
3.6. Evaluation metrics
Fig. 6. Measured tool wear at various machining distances (i.e., at 1 to 6 passes) for
Considering that the proposed tool wear prediction method outputs
each machining condition. The 𝑦-axis indicates the tool flank wear degree.
a continuous scalar value, several appropriate regression metrics are
employed for evaluation and performance comparison [22,28]. The
mean absolute error (MAE), the root mean squared error (RMSE), and
the mean absolute percentage error (MAPE) are used. The coefficient
𝐩𝑘+1 = 𝐩𝑘 − (𝐽𝐫𝑇 𝐽𝐫 + 𝜐𝑘 𝑑𝑖𝑎𝑔(𝐽𝐫𝑇 𝐽𝐫 ))−1 𝐽𝐫𝑇 𝐫(𝐩𝑘 ), 𝑘 ≥ 0 (25) of determination (i.e., 𝑅2 ) is also used. The evaluation metrics used in
this work are shown in Eq. (29) to Eq. (32). In addition, to evaluate the
⎡ 𝜕𝑟1 (𝐩) …
𝜕𝑟1 (𝐩)
⎤ predictive models’ probabilistic predictions, a predictive log-likelihood
⎢ 𝜕𝑝1 𝜕𝑝𝑚
⎥ (LL) is used [59], as shown in Eq. (33).
𝐽𝐫 (𝐩) = ⎢ ⋮ ⋱ ⋮ ⎥ (26)
⎢ 𝜕𝑟𝑛 (𝐩) 𝜕𝑟𝑛 (𝐩) ⎥
1 ∑
𝑁
⎣ 𝜕𝑝1 … 𝜕𝑝𝑚 ⎦ 𝑀𝐴𝐸 = |𝑦 − 𝑦̂𝑖 | (29)
⎡𝑟1 (𝐩)⎤ ⎡𝑦1 − 𝑓 (𝑥1 , 𝐩)⎤ 𝑁 𝑖=1 𝑖
⎢ ⎥ ⎢ ⎥ √
𝑟 (𝐩) 𝑦 − 𝑓 (𝑥2 , 𝐩)⎥ √
𝐫(𝐩) = ⎢ 2 ⎥ = ⎢ 2 √1 ∑ 𝑁
𝑅𝑀𝑆𝐸 = √
(27)
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ (𝑦 − 𝑦̂𝑖 )2 (30)
⎢𝑟 (𝐩)⎥ ⎢𝑦 − 𝑓 (𝑥 , 𝐩)⎥ 𝑁 𝑖=1 𝑖
⎣ 𝑛 ⎦ ⎣ 𝑛 𝑛 ⎦
100 ∑ |𝑦𝑖 − 𝑦̂𝑖 |
𝑁
𝑀𝐴𝑃 𝐸 = (31)
3.5. Data preprocessing 𝑁 𝑖=1 |𝑦𝑖 |
∑𝑁
(𝑦𝑖 − 𝑦̂𝑖 )2
A series of data preprocessing has been employed to facilitate tool 𝑅2 = 1 − ∑𝑖=1
𝑁
(32)
2
wear prediction model development. First, all experimental data are 𝑖=1 (𝑦𝑖 − 𝑦̄𝑖 )
standardized for normalization, as expressed in Eq. (28). To transform where:
142
Table 3
Estimated parameters for the abrasive tool wear model.
Experiment number Constant Number of pass
1 2 3 4 5 6
𝑎𝑘 2.37E−02 1.36E−02 3.23E−03 8.74E−04 1.80E−02 3.90E−02
1 𝑏𝑘 1.00E−02 35.78 1.18E+02 1.46E+02 −1.26E+02 −5.36E+02
𝑡 35.60 78.90 118.60 158.50 195.40 229.90
𝑎𝑘 1.67E−02 2.15E−03 1.85E−04 9.95E−03 2.27E−03 1.19E−02
2 𝑏𝑘 1.00E−02 52.70 6.63E+02 −3.05E+01 6.65E+01 −8.49E+01
𝑡 45.30 86.60 123.90 157.70 196.20 232.50
𝑎𝑘 4.14E−02 4.61E−03 1.08E−02 2.22E−02 1.89E−02 1.44E−02
3 𝑏𝑘 1.00E−02 1.07E+02 7.17E+01 −2.54E+01 1.05E+01 7.27E+01
𝑡 29.20 57.70 85.10 111.30 136.20 161.10
Table 4 Table 5
Hyperparameter tuning of the proposed method. The detailed backbone architecture of the proposed method.
Hyperparameter Search space Stage Operator Size of Output shape Number of Activation
kernels layers function
Number of MBConv layers [3,8]
Expansion rate of MBConv {2,3,6} 1 Input – 400 × 4 1 –
Number of convolutional kernels {4,8,12,16,32,64} 2 Conv1 (+BN) 3 200 × 8 1 LeakyReLU
Number of FC kernels {2,4,8,16,32} 3 MBConv1 3 200 × 8 1 Swish
Dropout rate {0.1,0.2,0.3,0.4,0.5} 4 MBConv2 3 200 × 16 2 Swish
Stride size {1,2,3} 5 MBConv3 5 100 × 16 2 Swish
Activation function {Linear,Sigmoid,ReLU,LeakyReLU,Swish} 6 MBConv4 3 100 × 16 3 Swish
Optimizer {SGD,RMSProp,Adadelta,Adam} 7 MBConv5 5 50 × 16 3 Swish
Batch size {32,64,128,256,512,1024} 8 MBConv6 5 25 × 32 4 Swish
Learning rate [0.001,0.1] 9 MBConv7 3 13 × 32 1 Swish
Learning rate scheduler {Step decay,Cosine decay,Cosine restart} 10 Conv2 (+BN) 1 13 × 64 1 LeakyReLU
11 Global average – 64 1 –
pooling
12 FC 1 32 32 1 LeakyReLU
𝑦: actual target value 13 FC 2 16 16 1 LeakyReLU
14 FC 3 2 2 1 Linear
̂ predicted target value
𝑦:
∑ 15 Output 1: 𝑦̂ 2 2 1 Linear
𝑦̄ = 𝑁1 𝑁𝑖=1 𝑦𝑖 : average target value Output 2: 𝜎̂
𝑁: number of data samples
the main feature extracting layers (i.e., MBConv, Conv) to generate two
1 ∑
𝑁
𝐿𝐿 = 𝑝(𝑦𝑖 |𝑥𝑖 , 𝝎) (33) output values consisting of the output density’s distributional parame-
𝑁 𝑖=1 ters (i.e., 𝑦, ̂ To facilitate training, variance scaling initialization is
̂ 𝜎).
where: applied for every learnable weight.
𝑝(𝑦𝑖 |𝑥𝑖 , 𝝎) =  (𝑦𝑖 ; 𝑦̂𝑖 (𝑥𝑖 , 𝝎), 𝜎̂𝑖 (𝑥𝑖 , 𝝎)) 3.9. Implementation details
𝑁: number of data samples
As mentioned above, for fair evaluation, the entire data are split into
3.7. Hyperparameter tuning the holdout training, validation, and test sets with a ratio of 70%, 10%,
and 20%, respectively. Five independent trials with distinct random
The proposed method has various hyperparameters that diversely seeds are performed for the experiments, and the average quantitative
affect the prediction performance, training procedure, and inference results with error bars (i.e., standard errors) are reported. For the
phase. Some hyperparameters, such as the number of kernels, layers, model training, the optimization algorithm with a specific learning
and activation function types, determine the predictive model’s de- rate and a learning rate scheduler, selected by the hyperparameter
tailed setting, while others, including batch size, an optimizer, and a tuning (as shown in Table 4), is used. While training the proposed
learning rate, affect the training and optimization process. As the num- method, the maximum number of epochs is set to 50,000, and early
ber of possible hyperparameter configurations increases exponentially stopping is used with a patience of 200 epochs. Each training epoch
with the number of hyperparameters, this work employs a random takes approximately 2.28 s with a batch size of 512. The experiments
search [32,68] to find an optimal hyperparameter configuration for in this work are conducted using an Intel Xeon Gold 5220 CPU and
the proposed method. The hyperparameter tuning has been performed four NVIDIA RTX A6000 GPUs. For DNN implementations, Tensor-
based on the prediction performance on a holdout validation set. The Flow and TensorFlow Probability libraries with Python (version 3.9.18)
hyperparameters tuned in this work for the optimal model design, programming language are utilized.
setting, and network training are shown in Table 4.
4. Results and discussion
3.8. Architecture details
This work rigorously validates the effectiveness of the proposed
Based on the hyperparameter tuning results, the detailed settings uncertainty-aware tool wear prediction method that uses real-world
of the EN-based backbone architecture for the predictive model are experimental data. In particular, the prediction performance is indi-
determined. Table 5 shows a detailed model architecture of the pro- vidually evaluated on three independent datasets collected from the
posed tool wear prediction method. Essentially, seven distinct MBConv turning processes. For a comprehensive analysis of the experimental
blocks are used to efficiently extract features with every weight under results and the effectiveness of this work, the proposed method’s pre-
Bayesian treatment. Three fully connected layers are stacked on top of diction performance is first compared with existing data-driven tool
143
Table 6 learning rate scheduler. In most cases, the proposed method has shown
Descriptions of datasets used in the experiments.
stable training processes for every dataset.
Condition Experiment 1 Experiment 2 Experiment 3 For a qualitative analysis of the proposed method’s performance,
Number of samples 13,931 14,067 9,084 each dataset’s tool wear prediction results are visualized, as shown in
in the training set
Fig. 9. For all three datasets, the proposed method shows high predic-
Number of samples 1,990 2,010 1,298
in the validation set
tion performance throughout the entire machining time. Comparing the
Number of samples 3,980 4,019 2,595 ground-truth tool wear degree, shown as the red solid line, with the
in the test set proposed method’s predicted tool wear denoted as the blue dotted line,
Maximum tool wear 346.3974 132.9756 299.4535 the proposed method seems to effectively capture the trend of ongoing
degree (μm)
tool wear degree. For Dataset 1, except for a short time during the
Minimum tool wear 0.0000 0.0000 0.0000
degree (μm) fourth machining pass (i.e., around 130 𝑠 after machining starts) and
Sliding window size 400 400 400 the last machining pass (i.e., around 20 𝑠 before the machining ends),
Number of features 4 4 4 the proposed method accurately predicts ongoing tool wear during the
Shape of a data (400,4) (400,4) (400,4) turning process. A slight deviation from the ground-truth tool wear at
sample
the end of the machining process might be due to the cutting tool’s
increased variability. The proposed method also correctly predicts tool
wear for Dataset 2 throughout the entire machining process. Similar to
wear prediction methods, including conventional ML and DL-based Dataset 1, the proposed method shows a marginally higher prediction
approaches. As the proposed method comes under a probabilistic ap- error during the first machining pass (i.e., around 50 𝑠 after start) and
proach, existing DL-based probabilistic methods based on Bayesian DL the last machining pass (i.e., after around 200 𝑠). The same prediction
techniques are also used for performance comparison. In addition, exist- behavior is observed in Dataset 3, which indicates the effectiveness of
ing state-of-the-art (SOTA) methods in the literature are compared with the proposed method in three different turning processes. Overall, the
the proposed method to verify its effectiveness. Moreover, validation of proposed method shows highly accurate tool wear prediction results
the uncertainty quantification is performed based on the conventional throughout the machining processes on every dataset, indicating its
analytical approach, further reinforcing the efficacy and reliability efficacy in using noisy mobile sensing data.
of the proposed uncertainty-aware tool wear prediction method. The When closely looking at the prediction error measured in MAE
descriptions of the three datasets collected using different machining (i.e., the shaded navy area in Fig. 9) of every dataset, the error
conditions used in this work are summarized in Table 6. More detailed seems to be larger during the beginning and end of the machining
information on machining conditions is provided in Table 1. length. In the earlier stage of machining, a significant collision between
First, KF-based noise suppression has been applied to the raw sensor the tool and the material leads to substantial vibration. Additionally,
measurements to reduce the sensor noise. The exemplary data after due to the high edge sharpness before wear occurs, stress becomes
applying the proposed noise suppression method and the removed concentrated at the tooltip, resulting in rapid and pronounced wear.
residual signals are visualized in Fig. 7. The raw signal of 𝐴𝑐𝑐𝑧 mea- Consequently, the proposed method tends to exhibit a high prediction
sured is transformed into the filtered signal based on the KF-based noise error. In addition, during the later machining stage, titanium alloy’s low
suppression method, as shown in (a). The removed noise (i.e., resid- thermal properties induce heat accumulation and cause an accelerated
tool wear process, which might have increased the difficulty of tool
ual signal) is visualized in (b). The sensor noises are reduced for
wear prediction. Interestingly, however, while the proposed method
all experimental data from Conditions 1 to 3. The maximum noise
tends to show relatively higher prediction errors at certain intervals,
reduction ratios for each condition are 7.65%, 6.72%, and 8.54%, re-
it also demonstrates higher predictive uncertainty simultaneously, as
spectively. This indicates that the proposed KF-based noise suppression
illustrated in the later parts in Fig. 13. As mentioned in Section 3,
might have significantly reduced the unexpected noise effects from the
the datasets show relatively shorter machining time and tool wear
mobile sensing data. There are differences in the relative amount of
generation trends [66] due to machining conditions’ characteristics.
reduced noise among datasets because the sensor noises are higher for
Still, the prediction results show the entire tool wear duration and
the machining condition of high chip thickness due to the additional
sufficiently demonstrate the proposed method’s predictive capabilities
impact and vibrations. Moreover, in the case of noise reduction for
over the tool’s whole operational lifetime before EOL.
accelerometer signals (i.e., 𝐴𝑐𝑐𝑥 , 𝐴𝑐𝑐𝑦 , 𝐴𝑐𝑐𝑧 ), the most significant noise
reduction occurred during the initial stages of the machining process
4.1. Comparison with existing data-driven methods
under all machining conditions. This reduction occurs because the
collisions between the work material and the cutting tool during the For a quantitative comparative study, the proposed method’s tool
early stages of machining might have led to substantial fluctuations wear prediction performance is first compared with existing data-
in sensor data. The abrupt changes in the sensor signals during these driven methods, including ML and DL-based approaches. For con-
collisions become a primary factor contributing to noise generation. ventional data-driven and ML-based methods, the statistical feature
The noise suppression results imply that the proposed method that uses extraction technique combined with various ML algorithms is em-
smartphone sensors can sufficiently replace costly traditional sensors ployed. In particular, time-series feature extraction on basis of scalable
for tool wear prediction. hypothesis tests (TSFRESH) [69] is used with LR models (i.e., Lasso,
After applying the KF-based noise suppression method, the main Ridge), SVM, and RF. Following [70], the TSFRESH extracts 77 sta-
tool wear prediction is conducted on each dataset. First, training the tistical features, including the mean, standard deviation, root mean
proposed tool wear prediction model with SGD-based optimization has square, entropy, linear trend, kurtosis, and Freidrich coefficient, from
been performed. As the proposed predictive model has two types of the multivariate time-series input data. In addition, DL-based methods
variational parameters (i.e., 𝝁, 𝝆) to train, the training process has based on LSTM, GRU, and CNN, which are most widely adopted for
taken slightly longer than conventional DNN-based models like LSTM, online tool wear prediction [14], are used for performance comparison.
GRU, and CNN. The learning curves of the proposed method with the Training the DL-based methods has followed the same procedure as the
training loss and the validation loss (on Dataset 1) over the training proposed method for a fair prediction performance comparison.
epochs are visualized in Fig. 8. The training and validation loss seems to The prediction performances of the proposed method and existing
continually and stably decrease as the learning process proceeds. Small data-driven methods for each dataset are provided in Table 7, Table 8,
fluctuations in the validation loss might be induced by the use of the and Table 9, respectively. For Dataset 1, DL-based tool wear prediction
144
Fig. 7. Visualization of noise suppression using Kalman filter (KF). (a) shows an exemplary raw signal and the filtered signal using KF. (b) shows the removed residual (noise)
signal from (a). The 𝑥-axis indicates the measured time and the 𝑦-axis indicates signal amplitude.
The proposed method’s superior tool wear prediction performance

can be attributed to several characteristics. First, the KF-based noise
suppression method might have helped the predictive model extract
and utilize core informative features for tool wear prediction. In par-
ticular, considering that data-driven models make predictions solely
based on sensor signals, the proposed noise suppression might have
been beneficial for accurate tool wear prediction. On the other hand,
as conventional ML-based methods using TSFRESH extract statistical
features from time-series data, signal noise might have deteriorated
the feature engineering quality. The incorporation of aleatoric and
epistemic uncertainties during machining processes might also have
improved the proposed method’s prediction performance. Quantitative
performance evaluation and comparison have empirically proven these
advantages that help accurate tool wear prediction. In addition, as the
proposed method eliminates the statistical feature extraction processes,
Fig. 8. Convergence analysis of training of the proposed method based on training and it has advantages in computational time for both training and inference
validation loss plotted over epochs. for online tool wear prediction. The proposed method’s superior perfor-
mance indicates that it can adeptly perform tool wear prediction using
mobile sensing data, enabling an efficient TCM in real-world practice.
methods show superior performance compared to the conventional
data-driven methods, conceivably due to their expressive power, as 4.2. Comparison with existing probabilistic methods
shown in Table 7. In addition, the proposed method shows higher
performance by a large margin, with MAE of 2.5815, RMSE of 3.9914,
Considering that the proposed method takes uncertainty into ac-
MAPE of 0.0217, and 𝑅2 of 0.9951. For Dataset 2, the prediction
count during tool wear prediction, a performance comparison with
results for every prediction method have improved, while the pro-
existing probabilistic approaches has been carried out. Advanced prob-
posed method still shows the best prediction performance with MAE
abilistic methods based on Bayesian DL that are capable of yielding
of 1.2414, RMSE of 1.3949, MAPE of 0.0186, and 𝑅2 of 0.9971. CNN
predictive distributions are used for comparison. The first used is Monte
and GRU also exhibit high prediction performance in terms of MAE
and RMSE, as indicated in Table 8. The same trend is observed in Carlo dropout (MCD) [59], which utilizes the dropout technique during
Dataset 3 as well, with the proposed method outperforming the existing inference as an approximate sampling from the posterior. MCD is
data-driven tool wear prediction methods by a noticeable margin in all one of the most widely adopted Bayesian DL techniques in practice
metrics, with MAE of 1.2269, RMSE of 1.3881, MAPE of 0.0138, and due to its implementation efficiency, especially for applications in the
𝑅2 of 0.9982, as shown in Table 9. In all datasets, the proposed method manufacturing domain [57,71]. An ensemble approach to Bayesian DL
particularly shows higher 𝑅2 values, which indicates that it accurately that employs multiple randomly initialized models for approximate
follows ongoing tool wear throughout the machining processes, as inference, called the deep ensemble (DE) [72], is also used. Bayesian DL
observed in Fig. 9. methods that have shown effectiveness in probabilistic prediction tasks,
145
Fig. 9. Prediction results of the proposed method on (a) Dataset 1, (b) Dataset 2, and (c) Dataset 3. The actual tool wear is plotted in a solid red line and the predicted tool wear
is plotted in a dotted blue line. The shaded navy area indicates prediction error measured in MAE, which changes over time. The prediction error plot shows how accurately the
proposed method predicts the ongoing tool wear. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Table 7 Table 8
Performance comparison with existing deterministic methods on Dataset 1. Performance comparison with existing deterministic methods on Dataset 2.
Dataset Model Metric Dataset Model Metric
MAE (↓) RMSE (↓) MAPE (↓) 𝑅2 (↑) MAE (↓) RMSE (↓) MAPE (↓) 𝑅2 (↑)
FE + Lasso 43.0206 54.9478 0.5194 0.4604 FE + Lasso 12.9758 18.3862 0.3976 0.6105
(±0.3278) (±0.3578) (±0.0101) (±0.0060) (±0.2040) (±0.3053) (±0.0128) (±0.0058)
FE + Ridge 43.1883 53.4027 0.5005 0.4902 FE + Ridge 10.9320 15.8053 0.3770 0.7122
(±0.2079) (±0.2651) (±0.0089) (±0.0084) (±0.0872) (±0.1678) (±0.0115) (±0.0018)
FE + SVM 43.6508 60.5118 0.5540 0.3473 FE + SVM 14.5151 20.9576 0.4351 0.4938
(±1.2671) (±1.2475) (±0.0209) (±0.0173) (±0.0882) (±0.0696) (±0.0115) (±0.0129)
FE + RF 25.2576 37.7465 0.2329 0.7426 FE + RF 5.3615 8.5551 0.1341 0.9156
1 (±0.1501) (±0.4819) (±0.0049) (±0.0050) 2 (±0.0807) (±0.0282) (±0.0003) (±0.0019)
LSTM 12.5230 18.5996 0.1235 0.9310 LSTM 3.6280 5.4409 0.0932 0.9588
(±3.5731) (±4.4968) (±0.0266) (±0.0322) (±1.0570) (±1.5492) (±0.0353) (±0.0387)
GRU 8.6472 12.5260 0.0811 0.9560 GRU 2.6740 3.3588 0.0490 0.9857
(±2.3702) (±3.9555) (±0.0509) (±0.0680) (±0.6397) (±0.9552) (±0.0118) (±0.0085)
CNN 4.6221 5.6050 0.0491 0.9742 CNN 1.9922 2.0296 0.0327 0.9903
(±0.3149) (±0.9362) (±0.0042) (±0.0235) (±0.6673) (±0.3072) (±0.0047) (±0.0119)
Proposed 2.5815 3.9914 0.0217 0.9951 Proposed 1.2414 1.3949 0.0186 0.9971
(±0.4483) (±0.9532) (±0.0056) (±0.0039) (±0.3960) (±0.6365) (±0.0059) (±0.0010)
which use SGD iterates as samples from an approximate posterior, in- modeling and quantification for tool wear prediction. Higher MAPE and
cluding variational SGD (V-SGD) [73] and stochastic weight averaging 𝑅2 values also suggest that the proposed method effectively captures
Gaussian (SWAG) [58], are used for performance comparison. the uncertainty, which empirically leads to better prediction perfor-
In addition to the four metrics (i.e., MAE, RMSE, MAPE, 𝑅2 ) used mance. This corroborates the technical novelty of the proposed method
in the previous performance comparison with existing data-driven ap- to incorporate uncertainty during prediction to improve tool wear
proaches, 𝐿𝐿 is also used for comparison, which quantifies the quality prediction accuracy, especially when using mobile sensing data. For
of uncertainty estimates (as described in Eq. (33)). The tool wear pre- Dataset 2, all other Bayesian DL methods show similarly high prediction
diction performance comparison with existing probabilistic approaches performance, even compared to existing data-driven approaches (refer
for each dataset is provided in Table 10, Table 11, and Table 12, to Table 8). While the proposed method still exhibits the highest
respectively. For Dataset 1, the proposed method shows superior tool prediction performance among probabilistic approaches, as shown in
wear prediction performance to other uncertainty-aware probabilistic Table 11, this result indicates that incorporating uncertainty during
methods in all metrics, as shown in Table 10. In particular, the pro- tool wear prediction can lead to improved prediction performance with
posed method’s higher 𝐿𝐿 indicates its effectiveness in uncertainty the ability to provide uncertainty estimates. For Dataset 3, as shown in
146
Table 9 uncertainty during tool wear prediction, which also stands with the
Performance comparison with existing deterministic methods on Dataset 3.
machining perspective.
Dataset Model Metric While probabilistic models can provide uncertainty-aware predic-
MAE (↓) RMSE (↓) MAPE (↓) 𝑅2 (↑) tions, they are known to take longer inference time compared to
FE + Lasso 29.0209 36.3308 0.2274 0.7741 deterministic models in general [14]. Therefore, maintaining a moder-
(±0.2186) (±0.2316) (±0.0044) (±0.0015) ate inference time while having a desirable uncertainty awareness for
FE + Ridge 26.7284 33.8770 0.2047 0.8036
tool wear prediction is imperative for real-world application. Table 13
(±0.2734) (±0.3350) (±0.0039) (±0.0023)
FE + SVM 29.6650 38.9033 0.2190 0.7398 shows a comparison of the inference time of the proposed method and
(±1.4308) (±2.8300) (±0.0141) (±0.0388) the existing probabilistic models. As previously mentioned in Section 3,
FE + RF 22.0535 30.4749 0.1517 0.8410 since all datasets have the same machining distance, the inference time
3 (±0.2025) (±0.3104) (±0.0028) (±0.0041) for each probabilistic method is averaged over datasets for comparison.
LSTM 8.8505 11.2058 0.0741 0.9566
According to the results provided in Table 13, MCD that uses dropout
(±1.2098) (±2.7243) (±0.0256) (±0.0332)
GRU 4.6183 6.1640 0.0415 0.9911 during inference has the shortest inference time, as it does not require
(±1.0350) (±0.0684) (±0.0252) (±0.0098) an additional number of parameters compared to standard DNNs. The
CNN 3.4705 4.4223 0.0315 0.9951 proposed method shows the second fastest inference time, which is a
(±0.5193) (±0.6458) (±0.0077) (±0.0024)
moderate inference speed considering its superior prediction accuracy
Proposed 1.2269 1.3881 0.0138 0.9982
(±0.2189) (±0.3278) (±0.0044) (±0.0012)
and ability to provide reliable uncertainty estimates. Other Bayesian
DL-based probabilistic methods tend to require longer inference time
than the proposed method, because they store and use multiple copies
of the predictive model for prediction.
Table 12, the proposed method outperforms existing probabilistic tool
wear prediction methods by a large margin, except for DE. However, 4.3. Comparison with state-of-the-art methods
considering that DE with 𝑘 ensemble components requires 𝑘 times more
training and inference time as well as the number of saved models, the In addition to the performance comparison with existing data-
proposed method has the edge on computational and memory costs. driven tool wear prediction methods and probabilistic methods based
While the proposed method outperforms existing Bayesian DL-based on Bayesian DL approaches, a comparison with SOTA methods in the
probabilistic methods in all datasets, it is observed that 𝐿𝐿 does not literature is conducted to rigorously validate the proposed method’s ef-
necessarily correlate with other regression measures, including MAE, fectiveness. In particular, the SOTA tool wear prediction methods pub-
RMSE, MAPE, and 𝑅2 , for other methods. For instance, SWAG shows lished in peer-reviewed journals with publicly available source codes
higher prediction performance than MCD, DE, and V-SGD in regression are selected. The SOTA methods chosen in this work for performance
metrics but shows a much worse 𝐿𝐿 value for Dataset 2. Conversely, comparison include the heterogeneous GRU-based method [19], tem-
V-SGD shows a higher 𝐿𝐿 value but lower prediction performance in poral CNN with the disentangled VAE-based method [23], and LSTM-
terms of other regression metrics. This indicates that the magnitude of ResNet-based method [20], all of which have shown high performances
prediction error does not imply the quality of the generated predictive in tool wear prediction tasks [14,26].
distributions. Hence, to further examine the behavior of the proposed The prediction performance of SOTA methods and the proposed
method and existing probabilistic methods, the tool wear prediction method for each dataset is compared in Table 14, Table 15, and
results are visualized for a qualitative analysis. Table 16, respectively. For Dataset 1, the proposed tool wear prediction
The probabilistic prediction results from existing Bayesian DL-based method outperforms SOTA methods in all metrics, which validates the
methods include the average prediction values and the corresponding proposed model’s effectiveness. Compared to other methods, the LSTM-
predictive distribution. 95% confidence intervals (i.e., credible inter- ResNet-based method [20] shows the prediction performance closest
vals) of the probabilistic tool wear prediction models and the proposed to the proposed method, as shown in Table 14. The proposed method
method are visualized for each dataset in Fig. 10, Fig. 11, and Fig. 12, also shows superior prediction performance in Dataset 2, as shown
respectively. For all datasets, the proposed method shows reliable in Table 15, but with marginal performance improvement compared
uncertainty-aware predictive distributions throughout the whole ma- to [19] and [20]. This indicates that when the proposed uncertainty
chining process. First, compared to some of the other probabilistic modeling is combined or integrated with existing methods, perfor-
methods that seem to underestimate predictive uncertainty, the pro- mance improvement might be achieved. Table 16 also shows how
posed method not only provides continuously changing uncertainty the proposed method outperforms tool wear prediction performance
estimates but also provides larger uncertainty, where the prediction er- compared to the SOTA methods in all metrics on Dataset 3. In addition
ror is larger (also see Fig. 9). In other words, the epistemic uncertainty to the superior prediction performance to existing SOTA methods in
is better modeled with the proposed method, indicating its ability to all datasets, the proposed method can also incorporate and quantify
effectively quantify the degree of prediction confidence through pre- uncertainty during tool wear prediction, which is a crucial advantage
dictive uncertainty, as shown in Fig. 10. Second, the proposed method compared to SOTA methods. Overall, the performance comparison
yields more reasonable predictive uncertainty that aligns with the ma- with existing data-driven, probabilistic, and SOTA methods consistently
chining perspective. During the early stage of the machining processes, demonstrates that the proposed method is highly effective for tool
the proposed method outputs larger uncertainty, which might be due to wear prediction using mobile sensing data, with the comprehensive
the high tool vibration and instability of the turning process. The pro- advantages detailed above.
posed method also shows high uncertainty between machining passes,
where the inherent uncertainty increases due to the initial collision 4.4. Dissection of predictive uncertainty of the proposed method
between the cutting tool and the work material, as shown in Fig. 12.
In addition, compared to other probabilistic methods, the proposed As mentioned in Section 2, one of the major strengths of the
method consistently exhibits increasing predictive uncertainty as the proposed method is that the predictive uncertainty generated with
machining distance increases, as observed in Fig. 11. The increased tool the predictive model can be effortlessly dissected into two types of
stiffness and wear at the final stage of machining lead to tool vibration uncertainty that are incorporated during prediction. Based on the pre-
and fluctuations in the sensor signals (e.g., acceleration signals), which diction results for the experimental data {𝑓𝝎(𝑡) (𝑥∗ )| 𝑡 = 1, … , 𝑇 } and the
might increase uncertainty in tool wear prediction. Therefore, the calculated total predictive uncertainty (as expressed in Eq. (18)), uncer-
∑
proposed method seems to show the most reliable and valid predictive tainty dissection is performed. Epistemic uncertainty 𝑇1 𝑇𝑡=1 (𝑦(𝑥 ̂ ∗ )𝑡 )2 −
147
Table 10
Performance comparison with existing probabilistic methods on Dataset 1.
Dataset Model Metric
MAE (↓) RMSE (↓) MAPE (↓) 𝑅2 (↑) LL (↑)
MCD 4.0228 5.6478 0.0492 0.9929 −3.3383
(±1.1926) (±1.7067) (±0.0171) (±0.0059) (±0.4670)
DE 4.1242 5.6083 0.0445 0.9924 −3.2749
(±0.2142) (±0.3080) (±0.0051) (±0.0043) (±0.3556)
V-SGD 5.0700 6.3791 0.0425 0.9920 −3.3089
1
(±1.5225) (±2.9433) (±0.0118) (±0.0079) (±0.6305)
SWAG 3.6546 4.6639 0.0355 0.9918 −3.1344
(±0.9712) (±1.6936) (±0.0285) (±0.0089) (±0.5850)
Proposed 2.5815 3.9914 0.0217 0.9951 −2.8847
(±0.4483) (±0.9532) (±0.0056) (±0.0039) (±0.5013)
Table 11
MAE (↓) RMSE (↓) MAPE (↓) 𝑅2 (↑) LL (↑)
MCD 1.5766 2.1887 0.0305 0.9940 −2.2755
(±0.3498) (±0.5021) (±0.0075) (±0.0024) (±0.1836)
DE 1.7906 2.7634 0.0362 0.9935 −2.4277
(±0.4589) (±0.3685) (±0.0093) (±0.0045) (±0.0628)
V-SGD 1.6799 2.2500 0.0359 0.9938 −2.4347
2
(±0.2560) (±0.3664) (±0.0044) (±0.0019) (±0.2507)
SWAG 1.2806 1.6005 0.0274 0.9967 −3.0052
(±0.3155) (±0.3754) (±0.0079) (±0.0012) (±0.4139)
Proposed 1.2414 1.3949 0.0186 0.9971 −2.2429
(±0.3960) (±0.6365) (±0.0059) (±0.0010) (±0.1698)
Table 12
MAE (↓) RMSE (↓) MAPE (↓) 𝑅2 (↑) LL (↑)
MCD 3.1412 3.9098 0.0299 0.9941 −3.3918
(±0.6283) (±0.5740) (±0.0090) (±0.0023) (±0.2214)
DE 1.4363 1.8295 0.0148 0.9974 −2.4612
(±0.0610) (±0.0499) (±0.0005) (±0.0025) (±0.4627)
V-SGD 2.3965 3.0880 0.0228 0.9980 −2.7292
3
(±0.6391) (±0.8349) (±0.0057) (±0.0006) (±0.2126)
SWAG 3.0872 3.8631 0.0296 0.9961 −3.7961
(±0.7099) (±0.8641) (±0.0054) (±0.0009) (±0.3086)
Proposed 1.2269 1.3881 0.0138 0.9982 −2.3883
(±0.2189) (±0.3278) (±0.0044) (±0.0012) (±0.1984)
Table 13 Table 15
Inference time comparison of probabilistic methods. Performance comparison with state-of-the-art methods on Dataset 2.
Model Inference time (ms) Dataset Model Metric
MCD 0.1686 (±0.0719) MAE (↓) RMSE (↓) MAPE (↓) 𝑅2 (↑)
DE 0.3885 (±0.0316)
Wang et al. [19] 1.3821 1.8246 0.0229 0.9959
V-SGD 0.4532 (±0.0086)
(±0.2435) (±0.3060) (±0.0048) (±0.0013)
SWAG 0.2829 (±0.0158)
Hahn and 1.7435 2.7607 0.0483 0.9916
Proposed 0.2197 (±0.0496)
Mechefske [23] (±0.5503) (±1.0344) (±0.0406) (±0.0042)
2
Sun et al. [20] 1.4457 1.5058 0.0223 0.9964
(±0.4385) (±0.5237) (±0.0091) (±0.0051)
Table 14
Proposed 1.2414 1.3949 0.0186 0.9971
Performance comparison with state-of-the-art methods on Dataset 1.
(±0.3960) (±0.6365) (±0.0059) (±0.0010)
MAE (↓) RMSE (↓) MAPE (↓) 𝑅2 (↑)
Wang et al. [19] 3.8056 5.4097 0.0327 0.9896
(±1.3338) (±2.0309) (±0.0179) (±0.0035) estimated with the proposed method during tool wear prediction are
Hahn and 6.4601 6.3076 0.0661 0.9704 visualized for each dataset respectively in Fig. 13. In all datasets,
Mechefske [23] (±1.7042) (±1.4857) (±0.0291) (±0.0081) the proposed method’s aleatoric uncertainty tends to increase as the
1
Sun et al. [20] 2.8368 4.1280 0.0239 0.9920
machining process begins and gradually decreases midway through the
(±0.9977) (±1.5858) (±0.0145) (±0.0081)
Proposed 2.5815 3.9914 0.0217 0.9951 process. On the other hand, the epistemic uncertainty generally tends
(±0.4483) (±0.9532) (±0.0056) (±0.0039) to increase at the later stages of the machining processes.
From the machining perspective, the increased aleatoric uncertainty
is assumed to stem from the use of new tools (i.e., cutting tools with
∑𝑇 ∑𝑇 high edge sharpness) in the early stage of machining, which leads to
( 𝑇1 𝑡=1 ̂ ∗ )𝑡 )2 and aleatoric uncertainty
𝑦(𝑥 1
𝑇
̂ ∗ )𝑡 )2
𝑡=1 (𝜎(𝑥 are calculated increased stress concentration that makes noisy data signals and rapid
individually. The epistemic uncertainty and the aleatoric uncertainty tool wear generation [74]. Typically, during those early stages, the
148
Fig. 10. Visualization of 95% prediction intervals of (a) MCD, (b) DE, (c) V-SGD, (d) SWAG, and (e) the proposed method on Dataset 1, each of which varies over time. The
actual tool wear is plotted in a solid red line and the predicted tool wear is plotted in a dotted blue line. (For interpretation of the references to color in this figure legend, the
reader is referred to the web version of this article.)
Table 16 worn-out tools. Hence, these factors contribute to a substantial increase

Performance comparison with state-of-the-art methods on Dataset 3.
in measured signal noises and the aleatoric uncertainty during ma-
chining. Hence, the proposed method’s generated aleatoric uncertainty
MAE (↓) RMSE (↓) MAPE (↓) 𝑅2 (↑) seems to match the existing knowledge of the machining process and
Wang et al. [19] 1.2882 1.7136 0.0140 0.9962 material characteristics.
(±0.2687) (±0.3606) (±0.0049) (±0.0027) The tendency for increasing epistemic uncertainty at later stages
Hahn and 2.8541 3.7104 0.0279 0.9921
Mechefske [23] (±0.7552) (±0.2520) (±0.0090) (±0.0048)
of machining is due to increased prediction difficulty from unstable
3 machining status. As tool wear progresses, the contact area between
Sun et al. [20] 1.8563 2.3744 0.0172 0.9972
(±0.5145) (±0.6547) (±0.0047) (±0.0017) the work material and the cutting tool expands, leading to varying
Proposed 1.2269 1.3881 0.0138 0.9982 pressures exerted on the tool. Furthermore, the continuous nature of
(±0.2189) (±0.3278) (±0.0044) (±0.0012)
turning operations (c.f., milling) results in the accumulation of heat
during machining, contributing to changes in the properties of both
the work material and the cutting tool [66]. This, in turn, causes data
work material is not adequately heated, leading to a situation where irregularities and escalates the epistemic uncertainty, particularly due
the hardness of the titanium material remains high. Consequently, there to the thermal softening characteristic inherent to titanium materi-
is a rapid wear of cutting tools due to the titanium alloy’s inherent als [75]. These factors might have contributed to increased prediction
characteristics. In addition, cutting tools before generated tool wear difficulty, leading to the high epistemic uncertainty of the proposed
generally exhibit a significantly higher edge sharpness compared to method. In addition, an increase in cutting speed is closely correlated
149
with a rise in thermal effects within the machining process [76]. the analytical approach, the proposed method has advantages in that it
Interestingly, compared to relatively low cutting speeds (i.e., Condition can generate continuous estimates throughout the machining process.
2), higher cutting speeds (i.e., Condition 1, Condition 3) result in lower The uncertainty estimates from the analytical approach are calcu-
uncertainties in the latter stages of machining. lated based on Eq. (19) to Eq. (23), as mentioned in Section 2. More-
over, the conventional analytical uncertainties are calculated through
4.5. Validation using a conventional analytical approach the Eq. (34) to Eq. (36), which are the relative uncertainty for 𝑎𝑘 and 𝑏𝑘 ,
depending on the time step. Process parameters 𝑎𝑘 and 𝑏𝑘 are calibrated
Given the previous results, the proposed method has outperformed through the LM method with an initial value of 0.
existing data-driven, probabilistic, and SOTA methods in tool wear < 𝑎𝑘 > 𝜕𝑉 𝐵𝑘 𝛿𝑎𝑘 < 𝑎𝑘 >
prediction on all datasets. In addition, the proposed method generates 𝑢𝑉 𝐵𝑎 = = 𝑉 𝑡 ⋅ 𝑢𝑎 𝑘 (34)
𝑘 < 𝑉 𝐵𝑘 > 𝜕𝑎𝑘 𝑎𝑘 < 𝑉 𝐵𝑘 > 𝑐
more reliable predictive distributions on tool wear prediction during
< 𝑏𝑘 > 𝜕𝑉 𝐵𝑘 𝛿𝑏𝑘 < 𝑏𝑘 >
turning processes. Based on the previous observations, this work further 𝑢𝑉 𝐵𝑏 = = ⋅𝑢 (35)
𝑘 < 𝑉 𝐵𝑘 > 𝜕𝑏𝑘 𝑏𝑘 < 𝑉 𝐵𝑘 > 𝑏𝑘
validates the uncertainty estimates of the proposed method with the
conventional analytical approach. There are several reasons for this ( )1
𝑢𝑉 𝐵𝑘 = (𝑢𝑉 𝐵𝑎 )2 + (𝑢𝑉 𝐵𝑏 )2 2 (36)
additional validation procedure. The first is to check if the proposed 𝑘 𝑘
method can generate reliable uncertainty estimates that align with the In the earlier stage of the machining process, the conventional
analytical approach, which has theoretical groundings. In addition, if uncertainty value is relatively higher in Condition 3, compared to other
the uncertainty estimates from the proposed method match those from machining conditions, as shown in Table 17. The initial collision force
150
between the tool and the work material is increased due to the high are slightly more severe, such as Conditions 1 and 3, all three stages
cutting speed and feed rate, leading to a higher level of uncertainty in of wear have manifested. However, in Condition 2, which represents a
predictions. During the initial machining, the interaction between the transition to the final stage, the abrupt increase in uncertainty during
cutting tool and work material is most pronounced. This stage primarily this stage change is less pronounced.
exhibits erosion and thermal effects concentrated on the tool due to Using the aforementioned uncertainty estimates calculated based on
initial cutting impacts. Initial wear patterns emerge on the tool surface, the analytical approach, the validity of the predictive uncertainty from
predominantly reflecting wear and erosion at the contact points. As the proposed method is verified. Since the conventional analytical ap-
machining continues, the initial wear progresses, and the interaction proach can yield the machining pass-wise uncertainty estimates rather
between the cutting tool and work material becomes more uniform in than continuous estimates, the predictive uncertainty at each machin-
the intermediate stage. Although cutting effects and thermal influences ing pass is used for evaluation. The similarity between the uncertainty
remain significant, they are less emphasized compared to the initial of the proposed method and that of the conventional analytical ap-
stage. In the final machining stage, wear and erosion, particularly at proach is measured with the Pearson correlation coefficient (𝑟). The
the cutting tool edge, become more distinct, and cutting tool defor- uncertainty validation results are provided in Table 17. For all datasets,
mation rapidly occurs. While cutting effects and thermal influences the similarity between uncertainties measured in 𝑟 by the proposed
persist, their impact increases as the tool approaches the end of its method and the analytical approach has shown to be high, indicating a
lifespan. There is an overall trend of increasing uncertainty with the strong correlation. For Dataset 1, the calculated 𝑟 between uncertainty
increment of machining length. However, there also are intervals where from the analytical approach and the proposed method has shown to
uncertainty experiences a notable increase due to transitions between be 0.8592, which indicates a strong correlation. For Datasets 2 and
stages (i.e., initial, intermediate, final). In machining conditions that 3, the calculated 𝑟 has shown to be 0.8377 and 0.8408, respectively,
151
Fig. 13. Predictive uncertainty results of the proposed method on (a) Dataset 1, (b) Dataset 2, and (c) Dataset 3. The actual tool wear is plotted in a solid red line and the
predicted tool wear is plotted in a dotted blue line. The shaded green and orange areas indicate the estimated epistemic and aleatoric uncertainties, each of which varies over
time. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Table 17
Uncertainty validation results using a conventional analytical approach.
Dataset Method Uncertainty per machining pass 𝑟
1 2 3 4 5 6
Analytical 0.0690 3.6454 20.7431 8.3766 25.6393 50.4826
1 0.8592
Proposed 6.3063 3.3084 4.2079 4.6368 8.4527 13.5001
Analytical 0.0032 0.2368 0.0459 0.2353 0.1221 0.6885
2 0.8377
Proposed 7.1768 5.2418 4.6156 4.8696 4.1705 14.0067
Analytical 0.3205 23.6810 4.5918 23.5347 12.2093 68.8504
3 0.8408
Proposed 4.3541 4.1345 4.3443 7.1206 8.6039 12.8861
both of which suggest that the predictive uncertainty from the proposed verified. Based on the three components, a total of eight (i.e., 23 = 8)
method matches the analytical uncertainty estimates. In addition, the different model settings are tested on every dataset. The base model
correlation analysis results indicate that the proposed method gener- represents the EN-based predictive model with deterministic weights
ates well-grounded uncertainty estimates during tool wear prediction. and a single output structure that uses non-filtered input signals, just
Therefore, the proposed method has been shown to output a sensi- like existing data-driven models (e.g., LSTM, GRU). Models (A), (B),
ble degree of predictive uncertainty during tool wear prediction and and (C) use each of the three main components, whereas (D), (E), and
to be capable of outputting uncertainty continuously throughout the (F), use a pair of two components among three, as indicated in Table 18,
machining process, further overcoming the inherent limitations of the Table 19, and Table 20. For the compared models in the ablation study,
conventional approach. MC estimation is used during inference for tool wear prediction when
a Bayesian learning component is employed (i.e., for (B), (D), (F)).
4.6. Ablation study of the proposed method The ablation study results on Dataset 1 show that while only using
the EN-based predictive model (i.e., base model) itself shows decent
Extensive experimental results presented above on three datasets prediction performance, using any of the proposed method’s main
manifest that the proposed method consistently outperforms existing components brings performance improvement, as shown in Table 18.
data-driven, probabilistic, and SOTA methods in tool wear prediction. The results of (A), (B), and (C) show that using each component
In addition, the proposed method has proven to show more reliable consistently improves tool wear prediction performance in all metrics
predictions via uncertainty quantification. To further examine the in- (i.e., MAE, RMSE, MAPE, 𝑅2 ), suggesting that noise suppression as well
dividual effects of the main components of the proposed method on as incorporating each uncertainty type during machining is effective for
tool wear prediction performance, an ablation study is conducted. In accurate tool wear prediction. Comparing the results of (B) with (D),
particular, the effects of the three main components of the proposed and (C) with (E) shows that it is indispensable to apply the proposed
method, including the KF-based noise suppression method, VI-based KF-based noise suppression for the mobile sensing data to achieve a
Bayesian learning, and the density output structure, are systematically high prediction performance. In addition, incorporating aleatoric and
152
Table 18
Prediction performance comparison using the ablation study of the proposed method’s components on Dataset 1.
Dataset Model Main component Metric
KF Bayesian Density output MAE (↓) RMSE (↓) MAPE (↓) 𝑅2 (↑)
learning
base ✗ ✗ ✗ 3.6935 5.0172 0.0376 0.9921
(±0.2613) (±0.5847) (±0.0117) (±0.0037)
(A) ✓ ✗ ✗ 3.2311 4.4383 0.0341 0.9930
(±0.4111) (±0.4113) (±0.0049) (±0.0015)
(B) ✗ ✓ ✗ 3.3728 4.7976 0.0354 0.9925
(±0.1525) (±0.3945) (±0.0034) (±0.0040)
(C) ✗ ✗ ✓ 3.5232 4.9226 0.0322 0.9931
1 (±0.3678) (±0.2555) (±0.0037) (±0.0014)
(D) ✓ ✓ ✗ 2.9396 4.2728 0.0333 0.9943
(±0.4943) (±0.3809) (±0.0071) (±0.0004)
(E) ✓ ✗ ✓ 2.6713 4.2035 0.0266 0.9945
(±0.2642) (±0.3040) (±0.0038) (±0.0005)
(F) ✗ ✓ ✓ 2.6401 4.3125 0.0253 0.9945
(±0.3563) (±0.8141) (±0.0060) (±0.0026)
Proposed ✓ ✓ ✓ 2.5815 3.9914 0.0217 0.9951
(full model) (±0.4483) (±0.9532) (±0.0056) (±0.0039)
epistemic uncertainties has shown consistent improvement in tool wear uncertainty for accurate online TCM. First, to reduce the undesired
prediction performance, as manifested in the results of (D), (E), and effects of inherent noise in mobile sensing data, the KF-based noise
(F) in Table 18. The results of the performance comparison indicate suppression method is applied to the sensor measurements. Since KF
that the main components of the proposed method have different is applicable regardless of the frequency domain, the proposed noise
effects on the tool wear prediction performance. Most importantly, suppression can be used in diverse machining processes, such as turning
using all the main components (i.e., full model) has shown superior tool and drilling. Second, the proposed predictive method is designed to
wear prediction performance to the ablation models, demonstrating the holistically capture both aleatoric and epistemic uncertainties that stem
effectiveness of the proposed method. from different sources during tool wear prediction. In particular, the
The ablation study results on Dataset 2 also corroborate the ef- epistemic uncertainty is captured using a VI-based Bayesian learning
fectiveness of the proposed method’s main components on tool wear approach, while the aleatoric uncertainty is directly modeled by a
prediction performance, as shown in Table 19. Compared to the base density output structure. In this way, the proposed method effectively
model, the prediction performances of (A), (B), and (C), each of which captures the two uncertainties using any DNN-based predictive model
employs one of the proposed method’s main components, exhibit per- via data-driven learning under a probabilistic framework. Hence, these
formance improvement in all metrics. In addition, using two of the two components of predictive uncertainty from different sources can
main components further enhances tool wear prediction performances be directly incorporated and dissected during tool wear prediction.
in every case (see results of (D), (E), and (F)). However, on Dataset 2, This uncertainty modeling method enables more accurate tool wear
the combination of Bayesian learning and the density output structure prediction during machining and provides reliable uncertainty esti-
provides marginal performance improvement in MAE, compared to oth- mates. In particular, the proposed method’s ability to reduce noise
ers. The proposed method also provides superior tool wear prediction effects and to incorporate and quantify different sources of uncertainty
performance to ablation models by a large margin in every regression can be beneficial for developing preventive maintenance strategies for
metric. This indicates that the main proposal of this work, which SMEs with relatively low expenses. In addition, the proposed tool wear
is to consider noise effects and the generated different uncertainty prediction method adopts an EN-based predictive model architecture,
types from mobile sensing during machining, has shown effective in which can be easily trained with an SGD-based optimization in an
enabling accurate tool wear prediction. The ablation study on Dataset end-to-end manner.
3 also shows promising results that confirm the effectiveness of the The efficacy of the proposed tool wear prediction method has
main components, showing similar prediction performance tendencies been extensively validated using real-world datasets collected from
as Datasets 1 and 2, as provided in Table 20. The constantly increasing turning experiments with multiple machining conditions. The pro-
prediction performance by using more components of the proposed posed method shows accurate tool wear prediction results on every
method not only convinces that the components can be used orthogo- dataset, outperforming existing data-driven methods in all metrics.
nally but also that other advanced techniques can help further improve In addition, compared to the Bayesian DL-based probabilistic meth-
the performance. Overall, the ablation study of the proposed method on ods, the proposed method shows superior prediction performances and
every dataset provides consistent results, demonstrating that using each more reasonable predictive uncertainty estimates during machining
of the proposed method’s main components contributes to improved processes. The proposed method also shows higher tool wear prediction
tool wear prediction. performance than existing SOTA methods in the literature, further ver-
ifying its effectiveness. In particular, the proposed tool wear prediction
5. Conclusion and future works method demonstrates the capability of capturing and providing pre-
dictive uncertainty dissected into aleatoric and epistemic uncertainties.
This work presents a novel tool wear prediction method that uses This aspect of the generated predictive uncertainty leads to reasonable
mobile sensing data collected with smartphones for efficient data- uncertainty estimates that align well with domain knowledge in ma-
driven TCM in real-world manufacturing practices. Considering the chining processes and tool wear generation. The proposed method’s
high costs of installing and operating high-precision sensors to collect predictive uncertainty is validated using the conventional analytical
measurement data, the proposed method, which is compatible with mo- approach, and the analysis results verify its ability to generate con-
bile sensing data, can be particularly practical for SMEs. Furthermore, tinuous and reliable uncertainty estimates throughout the machining
this work proposes to utilize various technically novel data-driven process. Furthermore, a comprehensive ablation study is conducted to
techniques to handle the high level of sensor noise and the induced systematically evaluate the individual contributions of the proposed
153
Table 19
learning
base ✗ ✗ ✗ 1.8279 2.4340 0.0405 0.9922
(±0.5321) (±0.6915) (±0.0078) (±0.0040)
(A) ✓ ✗ ✗ 1.7702 2.1593 0.0343 0.9935
(±0.3226) (±0.4142) (±0.0085) (±0.0019)
(B) ✗ ✓ ✗ 1.7625 2.1805 0.0355 0.9934
(±0.1473) (±0.3867) (±0.0135) (±0.0007)
(C) ✗ ✗ ✓ 1.7573 2.0077 0.0295 0.9940
2 (±0.4974) (±0.3774) (±0.0053) (±0.0011)
(D) ✓ ✓ ✗ 1.3926 1.7944 0.0278 0.9951
(±0.1985) (±0.0760) (±0.0038) (±0.0019)
(E) ✓ ✗ ✓ 1.4486 1.7602 0.0266 0.9957
(±0.3851) (±0.3703) (±0.0039) (±0.0015)
(F) ✗ ✓ ✓ 1.6747 1.7568 0.0255 0.9958
(±0.5197) (±0.4572) (±0.0146) (±0.0038)
Proposed ✓ ✓ ✓ 1.2414 1.3949 0.0186 0.9971
(full model) (±0.3960) (±0.6365) (±0.0059) (±0.0010)
Table 20
learning
base ✗ ✗ ✗ 3.4376 4.3970 0.0341 0.9954
(±0.3809) (±0.4757) (±0.0054) (±0.0004)
(A) ✓ ✗ ✗ 3.1889 4.1654 0.0336 0.9957
(±0.3654) (±0.3990) (±0.0036) (±0.0003)
(B) ✗ ✓ ✗ 3.2970 4.1165 0.0320 0.9957
(±0.5901) (±0.4603) (±0.0064) (±0.0020)
(C) ✗ ✗ ✓ 3.1018 4.0839 0.0326 0.9959
3 (±0.3145) (±0.7149) (±0.0070) (±0.0009)
(D) ✓ ✓ ✗ 2.6094 2.6704 0.0248 0.9967
(±0.5930) (±0.3839) (±0.0056) (±0.0011)
(E) ✓ ✗ ✓ 2.2526 2.4110 0.0252 0.9963
(±0.4606) (±0.7280) (±0.0093) (±0.0004)
(F) ✗ ✓ ✓ 1.8526 2.1533 0.0196 0.9967
(±0.4321) (±0.6674) (±0.0046) (±0.0020)
Proposed ✓ ✓ ✓ 1.2269 1.3881 0.0138 0.9982
(full model) (±0.2189) (±0.3278) (±0.0044) (±0.0012)
method on tool wear prediction. It has been shown that while the main uncertainty stemming from the complexity of advanced cyber manu-
components have different effects, using more components has consis- facturing systems to improve quality and process design. In addition,
tently improved the tool wear prediction performance, confirming the reliable prediction results can enable intelligent applications, such as
proposed method’s efficacy. tool replacement scheduling and preventive maintenance, enhancing
One of the main advantages of the proposed tool wear prediction productivity and resilience. While this work employs a smartphone for
method is its potential usage in real-world manufacturing sites, where mobile sensing, other device types, such as IIoT devices and virtual
expensive sensor installation and measurement costs are unbearable. sensors in cyber manufacturing systems, can also be applied to the
In particular, the proposed tool wear prediction method can enable proposed tool wear prediction method.
The proposed method, which handles noisy mobile sensing data and
efficient digital transformation for SMEs [30] by intelligent machining
uncertainty modeling, can be applied to other machining processes,
process monitoring and control. Another potential application area
including subtractive and additive manufacturing [16,79,80], where
is mobile machine tool robots. Recently, mobile machine tools and
similar problems that this work tackles exist, such as the high costs
industrial robots equipped with edge technologies have been used to
of high-precision sensors and the need for uncertainty awareness. In
move and produce large workpieces in the automobile and aerospace
particular, validation of the proposed method’s predictive capabilities
industries [77]. In addition, with the increasing size of workpieces,
on other machining environments, such as using coated cutting tools
mobile machining robots that directly manufacture larger parts have and lubricants under different machining conditions [66], in which
started replacing the traditional processes, including welding of smaller tool wear generation typically shows slower trends before the EOL of
machined parts [78]. The proposed method can be effectively ap- tools, remains a future work. The proposed method and the use of
plied to mobile machine tool robots that require wireless sensing and mobile sensing technology can be extended to cyber manufacturing
communication technology to achieve higher productivity. Further- systems, as in the framework shown in Fig. 14. In general, data ac-
more, reliable predictive uncertainty estimates, continually generated quired from various manufacturing processes can be transferred to the
from the proposed method, can help make informed decisions in ad- server or the communication network to develop smart manufacturing
vanced manufacturing, such as cyber manufacturing systems and CPS. systems. In this framework, using mobile sensors can lead to cost
In particular, the proposed method’s ability to capture both aleatoric savings and scalability in various locations. By replacing the conven-
and epistemic uncertainties for prediction can help address noise and tional high-precision sensors in the sensor network with smartphone
154
Fig. 14. A cyber manufacturing system framework using mobile sensing and the proposed method.
sensors, the framework for manufacturing data analysis prediction Acknowledgments

enables wireless sensing, thereby increasing the degree of freedom in
cyber manufacturing systems. Additionally, the proposed uncertainty- The authors would like to express appreciation to the editors and
aware method will enhance the versatility and productivity of the referees for their helpful comments to improve the quality of this work.
single and mass production units not only in the manufacturing process This work was supported by the National Research Foundation of Korea
but also across various industrial applications. Moreover, the proposed (NRF), South Korea grant funded by the Korea government (MSIT)
uncertainty-aware method can improve the reliability of data-driven [grant number RS-2024-00335260]; the Institute of Information &
approaches, resulting in quality control and improvement via pre- communications Technology Planning & Evaluation (IITP) grant funded
ventive maintenance. Future search directions include the application by the Korea government (MSIT) [grant number RS-2020-II201336];
the Advanced Technology Center Plus (ATC+) Program grant funded
of the proposed uncertainty-aware tool wear prediction method in
by the Ministry of Trade, Industry and Energy (MOTIE) of Korea [grant
advanced manufacturing systems, including digital twin, cyber manu-
number 20017932]; the National Research Foundation of Korea (NRF),
facturing [2], and large-scale maintenance strategies [81], for efficient
South Korea grant funded by the Korea government (MSIT) [grant
design and quality improvement [5]. In addition, using the proposed
number 2022R1A2C3007963]; and AICP (AI Challengers Program) of
method to employ virtual sensors in cyber-enabled manufacturing [82]
UNIST. The authors thank Hui Chan Moon, Sujin Jeon, Soyeon Park at
that performs online tool wear prediction remains another future re-
UNIST, and Minjoo Ku at LG Electronics for their helpful discussions.
search objective. Integration with existing methods for uncertainty
quantification in out-of-domain detection tasks and adversarial attacks, References
including the Dirichlet-based uncertainty-aware methods, for improved
generalization in tool wear prediction remains one future research [1] Wu D, Liu S, Zhang L, Terpenny J, Gao RX, Kurfess T, et al. A fog computing-
direction. based framework for process monitoring and prognosis in cyber-manufacturing.
J Manuf Syst 2017;43:25–34. http://dx.doi.org/10.1016/j.jmsy.2017.02.011.
[2] Chen X, Jin R. Adapipe: A recommender system for adaptive computation
CRediT authorship contribution statement pipelines in cyber-manufacturing computation services. IEEE Trans Ind Inf
2020;17:6221–9. http://dx.doi.org/10.1109/TII.2020.3035524.
[3] Hansen EB, Bøgh S. Artificial intelligence and internet of things in small and
Gyeongho Kim: Writing – review & editing, Writing – original medium-sized enterprises: A survey. J Manuf Syst 2021;58:362–72. http://dx.
doi.org/10.1016/j.jmsy.2020.08.009.
draft, Visualization, Validation, Software, Methodology, Investigation,
[4] Chen X, Kang X, Jin R, Deng X. Bayesian sparse regression for mixed
Formal analysis, Data curation, Conceptualization. Sang Min Yang: multi-responses with application to runtime metrics prediction in fog manufactur-
Writing – review & editing, Writing – original draft, Visualization, Val- ing. Technometrics. 2023;65:206–19. http://dx.doi.org/10.1080/00401706.2022.
idation, Software, Methodology, Investigation, Formal analysis, Data 2134928.
[5] Kim G, Choi JG, Lim S. Using transformer and a reweighting technique to develop
curation, Conceptualization. Dong Min Kim: Writing – review & edit- a remaining useful life estimation method for turbofan engines. Eng Appl Artif
ing, Methodology. Jae Gyeong Choi: Writing – review & editing, Intell 2024;133:108475. http://dx.doi.org/10.1016/j.engappai.2024.108475.
Methodology. Sunghoon Lim: Writing – review & editing, Writing – [6] Chen X, Zeng Y, Kang S, Jin R. INN: An interpretable neural network for
AI incubation in manufacturing. ACM Trans Intell Syst Technol 2022;13:1–23.
original draft, Validation, Supervision, Resources, Project administra-
http://dx.doi.org/10.1145/3519313.
tion, Methodology, Investigation, Funding acquisition, Conceptualiza- [7] Ferreira C, Gonçalves G. Remaining useful life prediction and challenges: A
tion. Hyung Wook Park: Writing – review & editing, Validation, Su- literature review on the use of machine learning methods. J Manuf Syst
pervision, Project administration, Methodology, Investigation, Funding 2022;63:550–62. http://dx.doi.org/10.1016/j.jmsy.2022.05.010.
[8] Grasso M, Chatterton ST, Pennacchi P, Colosimo BM. A data-driven method to
acquisition, Conceptualization.
enhance vibration signal decomposition for rolling bearing fault analysis. Mech
Syst Signal Process 2016;81:126–47. http://dx.doi.org/10.1016/j.ymssp.2016.02.
067.
Declaration of competing interest [9] Zhu K, Li X, Li S, Lin X. Physics-informed hidden Markov model for tool
wear monitoring. J Manuf Syst 2024;72:308–22. http://dx.doi.org/10.1016/j.
jmsy.2023.11.003.
The authors declare that they have no known competing finan-
[10] Zhu K, Liu T. Online tool wear monitoring via hidden semi-Markov model with
cial interests or personal relationships that could have appeared to dependent durations. IEEE Trans Ind Inf 2017;14:69–78. http://dx.doi.org/10.
influence the work reported in this paper. 1109/TII.2017.2723943.
155
[11] Wang P, Gao RX. Adaptive resampling-based particle filtering for tool life pre- [35] Lin Y, Wang X, Hao F, Wang L, Zhang L, Zhao R. An on-demand coverage based
diction. J Manuf Syst 2015;37:528–34. http://dx.doi.org/10.1016/j.jmsy.2015. self-deployment algorithm for big data perception in mobile sensing networks.
04.006. Futur Gener Comput Syst 2018;82:220–34. http://dx.doi.org/10.1016/j.future.
[12] Wang J, Wang P, Gao RX. Enhanced particle filter for tool wear prediction. J 2018.01.007.
Manuf Syst 2015;36:35–45. http://dx.doi.org/10.1016/j.jmsy.2015.03.005. [36] Alam F, Elsherif M, AlQattan B, Ali M, Ahmed IM, Salih A, et al.
[13] Wang Y, Zheng L, Wang Y. Event-driven tool condition monitoring methodology Prospects for additive manufacturing in contact lens devices. Adv Energy Mater
considering tool life prediction based on industrial internet. J Manuf Syst 2021;23:2000941. http://dx.doi.org/10.1002/adem.202000941.
2021;58:205–22. http://dx.doi.org/10.1016/j.jmsy.2020.11.019. [37] You I, Choo KK, Ho CL. A smartphone-based wearable sensors for monitoring
[14] Kim G, Yang SM, Kim DM, Kim S, Choi JG, Ku M, et al. Bayesian-based real-time physiological data. Comput Electr Eng 2018;65:376–92. http://dx.doi.
uncertainty-aware tool-wear prediction model in end-milling process of titanium org/10.1016/j.compeleceng.2017.06.031.
alloy. Appl Soft Comput 2023;148:110922. http://dx.doi.org/10.1016/j.asoc. [38] Mastakouris A, Andriosopoulou G, Masouros D, Benardos P, Vosniakos GC,
2023.110922. Soudris D. Human worker activity recognition in a production floor environment
[15] Zhang P, Gao D, Hong D, Lu Y, Wang Z, Liao Z. Intelligent tool wear monitoring through deep learning. J Manuf Syst 2023;71:115–30. http://dx.doi.org/10.
based on multi-channel hybrid information and deep transfer learning. J Manuf 1016/j.jmsy.2023.08.020.
Syst 2023;69:31–47. http://dx.doi.org/10.1016/j.jmsy.2023.06.004. [39] Mertes J, Lindenschmitt D, Amirrezai M, Tashakor N, Glatt M, Schellen-
[16] Kim G, Park S, Choi JG, Yang SM, Park HW, Lim S. Developing a data-driven berger C, et al. Evaluation of 5G-capable framework for highly mobile, scalable
system for grinding process parameter optimization using machine learning human-machine interfaces in cyber–physical production systems. J Manuf Syst
and metaheuristic algorithms. CIRP J Manuf Sci Technol 2024;51:20–35. http: 2022;64:578–93. http://dx.doi.org/10.1016/j.jmsy.2022.08.009.
//dx.doi.org/10.1016/j.cirpj.2024.04.001. [40] Sjöberg A, Önnheim M, Frost O, Cronrath C, Gustavsson E, Lennartson B,
[17] Yangue E, Ye Z, Kan C, Liu C. Integrated deep learning-based online layer- et al. Online geometry assurance in individualized production by feedback
wise surface prediction of additive manufacturing. Manuf Lett 2023;35:760–9. control and model calibration of digital twins. J Manuf Syst 2023;66:71–81.
http://dx.doi.org/10.1016/j.mfglet.2023.08.108. http://dx.doi.org/10.1016/j.jmsy.2022.11.011.
[18] Liu C, Wang RR, Ho I, Kong ZJ, Williams C, Babu S, et al. Toward online layer- [41] Wegener K, Bleicher F, Heisel U, Hoffmeister HW, Moehring HC. Noise and
wise surface morphology measurement in additive manufacturing using a deep vibrations in machine tools. CIRP Ann 2021;70:611–33. http://dx.doi.org/10.
learning-based approach. J Intell Manuf 2023;34:2673–89. http://dx.doi.org/10. 1016/j.cirp.2021.05.010.
1007/s10845-022-01933-0. [42] Guo K, Sun J. Sound singularity analysis for milling tool condition monitoring
[19] Wang J, Yan J, Li C, Gao RX, Zhao R. Deep heterogeneous GRU model for towards sustainable manufacturing. Mech Syst Signal Process 2021;157:107738.
predictive analytics in smart manufacturing: Application to tool wear prediction. http://dx.doi.org/10.1016/j.ymssp.2021.107738.
Comput Ind 2019;111:1–14. http://dx.doi.org/10.1016/j.compind.2019.06.001. [43] Peng CC, Chen TY. A recursive low-pass filtering method for a commercial
[20] Sun H, Zhang J, Mo R, Zhang X. In-process tool condition forecasting based cooling fan tray parameter online estimation with measurement noise. Meas
on a deep learning method. Robot Comput-Integr Manuf 2020;64:101924. http: 2022;205:112193. http://dx.doi.org/10.1016/j.measurement.2022.112193.
//dx.doi.org/10.1016/j.rcim.2019.101924. [44] Polotski V, Kenne JP, Gharbi A. Kalman filter based production control of a
[21] Shi C, Luo B, He S, Li K, Liu H, Li B. Tool wear prediction via multidi- failure-prone single-machine single-product manufacturing system with imprecise
mensional stacked sparse autoencoders with feature fusion. IEEE Trans Ind Inf demand and inventory information. J Manuf Syst 2020;56:558–72. http://dx.doi.
2019;16:5150–9. http://dx.doi.org/10.1109/TII.2019.2949355. org/10.1016/j.jmsy.2020.07.010.
[22] Qin Y, Liu X, Yue C, Zhao M, Wei X, Wang L. Tool wear identification and [45] Zhong Y, Wang Z, Yalamanchili AV, Yadav A, Srivatsa BR, Saripalli S, et al.
prediction method based on stack sparse self-coding network. J Manuf Syst Image-based flight control of unmanned aerial vehicles (UAVs) for material
2023;68:72–84. http://dx.doi.org/10.1016/j.jmsy.2023.02.006. handling in custom manufacturing. J Manuf Syst 2020;56:615–21. http://dx.doi.
[23] Hahn TV, Mechefske CK. Self-supervised learning for tool wear monitoring with a org/10.1016/j.jmsy.2020.04.004.
disentangled-variational-autoencoder. Int J Hydromechatron 2021;4:69–98. http: [46] Kendall A, Gal Y. What uncertainties do we need in Bayesian deep learning
//dx.doi.org/10.1504/IJHM.2021.114174. for computer vision? In: Proceedings of the 31st int conf neural inf process
[24] Wang Y, Qin B, Liu K, Shen M, Niu M, Han L. A new multitask learning method syst; 2017; Dec 4-9. long Beach, CA, USA; http://dx.doi.org/10.5555/3295222.
for tool wear condition and part surface quality prediction. IEEE Trans Ind Inf 3295309.
2020;17:6023–33. http://dx.doi.org/10.1109/TII.2020.3040285. [47] Sequera A, Guo YB. Uncertainty analysis of tool wear and surface roughness in
[25] Liu C, Li Y, Li J, Hua J. A meta-invariant feature space method for accurate end milling. In: Proceedings of the 8th int manuf sci eng conf; 2013 Jun 10-14.
tool wear prediction under cross conditions. IEEE Trans Ind Inf 2021;18:922–31. Madison, Wisconsin, USA; 2013, http://dx.doi.org/10.1115/MSEC2013-1245.
http://dx.doi.org/10.1109/TII.2021.3070109. [48] Panda A, Sahoo AK, Kumar R, Das D. A concise review of uncertainty analysis
[26] Kim G, Yang SM, Kim S, Kim DY, Choi JG, Park HW, et al. A multi-domain in metal machining. Mater Today Proc 2020;26:1734–9. http://dx.doi.org/10.
mixture density network for tool wear prediction under multiple machining 1016/j.matpr.2020.02.365.
conditions. Int J Prod Res 2023. http://dx.doi.org/10.1080/00207543.2023. [49] Song G, Zhang J, Ge Y, Zhu K, Fu Z, Yu L. Tool wear predicting based
2289076. on weighted multi-kernel relevance vector machine and probabilistic kernel
[27] Yan B, Zhu L, Dun Y. Tool wear monitoring of TC4 titanium alloy milling principal component analysis. Int J Adv Manuf Technol 2022;122:2625–43.
process based on multi-channel signal and time-dependent properties by using http://dx.doi.org/10.1007/s00170-022-09762-4.
deep learning. J Manuf Syst 2021;61:495–508. http://dx.doi.org/10.1016/j.jmsy. [50] Li X, Liu X, Yue C, Wang L, Liang SY. Data-model linkage prediction of tool
2021.09.017. remaining useful life based on deep feature fusion and Wiener process. J Manuf
[28] Li Y, Wang J, Huang Z, Gao RX. Physics-informed meta learning for machining Syst 2024;73:19–38. http://dx.doi.org/10.1016/j.jmsy.2024.01.008.
tool wear prediction. J Manuf Syst 2022;62:17–27. http://dx.doi.org/10.1016/j. [51] Huang Z, Shao J, Guo W, Li W, Zhu J, He Q, et al. Tool wear prediction
jmsy.2021.10.013. based on multi-information fusion and genetic algorithm-optimized Gaussian
[29] Han S, Mannan N, Stein DC, Pattipati KR, Bollas GM. Classification and process regression in milling. IEEE Trans Instrum Meas 2023;72:2516716. http:
regression models of audio and vibration signals for machine state monitoring //dx.doi.org/10.1109/TIM.2023.3280531.
in precision machining systems. J Manuf Syst 2021;61:45–53. http://dx.doi.org/ [52] Dey A, Yodo N, Yadav OP, Shanmugam R, Ramoni M. Addressing uncertainty in
10.1016/j.jmsy.2021.08.004. tool wear prediction with dropout-based neural network. Comput 2023;12:187.
[30] Omri N, Al Masry Z, Mairot N, Giampiccolo S, Zerhouni N. Industrial data man- http://dx.doi.org/10.3390/computers12090187.
agement strategy towards an SME-oriented PHM. J Manuf Syst 2020;56:23–36. [53] Adamson G, Wang L, Moore P. Feature-based control and information framework
http://dx.doi.org/10.1016/j.jmsy.2020.04.002. for adaptive and distributed manufacturing in cyber physical systems. J Manuf
[31] Sun M, Guo K, Zhang D, Yang B, Sun J, Li D, et al. A novel exponential Syst 2017;43:305–15. http://dx.doi.org/10.1016/j.jmsy.2016.12.003.
model for tool remaining useful life prediction. J Manuf Syst 2024;73:223–40. [54] Xi X, Chen M, Zhao W. Improving electrical discharging machining efficiency by
http://dx.doi.org/10.1016/j.jmsy.2024.01.009. using a Kalman filter for estimating gap voltages. Precis Eng 2017;47:182–90.
[32] Kim G, Choi JG, Ku M, Lim S. Developing a semi-supervised learning and ordinal http://dx.doi.org/10.1016/j.precisioneng.2016.08.003.
classification framework for quality level prediction in manufacturing. Comput [55] Niaki F, Ulutan D, Mears L. In-process tool flank wear estimation in ma-
Ind Eng 2023;181:109286. http://dx.doi.org/10.1016/j.cie.2023.109286. chining gamma-prime strengthened alloys using Kalman filter. Proc Manuf
[33] Tao F, Zhang Y, Cheng Y, Ren J, Wang D, Qi Q, et al. Digital twin and blockchain 2015;1:696–707. http://dx.doi.org/10.1016/j.promfg.2015.09.018.
enhanced smart manufacturing service collaboration and management. J Manuf [56] Wang J, Zheng Y, Wang P, Gao RX. A virtual sensing based augmented particle
Syst 2022;62:903–14. http://dx.doi.org/10.1016/j.jmsy.2020.11.008. filter for tool condition prognosis. J Manuf Process 2017;28:472–8. http://dx.
[34] Gupta N, Gupta S, Khosravy M, Dey N, Joshi N, Crespo RG, et al. Economic doi.org/10.1016/j.jmapro.2017.04.014.
IoT strategy: The future technology for health monitoring and diagnostic of [57] Deng Y, Shichang D, Shiyao J, Chen Z, Zhiyuan X. Prognostic study of ball
agriculture vehicles. J Intell Manuf 2021;32:1117–28. http://dx.doi.org/10.1007/ screws by ensemble data-driven particle filters. J Manuf Syst 2020;56:359–72.
s10845-020-01610-0. http://dx.doi.org/10.1016/j.jmsy.2020.06.009.
156
[58] Maddox WJ, Garipov T, Izmailov P, Vetrov D, Wilson AG. A simple baseline [71] Li D, Chen J, Huang R, Chen Z, Li W. Sensor-aware CapsNet: Towards trust-
for Bayesian uncertainty in deep learning. In: Proceedings of the 33rd int conf worthy multisensory fusion for remaining useful life prediction. J Manuf Syst
neural inf process syst; 2019 Dec 8-14. Vancouver, CAN; http://dx.doi.org/10. 2024;72:26–37. http://dx.doi.org/10.1016/j.jmsy.2023.11.009.
5555/3454287.3455466. [72] Lakshminarayanan B, Pritzel A, Blundell C. Simple and scalable predictive
[59] Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: Representing uncertainty estimation using deep ensembles. In: Proceedings of the 31st int
model uncertainty in deep learning. In: Proceedings of the 33rd int conf mach conf neural inf process syst; 2017 Dec 4-9. Long Beach, CA, USA; http://dx.doi.
learn; 2016 Jun 19-24. New York, NY, USA; http://dx.doi.org/10.5555/3045390. org/10.5555/3295222.3295387.
3045502. [73] Mandt S, Hoffman MD, Blei DM. Stochastic gradient descent as approximate
[60] Choi S, Lee K, Lim S, Oh S. Uncertainty-aware learning from demonstration using Bayesian inference. J Mach Learn Res 2017;18:4873–907. http://dx.doi.org/10.
mixture density networks with sampling-free variance modeling. In: Proceedings 5555/3122009.3208015.
of the 35th IEEE int conf robot autom; 2018 May 21-6. brisbane, AUS; http: [74] Oliaei SN, Karpat Y. Influence of tool wear on machining forces and tool
//dx.doi.org/10.1109/ICRA.2018.8462978. deflections during micro milling. Int J Adv Manuf Technol 2016;84:1963–80.
[61] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural http://dx.doi.org/10.1007/s00170-015-7744-4.
networks. In: Proceedings of the 36th int conf mach learn; 2019 Jun 9-15. Long [75] Khan MA, Jaffery SH, Baqai AA, Khan M. Comparative analysis of tool wear
Beach, CA, USA; http://dx.doi.org/10.48550/arXiv.1905.11946. progression of dry and cryogenic turning of titanium alloy Ti-6Al-4V under
[62] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC. MobileNetV2: Inverted low, moderate and high tool wear conditions. Int J Adv Manuf Technol
residuals and linear bottlenecks. In: Proceedings of the 29th IEEE/CVF conf 2022;121:1269–87. http://dx.doi.org/10.1007/s00170-022-09196-y.
comput vision pattern recognit; 2018 Jun 18-22. Salt Lake City, Utah, USA; [76] Hou J, Zhou W, Duan H, Yang G, Xu H, Zhao N. Influence of cutting speed on
http://dx.doi.org/10.1109/CVPR.2018.00474. cutting force, flank temperature, and tool wear in end milling of Ti-6Al-4V alloy.
[63] Hu J, Shen L, Albanie S, Sun G, Wu E. Squeeze-and-excitation networks. IEEE Int J Adv Manuf Technol 2014;70:1835–45. http://dx.doi.org/10.1007/s00170-
Trans Pattern Anal Mach Intell 2019;42:2011–23. http://dx.doi.org/10.1109/ 013-5433-8.
TPAMI.2019.2913372. [77] Law M, Rentzsch H, Ihlenfeldt S. Predicting mobile machine tool dynamics by
[64] Taylor JR, Thompson W. An introduction to error analysis: The study of experimental dynamic substructuring. Int J Mach Tools Manuf 2016;108:127–34.
uncertainties in physical measurements. Berlin: Springer; 1982. http://dx.doi.org/10.1016/j.ijmachtools.2016.06.006.
[65] Bevington PR, Robinson DK, Blair JM, Mallinckrodt AJ, McKay S. Data reduction [78] Liu C, Tian W, Kan C. When AI meets additive manufacturing: Challenges and
and error analysis for the physical sciences. Comput Phys 1993;7:415–6. http: emerging opportunities for human-centered products development. J Manuf Syst
//dx.doi.org/10.1063/1.4823194. 2022;64:648–56. http://dx.doi.org/10.1016/j.jmsy.2022.04.010.
[66] Paiva JM, Shalaby MAbdulMonim, Chowdhury M, Shuster L, Chertovskikh S, [79] Grasso M, Colosimo BM. A statistical learning method for image-based moni-
Covelli D, et al. Tribological and wear performance of carbide tools with TiB2 toring of the plume signature in laser powder bed fusion. Robot Comput-Integr
PVD coating under varying machining conditions of TiAl6V4 aerospace alloy. Manuf 2019;57:103–15. http://dx.doi.org/10.1016/j.rcim.2018.11.007.
Coat. 2017;7:187. http://dx.doi.org/10.3390/coatings7110187. [80] Liu C, Law AC, Roberson D, Kong ZJ. Image analysis-based closed loop quality
[67] Kim G, Shin DH, Choi JG, Lim S. A deep learning-based cryptocurrency price control for additive manufacturing with fused filament fabrication. J Manuf Syst
prediction model that uses on-chain data. IEEE Access 2022;10:56232–48. http: 2019;51:75–86. http://dx.doi.org/10.1016/j.jmsy.2019.04.002.
//dx.doi.org/10.1109/ACCESS.2022.3177888. [81] Xia T, Shi G, Si G, Du S, Xi L. Energy-oriented joint optimization of machine
[68] Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach maintenance and tool replacement in sustainable manufacturing. J Manuf Syst
Learn Res 2012;13:281–305. http://dx.doi.org/10.5555/2188385.2188395. 2021;59:261–71. http://dx.doi.org/10.1016/j.jmsy.2021.01.015.
[69] Christ M, Braun N, Neuffer J, Kempa-Liehr AW. Time series feature extraction on [82] Li Y, Wang L, Chen X, Jin R. Distributed data filtering and modeling for fog
basis of scalable hypothesis tests (tsfresh – A python package). Neurocomputing and networked manufacturing. IISE Trans 2024;56:485–96. http://dx.doi.org/10.
2018;307:72–7. http://dx.doi.org/10.1016/j.neucom.2018.03.067. 1080/24725854.2023.2184884.
[70] Fernandes TE, Ferreira MA, Miranda GP, Dutra AF, Antunes MP, et al. Classi-
fication of lathe’s cutting tool wear based on an autonomous machine learning
model. J Control Autom Electr Syst 2022;33:167–82. http://dx.doi.org/10.1007/
s40313-021-00819-5.
157

1 s2.0 S0278612524001584 Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S0278612524001584 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0278612524001584 Main

Uploaded by

Copyright:

Available Formats

Journal of Manufacturing Systems 76 (2024) 133–157

Contents lists available at ScienceDirect

Journal of Manufacturing Systems

Developing a deep learning-based uncertainty-aware tool wear prediction

ARTICLE INFO ABSTRACT

1. Introduction manufacturing systems using various high-precision sensors and indus-

diverse sensor signals, such as cutting force, vibration, and acoustic

Fig. 1. An overview of the proposed uncertainty-aware tool wear prediction method.

Fig. 2. Illustration of the proposed uncertainty modeling method.

𝜖 ∼  (0, 1): randomly sampled Gaussian noise 𝐶: constant term independent of 𝝎

2.3. Probabilistic predictive model 2.3.3. Model inference

Fig. 3. An experimental setup for the turning process of Ti-6Al-4V.

3.4. Tool wear calculation

The tool wear model is empirically applied to predict continuous

𝑥: the original independent variable

3.6. Evaluation metrics

The proposed method’s superior tool wear prediction performance

Table 16 worn-out tools. Hence, these factors contribute to a substantial increase

sensors, the framework for manufacturing data analysis prediction Acknowledgments

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.