Advanced Science - 2023 - Su

RESEARCH ARTICLE
www.advancedscience.com
Battery Charge Curve Prediction via Feature Extraction and

Supervised Machine Learning
Laisuo Su, Shuyan Zhang, Alan J. H. McGaughey, B. Reeja-Jayan,
and Arumugam Manthiram*
of 46 cells.[4] The failure of one battery could

Real-time onboard state monitoring and estimation of a battery over its propagate quickly through the entire battery
lifetime is indispensable for the safe and durable operation of battery-powered pack, which triggers the malfunction of the
devices. In this study, a methodology to predict the entire constant-current battery system and may lead to safety issues
cycling curve with limited input information that can be collected in a short like smoke, fire, and explosion.[5] Therefore,
the states, such as state of charge (SOC)
period of time is developed. A total of 10 066 charge curves of LiNiO2 -based
and remaining energy, and statuses, such as
batteries at a constant C-rate are collected. With the combination of a feature health condition, of batteries need to be ac-
extraction step and a multiple linear regression step, the method can curately monitored to ensure their reliable
accurately predict an entire battery charge curve with an error of < 2% using and safe use.
only 10% of the charge curve as the input information. The method is further A battery management system (BMS) is
generally adopted to monitor the state of
validated across other battery chemistries (LiCoO2 -based) using open-access
batteries, record battery usage information,
datasets. The prediction error of the charge curves for the LiCoO2 -based analyze the status of batteries, and provide
battery is around 2% with only 5% of the charge curve as the input feedback and suggestions to customers.[6]
information, indicating the generalization of the developed methodology for The BMS can directly measure some key
predicting battery cycling curves. The developed method paves the way for information with sensors, such as voltage,
fast onboard health status monitoring and estimation for batteries during current, and temperature.[7] The combina-
tion of this information can further esti-
practical applications.
mate the state of each battery, including
SOC, remaining energy, and health con-
ditions. Accurately estimating the health
conditions of LIBs is very important, but challenging, to guide
1. Introduction the use of batteries and at the same time prevent accidents and
malfunctions.[8] The health condition of a battery is generally
Lithium-ion batteries (LIBs) are becoming the dominant
reflected by a decreased maximum capacity, growth of internal
rechargeable batteries and are widely used in portable electronic
resistance, and appearance of fatal aging mechanisms, such as
devices and electric vehicles (EVs).[1–3] Hundreds or even thou-
formation of lithium dendrite.[9] The assessment of these pa-
sands of LIBs are connected to provide sufficient energy for EVs.
rameters is not trivial as BMSs typically only sample charg-
For example, the Standard-range version of the Tesla Model 3
ing/discharging current and voltage of batteries at a SOC range
carries 2,976 LIBs arranged in 96 groups of 31 cells and the
that is defined by the usage habits of customers.[10]
Long-range version contains 4,416 LIBs arranged in 96 groups
Many efforts have been made to estimate the health state of
batteries in real applications. One common method is based on
L. Su, A. Manthiram models, such as equivalent circuit models[11] and mechanism-
Materials Science and Engineering Program & Texas Materials Institute based models,[12] to simulate the behaviors of batteries, followed
The University of Texas at Austin by various optimization algorithms and observations to identify
Austin, TX78712-1591, USA
E-mail: manth@austin.utexas.edu the parameters in the models and the health states.[13] The esti-
S. Zhang, A. J. H. McGaughey, B. Reeja-Jayan mation capability of battery health states relies on the accuracy of
Department of Mechanical Engineering the models and the optimization algorithms. Therefore, building
Carnegie Mellon University a representative model is crucial. Data-driven methods are gain-
Pittsburgh, PA15213, USA ing increasing attention for battery health estimation and predic-
The ORCID identification number(s) for the author(s) of this article
tion due to their flexibility.[14] The data-driven methods have been
can be found under https://doi.org/10.1002/advs.202301737 demonstrated to predict the state of health (SOH) and remaining
© 2023 The Authors. Advanced Science published by Wiley-VCH GmbH. useful life of LIBs using impedance spectroscopy[15] and to pre-
This is an open access article under the terms of the Creative Commons dict the cycle life using information from early cycles.[14]
Attribution License, which permits use, distribution and reproduction in Various useful battery characteristic information can be de-
any medium, provided the original work is properly cited. rived from the charge curve, such as maximum capacity that can
DOI: 10.1002/advs.202301737 be used to calculate SOH, available battery capacity that can be
Adv. Sci. 2023, 10, 2301737 2301737 (1 of 10) © 2023 The Authors. Advanced Science published by Wiley-VCH GmbH
21983844, 2023, 26, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/advs.202301737 by CAPES, Wiley Online Library on [28/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
www.advancedsciencenews.com www.advancedscience.com
used to estimate SOC, and other energy-related states. For exam- 2.2. Feature Extraction
ple, Feng et al. proposed a support vector machine-based algo-
rithm that can use a partial charging segment (15 min charging) Charge curves were selected for this study because the charging
to predict the battery SOH with less than 2% error for 80% of all protocols are more controllable than discharge protocols to pro-
the cases.[16] Zheng et al. proposed an online capacity estimation vide more consistent input in real-world applications. Three com-
method based on a partial charge curve that can be utilized for monly used unsupervised learning algorithms were applied to
battery lifetime prediction.[17] Duan et al. developed a convolu- extract features from the charge curves, which are principal com-
tional neural network to accurately predict the battery impedance ponent analysis (PCA), non-negative matrix factorization (NMF),
spectroscopy based on limited constant-current charging infor- and Autoencoder (AE). PCA and NMF are techniques that can de-
mation (500 mV voltage window).[18] Recently, Tian et al. further compose a matrix Q into two separated matrices W and H, such
applied a deep neural network to predict the complete charge that it can be written as Equation (1),
curve of a battery based on a small portion of the curve, which can
be further used to estimate the SOC and SOH of the battery.[10] ∑
p
Therefore, the charge curve is important for understanding the Qi𝜇 ≈ (WH)i𝜇 = Wia Ha𝜇 (1)
a=1
status of a battery.
The charge curve of a battery depends on the chemistry of where Q is an n × m matrix that contains all the raw data in-
battery electrodes, the charging current, and the health status of formation. W and H have dimensions of n × p and p × m. The
the battery. As the first two parameters are known and measur- hyperparameter p is the number of features. The p columns of W
able in real applications, quantifying the aging mechanisms, i.e., can be interpreted as the features of charge curves, which will be
health status, of the battery is crucial for accurately predicting used to predict the complete charge curves by a supervised learn-
the charge curve. In this study, we compare the ability of human ing model. p is chosen based on the prediction accuracy of the
experts and machine learning algorithms to quantify the aging validation set for all feature extraction algorithms. Each column
mechanisms of batteries. Unsupervised learning algorithms per- of H contains the weights in a one-to-one correspondence with a
form better in capturing features necessary to quantify the battery basis feature in W.[25]
aging mechanisms, which agrees well with existing studies that To obtain the elements of W and H, an optimization problem
show the advantages of machine learning algorithms for feature with the objective function Q − WHF was solved, where · F is the
extraction and selection.[19,20] These features were further used Frobenius norm. The difference between PCA and NMF lay in
as the inputs for predicting the complete charge curves. Different the constraints on the optimization. In PCA, the columns of W
from the previous study that can only take one continuous charge were orthonormal, and the rows of H were orthogonal, such that
curve segment as the input,[10] our developed methodology can a unique solution was guaranteed.[26] In NMF, the elements of
also take multiple separated segments as the input, which will Q, W, and H were constrained to be non-negative. There is no
largely increase the practical applicability of the method. Finally, unique solution because the problem is non-convex.[27] As such,
we demonstrate the applicability of the developed methodology in we employed an initialization scheme called non-negative double
predicting open-source battery charge curves with different bat- singular value decomposition, which rapidly reduces the approx-
tery chemistries. imation error to a value that is lower than that using a random
initialization.[28] PCA and NMF are performed using the Scikit-
Learn package.[29]
2. Experimental Section Autoencoder is an unsupervised learning method that adopts
neural network architectures for the task of feature learning.[30]
2.1. Data Generation The neural network is constructed with a bottleneck layer that
enforces a compressed representation of the input layer, which
Two datasets were used in this study, including a lab-generated has the same dimension as the output layer. The autoencoder
dataset and an open-access dataset. The lab-generated dataset was with a single hidden layer implemented in this work is shown
collected in CR-2032 type coin cells with LiNiO2 as the cathode in Figure S1 (Supporting Information). This autoencoder is sim-
and Li metal as the anode.[21–23] The coin cells were tested at a ilar to NMF in that the hidden layer contains the weight matrix
C/10 rate three times after assembling, followed by cycling at (H) corresponding to the charge curves in the decoder weight ma-
room temperature with a C/2 charge rate and 1C discharge rate. trix (W),[31] and the product of the two matrices approximates the
The C/2 charge curves of these cells were used for this study, and input charge curve matrix. The autoencoder differs from NMF
a total of 10066 charge curves were selected with a minimum that there is no non-negativity constraint on the decoder weight
charge capacity of 160 mA h g−1 . matrix. It can also be easily extended by adding more fully con-
The open-access dataset was also used to evaluate the de- nected layers or other layer types, such as recurrent neural net-
veloped methodology. A total number of 4522 charge curves works and convolutional neural networks.[31] The autoencoders
were taken from the Center for Advanced Life Cycling En- are implemented using the Pytorch package.[32]
gineering (CALCE) dataset (CS2_3, CS2_8, CS2_9, CS2_21,
CS2_33∼CS2_38) provided by the A. James Clark School of En-
gineering at the University of Maryland.[24] The CALCE dataset 2.3. Charge Curve Prediction
was obtained from batteries with LiCoO2 as the cathode material
with trace elements of manganese, which is different from the Figure 1 shows the workflow of this study. The 10066 charge
LiNiO2 cathode tested in our lab. curves of the LiNiO2 cells were randomly divided into a training
Figure 1. The workflow for predicting an entire charge curve of a battery based on a portion of the charge curve. Both a continuous segment and multiple
separated segments can be used as the input. The output charge curve can derive many key states (SOC, SOH, and remaining energy) and even the
aging mechanism of the battery using the dQ dV−1 analysis.
set and a testing set with a ratio of 8 : 2, and 20% of the training set charge curve with n points, the cost function can be defined as
was used as the validation set to determine the hyper-parameters Equation (3),
in the models. Five-fold cross-validation was conducted to avoid
( )2
overfitting. Features were extracted from the training dataset us- ( ) ∑n−1
∑
p
( ) ∑
p
| |
ing PCA, NMF, and AE, from which we obtained the matrix W  h, Q0 = Q0 + ΔQi − Wj Vi ⋅ hj +𝜆 |hj | (3)
| |
that contains p columns. Each column in matrix W represents a i=0 j=1 j=1
feature extracted from the training set.

where 𝜆 ≥ 0 is the regularization parameter that controls
the trade-off between approximation error and regularization
2.3.1. Single Segment Input strength.[33] The hyperparameter 𝜆 is determined through the val-
idation set.
Given a partial charge curve with an arbitrary starting voltage that Solving this multiple linear regression will lead us to the h and
corresponds to a starting capacity Q0 and a voltage window length Q0 that minimize the cost function. To determine the complete
that corresponds to a capacity range of Qpartial , it is assumed that charge curve Qcomplete , we just need to take the linear combination
it can be approximated by the linear combinations of the features of the charge curves W obtained from feature extraction and the
in W. Thus, for the ith point on the partial charge curve, we can weight vector h corresponding to the partial charge curve,
write the following Equation (2),
Qcomplete = W ⋅ h (4)
∑p
( )
Q0 + Δ Qi = Wj Vi ⋅ hj + 𝜀i (2) The accuracy of the prediction is quantified by the root-mean-
j=1 squared error (RMSE) between the predicted complete charge
curve Qcomplete and the ground true charge curve Qtrue , as shown
where Q0 is the unknown starting capacity and is equivalent to in Equation (5).
the intercept of the linear regression and ΔQi is the incremental √
capacity relative to the starting capacity Q0 , obtainable from ex- √ n−1
√1 ∑( )2
perimental measurements. Wj is the jth column of W, hj is the RMSE = √ Qcomplete (i) − Qtrue (i) (5)
unknown weight corresponding to the features in W, and 𝜖 is the n i=0
error that follows a normal distribution.
The relationship between the incremental capacity ΔQi and
the feature at a specific voltage Wj (Vi ) can thus be modeled as a 2.3.2. Multiple Separated Segments Input
multiple linear regression problem. We use the L1 -norm as the
regularization term to penalize the parameters and reduce over- In practical applications, regenerative braking was widely
fitting. This regularizer (also called Lasso regularizer) can lead to adopted in electric or hybrid vehicles to restore the wasted energy
some parameters being zero, i.e., removing the parameters for from the process of slowing down a car and using it to recharge
output evaluation.[33] Thus, the Lasso regularizer can also serve the batteries.[34] The process of regenerative braking results in
as a feature selection method. Given a single-segment partial many separated charge segments instead of a continuous charge
Figure 2. Battery charge curves feature extraction and reconstruction. a) The visualization of all the 10066 charge curves. b) Normalized charge curves
and the averaged normalized charge curve calculated by taking the average of the normalized capacities at different voltages. c) Comparison between
three measured charge curves and the corresponding reconstructed charge curves. d) Distribution of the reconstruction error of all the charge curves.
(( ) )2
curve during the charging process. These separated segments ( ) ∑
n−1
dQ ∑
p
( ) ∑
p
| |
can also be used as the input in the model to predict the entire IC hIC = − WIC,j Vi ⋅ hIC,j +𝜆 |hIC,j | (8)
dV i
| |
charge curve. i=0 j=1 j=1
Given m input segments with n points in total, we have m start-

dQ
ing capacity Q0,k (k = 1, 2, …, m). This makes the multiple linear The complete dQ dV−1 curve is obtained by ( dV ) =
compleate
regression problem challenging to solve. For each segment, the WIC hIC , and the complete charge curve was recovered by inte-
linear relation at each point is displayed in Equation (6). grating the dQ dV−1 curve with respect to the voltage.
∑
p
( )
Q0,k + Δ Qi,k = Wj Vi,k ⋅ hj + 𝜀i,k (6)
j=1 3. Results and Discussion
All input segments share the same h but have different start- 3.1. Charge Curve Feature Extraction by Experts
ing capacity Q0,k . This implies that the problem becomes m linear
regressions with the same hyperplanes and different intercepts. Figure 2a displays all the charge curves collected in our lab with
We took the derivative of the input segments with respect to volt- LiNiO2 as the cathode material, and the capacity distribution of
age, which generates the incremental capacity (IC) curves or dQ these curves is shown in Figure S2 (Supporting Information).
dV−1 curves. The dQ dV−1 analysis removes the unknown Q0,k Figure 2b shows the normalized charge curves and the aver-
because the analysis is based on the relative change of capacity. aged normalized curve. A normalized charge curve was calcu-
For this approach to work, we need to create another dataset with lated by dividing its capacity by the maximum charge capacity of
dQ dV−1 curves and perform feature extraction to obtain the dQ the curve. And the average normalized curve was calculated by
dV−1 curve matrix WIC . The ith point on the n total points of the taking the average of the normalized capacity for all the normal-
input can then be written as Equation (7), ized curves at different voltages. This averaged normalized curve
is considered an expert-extracted feature, where the maximum
( )
dQ ∑
p
( ) charge capacity can be correlated to the amount of LiNiO2 active
= WIC,j Vi ⋅ hIC,j + 𝜀IC,i (7) materials.
dV i j=1 To quantify the capability of representing the battery degra-
dation using the expert-extracted feature (averaged normalized
where WIC,j is the jth column of WIC . The cost function can be curve in Figure 2b), we compared the difference between the
defined as Equation (8). reconstructed charge curves and three tested charge curves
Figure 3. Battery charge curve feature extraction and reconstruction using unsupervised learning algorithms. a) Visual representation of charge curves
matrix decomposition with two components. b,c) Comparison between three measured charge curves and the corresponding reconstructed charge
curves based on PCA-captured features with b) two components and c) ten components. d) Evolution of the reconstruction errors of all the 10066
charge curves with the number of components in the PCA and NMF. The error bar represents the standard deviation of the errors.
(Figure 2c). The three charge curves were selected to be the ones ticles with a length of around 100 nm.[21] The extraction of Li+
with the maximum capacity (curve 3), the minimum capacity from these primary particles and secondary particles is nonuni-
(curve 1), and the median capacity (curve 2). The reconstructed form. The heterogeneity of the Li+ extraction also depends on the
curve 2 matches well with the measured curve, while there are no- SOC and different aging status, making the reconstruction pro-
ticeable differences between the reconstructed and the measured cess of the actual charge curve even more complicated. There-
data for curve 1 and curve 3. Moreover, Figure 2d summarizes the fore, the features that represent other degradation mechanisms
distribution of the reconstruction errors of all the charge curves of LiNiO2 cells need to be considered and extracted to predict the
calculated from Equation (9), health status of batteries.
√
√ n−1
√1 ∑( )2
Error = √ Qmeasured (i) − Qreconstructed (i) (9) 3.2. Charge Curve Feature Extraction by Machine Learning
n i=0
Unsupervised learning algorithms were further applied to extract
where Qmeasured (i) is the actual capacity value at the sampling point features from the charge curves. Figure 3a shows the decompo-
i, Qreconstructed (i) is the reconstructed capacity value at the sampling sition of all the 10066 charge curves (Q) into two components (p
point i, and n is the total number of sampling points. The average = 2) and the corresponding weights using PCA. The mathemati-
error is 4.7 mA h g−1 with a standard deviation of 2.0 mA h g−1 . cal principle of charge curve matrix decomposition can be found
The averaged normalized curve in Figure 2b represents the in Experimental Section. Component 1 has a similar shape as
characteristic voltage profile of LiNiO2 during delithiation, while the expert-extracted feature shown in Figure 2b, indicating that it
the maximum charge capacity is determined by the amount of represents the characteristic voltage profile of LiNiO2 during de-
LiNiO2 active materials. If the loss of active materials is the only lithiation and the corresponding weight 1 represents the amount
degradation mechanism for the capacity fading of LiNiO2 cells, of active materials. Interestingly, component 2 shows a similar
all the charge curves can then be accurately reconstructed us- shape to the dQ dV−1 analysis of the charge curve, where each
ing the extracted feature (the averaged normalized curve) and phase transition corresponds to a peak in the dQ dV−1 curve.[37]
the corresponding parameter (maximum charge capacity). How- As Li+ diffusion kinetics within LiNiO2 cathode particles follows
ever, there are many other degradation mechanisms, such as the same trend of the dQ dV−1 curve,[23] component 2 may rep-
impedance growth and reaction heterogeneity.[35] The impedance resent the kinetic effect of LiNiO2 during de-lithiation, and the
growth leads to an increased voltage polarization, which shifts corresponding weight 2 represents the kinetic contribution to the
the charge curve upward. Moreover, the impedance of a battery overall charge capacity.
is a function of the SOC,[36] complicating the reconstruction pro- The accuracy of the reconstructed charge curve can be im-
cess of the charge curve. Similarly, the LiNiO2 electrode is com- proved by increasing the number of components in the unsu-
posed of many secondary particles with a diameter of around pervised learning algorithms. Figure 3b,c shows the reconstruc-
12 μm, which is further composed of hundreds of primary par- tion of three charge curves based on the two components (p
Figure 4. Charge curve prediction based on PCA-extracted features. a) The average prediction error of the charge curves using the features captured by
PCA. The x axis is the ratio between the voltage window of (Vstart − 3.4 V) and the total voltage window (4.4 V − 3.4 V), and the y axis is ratio between
the voltage window of (Vend − Vstart ) and the total voltage window (4.4 V − 3.4 V). The inset shows the average prediction error across different starting
positions with respect to the input length. b) The best and worst prediction results of the first charge curve and the last charge curve. The locations of
the best and worst predictions are marked in (a). c,) The corresponding best and worst predictions of the dQ dV−1 curves of the two charge curves in
(b).
= 2, Figure 3b) and ten components (p = 10, Figure 3c) using 100% of the charge curve was shown to highlight the prediction
PCA. The three charge curves were selected to be the ones with accuracy. The inset shows the average prediction error with differ-
the maximum capacity (curve 3), the minimum capacity (curve ent input lengths, and the error bars represent the standard devi-
1), and the median capacity (curve 2). Compared to Figure 2d, ation of the errors across different starting positions. The result
which is reconstructed based on only one expert-extracted fea- reveals that the model based on PCA-extracted features generally
ture, Figure 3b shows a much more improved reconstruction ac- has a better prediction performance when the input sequence has
curacy for the three curves, especially for curve 1 and curve 3. The a starting position at 10–25%, which may depend on the investi-
three reconstructed curves overlapped with the measured curves gated cathode and anode materials that determine the shape of
when the number of components is increased to ten (Figure 3c). the charge curve. The maximum relative error is 1.8% (4.0 mA h
Figure 3d further shows the average reconstruction error of all g−1 ) with 40% of input length, regardless of the starting position.
the 10066 charge curves with respect to the number of com- The averaged relative error is only 1.3% (2.8 mA h g−1 ) with 40%
ponents using two different unsupervised algorithms (PCA and of input, and it further reduces to less than 1.0% (2.2 mA h g−1 )
NMF), and the standard deviation of the errors is shown by the when the input goes beyond 50%.
error bar. It needs to be noted that the NMF algorithm does not The averaged prediction errors of all the test data (2013 charge
converge when the number of components is one (p = 1), thus it curves) are further plotted in Figure S3 (Supporting Informa-
starts at p = 2. The result suggests that the reconstruction error tion). The large prediction error at the bottom left corner in
decreases with the number of components in both algorithms. Figure S3 (Supporting Information) is caused by the limited
meaningful input information between 3.4–3.6 V, which feeds
a capacity value of zero into the model. In actual battery appli-
3.3. Charge Curve Prediction Based on a Single Input Segment cations, the amount of input data depends on the charge time,
which is directly related to the capacity rather than the voltage.
The PCA algorithm was first applied to extract features from No zero capacity values will be fed into the model, and the poor
training charge curves to be used for predicting the entire charge prediction performance at the bottom left region can be avoided.
curves. Figure 4 suggests that the model based on PCA-extracted Figure S4 (Supporting Information) shows the evolution of the
features can accurately predict the battery charge curve with the averaged prediction error with respect to the input length. The
limited input information. A total number of 13 components (p = averaged prediction error reaches 6.5 mA h g−1 with only 20%
13) was extracted to achieve a reasonable prediction accuracy on of input length, corresponding to 3.0% of the relative error when
the validation set. Figure 4a shows that the prediction accuracy normalized by the maximum specific capacity of 220 mA h g−1 .
depends on the input data, which can be defined by the start- Moreover, the NMF-extracted features and AE-extracted features
ing voltage (viz., starting position) and the voltage window (viz., can also be used for predicting the entire charge curves, and the
length of the input data). A selected input length from 40% to performance of the model is shown in Figures S5 and Figure S6
Figure 5. Charge curve prediction based on multiple separated input segments. a) The prediction error of the model based on AE-extracted features.
The x axis is the number of segments, and the y axis is the total length of the input sequence. b,c) Comparison between measured and predicted dQ
dV−1 curves of b) the first cycle (with the maximum charge capacity) and c) the last cycle (80% of the maximum charge capacity). Two different types of
inputs are examined, which are marked in (a). d) The corresponding charge curves calculated from the dQ dV−1 curves in (b) and (c).
(Supporting Information). Both show high prediction accuracy, 3.4. Charge Curve Prediction with Multiple Separated Input
but slightly worse than the performance of the model based on Segments
the PCA-extracted features. Therefore, no further analysis was
conducted on these two models for the single-segment input. The features extracted by the three unsupervised algorithms can
To visualize the performance of the model based on PCA- also predict the entire battery charge curves using multiple sepa-
extracted features, we show in Figure 4b the prediction of the rated input segments. Among the three algorithms, the AE with
charge curves with the largest charge capacity (equivalent to the one hidden layer containing 20 neurons has the best prediction
first cycle) and 80% of the largest charge capacity (equivalent to performance on the validation set. Figure 5 shows the perfor-
the last cycle), where 40% of the charge curve is chosen as the mance of the model using multiple separated input segments
input. As the prediction accuracy depends on the starting posi- based on AE-extracted features, and Figure S7 (Supporting Infor-
tion, the starting positions of the best and worst predictions are mation) displays the performance of the model based on PCA-
marked in Figure 4b, which displays both the best and the worst extracted features and NMF-extracted features. The segments
prediction results of the charge curve in the first and the last cycle. were randomly selected on a charge curve, and the error is av-
The prediction error of the first cycle is 1.83 mA h g−1 (0.83% rel- eraged throughout all the test data. It needs to be noted that the
ative error) and 3.28 mA h g−1 (1.49% relative error) for the best best feature extraction algorithms depend on the shape of the
and worst prediction, respectively. And the prediction error of the charge curve, which is determined by the materials of the two
last cycle is 1.31 mA h g−1 (0.60%) and 2.99 mA h g−1 (1.36%) for electrodes in a battery. Thus, other feature extraction algorithms
the best and worst predictions, respectively. The small prediction may be applied for batteries with different types of electrodes to
errors suggest an outstanding performance of the model based achieve optimal prediction performance.
on the PCA-extracted features. Figure 5a indicates that the prediction error decreases with the
Moreover, the corresponding dQ dV−1 curves derived from the increase in the number of segments and the total input length in
charge curves are shown in Figure 4c,d. The dQ dV−1 curve has the model. The model achieves high prediction accuracy when
been reported to be a versatile tool for diagnosing battery degra- the number of segments is more than 15, even with a small input
dation mechanisms, such as loss of active materials, impedance length. For example, the prediction error can be as low as 4.1 mA
increase, and lithium plating.[38] A good match among these dQ h g−1 (the relative error is 1.9%) with only 10% of input length
dV−1 curves highlights the significance of the prediction method. when the number of segments reaches 20, which corresponds to
Thus, a full charge curve at a constant current is no longer around 12 min of charge data collected at a C/2 rate. It should be
needed to evaluate the health status of batteries, which is time- mentioned that dQ dV−1 curves were predicted first when apply-
consuming to collect and, in certain cases, unrealistic. Instead, ing the multiple separated input segments, which were then used
a partial charge curve is sufficient to construct the full dQ dV−1 to calculate the charge curves by integrating the dQ dV−1 curves
curve for analyzing the degradation mechanisms of batteries us- on voltage. We also examined the performance of the model by
ing the methodology that we developed here. predicting the charge curves directly from the separated charge
Figure 6. Applying the methodology to other battery chemistries. a) Charge curves and normalized charge curves of batteries taken from the Center for
the CALCE dataset. b) Distribution of the charge capacity of all the 4522 charge curves. c) Performance of the model based on PCA-extracted features and
a single input segment. The inset shows the average prediction error across different starting positions. d) The best and worst prediction results of the
first charge curve and last charge curve. The corresponding starting positions are marked in (c). e) Performance of the model based on the PCA-extracted
features and multiple separated input segments. f) Prediction results of the first and last charge curves based on only 5% of the input length and 10
segments. The corresponding input condition is marked in (e).
curve segments, but the prediction accuracy is not as good, as ent from the LiNiO2 cathode that we tested in the lab, and
shown in Figure S8 (Supporting Information). thus shows different shapes of the charge curves, as shown in
Figure 5b,c compares the tested and predicted dQ dV−1 curves Figure 6a. The accurate charge curve prediction of these bat-
with the maximum capacity (Figure 5b, first cycle) and 80% of the teries would illustrate the wide applicability of the developed
maximum capacity (Figure 5c, last cycle). Two types of inputs are methodology.
selected for the model: a total input length of 20% with 10 seg- The batteries from the CALCE dataset show a maximum
ments and a total input length of 10% with 20 segments, which charge capacity of ≈1 Ah and a minimum capacity of 0.6 Ah.
are marked in Figure 5a. Increasing the number of segments is Figure 6a displays all the charge curves with the same charg-
more effective in improving the prediction accuracy than increasing rate (C/2) and the corresponding normalized charge curves.
ing the input length. For example, Figure 5a shows the prediction Figure 6b shows the charge capacity distribution of the 4522
error of the model with 10% input and 20 segments (4.0 mA h curves. The decrease in the charge capacity could be attributed to
g−1 ) is smaller than that with 20% input and 10 segments (5.1 mA the loss of active materials, growth of the resistance, and other
h g−1 ). Figure 5b,c further shows that the peak positions and in- mechanisms that can hardly be extracted or quantified by hu-
tensities of the dQ dV−1 curves are accurately predicted by the man experts. The three unsupervised learning algorithms (PCA,
model that uses 20 segments and 10% of input length as the NMF, and AE) were applied to extract features to be fed into the
input. By comparison, there is a slight mismatch of peak in- multiple linear regression model for predicting the health status
tensities at 4.15 V (Figure 5b) and 3.65 V (Figure 5c) between of the battery.
the tested curves and the predicted ones with 10 segments and Figure 6c and Figure S9 (Supporting Information) show the
20% of input length as the input. Such a mismatch can lead to performance of the model for predicting the overall charge curve
over- or under-estimation of the total charge capacity, as shown in using one continuous input segment. Overall, the model based
Figure 5d. on the PCA-extracted features outperforms the model based on
the other algorithms-extracted features in the validate set, which
were further used to predict the entire charge curves of the
3.5. Charge Curve Prediction for Different Batteries CALCE battery. Figure 6c suggests that the prediction error de-
pends on the starting position and the length of the input data.
To evaluate the applicability of the methodology developed in A relatively large error appears in the bottom left corner that cor-
this work, we applied the workflow to open-access battery cy- responds to 0–15% of the starting position, which is also shown
cling data.[24] A total number of 4522 charge curves were taken in Figure S10 (Supporting Information) with a full range of the
from the CALCE dataset. The CALCE dataset was tested from input length from 1% to 100%. This large error is caused by the
batteries with LiCoO2 as the cathode material, which is differ- sharp voltage increase between 3.5 and 3.7 V (Figure 6a). As the
input length was defined by the voltage range rather than the 3.6. Moving forward to the Real-World Applications
charge capacity, the starting position at around 3.5 V would lead
to much less meaningful input information compared to that The proposed methodology for predicting battery cycle life in-
started at a higher voltage. To account for this drastic increase cludes two steps: feature extraction and multiple linear regres-
in the voltage region, we avoid this specific region when calculat- sion. Different from the previous literature that treats the data-
ing the average prediction error with respect to the input length, driven method as a “black box”,[8] the feature extraction step cap-
as shown in the inset of Figure 6c. The average prediction error is tures important information about the battery system that re-
less than 0.01 Ah when the input length goes beyond 50%, which flects the degradation mechanisms, such as loss of active mate-
corresponds to a relative error of < 1.0% after being normalized rials, impedance growth, and increase of reaction heterogeneity.
by the maximum capacity of 1 Ah. Moreover, the linear regression step provides the parameters that
Figure 6d displays the prediction of the charge curves with the can be used to predict the health status of the battery, including
largest charge capacity (first cycle) and 80% of the largest charge remaining useful life, SOC, and an entire charge curve. There-
capacity (last cycle). 50% of the charge curve was chosen as the in- fore, the developed methodology has wide application in under-
put, and two different starting positions were selected as marked standing aging mechanisms, predicting health status, and pro-
in Figure 6c to represent the best and worst performances of viding advice to customers to optimize the application of batter-
the model. The good agreement between the prediction curves ies in their devices. However, there are some gaps between this
and the tested curves indicates the outstanding performance of study and the real-world applications of the method in a BMS,
the model. Moreover, the corresponding dQ dV−1 curves derived which warrants further investigation.
from the charge curves also match well between the prediction First, the charge curves in the study were collected at a constant
and the test data (Figure S11, Supporting Information), includ- C-rate (C/2) and at room temperature. But the current and tem-
ing the positions and intensities of all the peaks. perature vary in real-world battery applications. Collecting and
Figure 6e shows the performance of the model with multiple selecting the appropriate information to be used as the input will
separated input segments based on the PCA-extracted features. be an important step to improve the robustness of the model. An
The results show that the prediction accuracy increases with the alternative solution is to develop a more robust model that can
increase in the number of segments and the input length. When take all the information (current, temperature, voltage, capacity,
the number of segments is more than 10, the prediction error is etc.) as the inputs, and neural networks could be a candidate for
close to 0.02 Ah (relative error is 2.0%) with only 5% of the input solving the problem.
length. Figure 6f displays the performance of the model to predict Second, batteries with different types of chemistries (cathodes
the charge curves in the first and the last cycle with 5% of the total and anodes) have been widely used, which leads to different
input length and 10 separated segments. Figure S12 (Supporting shapes of cycling curves. Although we examined two different
Information) further displays the corresponding dQ dV−1 curves batteries and demonstrated the applicability of the methodology
derived from the charge curves. The almost perfect overlap be- in both cases, further evaluation of the model in other battery
tween the predicted curves and the tested curves in both Figure 6f systems is needed. Moreover, the optimal feature extraction algo-
and Figure S12 (Supporting Information) highlights the signifi- rithms may differ from one battery system to another. More al-
cance of the prediction method to diagnose the health status of gorithms should be examined to obtain the model with the best
batteries. prediction performance for a specific battery system.
Last, we investigated the effect of data size and the capability Moreover, it is worth noting that dQ dV−1 curves are gener-
of the model in predicting the maximum capacity (SOH). To eval- ally plotted at low rates (C/10 or below) to investigate the ther-
uate the effect of data size on the performance of the developed modynamical aspects of the battery. The accuracy of the dQ dV−1
methodology, we randomly selected 3000, 1500, and 500 charge analysis also depends on the quality of the measurement data.[38]
curves from the CALCE dataset (4522 in total). Figure S13 (Sup- For example, it is important to ensure environmental consistency
porting Information) compares the prediction accuracy of the during the test (temperatures and contacts), and the sampling
model with different sizes of training datasets when a single seg- rate should be reasonable to ensure enough data points for anal-
ment is used as the input. The result suggests that the size of the ysis and avoid large data files. Best practices for testing have been
dataset has a negligible effect on the performance of the model. introduced in the literature.[39] In our study, the charging rate
It can still accurately predict the entire charge curve even if only was C/2 for the batteries. The prediction accuracy of the dQ dV−1
500 charge curves are used. Figure S14 (Supporting Information) curves is expected to increase with a slower charging rate, which
further indicates that the size of the dataset has little effect on the warrants further investigation.
performance of the model when multiple segments are used as Finally, correlating the algorithms-extracted features with the
the input. These results highlight the robustness of the devel- degradation mechanisms of a battery is an important step to
oped approach in predicting the entire charge curve, which will deepen our understanding of the system. As a battery is a com-
be encouraging for practical applications where a limited dataset plex nonlinear system, the evolution of the electrodes (cathode
is available. Moreover, Figure S15 (Supporting Information) dis- and anode), electrolytes, and the interface between them could
plays a parity plot of the maximum capacity. A single segment lead to a change in the capacity and resistance, which will be
with 40% of the charge curve is used as the input to showcase reflected in the cycling curve. Uncovering the battery degrada-
the capability of SOH prediction. A less than 1% train and test tion mechanisms and quantifying their effect on the shape of the
error indicate the high accuracy of the developed methodology in charge curve could help build physics-informed models to reach
predicting battery SOH. an optimal prediction of battery performance.
4. Conclusion [5] J. Zhang, L. Su, Z. Li, Y. Sun, N. Wu, Batteries 2016, 2, 12.
[6] M. Nizam, H. Maghfiroh, R. A. Rosadi, K. D. Kusumaputri, AIP Conf.
Data-driven methods have a superior ability to capture features in Proc. 2020, 2217, 030157.
cycling curves. The features can be correlated to the aging mech- [7] J. Huang, S. T. Boles, J.-M. Tarascon, Nat. Sustain. 2022, 5, 194.
anism of batteries, such as loss of active materials and growth [8] Y. Li, K. Liu, A. M. Foley, A. Zülke, M. Berecibar, E. Nanini-Maury,
of resistance. Moreover, these features can be combined with a J. Van Mierlo, H. E. Hoster, Renewable Sustainable Energy Rev. 2019,
multiple linear regression model to predict a complete cycling 113, 109254.
[9] A. Farmann, W. Waag, A. Marongiu, D. U. Sauer, J. Power Sources
curve based on a limited portion of it. We demonstrate that a
2015, 281, 114.
single continuous segment and multiple separated segments can [10] J. Tian, R. Xiong, W. Shen, J. Lu, X.-G. Yang, Joule 2021, 5, 1521.
be used as the input to predict the complete cycling curve. The [11] Z. Liu, Z. Li, J. Zhang, L. Su, H. Ge, Energies 2019, 12, 757.
model achieves a 2% prediction error of an entire charge curve [12] X.-G. Yang, Y. Leng, G. Zhang, S. Ge, C.-Y. Wang, J. Power Sources
using only 10% of the curve as the input for the LiNiO2 -based bat- 2017, 360, 28.
teries and achieves the same accuracy with only 5% of the curve [13] L. Lu, X. Han, J. Li, J. Hua, M. Ouyang, J. Power Sources 2013, 226,
as the input for the LiCoO2 -based batteries. The complete charge 272.
curve can be used to evaluate the health status of batteries, which [14] K. A. Severson, P. M. Attia, N. Jin, N. Perkins, B. Jiang, Z. Yang, M.
can not only guide the use of batteries but prevent accidents and H. Chen, M. Aykol, P. K. Herring, D. Fraggedakis, M. Z. Bazant, S. J.
malfunctions. Harris, W. C. Chueh, R. D. Braatz, Nat. Energy 2019, 4, 383.
[15] Y. Zhang, Q. Tang, Y. Zhang, J. Wang, U. Stimming, A. A. Lee, Nat.
Commun. 2020, 11.
[16] X. Feng, C. Weng, X. He, X. Han, L. Lu, D. Ren, M. Ouyang, IEEE Trans.
Supporting Information Veh. Technol. 2019, 68, 8583.
[17] Y. Zheng, C. Qin, X. Lai, X. Han, Y. Xie, Appl. Energy 2019, 251, 113327.
Supporting Information is available from the Wiley Online Library or from
[18] Y. Duan, J. Tian, J. Lu, C. Wang, W. Shen, R. Xiong, Energy Storage
the author.
Mater. 2021, 41, 24.
[19] L. Su, M. Wu, Z. Li, J. Zhang, eTransportation 2021, 10, 100137.
[20] Y. Liu, J.-M. Wu, M. Avdeev, S. Q. Shi, Adv. Theory Simul. 2020, 3,
Acknowledgements 1900215.
[21] L. Su, E. Jo, A. Manthiram, ACS Energy Lett. 2022, 7, 2165.
This work was supported by the Assistant Secretary for Energy Efficiency
[22] L. Su, X. Zhao, M. Yi, H. Charalambous, H. Celio, Y. Liu, A.
and Renewable Energy, Office of Vehicle Technologies of the U.S. Depart-
Manthiram, Adv. Energy Mater. 2022, 12, 2201911.
ment of Energy through the Advanced Battery Materials Research (BMR)
[23] L. Su, K. Jarvis, H. Charalambous, A. Dolocan, A. Manthiram, Adv.
Program (Battery500 Consortium) award number DE-EE0007762. B.R.J.
acknowledges support from the National Science Foundation (NSF) CA- Funct. Mater. 2023, 33, 2213675.
REER Award (CMMI1751605). [24] Open Source Battery Research Data, https://calce.umd.edu/
data#INR.
[25] D. D. Lee, H. S. Seung, Nature 1999, 401, 788.
[26] J. C. Liao, R. Boscolo, Y. Yang, L. M. Tran, C. Sabatti, V. P.
Conflict of Interest Roychowdhury, Proceedings of the National Academy of Sciences –
PNAS 2003, 100, 15522.
The corresponding author (A. M.) is a co-founder of TexPower, Inc., a start-
[27] K. Huang, N. D. Sidiropoulos, A. Swami, IEEE Trans. Acoust., Speech,
up company focusing on cobalt-free cathode materials for lithium-based
batteries. Signal Process. 2013, 62, 211.
[28] C. Boutsidis, E. Gallopoulos, Pattern Recognit 2008, 41, 1350.
[29] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O.
Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, the Journal
Data Availability Statement of machine Learning research 2011, 12, 2825.
[30] I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, Cam-
The data that support the findings of this study are available from the cor-
bridge, Massachusetts 2016.
responding author upon reasonable request.
[31] P. Smaragdis, S. Venkataramani, In 2017 IEEE International Confer-
ence on Acoustics, Speech and Signal Processing (ICASSP) 2017, 86–
90.
Keywords [32] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z.
Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in
batteries, charge curves, feature extraction, prediction, machine learning
pytorch 2017.
[33] R. Tibshirani, J R Stat Soc Series B Stat Methodol 1996, 58, 267.
Received: March 16, 2023 [34] S. R. Cikanek, K. E. Bailey, Proceedings of the 2002 American Control
Revised: May 31, 2023 Conference (IEEE Cat. No. CH37301) 2002, 4, 3129–3134.
Published online: July 2, 2023 [35] L. Su, J. L. Weaver, M. Groenenboom, N. Nakamura, E. Rus, P. Anand,
S. K. Jha, J. S. Okasinski, J. A. Dura, B. Reeja-Jayan, ACS Appl. Mater.
Interfaces 2021, 13, 9919.
[36] L. Su, J. Zhang, J. Huang, H. Ge, Z. Li, F. Xie, B. Y. Liaw, J. Power
[1] A. Manthiram, Nat. Commun. 2020, 11, 1550. Sources 2016, 315, 35.
[2] Q. Wang, B. Liu, Y. Shen, J. Wu, Z. Zhao, C. Zhong, W. Hu, Adv. Sci. [37] L. de Biasi, A. Schiele, M. Roca Ayats, G. Garcia, T. Brezesinski, P.
2021, 8, 2101111. Hartmann, J. Janek, ChemSusChem 2019, 12, 2240.
[3] H. Yaghoobnejad Asl, A. Manthiram, Nat. Sustain. 2021, 4, 379. [38] M. Dubarry, D. Anseán, Front. Energy Res. 2022, 10.
[4] T. Model, M. Reichweite, Tesla Model 3, Access October 28, 2022. [39] M. Dubarry, G. Baure, Electronics 2020, 9, 152.

Advanced Science - 2023 - Su

Uploaded by

Copyright:

Available Formats

Advanced Science - 2023 - Su

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Science - 2023 - Su

Uploaded by

Copyright:

Available Formats

RESEARCH ARTICLE

Battery Charge Curve Prediction via Feature Extraction and

of 46 cells.[4] The failure of one battery could

feature extracted from the training set.

Given m input segments with n points in total, we have m start-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.