Neural Network For SSCV Hydrodynamics
Neural Network For SSCV Hydrodynamics
Neural Network For SSCV Hydrodynamics
SSCV Hydrodynamics
A.J. Haenen
Neural Network for
SSCV
Hydrodynamics
A Study on the Potential of a Neural Network
Based Model in Predicting Hydrodynamic
Behavior of a Semi-Submersible Crane
Vessel
by
A.J. Haenen
to obtain the degree of Master of Science
at the Delft University of Technology,
to be defended on Sep 25, 2018
A.J. Haenen
Delft, September 2018
iii
Abstract
Heerema Marine Contractors (HMC) is a contractor in the international offshore oil, gas and
renewables industry. It is specialized in transporting, installing and removing large offshore
facilities. HMC operates three crane vessels. Two of which are semi-submersibles (Thialf and
Balder), the other is the monohull Aegir. A third semi-submersible, the Sleipnir, is currently
under construction.
To ensure safe operations, make accurate fatigue predictions and extend operational win-
dows, HMC relies on vessel motion calculations. Currently, vessel motion estimations are
based on response amplitude operators calculated by the diffraction software package WAMIT.
As HMC cannot rely on diffraction software in case of non-linear vessel motions, the need for
a method capable of capturing non-linear effects arises.
The goal of this study is to determine the potential of a neural network based model
in predicting hydrodynamic behavior of semi-submersible crane vessels. Hindcast weather
data, vessel motion measurements and model test data are used to train several different
neural network architectures. The research into the potential of neural networks in predict-
ing hydrodynamic behavior is split into two main categories: the frequency domain and the
time domain.
Within the frequency domain, wave forecasts can be used to predict a response spectrum.
The neural network in this case acts as a conventional RAO. In an artificial environment, four
architectures are tested and the results show that neural networks are able to make accurate
predictions in a fully linear environment. When tested on project data, where the vessel sails
at operational draft, the neural network predictions shows a slightly higher accuracy than the
diffraction based predictions for the specific test case. Another network is tested on transit
data, where the vessel sails at an inconvenient draft. The results from these tests show that
there is potential for a neural network to be used as a substitute for Response Amplitude
Operators.
The time domain models focus on predicting ship response based on surface height sig-
nals and/or hindcast vessel motion measurements. The first model is trained and tested
on model test data from an SSCV. The input of the neural network is surface height mea-
surements and the output is pitch motion prediction. The model shows that it is capable of
predicing both first and second order pitch motions. Another time domain model has MRU
roll measurements as input and it tries to predict the future 60 seconds of roll motion. Many
network topologies and optimizer settings are tested but none are capable of predicting fu-
ture motions.
v
List of Abbreviations
Adam Adaptive moment estimation
AI Artificial Intelligence
DP Dynamic Positioning
NN Neural Network
vii
List of Symbols
ix
x 0. List of Symbols
Abstract v
List of Abbreviations vii
List of Symbols ix
1 Introduction 1
2 Research Setup 3
2.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Secondary Research questions . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Research Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Literature Study 5
3.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Neural Networks in Offshore Engineering . . . . . . . . . . . . . . . . . . . . . . 5
4 Background 7
4.1 Established Method for Assessing Ship Motions . . . . . . . . . . . . . . . . . . . 7
4.1.1 RAOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.2 Limitations of diffraction software . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.1 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.2 Keras & TensorFlow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.3 Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.4 Activation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.5 Cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.6 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2.7 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.8 Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6 Frequency Domain 21
6.1 Artificial Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1.2 Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1.3 Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1.4 1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.1.5 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
xi
xii Contents
6.1.6 2D-SDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.1.7 Parametric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.1.8 Including ship parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Measurement Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3 Standard cases; project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.3.1 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.4 Non-standard cases; transit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4.1 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7 Time Domain 59
7.1 Wave ⇒ Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.1.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.1.2 Topology selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.1.4 Potential application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.2 Motion Hindcast ⇒ Future Motion . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.2.2 Topology selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8 Discussion 73
9 Conclusions 75
10 Recommendations 77
A Vector Fitting 79
A.0.1 Vector Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
1
2
Research Setup
3
4 2. Research Setup
Can a neural network based model be used as an alternative to an RAO in the time
domain?
Time domain predictions are better suited for neural network based models than high-
input, high-output calculations. Therefore, it is expected that a time domain model
can achieve better results than the frequency domain model. If this is indeed the case,
it might also be possible to use the time domain model to reconstruct an RAO. An extra
question is thus: Can an RAO be reconstructed using a working time domain model?.
Figure 3.1: image from ’A logical calculus of the ideas immanent in nervous activity’ [16]
5
6 3. Literature Study
relatively simple way to incorporate measurement data into future predictions and it factors
in all encountered contributions to strain responses automatically. However, this also reveals
one of the major drawbacks of neural networks. Major contributions to fatigue damage can
be phenomena that happen once every couple of years or even less frequent. This can fall be-
yond the scope of the measurement hindcast data and are thus not part of the training data
on which the network bases its predictions.
Fatigue damage monitoring systems could benefit from continuous time-domain anal-
ysis with neural network based models but these are often not viable yet because of a lack
of computational power [12]. If Moore’s law continues this would however become possible
within a few years. For now though, fatigue monitoring systems that are based on neural
networks are often still bound to the spectral approach.
One of the research fields applying neural networks within the offshore industry is wave
prediction. One example is the use of an ANN to provide an alternative to wave spectra that
use empirical relationships like Pierson-Moskowitz, Jonswap and Scotts. Decisions on which
spectrum to use can be quite subjective and often these spectra fail to generalize actual site
conditions [17]. ANNs can be a viable alternative in estimating these spectral shapes. The
same is true for nonlinear interaction models for wind wave spectra [27].
Like the field of wave spectra, other fields in the industry also rely heavily on statistical
methods. A 2011 paper provides a neural network based method as an alternative to the
Fokker-Planck approach [5]. The Fokker-Planck equation is a partial differential equation
that describes the time evolution of the probability density function of something under in-
fluence of forces.
Another possible application of neural networks within the offshore sector is workability
analysis. For example the coupling of a finite element method and an artificial neural network
to predict safe sea-states [29]. This has been done in a 2009 study. Here, a table of safe sea-
states was generated using a finite element dynamic analysis and sea-state predictions were
made by a neural network. These sea-state predictions used an input of previous significant
wave heights and peak periods of preceding consecutive 3-hour periods in addition to a few
fuzzy values that are related to the season.
4
Background
This chapter discussed the theoretical background of this thesis. As this research combines
the fields of ship hydrodynamics and artificial neural networks, it is necessary to provide
information for both.
4.1.1. RAOs
As mentioned, an RAO acts as a transfer function between a wave spectrum and a vessel re-
sponse spectrum. An RAO, like any transfer function, describes the linear relation between
the input and the output. The RAO is a function of frequency and heading. There are multiple
methods of assessing RAO’s for ships. CFD analysis and model tests can be used to calculate
RAOs accurately but these are often too expensive and time consuming. More common cal-
culation methods are based on linear potential theory. Within HMC, the diffraction software
package WAMIT is used to calculate the RAOs. WAMIT solves the boundary value problem
in terms of a complex potential describing the flow. Linearizing the problem enables the de-
composition of the velocity potential ϕ into the radiation and the diffraction components
[28][15].
Z ∞ Z 2π
S s (ω) = |R AO(ω, θ)|2 · S ζ (ω)d θd ω (4.1)
0 0
7
8 4. Background
The limitations of the diffraction software calculations at an inconvenient draft are caused
by the significant wave elevations on top of the pontoons. These are present in the calcula-
tions but physically they cannot exist. At 15m draft the water column above the pontoons is
3m which limits the wave height to only a few meters due to the depth limited wave height
[22]. Diffraction software however predicts wave elevations up to 9m for a 1m incident wave
[21]. This large wave height is caused by resonant motions in the fluid between the pontoons
and columns of the semi-submersible. This resonant effect can also occur in reality, however
the wave height is limited due to turbulence and wave breaking. As there is a linear relation-
ship between ship motion and wave height in the WAMIT, unrealistically high wave elevation
such as the 9m wave for a 1m incident wave causes large differences between reality and the
predictions based on diffraction software.
Another cause of the changes between reality and the predictions based on WAMIT is
because of forward speed effect. The 3D panel method being used in WAMIT does not include
forward speed. Instead, the assumption is made that the vessel response at forward speed is
the same as the response to encounter frequencies [7].
U ω2
ωe = ω − cos(φ) (4.2)
g
TensorFlow is an open source software library for high performance numerical computa-
tion. The library is developed within Google’s AI organization. It supports machine learning
and deep learning[26].
4.2.3. Nodes
An ANN is composed of nodes. Each node has a similar structure. As an input, it receives
the connection ’strengths’ from the nodes of the previous layer (with the input layer as an
exception). The output of each node is the result of two mathematical computations. First
is the summation, which is the sum of the product of all strengths and weights plus a certain
bias (Eq. 4.3). The weights are values held by each connection between nodes and the biases
are a constant that is held by each node.
X
Ii = w i j x j + bi (4.3)
j
The second computation is the transfer, which is the application of a transfer function.
In the machine learning field this transfer function is called the activation function. These
subsequent computations through the entire network is called forward propagation (Figure
4.2).
y i = f (I i ) (4.4)
1X n
L M SE (Yn , Y t ) = (Yn (i ) − Y t (i ))2 (4.5)
n i =1
The cost function or objective function is the value that flows back into the network via
backpropagation (Section 4.2.6) to update the weight and bias matrices. It can consist of a
single loss, but it can also be the sum of multiple losses in a batch (Section 4.2.8). Another
addition to the cost function can be a regularizer (Section 5.2.2).
The computation of the cost function is performed by the use of a forward-propagation
algorithm. This maps the parameters to the training loss L(Yn , Y t ). An example of such an
algorithm is provided (Algorithm 1). It is the forward propagation through a typical neural
network (Multilayer Perceptron) and the computation of the cost function.
4.2.6. Backpropagation
Using a neural network, information flows from the input layer to the output layer (forward
propagation). This produces a prediction, or in the case of training a cost scalar. In order
to train the network, the information has to flow back into the network. This is done by the
back-propagation algorithm (Algorithm 2). Backpropagation is a means of computing the
gradient of the cost function. This provides the gradient for the weight and bias matrices
which can subsequently be altered by an optimization algorithm.
12 4. Background
4.2.7. Optimization
The goal of the learning stage of the neural network is to minimize the cost function. Ap-
proaching the minimum of this function is done with a technique called gradient descent [2].
Suppose there is a function y = f (x). The derivative f 0 (x) gives the slope of f (x) at the point
x. It thus tells you how a small change in the input scales with the output. The derivative can
be used to minimize a function because it shows how to change x in order to lower y. This
method however stagnates when f 0 (x) = 0. This is the case in local minima, local maxima
and saddle points [6].
In deep learning, cost functions are very complex functions with many local minima and
very flat regions. Optimization can therefore be a challenge and usually one settles for a very
low local minimum instead of the global minimum (Figure 4.5).
Figure 4.5: (A) Global minimum, optimal solution; (B) local minimum, acceptable solution; (C) local minimum,
poor solution
4.2. Artificial Neural Networks 13
Three types of gradient descent variants exist. These types represent the trade-off be-
tween accuracy and computational costs [6].
In practice, people also refer to mini-batch gradient descent as stochastic gradient de-
scent (or SGD). With the original SGD having a mini-batch of size 1. Throughout the rest of
this thesis SGD refers to a vanilla mini-batch gradient descent. Meaning mini-batch gradient
descent without additions. Furthermore, batch size will be used instead of mini-batch size.
Vanilla gradient descent algorithms do not guarantee convergence to an acceptable local
minimum. Complex cost functions can contain sub-optimal local minima and saddle points
which are difficult for SGD to escape. And convergence is not always fast enough to avoid
overfitting. Numerous extensions to the vanilla gradient descent optimization algorithm are
used in machine learning. All have advantages and pitfalls. Two of which are discussed in
this section, these are the algorithms that proved to be useful in this project: Momentum and
Adam [23].
Momentum Common around local optima are ’ravines’. These are areas with a vastly differ-
ent steepness in one dimension than the others. Momentum is an addition to SGD that
helps to navigate through these ravines much faster (Figure 4.6) by adding a fraction of
the previous parameter update to the new one (Eq. 4.6). This may however result in
very high learning rates in some dimensions which can lead to overshooting.
v t = γv t −1 + η∇θ J (θ)
(4.6)
θ = θ − vt
Adam One of the most popular optimization algorithms for machine learning is Adam (Adap-
tive moment estimation) It computes adaptive learning rates for each parameter in a
more comprehensive way than momentum. As it stores an exponentially decaying av-
erage of both past gradients and past squared gradients (Eq. 4.7) [13]. The main down-
side to Adam is that the average and variance of each parameter is stored and therefore
increases the computational cost.
14 4. Background
m t = β1 m t −1 + (1 − β1 )g t
v t = β2 v t −1 + (1 − β2 )g t2
mt
m̂ t =
1 − β1t (4.7)
vt
v̂ t =
1 − β2t
η
θt −1 = θt − p m̂ t
v̂ t + ²
The default values for Adam (β1 = 0.9, β2 = 0.999, ² = 10−8 ) proposed by the original
paper result in a robust optimization algorithm leaving the learning rate and batch size
as the only hyper-parameters to be tuned.
Multilayer Perceptron The Multilayer Perceptron (MLP) is an ANN in it’s most simple and
common form. These networks can also be referred to as feedforward neural networks
or deep feedforward networks depending on the number of hidden layers. The work-
ings of this network are already discussed in chapter 4. Common applications for MLPs
are pattern classification, pattern matching, function approximation and nonlinear
mapping.
15
16 5. Methodology; Network Architecture and Training
Recurrent Neural Network The principle recurrent neural networks is that the model holds
information of previous computations. This ’memory’ can be useful if there is a strong
correlation between the previous computation(s) and the next one. For example this
can be true for time series prediction and translation models. Many different RNN
architectures exist but only two are mentioned in this thesis. The basic, fully recurrent
network, and the Long Short Term Memory (LSTM) network (Section 5.1.1).
In recurrent neural networks, every node in the hidden layers depends both on its input
as on its output of the previous timestep (Eq 5.1). Both the input and the memorized
output have individual corresponding weights. For a tanh activation function for ex-
ample the node output will be calculated as in equation 5.2.
h t = f (h t −1 , x t ) (5.1)
h t = tanh(w i j h h t −1 + w i j x x t ) (5.2)
Radial Bases function Neural Network A radial basis function (RBF), considered the distance
of a point with respect to the center. RBF networks typically consist of three layers. An
input layer containing a vector of real numbers. An output layer with a linear activation
containing a scalar function of the input vector. The hidden layer has a non-linear RBF
activation function.
The RBFs feature is that their response changes monotonically with distance from a
central point. Typical RBFs are the multi-quadratic (Eq 5.4) and the Gaussian function
(Eq 5.3).
(x − c)2
µ ¶
h(x) = exp − (5.3)
r2
p
r 2 + (x − c)2
h(x) = (5.4)
r
Radial basis functions can be seen as a type of activation function that can be employed
in any MLP. However they are often considered as their own type of network. In the
case that basis functions are able to move, change size and/or when the network has
multiple hidden layers, RBFs become nonlinear [20].
5.1.2. Approach
For MLPs, there are many potentially well-performing architectures ranging from shallow to
deep with all kinds of activation functions. Finding the right architecture for a particular
problem can be a challenge. This should be done using a structured approach while mini-
mizing the amount of influencing parameters.
Reducing the choice of activation functions significantly lowers the amount of network
architecture options. For this research, the four most commonly used activation functions
are considered: linear, sigmoid, tanh and ReLU. Not combining these different activation
functions has this same effect, with the exception of a linear input and output layer when
using one of the other activation functions. This is necessary because the output ranges of
these activation functions are limited and some input values can cause problems (Section
4.2.4).
The optimization of an LSTM based architecture is less tedious process. The topology of
the LSTM layer is bound by the amount of input parameters, there are standard activation
functions in the LSTM modules and although often a fully connected linear layer can be a
valuable addition, the addition of any extra layers is not likely to improve the network.
alter the topology of the network, others have to do with the training and optimization pro-
cess. The ones altering the topology that are used in this research are explained below:
Dropout This technique randomly leaves out a fraction of the network’s nodes (along with
it’s connections) for each training example. This can significantly reduce overfitting
[25]. In a Keras built network, dropout is implemented by introducing a dropout layer.
This layer can ’cut’ the connections of a certain amount of the next layer’s nodes.
Accuracy The accuracy of the input data, by which is meant the size of the steps in frequency
and heading, influences the potential quality of the output data. A high input resolu-
tion is therefore preferred. However the amount of parameters in the network often
increases exponentially with a higher resolution. A higher amount of parameters can
lead to a less efficient training process and thus to overfitting.
Besides network changes, choices made in the training process also have a major influ-
ence on chance of overfitting the network on the training data. It all comes down to the
simple reasoning that more epochs result in a higher chance at overfitting. Reducing the
amount of epochs is achieved by increasing the rate of convergence per epoch. This can be
done by either having a more efficient training process or somehow increase the amount of
training data. The methods used in this research to reduce the amount of epochs are listed
and described below.
20 5. Methodology; Network Architecture and Training
Validation set A validation set is a part (usually about 10%) of the training data set that is
not used for updating the network parameters. Instead it is used to evaluate network
performance every epoch. The comparison between training loss and validation loss
is used to diagnose overfitting. In general, it is best to stop network training when the
validation loss stops decreasing. This can however be a pitfall because this can also
happen when a network is (temporarily) stuck on a saddle point of the cost function.
Learning rate The learning rate is the most influential parameter in the optimization algo-
rithm. Taking larger steps while updating the weights can increase the rate of conver-
gence. However this can be done up until a certain point. Too high a learning rate
will cause overshooting of the local minimum and will halt convergence or even cause
divergence.
Optimizer There is no single best optimizer for training a neural network. Often there is a
trade-off between robustness, tedious hyper-parameter optimization and likeliness of
overfitting. In general, SGD and Adam are good choices for network training and so
these are the ones used for this project. Like the network architecture, it is best to limit
your own choices of optimizers as it takes too much time to try everything and there
are no reliable rules of thumb in this field.
6
Frequency Domain
The purpose of the research in the frequency domain is to find out whether a neural network
can function as an alternative for an RAO. Either close to it’s current form with a network
without hidden layers or in alternative form.
Time Time
Hs Significant wave height (Wind, Swell, Total)
Hsdir Heading (Wind, Swell, Total)
Tz Zero crossing period (Wind, Swell, Total)
Tm Mean Period (Wind, Swell, Total)
Tp Peak Period (Wind, Swell, Total)
Vw Wind speed
Vwdir Wind direction
s3d Full polar wave spectrum
6.1.2. Architectures
The aim for the frequency spectrum model is to provide an alternative to the RAOs calculated
by WAMIT. It would make sense then that the input would be a 2D (frequency and heading)
wave spectrum and that the output would be a response spectrum for a certain degree of
freedom. This makes the network a direct substitute for an RAO. However this is not the only
21
22 6. Frequency Domain
6.1.3. Cases
The main goal for the artificial environment is the exploration of topologies and hyper-parameters.
The liberty that comes with unrestricted amount of data with an adjustable parameters ranges
allows for more insight driven exploration of these topologies and hyper-parameters. In aid
of gaining more insight, different ’cases’ are provided to the model. These different cases are
not used in any way in optimizing the network topology or hyper-parameters, they serve only
as a means to determine network capabilities when confronted with different datasets. The
following six cases are used in the artificial environment:
Standard The standard case for the neural network is a dataset composed of 80% training
data and 20% test data (Figures 6.1 & 6.5). Sea state parameters H s [m] and T p [s] range
from 0 − 8 and 1 − 20 respectively and the main heading is random. The topology of the
network and the order of magnitude for the hyper-parameters is based on this case.
Interpolation A lack of overlap of training data and test data can occur in real-world appli-
cations of the network. The interpolation case is used to assess the networks ability to
interpolate. The dataset (Figures 6.3 & 6.7) is composed of 80% training data and 20%
test data. For the test data set, sea state parameters H s [m] and T p [s] range from 4 − 5
and 9 − 11 respectively. For the training data set, sea state parameters H s [m] and T p [s]
range from 2 − 7 and 7 − 13 respectively, excluding the ranges 4 − 5 for H s and 9 − 11 for
T p . The main headings are random.
Extrapolation A lack of overlap of training data and test data can occur in real-world appli-
cations of the network. The extrapolation case is used to assess the networks ability to
extrapolate. The dataset (Figures 6.2 & 6.6) is composed of 80% training data and 20%
test data. For the training data set, sea state parameters H s [m] and T p [s] range from
4 − 5 and 9 − 11 respectively. For the test data set, sea state parameters H s [m] and T p [s]
range from 2 − 7 and 7 − 13 respectively, excluding the ranges 4 − 5 for H s and 9 − 11 for
T p . The main headings are random.
Scarce To assess the functionality of the network when trained on a limited number of dat-
apoints (Figures 6.4 & 6.8). The sea state parameters H s [m] and T p [s] range from 0 − 8
and 1 − 20 respectively. The main headings are random.
6.1. Artificial Environment 23
Spectral wave climate same location In an effort to bring the artificial environment closer
to reality, the spectral wave climate of the North sea from the Argoss database is used
to generate the dataset. This provides a realistic spreading in H s and T p . The main
headings are random.
Spectral wave climate different location The second case based on Argoss data is used to
assess the performance of a network trained on a specific spectral wave climate in mak-
ing predictions for a spectral wave climate of a different environmental area. The two
data sets used for this case are North Sea (Snorre) and Trinidad (Cassia). H s and T p are
plotted in Figures 6.9 & 6.10). Heading is randomized.
Figure 6.1: Standard case with 100 datapoints Figure 6.2: Extrapolation case with 100 datapoints
Figure 6.3: Interpolation case with 100 datapoints Figure 6.4: Scarce case with 20 datapoints
24 6. Frequency Domain
Figure 6.5: Standard case with 100 datapoints Figure 6.6: Extrapolation case with 100 datapoints
Figure 6.7: Interpolation case with 100 datapoints Figure 6.8: Scarce case with 20 datapoints
Figure 6.9: Snorre, 1000 datapoints Figure 6.10: Cassia, 1000 datapoints
6.1.4. 1D
The 1D model is only used for the artificial environment as there is a lack of measurement
data for this model. Model tests and possibly sea states that are almost unidirectional are the
only kinds of useful data for this model. It is of course able to make predictions in an artificial
environment. Also, the relatively low amount of input parameters causes the weight and bias
matrices to be small enough to be assessable. This makes it possible to gain some insight in
network behavior.
The 1D model in the artificial environment has to be capable of a fairly simple compu-
tation; a linear transfer function for all corresponding frequencies. A neural network for this
application can be designed by hand and would not require machine learning. It is however
a nice introductory exercise in the method and it can still answer some questions regarding
6.1. Artificial Environment 25
network behavior. Because of the simplicity of the computation, the number of potential
topologies is fairly low: Either a network without hidden layers and a linear activation func-
tion, or a network with one hidden layer and two linear activation functions. The second
option will also require a search for the optimal hidden layer size. The network without a hid-
den layer would have the RAO on the diagonal of its weight matrix in the case that the input
and output layer have the same size and represent the same frequency bins (Figure 6.11). The
network with one hidden layer on the other hand, is a universal approximator (Appendix B).
This enables the network to represent the RAO as well. While both topology types are able
to represent the necessary computation, performance can still differ in the form of optimizer
performance.
Standard case
The general methodology of finding the optimal topology and optimizer for this particular
model is as follows. First, the a small dataset of 100 training/testing examples is used to find
the right setting, and then a larger dataset of 1000 training/testing examples is used to assess
the model. Initially, using the smaller dataset, ten topology options are tested with a robust
optimizer (Adam, with default hyper-parameters). These topology options are displayed in
Table 6.2.
Option 1 2 3 4 5 6 7 8 9 10
Number of layers 2 3 3 3 3 3 3 3 3 3
Hidden layer size n.v.t. 4 8 16 32 64 128 256 512 1024
Activation Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear
Batchsize 5 5 5 5 5 5 5 5 5 5
Epochs 100 100 100 100 100 100 100 100 100 100
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam
None of these topology options resulted in satisfying results within a reasonable amount
of epochs. The results of the best performing topology from this series are shown in Figure
6.12. It appears that non-corresponding frequencies still get excited by these networks. A pos-
sible reason for this is that the learning rate is too high and therefore will keep overshooting
26 6. Frequency Domain
the desired value of 0 (instead of a random value between 0 and 1) for biases and unrelated
weights. A lower learning rate would however result is much lower convergence rates.
Alternative solutions for this problem are intializing the weights and biases at 0 and/or
restrict the value ranges of the weights and biases. Because the first of these solutions is less
invasive, this is the preferred option. All topology options from Table 6.2 are tested again with
weights and biases initialized at 0.
Tests with the initialized parameters show a clear pattern shown in Figure 6.13. The trend
seen in this comparison is faster and further convergence with an increasing amount of hid-
den nodes. With the topology without a hidden layer (model 0010) deviating from this trend.
All topologies are still converging at 100 epochs but not all show a sufficient rate of conver-
gence and show a clear limit to their maximum capability. This is true for the models with no,
or a low (4, 8 and 16) amount of hidden nodes.
6.1. Artificial Environment 27
The next step; applying the neural network with topology option 10 to a larger (1000 ex-
amples) dataset, shows similar results (Figure 6.15. The model loss curve suggests that the
learning rate is too high after about 10 epochs as it starts to oscillate heavily. The network
still has some problems when dealing with low SDA spectra as can be seen in the SDA plot.
The reason for this is that, even though the parameters are initialized at 0, small values are
assigned to irrelevant parameters by the optimizer. In low SDA cases these small values are
significant and cause inaccurate results.
28 6. Frequency Domain
Alternative cases
The topology found for the standard case is now used to evaluate the models extrapolation
and interpolation capabilities, performance on a low amount of data and data that is more
representative of the real world conditions. The specifics of these data sets are discussed
earlier in Section 6.1.3. To avoid the convergence issues seen in the model loss of Figure 6.15,
a small (100 examples) dataset is used for the alternative cases. With the obvious exception
of the scarce case.
The interpolation case depicted in Figure 6.16a shows a near perfect match between the
Neural network SDA and the diffraction based calculations. As the more extreme values in
the training set cover a larger part of te frequency domain than the ones in the test set, this is
no surprise. The extrapolation case on the other hand shows some deviating results because
of the training set not containing values in the entire frequency domain, but most of the test
examples are on the diagonal.
In the results from the scarce case (Figure 6.16c), the validation loss curve is converging
well but a higher learning rate or increasing the amount of epochs both result in overfitting on
the small dataset. Still, with this very short learning process, the right trend is clearly visible
in the SDA plot. This shows that such a simple neural network can be trained on a very low
amount of data.
6.1. Artificial Environment 29
6.1.5. 2D
For the 2D model, there is a significant increase in the amount of nodes of the input layer. The
size of the input layer in this model is equal to the amount of frequencies and the amount of
headings used to describe the wave spectrum. For a wave spectrum described by bins of a
0.05 r ad /s frequency step and a 5 ◦ heading step, the amount of input nodes is 81 · 73 = 5913.
With a large input layer, the amount of parameters in the network can easily skyrocket when
increasing the size of the hidden layer(s). Because this can lead to long training times, search-
ing for the right architecture is best not done with the entire training set. While a network
designed for a small data set may not be the optimal architecture for a case where a large
amount of data is available, it is still a lot more robust than other methods of training time
reduction like reducing the amount of epochs or increasing batch size.
30 6. Frequency Domain
standard case
The success of the shallow network for the 1D model implies that this would also be a good
option for the 2D case. As it is the same type of computation; a linear transfer between each
heading-frequency bin. A threat for this topology however is that it is possible that not all bins
get excited in the training examples. This means that if the initial value of all parameters is
0, this will still be (close to) 0 after the training process for the un-excited heading-frequency
combinations.
The same topology options are tested as in the 1D case. With random initial values and
with the parameters initialized at 0.
Results displayed in Figure 6.17 show a similarity with the 1D case. Initializing the net-
work parameters with 0 values significantly improves their convergence. However, in Figure
6.17b, the networks with a high amount of nodes do not converge as far as the other net-
works. Instead their loss curves start to oscillate, but not diverge. This indicates optimizer
issues and therefore, variations in optimizer and optimizer settings are applied. The varia-
tions are: Adam with lr = 1e-5, SGD with lr = 1 and SGD with lr = 10. Figure 6.18 shows the
comparison of these optimizer settings.
6.1. Artificial Environment 31
(c) SGD lr = 1
(d) SGD lr = 10
Figure 6.18: 2D, topology options 1-8, initial parameter value: 0 , optimizer variations
Validation loss curves
32 6. Frequency Domain
The optimizer with the best results is Adam with a learning rate ten times lower than
the default value (Figure 6.18b). This also results in the correlation between hidden layer
size and network loss; just like with the 1D model, a higher amount of hidden nodes results
in a higher accuracy. The high amount of hidden nodes, in combination with a large input
layer, causes a very high amount of network nodes. For example, the amount of network
parameters in a model with one hidden layer consisting of 1024 nodes is 6,138,961 (Equation
6.1). The consequence of this higher number of nodes is that the sum of all parameters is
spread out over more weights and biases while the sum itself does not scale linearly with the
amount of parameters. As the average value of network parameters decreases, the optimizer
requires lower stepsizes when approaching a minimum. This explains why the default values
in Figure 6.17b cause oscillatory behavior.
n θ = n W1 + n B 1 + n Wo + n B o
(6.1)
6, 138, 961 = 5913 · 1024 + 1024 + 1024 · 81 + 81
For the final model, the network with the highest amount of network nodes is chosen.
The optimizer is Adam with a lower learning rate (lr=1e-5). The network is trained and tested
separately on both a 100 example dataset and a 1000 example dataset. The validation curve of
the smaller dataset (Figure 6.19a) stops converging before the training process is done while
the training loss curve still converges. This shows that the network is overfitting on the train-
ing dataset. Using the larger dataset, overfitting issue is solved and the SDA plot shows good
results. However, the validation curve is oscillating which indicates that the learning rate is
too high for a loss below 10−4 . The consequence of this can be clearly seen in the ’Worst
Prediction’ plot; the learning rate is too high to deal with very small values. This issue can
be mitigated by lowering the learning rate after a certain amount of epochs, or using learn-
ing rate decay which gradually lowers the learning rate. Because this is only an issue for low
SDA sea-states and the general trend of the SDA curve shows a good match with the WAMIT
results, the model is considered satisfactory and no further changes are made to the network.
6.1. Artificial Environment 33
Alternative cases
Training and testing the network on the alternative cases shows the same results as with the
1D network; the interpolation case shows near perfect results, while the extrapolation case
shows less good, but still acceptable results. In this case, the network especially has trouble
making predictions in low SDA sea-states.
For the scarce case the results are also similar to the 1D case. The general trend of the test
examples is good, but the validation curve shows clear signs of overfitting.
The network trained and tested on the North Sea data shows good results, as can be ex-
pected from the standard case results. However the network does show a lot of fluctuations
on the loss curves. In an attempt to mitigate this, the network trained on the North Sea data
and tested on the Trinidad data is trained in larger batches (50 training examples) and more
epochs (10,000). The network results show that the loss curves only have very limited fluctu-
ations and the SDA plot results follow the right trend, although this dataset seems to be much
harder to predict than the extrapolation case.
6.1. Artificial Environment 35
6.1.6. 2D-SDA
The 2D-SDA model aims to predict only a single value which represents the entire response
spectrum. The computation that this model should represent consists of a linear transfer
function per heading-frequency bin and a form of integration. As the linear transfer was best
represented by two linear activation functions, one would expect that this network should
have at least one extra layer to represent the integration. However it is possible that the net-
work can integrate both computations in the first two layers.
The first topology options that are tested vary in network depth and hidden layer size. The
first twelve linearly activated networks with 1-3 hidden layers and 32-256 nodes per hidden
layer as can be seen in Table 6.3.
Option 1 2 3 4 5 6 7 8 9 10 11 12
Number of layers 4 4 4 4 3 3 3 3 5 5 5 5
Hidden layer size 32 64 128 256 32 64 128 256 32 64 128 256
Activation Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear
Batchsize 5 5 5 5 5 5 5 5 5 5 5 5
Epochs 100 100 100 100 100 100 100 100 100 100 100 100
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam
Figure 6.21: Initial variations in layer size and network depth, validation loss curves
Out of the tests of the first twelve topology options (Figure 6.21), a few conclusions can
be drawn. The first is that the results get better with each added hidden layer. In the next
topology variations, more layers need to be added to see whether this trend will continue.
6.1. Artificial Environment 37
The second conclusion is that this model is prone to overfitting and that this effect increases
with a higher amount of nodes. Option 12, with 256 nodes per layer, does however reach
the lowest loss. Therefore there seems to be a trade-off between rate of convergence and
robustness (lower chance of overfitting).
The reason for this overfitting is related to the network output and the size of the training
dataset. Because only the SDA is predicted, and the training set, excluding the validation
data is 72 cases, the network can settle for learning these 72 values instead of the physical
relationship between the input and output data. As network capability (amount of nodes)
increases, the network becomes more able to memorize these 72 values.
The next set of topology options explores the trend seen in the first 12 options: higher net-
work depth leads to better convergence. Because of the overfitting issue, all depth variations
are tested with both 32 and 256 nodes per layer (Table 6.4).
Option 13 14 15 16 17 18 19 20 21 22
Number of layers 6 7 8 9 10 6 7 8 9 10
Hidden layer size 32 32 32 32 32 256 256 256 256 256
Activation Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear
Batchsize 5 5 5 5 5 5 5 5 5 5
Epochs 100 100 100 100 100 100 100 100 100 100
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam
Figure 6.22 shows that the overfitting issue also exists with the more narrow networks.
Overfitting does happen after more epochs, but the wider networks achieve are still con-
verged further at the moment that overfitting starts. There does seem to be a positive effect
of a higher network depth, but it is hard to conclude because of oscillatory behavior. For the
next set of tests, the learning rate is lowered in an effort to mitigate these oscillations in the
validation loss curve. In this next set of tests the topologies shown in Table 6.5.
38 6. Frequency Domain
Option 23 24 25 26 27 28 29 30
Number of layers 6 7 8 9 10 15 20 25
Hidden layer size 256 256 256 256 256 256 256 256
Activation Linear Linear Linear Linear Linear Linear Linear Linear
Batchsize 5 5 5 5 5 5 5 5
Epochs 100 100 100 100 100 100 100 100
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam
Learning rate 1e-5 1e-5 1e-5 1e-5 1e-5 1e-5 1e-5 1e-5
The correlation between network performance and network depth is clear in Figure 6.23.
This trend halts at about 20 nodes and increasing network depth from that point on does not
lead to further convergence.
A final effort to increase network convergence includes different activation functions. The
sigmoid, ReLU and Tanh activation functions are tested. Besides these different activation
functions, a narrow network with 20 layers is also tested. Learning rates are adjusted for
the narrow network because the amount of parameters in this network is much lower. For
the ReLU activation function, learning rate is also adjusted, preliminary tests showed much
better convergence with a higher learning rate.
Option 32 33 34 35
Number of layers 20 20 20 20
Hidden layer size 32 256 256 256
Activation Linear Sigmoid ReLU TanH
Batchsize 5 5 5 5
Epochs 100 100 100 100
Optimizer Adam Adam Adam Adam
Learning rate 1e-4 1e-5 1e-3 1e-5
(b) Zoomed in
The narrow network of 32 nodes per layer performs as well as the wide network. Where
the narrow network only has 207,000 network parameters and the wide network 2,6 million.
Increasing the amount of nodes per layer greatly increases the computational cost with only
a small gain in accuracy.
Sigmoid and ReLU activation functions are clearly not able to represent the computations
of this particular model. TanH on the other hand, performed slightly better than the linear
activation functions and therefore shall be used on the bigger dataset and for the alternative
cases.
40 6. Frequency Domain
Figure 6.25 shows that the network is capable of imitating the two computations dis-
cussed at the beginning of this chapter. There is however an issue with sea-states that have
an SDA of (nearly) 0. Here, the network outputs negative values in some cases. This could be
solved by implementing constraints.
As the 2D model with the full response spectrum as an output gives satisfactory results
for the artificial environment. The 2D-SDA model, which contains a lot less information in
it’s output, is essentially not a useful addition. However should the results of the 2D model
be disappointing when dealing with real measurement data, this model type is a potential
alternative.
Alternative cases
The network, with it’s topology designed for the standard case, is also tested on the alternative
cases: interpolation, extrapolation and scarce. The results are presented in Figure 6.26. In
the interpolation case, with the network trained on high and low sea-states with high and
low peak periods, the model performs well. With the absence of very low sea-states, the only
downside seen in the standard case is also gone. The model performs less well when the
extrapolation case is applied. But the general trend is still clear. The network is thus capable
of making somewhat accurate predictions in sea-states it has not encountered before.
As encountered earlier in topology optimization, the single output parameter makes this
network prone to overfitting. The difference in trend of the training loss and validation loss
curves in Figure 6.26c also indicated overfitting and explains the outliers in the SDA plot.
6.1. Artificial Environment 41
6.1.7. Parametric
One of the issues with the 2D model is the amount of input and output nodes. Too many
nodes will cause training difficulties that can only be overcome by more data (and in some
cases time). The amount of data however is limited. Decreasing the amount of input nodes
will lower the accuracy and reducing the amount of output nodes reduces the resolution.
In some applications, the vessel motions are not represented by the entire frequency re-
sponse spectrum but by the Significant Double Amplitude (SDA). This is a value that rep-
resents the entire spectrum in one integrated value. This value of course holds less infor-
mation and it is possible for two spectra to have the same SDA, but it many practical ap-
plications it is a useful measure. As for the input spectrum, the wave spectrum can also be
represented by a few variables instead of the entire spectrum in all heading and frequency
components. A simple spectrum can be represented by the significant wave height (H s), the
peak period (T p) and the main wave direction (Θ). More complex spectra can be represented
by multiple sets of these values. For example one set for swell, and two sets for wind waves:
42 6. Frequency Domain
H s s , T p s , Θs , H s 1 , T p 1 , Θ1 , H s 2 , T p 2 , Θ2 ).
Using this ’parametric’ model, the amount of nodes can be decreased dramatically. Only
the question arises: does this limited amount variables hold enough information to describe
the complex nature of the SDA value?
The validation loss curves in Figure 6.27 show faster convergence with an increase in the
amount of nodes per layer. But it does not show further convergence. To keep training time
low, further tests are done with 32 nodes per hidden layer. In a later stage, optimization
through layer size will be applied again. In an effort to increase the sudden fluctuations in
the validation loss curves, learning rate is decreased to 1e-5.
Figure 6.28 shows that adding more layers, has the same effect as adding more nodes
per layer; the rate of convergence is higher, but all loss curves stop decreasing at the same
loss value. This means that either the limit of correlation between input and output has been
reached or a network consisting of linear activations is not capable of finding this correlation.
This is determined by performing tests with sigmoid, ReLU and TanH activations.
6.1. Artificial Environment 43
(a)
(b) Zoomed in
Variations in activation functions show that the sigmoid and ReLU functions stop con-
verging fairly quick. This is due to the biases taking over as the dominant parameter. There-
fore, all predictions are the same no matter the network input. The TanH function on the
other hand is a slight improvement to the linear activation. As different activation functions
can react differently to layer sizes, the TanH function is tested with 256 nodes per layer in
Figure 6.30.
44 6. Frequency Domain
As the increase of layer size did not increase network performance, the layer size of 32
nodes is used for the final network topology. The network is used on the 1000 example
dataset. The result of which is shown in Figure 6.31a. The SDA plot shows a clear trend
around the diagonal, but with a large spread. Because the validation curve is still converg-
ing, an effort is made to increase convergence. Increasing learning rate however, results in
divergence and the network training time is at the maximum that is still workable. Increasing
the amount of epochs therefore is only acceptable if the batch size is also increased. This
training process is presented in Figure 6.31b. The result of this effort is a good match in the
SDA plot, but a learning process that does not seem to be very robust. Increasing robustness
of this training process can be achieved by tuning optimizer hyper-parameters or using a dif-
ferent optimizer altogether. As the SDA plot results are satisfactory, no further changes are
made in the optimizer or the network.
The SDA plot shows that the model is capable of adapting to only knowing three param-
eters instead of the full wave spectrum in heading-frequency bins. This does not necessarily
mean that it is capable of this to the same extend when dealing with real world predictions
and measurements. For the artificial environment, a random set of wave spectrum parame-
ters is used to create a 2D spectrum using the Ochi-Hubble method. In reality, wave spectra
can not be described with the same simplicity. However the method does show promise as
an alternative to the 2D method if it should not work with real world data.
(a) (b)
Figure 6.31: Standard case, 1000 training examples, validation loss curves
6.1. Artificial Environment 45
Alternative cases
In the alternative cases, this model shows similar behavior than the other architectures, but
a little more extreme. The interpolation case yields good results and shows a near-perfect
SDA plot. The extrapolation case and scarce case on the other hand show less good results. It
shows that this model is not the most robust and is sensitive to large input variations because
of the low amount of input parameters. The real world cases show similar network perfor-
mance as the extrapolation case. Unlike with the 2D model, where the Snorre-Cassia case
showed much less convergence than the extrapolation case.
46 6. Frequency Domain
Linear Network This network is established by simply adding an extra dimension to the in-
put layer. This way, the network is able of using the same simple topology as before.
However, the size of the input layer is now dependent on the resolution of θ, ω and D.
Adding this extra dimension to the already sizable input layer, can lead to unworkable
sizes.
Another disadvantage of this topology is the limited predictive capability. As the net-
work is just filling in a very extensive RAO. It is likely that it will not perform well in
sea-states that it has not encountered yet.
Multiple Linear Networks As only a few vessel drafts are common, it is also a possibility to
have separate networks for each draft or each set of ship parameters. This way, the
network cannot be used in predicting situations in unfamiliar drafts due to the lack of
data, but it can become very accurate in predicting motion behavior in common vessel
drafts.
Parametric The implementation of draft or other ship parameters can lead to large input
sizes. Using only a few parameters to describe the wave spectrum avoids this issue.
The downside is that there is much less detail in the spectrum and some wind swell
spectral contributions could be completely lost.
Projects
The project data is used to train the network on the ’standard’ case where the vessel sails at
operational draft. Roll motion MRU measurement data is taken from two projects: ’Kaombo’
and ’Bigfoot’. For the training data, the Kaombo project is used, which is a project near the
coast of Angola (Figure 6.33a). It totals 822 training examples. For the test data, Bigfoot data
is used. Bigfoot is a project in the Gulf of Mexico. It totals 276 examples.
48 6. Frequency Domain
(b) Bigfoot
(a) Kaombo-B32
Transit
For the non-standard case, the dataset consists of transit data of the Balder from 2010-2011.
Like the the projects, roll motion MRU measurements are used. The main differences in ship
parameters compared to the projects are a shallow draft and forward speed. In Figure 6.34,
the three transits of that period are plotted. The dataset is randomly seperated into test-
data and training data. The training dataset is composed of 563 training examples and a test
dataset composed of 161 examples. This amounts to roughly 70 + 20 = 90 days worth of data.
bin, may perform better on training data wherein a relatively small part of the 2D spectrum
is covered. The downside to this is that there is a huge amount of possible topologies for this
model. The restrictions on the considered topologies discussed in Section 5.1.2 are applied
for pragmatic reasons.
Nevertheless, the first set of tests is performed on single-hidden layer topologies with an
increase in the amount of nodes. An overview is depicted in Table 6.7.
Option 1 2 3 4 5 6 7
Number of layers 3 3 3 3 3 3 3
Hidden layer size 16 32 64 128 256 512 1024
Activation Linear Linear Linear Linear Linear Linear Linear
Batchsize 5 5 5 5 5 5 5
Epochs 100 100 100 100 100 100 100
Optimizer Adam Adam Adam Adam Adam Adam Adam
The validation loss curves in Figure 6.35 show a clear trend of a higher rate of convergence
with a higher amount of nodes per layer. Since this trend slows down significantly after 128
nodes per layer, this becomes the amount of nodes for the next series of topology options.
Notice that a higher rate of convergence does not lead to further convergence because all
validation loss curves eventually settle on the same asymptote.
The next series of topology options is exploring the effect of network depth on the valida-
tion loss. This series, consisting of option 8-17, is depicted in Table 6.8.
50 6. Frequency Domain
Option 8 9 10 11 12 13 14 15 16 17
Number of layers 4 5 6 7 8 9 10 15 20 25
Hidden layer size 128 128 128 128 128 128 128 128 128 128
Activation Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear
Batchsize 5 5 5 5 5 5 5 5 5 5
Epochs 100 100 100 100 100 100 100 100 100 100
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam
The effects of network depth are limited as can be seen in Figure 6.36a. The curve with
the highest rate of convergence is model 6, which has only the one single layer. Because of the
high oscillatory behavior, the same series 8-17 is run with a lower learning rate. This yielded
the same result (Figure 6.36b). Therefore, A single layer neural network is trained on the full
dataset of the Kaombo project.
6.3. Standard cases; project 51
Using the model on the entire dataset however shows overfitting however (Figure 6.37,
blue curve). Overfitting mitigation measures, as described in Section 5.2.2 are applied in an
attempt to reduce overfitting. The red curve in the graph represents the validation loss curve
of the model with dropout and weight regularization and overfitting is indeed stopped. The
results of this model are shown in Figure 6.38.
The training and validation loss curves both show that the network is converging. this,
together with the absence of large fluctuations show that the network is robust and no over-
fitting takes place. The SDA plot shows a spread around the y(x) = x line. The tests in the
artificial environments show that a neural network is capable of making near perfect predic-
tions in a fully linear environment. The spread is therefore caused by uncertainties of weather
data, vessel manoeuvres, deck activity and non-linear hydrodynamic effects.
The results from the SDA plot show that the neural networks predictions have a similar ac-
curacy as the diffraction based predictions. However, it is hard to draw conclusions from the
SDA plot with different dots overlapping. Therefore the difference between the measurement
SDAs and the predicted SDAs are depicted in Figures 6.39a & 6.39a by means of the contours
of the distribution. The plots show that the neural network has a tendency to overpredict the
SDA value but has fewer outliers than the diffraction method predictions.
In order to quantify the predictive capabilities of both the neural network and the diffrac-
tion method, the mean squared error (Equation 6.2) and a normalized mean squared error
(Equation 6.4) are calculated. The results of which is depicted in Table 6.9 and show that the
neural network performs better according to both measures.
1 X N
M SE (A, B) = (Ai − Bi )2 (6.2)
N i =1
1 X N µ A − B ¶2
i i
N M SE (A, B) = (6.3)
N i =1 Ai
52 6. Frequency Domain
Option 1 2 3 4 5 6 7 8 9 10 11 12
Number of layers 3 3 3 3 3 3 3 3 3 3 3 3
Hidden layer size 16 32 64 128 256 512 16 32 64 128 256 512
Activation Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam
Learning rate 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002
The validation loss curves of the first twelve topology variations show that the network
with 256 nodes and the learning rate of 0.0002 convergences the furthest. However the val-
idation loss curve contains high peaks as the rate of convergence decreases This indicates
that the learning rate is too high. A learning rate decay may solve this problem. The network
54 6. Frequency Domain
containing 512 nodes diverges in an early stage, which might be by the same cause. There-
fore, both these layer sizes are used in the next batch of topology variations. The following
variations are displayed in Table 6.11 and contain different decay values.
Option 13 14 15 16 17 18 19 20
Number of layers 3 3 3 3 3 3 3 3
Hidden layer size 256 512 256 512 256 512 256 512
Activation Linear Linear Linear Linear Linear Linear Linear Linear
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam
Learning rate 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002
Decay 1e-4 1e-4 1e-5 1e-5 1e-6 1e-6 1e-7 1e-7
In Figure 6.42, the new topology variations are compared. Although learning rate decay
is able to reduce fluctuation of the loss curve in some cases, non of the networks containing
decay converge further than the model without decay. Another method of increasing conver-
gence has to be found. In the next batch of variations, multiple network additions are tested:
dropout, regularization and parameter constraints (all are discussed in Chapter 5). All tested
variations are shown in Table 6.12.
Option 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Number of layers 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Hidden layer size 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256
Activation Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam
Learning rate 0.0002 0.0003 0.0003 0.0004 0.0003 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0003 0.0003 0.0004 0.0003
Decay 1e-5
Dropout 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
Weights regularizer 1e-6 1e-7 5e-8 5e-8 5e-8 5e-8 2e-8 1e8 5e-9 1e-9 1e-8
Weights constraint >0 >0 >0 >0 >0 >0 >0
Bias constraint =0 =0 =0 =0 =0 =0 =0 =0 =0 =0 =0
(a)
(b)
The validation loss curves are displayed in Figure 6.43a and for readability reasons, the
best performing networks are again shown in Figure 6.43b. A combination of all introduced
measures (network 30) results in the best network performance.
As a network with only a single hidden layer has only limited capability in adapting to
non-linear physical phenomena, a batch with an increasing number of hidden layers is tested
along with an additional network with TanH activation functions. This batch of topology
variations is depicted in Table 6.13.
Option 37 38 39 40 41 42 43 44 45 46 47
Number of layers 4 5 6 7 8 9 10 15 20 25 25
Hidden layer size 256 256 256 256 256 256 256 256 256 256 256
Activation Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear TanH
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam
Learning rate 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
Dropout 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
As shown in Figure 6.44, extending the network with more layers does not lead to better
predictions. Therefore, no further changes are made to the network. The single hidden layer
neural network with measures to increase robustness and decrease the chances of overfitting
is used to make predictions on the test data set. Serving as a benchmark are predictions
made by means of an identified RAO (Section 4.1.2). A acceleration RAO, available at HMC,
was originally calibrated on roll accelerations. For the purpose of serving as a benchmark for
the neural network, it is transformed to a motion RAO by dividing its amplitude components
by the squared frequency.
The results are shown in Figure 6.45. Figures 6.46a & 6.46b show the SDA difference distri-
butions. The majority of the test examples show a good trend on the diagonal of the SDA plot.
However, a part of the test examples is significantly under-predicted. A part of the spread
around the y(x) = x line can be ascribed to the same causes as in the project data tests: un-
certainties of weather data, vessel manoeuvres, deck activity and non-linear hydrodynamic
effects. The main two differences in the circumstances compared to the project data are a
shallow draft and forward speed. Extra non-linearities are hereby introduced. The forward
speed also changes the encounter frequency of the incoming waves. As the network input
(a 2D wave spectrum) is adjusted for heading but not for forward speed, the network will
adapt to the mean forward speed (or a speed close to this). But the network is not capable of
adjusting for changes in forward speed as it is not provided as an input.
The neural network predictions show a closer match with the measurements than those
of the identified RAO. To quantify this, the mean squared error (Eq. 6.4) and the normalized
mean squared error (Eq. 6.5) are calculated. The NMSE is modified to limit the effect of the
very low SDAs by dividing by a value of at least 0.1. The calculated values are depicted in
Table 6.14.
The test for the non-standard case/transit case show that there is potential for a neural
network to act as an RAO when sailing at inconvenient draft, for the vast majority of the test
examples clearly follows the SDA plot diagonal. However, like the RAO based method, the
network accuracy is also reduced significantly during transits.
6.4. Non-standard cases; transit 57
1 XN
M SE (A, B) = (Ai − Bi )2 (6.4)
N i =1
N µ ¶2
1 X Ai − Bi
N M SE (A, B) = (6.5)
N i =1 max(Ai , 0.1)
7.1.1. Data
For this assessment of neural network capability, data is used from two model tests con-
ducted at Marin. During these modeltests, both surface height and pitch motion are mea-
sured. Each modeltest lasts 3.5 hours and the measurement sample rate is 4 Hz. Both datasets
are splitt, with the first 80% (Figures 7.1 & 7.5) as training data and the last 20% (Figures 7.3 &
7.7) as test data. As the signal contains noise, this is mitigated by using the Matlab ’smooth-
data’ function. This returns a moving average of the signal. The result of this can be seen in
two snapshots of the datasets in Figures 7.2, 7.4, 7.6 & 7.8 .
For both modeltests, the model is at operational draft (T = ##m in full scale) and the
model heading is 0◦ . The waves are uni-directional and come from 180◦ . The seastate con-
ditions are depicted in Table 7.1. The sea-state 2 is particularly interesting because the pitch
motion of the vessel model is at a lower frequency than that of the waves. This can be ob-
served in Figure 7.6. The motion that is observed is caused by second order wave drift forces.
p
Sea-state nr. Duration [hour s] 4 m 0 [m] T p [s] Shape factor γ
1 ## ## ## ##
2 ## ## ## ##
59
60 7. Time Domain
training times are much longer than the frequency tests. The training run times for these
variations vary between 2-12 hours.
Variation 1 2 3 4 5 6 7 8 9 10 11 12
Input seconds 15 30 45 60 75 90 105 120 135 150 165 180
Input timesteps 60 120 180 240 300 360 420 480 540 600 660 720
LSTM modules 60 120 180 240 300 360 420 480 540 600 660 720
Variation 13 14 15 16 17 18 19
Input seconds 180 15 300 600 600 600 600
Input timesteps 720 60 1200 2400 2400 2400 2400
LSTM modules 60 720 60 60 60 60 60
Decay 1e-5 1e-4 1e-3
The training curves of de modeltest network are shown in Figure 7.9. The first twelve
topologies are a series of increasing input size and network size. For each network input dat-
apoint, an extra LSTM is added. Each step results in an increase in network performance.
At 180 seconds, or 720 timesteps, the amount of network nodes is at its maximum. The
amount of network parameters (Equation 7.1) for 720 LSTM modules and 720 input values
is 4, 150, 080. The increase in network performance is caused by either the increase in input
values, the increase in LSTM modules or both. Further variations conclude that the increase
in input values outweighs the increase in LSTM modules, but that both have a positive influ-
ence on network performance. A balance between network performance and computational
cost is found at 600 seconds of input signal and 60 LSTM modules. As symptoms of a too
high learning rate show up after a few epochs, the effect is mitigated using learning rate de-
cay. The comparison of network performance of variation 15-19 is displayed in Figure 7.10.
Network variation 18 (Table 7.3) is selected as the final network. A tuning of optimizer hyper-
parameters will probably lead to better network performance as only a few variations are
tried, but as network training takes a long time, no further changes are made to the network.
2
n θ = 4 · ((n i nput + 1) · n LST M + n LST M)
(7.1)
4, 150, 080 = 4 · ((720 + 1) · 720 + 7202 )
Figure 7.10: Modeltest neural network variations (selection), training loss curves
7.1.3. Results
The final network topology (variation 18, Table 7.3) is tested on the test dataset from the sec-
ond sea-state modeltest. The neural network predictions of the network are compared to the
measurements in Figure 7.11. In the first stage (in grey), the network is not able to make accu-
rate predictions. The reason for this, is the large amount of input values of the network. The
input consists of 2400 timesteps containing 10 minutes worth of surface height data. There-
fore, the first ten minutes of the test, the network input signal is thus not large enough and
gets supplemented by zeros.
7.1. Wave ⇒ Response 65
The LSTM modules have an internal memory which extends the time that the zeros sup-
plemented to the network input in the first stage have an effect. To determine how many
seconds this internal memory lasts, a second test is done starting 750 seconds later in the
timeseries. The full timeseries comparison is depicted in Figure 7.13. A zoom-in shows that
the two neural networks produce the same ouput, starting 1350 seconds into the timeseries
(Figure 7.14). This means the internal memory of the LSTM network does not outlast the
effect of the supplemented zeros in the network.
66 7. Time Domain
A part of the first ten minutes of the timeseries is characterized by large vessel motions.
After ten minutes, when there are no supplemented zeros left in the input, the network tends
to slightly underpredict the vessel motions (Figure 7.12). That is a result of the large vessel
motions, which are rare in the training set. Overall, the network predictions are in phase
and the amplitudes also match quite close. The predictions of the full timeseries is displayed
in Appendix D. The network makes accurate predictions for this sea-state and can capture
second order vessel motions caused by wave drift forces. However, because of the dominance
of the second order vessel motions, the first order response is of low priority in the network
and there are instances where these first order motions are not captured.
The network is also tested on the first sea-state, where the first order response is domi-
nant. Because these first order vessel motions can be described by diffraction software, the
neural network output is compared with RAO based predictions as well as measurements.
The results of which are displayed in full in Figure 7.15, a zoom-in of seconds 500-1000 in
Figure 7.16 and the the rest in Appendix D.
As both the diffraction based prediction as the neural network based prediction corre-
spond well with the measurement data, it is hard to judge which method produces the best
7.1. Wave ⇒ Response 67
results. Quantification of the timeseries correlation by means of the Pearson correlation coef-
ficient is a way to solve this issue [? ]. Equation 7.2 shows the correlation coefficient definition
with A being the measurement timeseries and B the predicted timeseries. µ and σ represent
the mean and the standard deviation, respectively, for the corresponding timeseries. As the
Pearson correlation coefficient uses a form of normalization (the correlation coefficient be-
tween A = si n(t ) and B = 2 ∗ si n(t ) is 1), the two prediction methods are also judged on the
basis of the mean squared error (Equation 7.3).
1 X N µA −µ ¶µB −µ ¶
i A i B
ρ(A, B) = (7.2)
N − 1 i =1 σA σB
1 X N
L M SE = (Ai − Bi )2 (7.3)
N i =1
The Pearson correlation coefficient is calculated for prediction methods for the test dataset
from 600s to 2500s. The correlation coefficient for the diffraction method predictions is
ρ D = 0.8636. For the neural network predictions is ρ N = 0.8988. The mean squared error
is calculated for the same timeseries and yields L M SE ,D = 0.0039 for the diffraction method
and L M SE ,D = 0.0030 for the neural network.
Both means of quantifying the results show a better performance by the neural network
compared to the diffraction method. However both methods perform well in the sea-state
where the first order responses are dominant and it would be premature to conclude, based
on a single model test, that the neural network has better predictive capabilities in all sea-
states where the first order responses are dominant. The neural network does show it is more
robust as it performs well in both sea-states.
7.2.1. Data
HMC has equipped her vessels with several sensors continuously logging data such as ac-
celerations, motions, stresses, draft and more as part of a big monitoring effort. The Fatigue
Monitoring System (FMS) on board of the DCV Balder has been logging since 2009. The Mo-
tion Reference Units (MRUs) enables HMC to log the vessel motions in all six degrees of free-
dom. Only roll-motion is taken into account in this assessment into network potential. There
is no reason why this type of network configuration would be more or less suitable for any
particular degree of freedom. Therefore it is not of value to assess all DoFs seperately, when
merely assessing the potential of this method. For the MRU model, data is selected from a
2010 transit. In the selection, the draft of the vessel is relatively constant (Figure 7.17). The
signal of the roll motion is depicted in Figure 7.18 & 7.19.
7.2. Motion Hindcast ⇒ Future Motion 69
The dataset is not separated into training and test data. Because the dataset is this large,
there is no need for a lot of epochs in the training process. This means that the network rarely
gets fed the same data twice. Therefore there is no risk of overfitting. This also means that
the neural network should be able to adapt itself to a different sea-state at the same rate that
sea-states change in the real ocean environment.
MRU Filtering
The MRU signals for the vessel motion 25 Hz. The signal is filtered in Matlab using the
smoothdata() function as can be seen in figure 7.20. This function outputs the running aver-
age of the signal.
The 25Hz is a unnecessary high frequency for the MRU signals. This will cause the amount
of input nodes to be higher which will increase training time and cause training difficulties.
The signal frequency should be high enough to accurately capture the waves but any higher
than the necessary accuracy will result in training complications. The sample rate was set on
3.125Hz as seen in figure 7.21. For readability, only four frequencies are shown in the figure
although more have been considered.
Variation 1 2 3 4 5 6 7 8
Input seconds (approx) 15 30 45 60 75 90 105 120
Input timesteps 45 90 135 180 225 270 315 360
LSTM modules 45 90 135 180 225 270 315 360
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam
Learning rate 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
During training, the loss curves seem to converge slowly, but none of the topology changes
seem to significantly increase network performance. In Figure 7.22, the training curves are
displayed and compared to the moving average of the variance of the roll motion. Figure
7.23, shows a simplified version in which all training curves are represented by their mean.
The moving average variance of the roll motion is a measure for how difficult it is for the net-
work to predict the sea-state. A strong correlation can be seen between the two curves and
this indicates that the networks are not actually converging.
Figure 7.22: Training loss curves (left y-axis) versus roll motion variance (right y-axis)
Figure 7.23: Mean of training loss curves (left y-axis) versus roll motion variance (right y-axis)
72 7. Time Domain
Network topology variations 1-8 follow the same strategy to quickly reduce their training
loss; they repeat the last input value as their output value. This ensures a quick descent down
the cost function. The networks all end up in a local minimum out of which the optimization
algorithm is unable to escape. Further topology changes do not yield better results. Table 7.5
and 7.6 show all tested topology variations. These include MLPs, multi-layer LSTM networks
and optimizer changes. All these variations performed worse than the first eight topologies
and either ended up in the same local minimum, or diverged in an early stage.
Variation 9 10 11 12 13 14 15 16 17 18
Input seconds (approx) 60 60 60 60 60 60 60 60 60 60
Input timesteps 180 180 180 180 180 180 180 180 180 180
Layers 3 7 12 22 12 22 12 22 12 22
Layer size 500 500 500 500 500 500 500 500 500 500
Activation function Linear Linear Linear Linear TanH TanH ReLU ReLU Sigmoid Sigmoid
Optimizer Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam
Learning rate 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
Variation 19 20 21 22 23
Input seconds (approx) 60 60 60 60 60
Input timesteps 180 180 180 180 180
Layers 3 3 3 4 5
LSTM modules (per layer) 180 180 180 Linear Linear
Optimizer SGD SGD Adam Adam Adam
Learning rate 0.001 0.01 0.1 0.0001 0.0001
Can a neural network based model be used as an alternative to the RAOs calculated
by a diffraction based method?
The results from the artificial environment in Chapter 6 show that, in theory, all four proposed
neural network architectures are capable of acting as an alternative to an RAO. As the 2D net-
work provides the most information and gives accurate results in the artificial environment,
it is the best option for tests with real world data.
A distinction is made between projects and transits. During projects, when the vessel
sails at operational draft, the diffraction based calculations are accurate enough to provide
reliable predictions for workability analysis, fatigue damage prediction, etc. During transits,
the vessel sails at an inconvenient draft and nonlinearities and other effects make that the
diffraction based calculations fail to represent reality.
The neural network based model when tested on project data is able to attain the same
accuracy of the diffraction based calculations. This makes it a viable alternative for these
cases. The spread around the y(x) = x line in the SDA plot (Figure 6.45) is most probably
due to uncertainties such as weather data, vessel manoeuvres, deck activity and non-linear
hydrodynamic effects.
For the model that was tested on transit data, the results show that the vast majority of the
predictions follow the right trend, however the SDA of some test examples show a significant
under-prediction by the network. A part of the spread around the y(x) = x line (Figure 6.45)
can be ascribed to the same causes as in the project data tests: uncertainties such as weather
data, vessel manoeuvres, deck activity and non-linear hydrodynamic effects. The main two
differences in the circumstances compared to the project data are a shallow draft and for-
ward speed. Extra non-linearities are hereby introduced. The forward speed also changes the
encounter frequency of the incoming waves. As the network input (a 2D wave spectrum) is
adjusted for heading but not for forward speed, the network will adapt to the mean forward
speed (or a speed close to this). But the network is not capable of adjusting for changes in
forward speed as it is not provided as an input.
A neural network based model shows promise as an alternative to a diffraction based
calculation method. Training a neural network in this way does however take a large amount
of data and using the model when encountering sea-states that are not in the training dataset
may cause reduced accuracy.
73
74 8. Discussion
Can a neural network based model be used as an alternative to an RAO in the time
domain?
One of the researched potential time domain uses of a neural network is a model that acts
as a real time tranfer function between waves and vessel motion. Two datasets from model
tests at operational draft, containing surface height and pitch measurements are used to train
and test the network. In the first dataset, a sea-state is present that invokes first order vessel
motion responses. The sea-state from the second dataset invokes mainly second order re-
sponses.
A network topology with 2400 input values containing a 10 minute, 4Hz surface height
signal and 240 LSTM modules and a single linearly activated node is chosen after a topology
optimization process. Results are displayed in Appendix D. The results from both datasets
show that the network is able to predict vessel motions with an acceptable accuracy.
Out of the modeltest results can be concluded that there is potential in a neural network
based model acting as a real time transfer function. It can predict first order vessel motions
with a similar accuracy than the diffraction based method. Unlike the conventional, RAO
based, method it is not limited to linear computations as it showed it is able to predict vessel
motions that are caused by second order wave drift forces.
Frequency domain
• All four proposed architectures (1D, 2D, 2D-SDA & Parametric) tested in the artificial
environment show satisfactory results.
• The 2D architecture is tested on project data with the Balder at operational draft. A
single hidden layer neural network with dropout and regularization was the best per-
forming network after a topology selection process.
• The 2D architecture is tested on project data with the Balder at operational draft. The
neural network predictions show similarly good results as the predictions based on the
RAOs.
• The 2D architecture is tested on transit data with the Balder at transit draft. The neural
network predictions show a closer match to the measurements than the predictions
based on the idenified RAOs.
• The results from the first dataset show that the neural network can predict first order
pitch motions with similar accuracy as a diffraction based method.
• The results from the second dataset show that the neural network can predict second
order (pitch) motions that a first order diffraction based model is not able to.
• All tested topologies either diverge, or end up in a local minimum in which the network
mimics the networks input. Attempts to predict the future motions based only on the
past motions were therefore unsuccesful.
75
76 9. Conclusions
• The overfitting mitigation measures Dropout and Regularization increase the robust-
ness of the training process.
Frequency Domain
• Make seperate networks per draft range using the final topologies from Sections 6.3 &
6.4. There will be a trade-off between the maximum network accuracy and the amount
of training data when choosing a draft range. These networks can be made for any
vessel.
• The current modeltest networks are trained in a few epochs. This cannot be done real
time. Develop a network training process in which the network is trained multiple
times on recent data, while also being trained real time.
• Test if the network works with input from FutureWaves, using the dominant wave head-
ing.
• Develop a network that uses the entire heading spectrum that FutureWaves provides.
• If computational capability allows it, increase the amount of LSTM modules and input
values (longer timeseries). This should increase network performance.
77
78 10. Recommendations
The vector fitting method turned out not to be applicable in this case. The main reasons
for this are the high amount of zeroes and the low resolution. The zeroes form a problem
because the vector fitting method produces a form of polynomial which in general can not
accurately describe flat lines. And the low resolution is a problem because the surface of
the wave spectrum is already described with too few values to produce a smooth line. The
method works best when one wants to go from hundreds of values to dozens of values. Not
when one wants to go from dozens to only a few. Simply because the complexity of the wave
spectrum needs at least as many parameters as the current method used.
79
B
Universal approximation theorem
The universal approximation theorem states that a neural network with a single hidden layer
that contains a finite number of nodes, can approximate any continuous function on a bounded
and closed subset of Euclidean space. This does contain some restraints on the activation
function.
In 1989, this was proven for the sigmoid activation function by George Cybenko [3]. A
visual representation of the proof by Cybenko is depicted in Figure B.1. It shows how a neural
network with two hidden nodes and a sigmoidal activation function is capable of represent-
ing a step function. Any function within the bounds of the universal approximation theorem
can be approximated by a finite number of step functions. From this follows that any con-
tinuous function on a bounded and closed subset of Euclidean space can be approximated
by a neural network with a single hidden layer with a finite number of nodes and a sigmoidal
activation function.
81
82 B. Universal approximation theorem
1
Figure B.1: Stepfunction by a single hidden layer neural network using f (I i ) =
1+e −I i
Later, the proof was delivered that single hidden layer networks are universal approxima-
tors as long as the activation function is continuous, bounded and non-constant [9].
Having established that neural networks are universal approximators does not give any
guarantee that a neural network can give accurate predictions. It is guaranteed under the
condition that training data is representative for the test data, enough training data is pro-
vided, the training process is not bound by computational cost and that the neural network
is not bound to a certain size.
C
MLP for timeseries
Recurrent neural networks such as LSTM networks are dedicated for time series prediction.
Multilayer perceptrons are not. Figure C.1 includes a visualization of how a timeseries is ’fed’
to an MLP network.
83
D
Modeltest predictions
85
86 D. Modeltest predictions
(a)
(b)
(c)
(a)
(b)
(c)
[2] M.A. Cauchy. M´ethode g´en´erale pour la r´esolution des syst‘emes d’´equations simul-
tan´ees.
[4] M.C. Deo. Artificial neural networks in coastal and ocean engineering. 39(4):589–596.
[5] A.A. Elshafey, M.R. Haddara, and H. Marzouk. Estimation of excitation and reaction
forces for offshore structures by neural networks. 1(1):1–15.
[6] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
http://www.deeplearningbook.org.
[7] HMC. Fatigue Tool - Balder tower fatigue during transit.
[10] K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are univer-
sal approximators.
[11] M. Khashei and M. Bijari. An artificial neural network (p, d , q) model for timeseries fore-
casting. pages 479–489.
[12] Y. Kim, H. Kim, and I. Ahn. A study on the fatigue damage model for gaussian wideband
process of two peaks by an artificial neural network. 111:310–322.
[13] D.P. Kingma and J. Lei Ba. Adam: A method for stochastic optimization. In ICRL.
[15] C.H. Lee and J.N. Newman. Computation of wave effects using the panel method. WIT
Press.
[16] W.S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity.
5(4):115–133.
[17] S. Namekar and M.C. Deo. Application of artificial neural network model in estimation
of wave spectra.
[18] S.J. Nowlan and G.E. Hinton. Simplifying neural networks by soft weight-sharing. pages
473–493.
89
90 Bibliography
[21] I. Rentoulis. Improvement of fatigue damage prediction for the balder j-lay tower during
transits.
[22] H.P. Riedel and A.P. Byrne. Random breaking waves - horizontal seabed. In editor, editor,
Coastal Engineering Proceedings, pages 903–908.
[24] D. Skandali. Identification of response amplitude operators for ships based on full scale
measurements.
[25] N. et al Srivastava. Dropout: A simple way to prevent neural networks from overfitting.
(15):1929–1958.
[27] H.L. Tolman, V.M. Krasnopolsky, and D.V. Chalikov. Neural network approximations for
nonlinear interactions in wind wave spectra: direct mapping for wind seas in deep wa-
ter. 8:253–278.
[29] S.F. Yasseri, H. Bahai, H. Bazargab, and A. Aminzadeh. Prediction of safe sea-state using
finite element method and artificial neural networks.
[30] J.H Yi, J.S. Park, and K.S. Lee. Long-term strain measurement on a jacket-type offshore
structure and neural networks based prediction model.