0% found this document useful (0 votes)
150 views

Comsol Parallel

This document discusses running COMSOL Multiphysics simulations in parallel on a Linux cluster. It describes how to run COMSOL on a cluster and investigates the speedup from parallelization. Particle swarm optimization is implemented to optimize a COMSOL model using the Matlab interface on the supercomputer. The goal is to help students utilize high performance computing for their COMSOL projects.

Uploaded by

overkind
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

Comsol Parallel

This document discusses running COMSOL Multiphysics simulations in parallel on a Linux cluster. It describes how to run COMSOL on a cluster and investigates the speedup from parallelization. Particle swarm optimization is implemented to optimize a COMSOL model using the Matlab interface on the supercomputer. The goal is to help students utilize high performance computing for their COMSOL projects.

Uploaded by

overkind
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Parallel Computing and Optimization with

COMSOL Multiphysics

Sol Maja Bjørnsdotter Fossen

Master of Energy and Environmental Engineering


Submission date: July 2017
Supervisor: Robert Nilssen, IEL

Norwegian University of Science and Technology


Department of Electric Power Engineering
MASTER THESIS, JUNE 2017, NTNU 1

Parallel Computing and Optimization with Comsol


Multiphysics
Fossen, Sol Maja B.

Master student
Department of Electric Power Engineering
Norwegian University of Science and Technology
Trondheim, Norway
Email: solmajab@stud.ntnu.no

Abstract—In this thesis the parallel capabilities of COMSOL The most common types of jobs that requires a great deal
Multiphysics are investigated. A description for how one can run of computational resources are
COMSOL on a Linux cluster is presented. The speedup was • Large 3D simulations with millions of degrees of freedom
found to be poor for medium-sized simulations with less than 10
• Time dependent simulations
million degrees of freedom. The speedup for parametric sweeps
were found to be excellent. Particle swarm optimization (PSO) • Optimization methods that run hundreds or thousands of
was implemented using LiveLink for Matlab, and run on the simulations
supercomputer at NTNU. It was found to perform very well A general description of how one can run COMSOL models
without any tuning of the algorithm. on a cluster will be presented.
In master projects, time dependent simulations and op-
timization are perhaps the most common jobs. Of these,
I . I NTRODUCTION
optimization is the type of job that will benefit the most
Motivation from parallel computing. Different optimization methods are
This thesis was motivated by the wish to run larger FEM reviewed, and a global optimization method is selected for
simulations by the electrical power department. implementation on Vilje. The goal is to 1) describe a method
Research objectives are often restricted by the computa- for how one can optimize COMSOL models on Vilje, and
tional resources available. There is a trade of between the 2) use the method to optimize a model from fellow student
accuracy of a simulation and the simulation time, and sim- Charlie Bjørk.
ulations will be created with time and memory-constraints in This thesis is partially meant to be a "how to"-manual for
mind. When it comes to FEM software, models are usually future students. Hopefully this work will make it easier for
simplified in order to achieve a reasonable simulation time. For future master students to utilize high performance computing
example by using linear models instead of complex models. in their COMSOL projects.
NTNU has a supercomputer (Vilje) at campus which is
available for students and phd’s, but it is being utilized by II . S CIENTIFIC AND PARALLEL C OMPUTING
master students to a very small degree. Scientific computing can be defined as "the study of how
Some reasons that Vilje has not been utilized more is to use computers to solve mathematical models in science
• No awareness: Students doesn’t know it exists, or that it and engineering". Scientific computing emerged in 1938 when
is available for them Konrad Zuse build the first programmable computer in order
• Limited knowledge: students don’t know how to use it to solve systems of linear equations [1]. Before that scientists
• Limited need: Master projects seldom require heavy had to make many simplifications to be able to solve a problem
computations by hand, and only simple PDE’s could be solved accurately.
There is a gap between the electrical power department and Since then, scientific computing has become an increasingly
the world of cluster computing. The goal of this thesis is to important enterprise for researchers and engineers. Most in-
bridge the gap, so that future students are not restricted by the dustrial sectors rely on scientific computing when developing
computational resources of a laptop. new designs and products. The development is driven by the
COMSOL Multiphysics is the chosen FEM software by need to solve larger and more complex problems.
NTNU’s electrical power engineering department. It is a Scientific computing combines mathematic models and nu-
powerful and versatile simulation software with cluster capa- merical analysis to solve complex problems. The first step is to
bilities. use understanding of the physical problem to set up a suitable
Scope of Work mathematical model. The model will in most cases consist of
The main objectives at the start of this thesis was to differential equations and a number of initial and boundary
gain experience in running COMSOL Multiphysics on the conditions [2]. Numerical methods and computer science can
supercomputer Vilje. then be used to solve the system of equations.
MASTER THESIS, JUNE 2017, NTNU 2

system matrix will be zero. Some cases can have nodes


connected to non-neighboring nodes (for example models
(a) using radiative heat transfer) which results in a denser
system matrix.
Fig. 1: Different element shapes [3] Assembling the system matrix is one of the most
memory-intensive steps when computing a solution.
Denser system matrices will require much higher mem-
There are many different methods available for solving lin-
ory. The number of degrees of freedom and the sparsity of
ear and non-linear PDEs. One of the most practical numerical
the system matrix will determine the memory requirement
methods is finite-element methods (FEM).
of a model.
5) Solving the system of equations
A. The Finite Element Method The matrix system is solved using either direct or iterative
The finite element method was discovered by several people methods.
independently, and gained traction in the 1960-70s. One of
the main advantages of FEM is that it can handle complex B. Parallel Computing and Speedup
geometry and material topology, and the ability to include
general boundary conditions. FEM can solve systems made Depending on the problem, scientific computing is done
up of many different geometric and material regions. on everything from personal computers to supercomputers.
The main principle of FEM is to divide a domain into More powerful computers and more efficient algorithms allows
many small, discrete elements, and applying the relevant scientists to solve larger and more realistic problems. Super-
differential equations to each element. A system of equations computers often use parallel processing, so the development
is constructed, and the system can then be solved using linear of parallel numerical methods has been an important area of
algebra or numerical schemes. research.
FEM can be divided into five separate steps: The speedup is defined as the ratio of the time T1 used by
1) Discretize the domain into finite elements one processor, to the time TP used by P processors.
The most common elements are triangular and rectangu- T1
lar linear elements for 2D, and tetrahedral, hexahedral and SU = (1)
TP
pyramid elements for 3D [4]. The elements are connected
together at nodes. The speedup for partially parallel problems is determined
2) Select interpolation functions by how much of the problem can be parallelized.
FEM is discrete in nature, meaning that the solution is not Let f be the part of the computation, by time, that must
computed for every point in the domain. The unknown be done serially [6]. Then (1 − f ) is the amount of time that
variables are found at the nodal points. To approximate can be completely parallelized. The run time when using P
the variables at all points inside an element, an interpola- processors is then
tion function is used. The type of interpolation function
(1 − f )
depends on the shape of the element. The interpolation Tp = f + (2)
function is often a polynomial, where the degree depends P
on the number of nodes in an element. The speedup SU is then
3) Deriving the finite element equations
There are two popular methods of deriving the finite ele- T1 1
SU = =
ment equations, the variational approach and the Galerkin Tp f+ (1−f )
·P
P (3)
approach [5]. The Galerkin approach is a special case of
P
the method of weighted residuals, and is the most popular =
f (P − 1) + 1
due to its greater generality [5]. Boundary conditions are
also imposed. Expression 3 is known as Amdahl’s law, and it gives the
4) Assembling the element equations theoretical maximum speedup for a job. Amdahl’s law is
Assembly is the process of adding all the element matri- plotted in figure 2.
ces together to form the system matrix [5]. For a model The gained benefit of using a supercomputer is not al-
with m unknown nodal values, the system matrix will ways as large as one would expect. For high values of f,
be an m × m matrix. The connectivity of the elements increasing the number of processors is not going to increase
is used to assemble all the equations together to a linear the speedup. Even a slight percent serial part will limit the
system of equations Ax = b, where A is called the system maximum speedup greatly. If the computational resources goes
matrix, or the stiffness matrix. toward infinity, a 95% parallel job still only has a maximum
The system matrix will be square, symmetric, singular theoretical speedup of 20.
and in most cases sparse. The sparsity of the matrix Problems can be classified according to the degree of
depends on the shape of the elements and the connectivity parallelism:
of the nodes. Normally, each node is only connected to • Embarrassingly parallel problems are problems which
the neighboring nodes in the mesh, so most terms in the are easy to separate into a number of parallel tasks.
MASTER THESIS, JUNE 2017, NTNU 3

TABLE I: Some terminology regarding supercomputers


Core The basic unit that performs calculations.
Processor A processor is the central processing unit in a computer.
It contains cores and supporting hardware. A processor can
have more that one core.
Node A physical self contained computer. Each node on Vilje
consists of two processors, with 8 cores each.
Cluster A connected group of computers(nodes), working together.

There are two ways to benefit from parallel computing. With


more computational resources one can split a problem into
smaller subproblems, reducing the computation time. This is
called strong scaling (speedup). Weak scaling, on the other
hand, is keeping the work-per-node constant, while increasing
the number of nodes. Weak scaling allows you to solve larger
problems in a constant amount of time.
Fig. 2: Amdahl’s law
C. Supercomputers (Vilje)
There is little communication between the tasks, and The term "supercomputer" is changing as computers are
no dependencies between them. A Parametric Sweep in rapidly improving. A normal smart phones today could be
COMSOL Multiphysics is embarrassingly parallel. Each considered a supercomputer 20 years ago. Some terminology
simulation is completely independent of the others. Of regarding supercomputers is given in table I.
course, each simulation in the sweep has its own degree The famous Moore’s law (1965) states that the number of
of parallelism. transistors per chip doubles every two years. This law held
• Partially parallel problems are problems where part
true for many years, but in recent decades the law has been
of the problem is parallelizable. All FEM-models are at stalling.
least partially parallel, because solving a system of linear In order to increase performance in the post-Moore era, the
equations is a partially parallel problem. industry has been moving towards using parallel processing,
• Inherently serial problems are impossible to divide into
making specialized chips and reconfigurable chips.
a number of parallel tasks, because they all depend on The national e-infrastructure for universities in Norway
each other. A good example of this is time dependent consists of four supercomputers, "Abel" in UiO, "Vilje" at
studies, where initial values for a time step n is given NTNU, "Hexagon" at UiB, and the new supercomputer "Fram"
by time step n − 1. It is not possible to parallelize the at UiT. "Fram" was installed at UiT spring 2017, and it will
time step simulations, so the speedup be determined by eventually replace Vilje and Hexagon. The new supercomputer
the possible speedup of a single time-step simulation. is currently rated as one of the worlds top 200 supercomputers
For inherently serial problems, throwing more computing [8]. The lifetime of "Fram" is estimated to be four years.
power at the problem will gain you little in speedup. In some Uninett Sigma2 is responsible for the operation and main-
cases, it might even take longer to solve a problem with more tenance of the e-infrastructure for universities.
resources. When more processors are added to a parallel job,
the computational task for each becomes smaller, and the III . RUNNING C OSMOL M ULTIPHYSICS IN PARALLEL
communication overhead increases. Communication overhead In this chapter the parallel capabilities of COMSOL are
is the time not spent on running the job, but on things like looked into. A detailed explanation of how to run COMSOL
distributing the job, collecting the results and communication on a cluster is presented.
between nodes. Figure 3 shows how the speedup decreases COMSOL Multiphysics is a flexible finite element analysis
because of communication overhead. software geared towards research purposes. The interface
makes it easy to set up complex problems without extensive
knowledge about the underlying mathematics or physics.
COMSOL Multiphysics has two modes of parallel opera-
tion:
• The shared memory model
When running on a personal computer, COMSOL uses
shared-memory parallelism. On personal computers all
the cores have access to the same memory, and the com-
munication between the cores is as fast as the memory
access. COMSOL will by default use all the available
Fig. 3: Ideal speedup vs. realistic speedup [7] cores available.
MASTER THESIS, JUNE 2017, NTNU 4

• The distributed memory model There are many iterative solvers to choose between in
When running COMSOL on a Linux or Windows cluster COMSOL, but two of the most widely used are multigrid
consisting of several computers, the distributed memory methods and domain decomposition.
model is used. The nodes in a cluster do not share the
same memory, and the speed of communication between
nodes will depend on the physical distance between them.
COMSOL uses MPI for node-to-node communication on
a cluster. MPI the dominating message passing protocol
for parallel computing.
Both memory models are combined to give the best perfor-
mance. When you start a COMSOL job on a cluster, COMSOL
will use the shared memory model within each node, and MPI
to communicate between nodes. It is also possible to use MPI
(distributed mode) to communicate between the cores within
a CPU, but this will be slower than using the shared memory Fig. 4: A schematic description of the full multigrid algorithm
mode. [12]
An advantage of COMSOL Multiphysics is that is will set
up and manage the communication between nodes automati- Multigrid Solvers
cally. Little knowledge about parallel computing is required to Basic iterative methods remove the high frequency errors
use it, but in order to get the most out of distributed mode, the quickly, but use a long time to remove the low frequency
model should be set up with parallelism in mind. Choosing the errors. The multigrid method is based on the observation that if
right solver is important in order to best utilize a computing a problem is transferred to a coarser grid, the highest frequency
cluster. errors disappear and the low frequency errors turn into higher
frequency errors. Transferring the problem to a coarser grid is
called restriction.
A. Solvers The first step in multigrid methods is to remove the high
There are two main methods to solve systems of linear frequency errors with a basic iterative method, giving the
equations: direct and iterative methods. Direct methods require solution xl . The residual
a fixed, deterministic number of steps to produce a solution. r = b − A · xl (4)
Gaussian elimination and LU factorization are some examples
of direct methods. Iterative solvers improve on an initial guess of the solution is computed, and the system is restricted from
for each iteration. The process can be repeated until the a mesh with size h to a mesh with size 2 · h.
residual Ax-b sufficiently close to ~0. The solution on the mesh n · h is then prolongated up to
Direct solvers are very robust and will work on most the mesh (n/2)h by interpolation.
problems, but iterative methods are usually more efficient both In order to solve the residual on the coarser mesh, the
in time and memory consumption. Because of this, the default system can again be transferred to a coarser grid. This can
solver uses a direct method only for 2D problems, and for be repeated recursively until the system is small enough to
smaller 3D problems. solve with direct solvers. This recursive scheme is called the
The initial sparseness of the system matrix will not be multigrid V-cycle.
maintained when using a direct solver. Many of the zero terms Multigrid methods are popular because of their rapid con-
will become non-zero during the solution process [10]. This vergence rate. The computational work increases linearly with
is called "fill-in", and it is undesirable because it increases the the number of unknowns.
memory requirements and the number of arithmetic operations. Domain Decomposition
Different strategies can be used to reduce the fill-in. Domain decomposition works by dividing model domains into
Iterative methods can be challenging to set up for complex sub-domains, and solving the problem for each sub-domain.
multi-physics models, as they are less robust. There is a trade- The total solution is then found by iterating between the
off between robustness of a solver and their time and memory computed solutions for each sub-domain, and using the cur-
requirements. The fastest iterative solvers are least robust, and rent neighboring sub-domain solutions as boundary conditions
do not work for all cases. [11].
There are many iterative solvers available in COMSOL, but To solve each sub-domain, a "Domain solver" is used. In
they are all similar to the conjugate gradient method. COMSOL, MUMPS is the default domain solver, but there
Iterative methods rely on good preconditioners to be effi- is a wide range of solvers to choose between. The idea is to
cient. A widely used preconditioner is the geometric multigrid divide the the domains into small enough pieces that direct
(GMG) technique, which can handle a large class of problems solvers are efficient. Domain decomposition is combined with
[? ]. a global coarse solver to accelerate convergence [13]. Figure 5
Preconditioners can use more time and memory than the shows the domain decomposition tree in COMSOL, with the
iterative solver itself [11]. coarse solver and the domain solver nodes.
MASTER THESIS, JUNE 2017, NTNU 5

Domain decomposition is especially useful for parallel


computing, as each sub-domain problem is independent of
the others. The default method is the overlapping Schwartz
method, where the domains overlap by more than the interface.

(a) A coil before par- (b) The coil after par- (c) The coil after par-
titioning titioning into 2 do- titioning into 20 do-
mains mains

Fig. 6: A coil geometry before and after partitioning with work


planes

Fig. 5: A study tree in COMSOL 5.2 using the "domain


decomposition" solver

All iterative solvers (except Incomplete LU) in COMSOL


are parallelized. Setting up the solvers to maximize the
speedup can be challenging.
From the COMSOL blog [? ]: "It is also fair to say that
setting up and solving large models in the most efficient way
possible is something that can require some deep expertise of
not just the solver settings, but also of finite element modeling
in general." (a) Large scale

B. Meshing in Parallel
The free mesher in COMSOL runs in parallel both in shared
memory mode, and in distributed mode.
The free mesher starts by meshing the faces in a model, and
then moving on to the interior volumes of the domains. After
the faces between two domains are meshed, the domains can
be meshed independently of each other, and the two jobs can
be distributed on the available cores.
The free mesher will distribute the domains automatically,
but it can not divide a domain into sub-domains in order
to parallelize the job further. If there is only one domain
in the model, which can be the case for some imported
CAD-models, there will be limited speedup by using more (b) Small scale
processors. Fig. 7: Meshing time for the model "Box" as a function of
number of domains in the model
Reducing the meshing time by partitioning
In order to parallelize the meshing, the domains can be
partitioned manually. To do this, add a work plane to the 2) Model "Coil" consists of a simple coil with 10 windings.
model. A partition objects geometry operation can then be The mesh set to "Free tetrahedral", and the size to
used to partition the domain with the work plane. If you only extremely small. See figure 6.
want the partition to affect the meshing, and not the geometry, The models were meshed with a different number of parti-
the geometry operation Virtual opertions is useful. It allows tions to investigate the speedup. The simulations were run on
you to partition a model only for meshing purposes, without my laptop, which has an Intel Core i7-3720QM CPU with 4
influencing the physics settings of the model. physical cores, and 8 GB of memory. The results are plotted
To test the effects of partitioning on meshing time, two in figure 7 and 8.
simple models were created. In figure 7 b) is is possible to see that a parallel job
1) Model "Square" consist of a simple 3D-box geometry. is most efficient if the number of jobs is divisible by the
The mesh was set to extremely small. number of cores. The lines for 4 cores (blue line) and 3 cores
MASTER THESIS, JUNE 2017, NTNU 6

give a good reduction in the total simulation time. However,


this is only important if there are large or complex domains
in the model.

C. Running COMSOL on a Cluster


In this chapter, a detailed explanation of how to run COM-
SOL on a cluster is given. The speedup of single simulations
and parametric sweeps is tested and discussed. In the end, the
solutions for some common problems are given.
To run COMSOL on Vilje/Fram, there are 4 basic steps:
• Prepare the model

• Create/edit a job script


• Move the files to the cluster, and submit the job
• Move the resulting output file to your personal computer
In addition to this, before you run COMSOL the first time
on a cluster you need to log in to COMSOL with a username
and password. To do this, write "load COMSOL/5.2", and
Fig. 8: Meshing time for the model "Coil" as a function of then start COMSOL. You will be asked for a username and
number of domains in the model password. After this, press Ctrl+c to close the program. It is
important to quit COMSOL because you are not supposed to
TABLE II: Meshing time for model "Coil" for a different run programs on Vilje directly. Luckily, you only have to do
number of meshing-nodes in the meshing sequence this once.
20 "Free-tetrahedral"-nodes 5.4 minutes
1) Preparing the model: Under the "Study" node in COM-
20 Domains 10 "Free-tetrahedral"-nodes 4 minutes
SOL, there are several options for cluster computing. These are
1 "Free-tetrahedral"-nodes 2.5 minutes "Batch", "Batch Sweep", "Cluster Computing", and "Cluster
Sweep". These nodes are all mainly for running COMSOL
on a cluster remotely from your computer, by simply pressing
(green line), have local minimums at [4,8,12] and [3,6,12] the compute button. However, it is not allowed to run jobs
respectively. This effect becomes smaller and smaller as the directly on Vilje, so I have not used this functionality. To run
size of the jobs decrease. For one core, there is a very small a COMSOL job on a cluster, it is not necessary to add any of
speedup in the meshing time when partitioning the model. the above study nodes. It is mentioned because this can cause
When the number of domains goes over 100, the meshing time some confusion.
increases. According to COMSOL support, this is because of Setting up parametric sweeps
the increased number of faces in the model. If you are running a large parametric sweep, with hundreds
For the coil geometry, one can see in figure 8 that there is a or thousands of simulations, storing all the solutions in a file
very good speedup even for the single core case. According to can take up a lot of RAM. If the file grows too large, it can
COMSOL support, this is because the faces of the coil have be cumbersome to work with the file on a personal computer
a complicated surface parametrization, which means that the afterwards. To avoid this, use probes to measure the relevant
majority of the meshing time is spent generating the surface variables for each simulation. Create the probes you need by
mesh. When the coil is partitioned the resulting faces get less right-clicking on "Component 1 -> Definitions -> Probes".
complex, and the surface meshing time decreases. In the settings for the parametric sweep, check the box
COMSOL will mesh a meshing-sequence serially in the for "Accumulated probe table". COMSOL will collect all the
order they are listed. This means that if there is, for example, probe values during the sweep, and store them in a table. If
several "Free Tetrahedral" nodes, only one of them will be the sweep is large, it can be necessary to change the maximum
meshed at a time. As table II shows, the meshing time number of rows in the table settings, as the default is 10 000
increases when there are more mesh-nodes in the meshing rows.
sequence. To avoid storing all the solutions, go to "Output While
If you are setting up a parametric sweep that only changes Solving" and set "Keep solutions in memory:" to "Only last".
one part of the geometry, it is a good idea to put the meshing- Note: When setting up a parametric sweep, remember to
node for that part at the bottom of the list. COMSOL will start check the "Study Extensions -> Distribute parametric sweep"
at the top of the meshing sequence, and check if the geometry box.
corresponding to that mesh-node has changed. If it has, that COMSOL will by default stop a parametric sweep if one
node and all the nodes under it will be meshed again. of the simulations produces an error. To disable this, go to
Partitioning the mesh is perhaps more important for models "Study -> Job Configurations-> Parametric Sweep -> Error"
that are being run serially, like in time studies or for some and uncheck the "Stop if error"-box. If a simulation fails, it
optimization algorithms. Saving meshing time in each step can will simply move on to the next simulation in the sweep.
MASTER THESIS, JUNE 2017, NTNU 7

Before running a COMSOL model on a cluster, there is a


few things you can always do to increase the chance of the 1 # ! / bin / bash
simulation running smoothly. 2 #PBS −N job_name
• If possible, check that everything works correctly by 3 #PBS −A p r o j e c t _ a c c o u n t
running a simulation (on a coarse mesh perhaps). 4 #PBS −l s e l e c t = 2 : n c p u s = 3 2 : m p i p r o c s = 2 :
o m p t h r e a d s =8
• Press "File -> Compact History"
5 #PBS −l w a l l t i m e = 2 4 : 0 0 : 0 0
• Clear the mesh by right-clicking on the mesh and select- 6 #
ing "Clear Mesh". Having a partially meshed model can 7
result in a "Domain already meshed"-error. 8 module l o a d c o m s o l / 5 . 2
• Clear all solutions
9
10 cd PBS_O_WORKDIR
2) Setting up the job script: To run a job on Vilje, it 11
must first be submitted to the job scheduler. The scheduler 12 w= / work / $PBS_O_LOGNAME / s o m e _ f o l d e r
controls which jobs runs on which nodes at any given time. 13 i f [ ! −d $w ] ; t h e n m k d i r −p $w ; f i
The scheduler will 14
15 cp model . mph $w
• Put the job in a queue 16 cd $w
• Allocate resources and decide which job to start 17
18 c o m s o l -nn 20 -np 8 -clustersimple -mpiarg
• Report back any output and error messages from the job -rmk -mpiarg p b s −f n o d e f i l e b a t c h
-batchlog l o g . t x t -inputfile $model . mph
The order in which the jobs are started depends on how -outputfile o u t . mph
the jobs are ordered (in the queue), how many resources are 19
available, and how many resources the jobs are requesting. On 20 cp l o g . t x t $PBS_O_WORKDIR
21 cp o u t . mph $PBS_O_WORKDIR
Vilje, the scheduler PBS is used.
To submit a job to the queue, the user must tell the scheduler
how many nodes they request, and how long time they estimate Fig. 9: An example job script for running a COMSOL batch
their job will take. This is done by creating a job script. A job file
script is a text file containing information for the scheduler,
and the commands you want to run.
Running a "normal" COMSOL job without any interaction two COMSOL processes per node, and 16 cores per node,
is called running a batch job on a cluster. From my experience, so -np is set to 16
2 = 8.
the batch job is the most reliable and stable way of running The documentation for the rest of the commands in line
COMSOL on Vilje. The batch command will take in a file 18 can be found in the chapter "Running Comsol-> Running
name, run all the studies in the model, and store the solution. COMSOL in parallel -> The Comsol Commands" in the
Figure 9 gives an example of a job script for running a batch reference manual [11].
job. The lines starting with #PBS contains information for the When you are running a parametric sweep on a cluster, the
scheduler. The line number of COMSOL-processes will decide how many sim-
ulations in the sweep are run simultaneously. The parametric
#PBS -l select=2:ncpus=32:mpiprocs=2:ompthreads=8
sweep will be distributed over the COMSOL-processes, so the
tells the scheduler how many nodes and processors the job number of simultaneous simulations is equal to the −nn value.
is requesting. Change "select" from 2 to the number of nodes If you want the parametric sweep to be as fast as possible, set
you want. "ncpus" determines how many cores per node you -nn equal to the length of the sweep.
are given. If you ask for 1 node, the scheduler will reserve the NOTE: The number of MPI processes should be equal to
whole node for your job. That means that there is rarely any the number of COMSOL-processes per node. Set "mpiprocs"
point in requesting less than all the cores, as nobody else will to −nnN . Each COMSOL-process needs a communication in-
be able to use them. terface.
"mpiprocs=2" sets the number of MPI-processes per node. The size of the simulations should decide how many COM-
Each node on Vilje has two physical processors which do not SOL processes you start per node. If the simulation is small,
share memory, so "mpiprocs" should always be at least 2. you can run many simulations in parallel on one node. For
If you are running a parametric sweep, this number can be example, if one core is enough for one simulation, you can
increased depending on how you want to distribute the sweep. solve 16 simulations simultaneously per node by setting −nn
Line 18 in figure 9 starts the simulation. There are two flags to 16 · N , and −np = 1. Figure 10 shows an example of two
which will determine how COMSOL is set up on the nodes: different configurations.
1) -nn 20: The -nn flag tells COMSOL how many 3) Submitting and checking a job:
COMSOL-processes to start in total. If you are running a 1) Move your COMSOL file to a folder on your home
batch job without a parametric sweep, set this number to area on Vilje using WinSCP. See the appendix for some
2·N , where N is the number of nodes you are requesting. information on WinSCP and puTTY.
2) -np 8: This flag tells COMSOL how many physical cores 2) Create or tweak a job script, and put it in the same folder
each COMSOL process can use. In this case, there are as your COMSOL file.
MASTER THESIS, JUNE 2017, NTNU 8

(a) COMSOL started with nn=4, np=8

Fig. 11: Speedup for the model "Electronic Enclosure Cooling"

Nodes NN NP Cores Time


1 1 2 2 11083
1 1 4 4 6808
1 1 8 8 4929
1 2 8 16 4468
2 4 8 32 3567
3 6 8 48 3316
4 8 8 64 3074
5 10 8 80 3207
7 14 8 112 3187
(b) COMSOL started with nn=8, np=4 10 20 8 160 3575

Fig. 10: Two different ways to distribute a parametric sweep. TABLE III: Simulation times for the model "Electronic En-
In figure a), 4 simulations in the sweep are started simultane- closure Cooling"
ously, with each simulation running on 8 cores

1) Speedup of Models: To investigate the speedup, the


3) Log into Vilje with PuTTY and navigate to the correct model "Electronic Enclosure Cooling" from the application
folder. To submit your job to the queue, type "qsub library was run on a different number of nodes. The model
yourjobscript.pbs". The job should now be submitted to uses the "Heat Transfer Module" to study the thermal behavior
of a computer power supply unit [15]. The model uses the
the queue. geometric Multigrid solver, and it has 1,2 million number of
4) When the job is submitted, you can check the status with degrees of freedom.
qstat -u username. The speedup is plotted in figure 11. The maximum speedup
5) When your job is finished, there should be two new files for this model was found to be around 3,5. The speedup is
in the folder. One output and one error file, which are quite bad, and one of the reasons for this might be that the
named something like jobname.o/e0455732123. If you model is too small.
are using -batchlog log.txt in the job script the error file Some speedup tests were also run on a simple model,
will be empty, as everything is outputted to the log file. "Box.mph". The model consist of a simple 3D box, partitioned
6) If the job finished successfully, use WinSCP to move the into 100 pieces. The mode was run with two different solvers,
COMSOL .mph file back to your own computer. The file geometric Multigrid, and MUMPS. It was run for two different
should contain a solution. mesh sizes, resulting in a DOF of 2,8 and 7,9 million.
It is clear from figure 12 that the iterative solver performs
much better than the direct solver. The speedup of the iterative
D. Testing the Speedup of Comsol Models solver was not better than for the direct solver, and the speedup
was very small in both cases.
So what kind of speedup can you expect from COSMOL 2) Speedup of Parametric Sweeps: To test the speedup of
Multiphysics? parametric sweeps, a model by my fellow student Charlie
According to [14], for large COMOSL models the maxi- Bjœrk [16] was used. A parametric sweep of length 180 was
mum speedup is normally in the range of 8 to 13. Gener- added to the model. The sweep takes approximately 1 hour
ally, models requiring large amounts of memory have better on my personal computer. The sweep was run in different
speedup [14] [11]. The speedup is also largely dependent on configurations on Vilje.
the mesh, the solver configurations, and the physics. Figure 13 shows the speedup of the sweep. The speedup
MASTER THESIS, JUNE 2017, NTNU 9

Fig. 12: Simulation times for the model "Box", for different
solvers and mesh sizes

TABLE IV: Simulation times for the parametric sweep


Fig. 13: Speedup a parametric sweep of length 180
Nodes NN NP Time
1 1 8 79 minutes,21 seconds
12 90 2 6 minutes Delete some of the hidden COMSOL files on your
12 180 1 6 minutes, 57 seconds home area. The files in .comsol/v52/configurations and
6 45 2 x 7 minutes, 25 seconds(?) .comsol/v52/workspace can be deleted.
23 180 2 5 minutes,54 seconds • Problem: No batch sweep detected Sometimes, if you
have messed around with the job configurations before
adding a parametric sweep, the parametric sweep node
is linear in the beginning, then it flattens out as the number will not appear under job configurations. Try deleting the
of COSMOL-processes reaches the number of simuations in configurations, and creating it again by right-clicking job
the sweep. Running a parametric sweep in COSMOL is a configurations and selecting "Show default Solver". If this
embarrassingly parallel problem, and the speedup is excellent. does not work, try deleting configurations, then run a
3) Common Problems with Distributed COSMOL: Some of single simulation in the sweep, and try creating it again.
the errors I have experienced with COSMOL, and the solutions
to them, are listed below. IV . O PTIMIZATION
• Problem: Simulation takes unreasonable long time. Optimization is the practice of finding the best solution to
When using COSMOL 5.3, often COSMOL would not a problem from all feasible solutions. In most cases, finding
stop running after finishing a simumlation. It would a good enough solution is often sufficient. How extensively a
instead keep "running" until the job exceeded the wall- problem can be optimized is a function of the cost and time
time. If the top-command shows that all the COSMOL- of the optimization process. An optimization problem can be
processes are using 100 expressed as
• Problem: Error using the Mphserver on Vilje It is not
possible to use the mphserver on more than one node
when using COSMOL 5.2, as there is an bug in the min f (x), x ∈ Rn
version. Often the error messeage will say something like
Subject to the constraints
"Inconsistent numbers of degrees of freedom". According
to COSMOL Support, one should use COSMOL 5.3 to Gi (x) > 0
fix this. However, I did not manage to get the COSMOL Gi (x) = 0
5.3 Mphserver to run smoothly on Vilje. I you can, use
the batch mode. Where n is the number of control variables, Gi (x) is a
• Problem: The probe table is empty after a parametric set of constraints, and f (x) is the objective function to be
sweep Solution: Delete the solver configurations node, minimized. In the case of optimizing a COMSOL model, the
and the study-node. Add a new study, the required nodes, control variables can for example parameterize the geometry
and a parametric sweep. Don’t run a stationary or time- and materials.
dependent study before you add the parametric sweep. There are two main categories of optimization methods,
Run a short parametric sweep. Check that a "Parametric gradient-based and gradient-free methods. They can also be
Solutions" node shows up under "Solution 1 –> Solver labeled as stochastic, deterministic or metaheuristic [17].
Configurations". Table ?? gives some advantages/disadvantages of the two
• Problem: No username detected different methods. Gradient-based techniques require the func-
This errors shows up if you accidentally delete your login tion to be smooth and differentiable. They converge quickly
information in the .comsol folder, or the first time you run to optima, but can get stuck easily in local optima. Global
a job if you have not created a user. methods converge slower near optima, but does not get trapped
• Disk quota exceeded in local optima.
MASTER THESIS, JUNE 2017, NTNU 10

In 1996 and 1997 David Wolpert and William Macready Algorithms (EA’s) and Swarm Intelligence (SI) are the most
presented the "No Free Lunch" theorem. They observed that studied groups of population based methods. EA’s are based
the performance of all search algorithms, averaged over all on Darwin’s evolutionary theory, and contains, among others,
possible problems is exactly the same [18]. This includes Evolutionary programming and Genetic Algorithms (GA).
blind-guessing. No algorithm is universally better than another Swarm Intelligence algorithms are methods inspired by the
on average. This means that there is no search algorithm that social behavior of insects and birds. Typically the individuals
always perform better than others on any class of problems. will be simple, and need to cooperate to find a good solution
Instead of looking for the best general optimization algorithm, [19]. Together the individuals can perform complex tasks,
one must look for the best optimization algorithm for the similar to ants and bees. The most studied SI’s are Particle
specific problem at hand. Much research has been done the last Swarm Optimization (PSO) and Ant Colony Optimization.
decades on finding the strengths and weaknesses of different Some other examples are Bacterial foraging optimization,
optimization algorithms, and for which problems they excel. Artificial Immune Systems and Bee Colony Optimization.
Hybrid algorithms takes advantage of both local and global In these algorithms the solutions are imagined to be points
optimization methods. Hybrid alogitms use a global method in a n-dimensional room where each control variable repre-
to find the area of the global optimum, and then switches sents one dimension. The solutions (called particles or insects
to a gradient-based method to find the optimum with fast depending on the algorithm) can then fly or crawl through this
convergence. multidimensional room, often called the search space.

A. Metaheuristics
Metaheuristic methods can not guarantee that the solu-
tion is the best solution, but are useful as they make few
assumptions about the problem, and can handle very large
search spaces. Metaheuristics can also solve a wide range
of "hard" optimization problems without needing to adapt
to each problem [19]. A metaheuristic is useful when a
problem can not be solved with an exact method within a
reasonable time, and are used in many different areas, from
engineering to finance. Metaheuristics can solve problems with
several control variables. Metaheuristic methods are very often
inspired by nature, and often use random variables.
Metaheristic methods have gained more and more attention
the last 30 years.
"The considerable development of metaheuristics can be
explained by the significant increase in the processing power
of the computers, and by the development of massively
parallel architectures. These hardware improvements relativize
the CPU time costly nature of metaheuristics." (Boussaid,
Lepagnot, Siarry) [19].
Fig. 14: Overview of basic PSO
Two important processes decide the success of a metahuris-
tic, exploration and explotation. Exploration is a global sam-
pling of the search space, with the purpose of finding areas 1) PSO: PSO was introduced by Kennedy, Eberhart and
of interest. Explotation is the local sampling in these areas Shi in 1995 as a way to simulate the social behavior of flock
in order to close in on the optimal solution. The main of birds or a school of fish. An overview of the generic PSO
difference between metaheristics is how they balance between algorithm can be found in figure 14. PSO is being used on
exploration and explotation [19]. many different problems, from reactive power and voltage
Metaheuristic methods can be divided into single-solution control to human tremor analysis [18].
and population based methods. Single-solution based meta- The algorithm works by first initializing a random group of
heuristics starts with a single point in the search space and solutions, called particles. The values of the decision variables
moves around, they are also called "trajectory methods". The is the position of the particle. Each particle will have a velocity
most "popular" trajectory method is Simulated Annealing. and a position in the search space. For each iteration (called
Simulated Annealing was proposed by Kirkpatrick et al. in a time step) the particles will move to a new position, and
1982 and is inspired by annealing in metallurgy. The objective evaluate the fitness of the solution. How a particle moves is
function is thought to be the temperature of a metal, which is dependent on its own memory of the best found position, and
then lowered to a state of minimal energy. the memory of the neighborhood particles. The neighborhood
Population based meta-heuristics initializes a set of solu- size can vary from a few to all of the particles in the swarm,
tions, and moves these through the search-space. One main depending on the implementation.
advantage to population-based methods over trajectory meth- The best known position found by all the particles is called
ods is that they can be implemented in parallel. Evolutionary global best, and the best position found a single particle local
MASTER THESIS, JUNE 2017, NTNU 11

best. The particles will move based on its own experience, and or slow down the particle randomly. [20] found in experiments
the experience of its neighbors. that random inertia weight was the most efficient strategy (least
As a particle is drawn towards another, it might discover iterations) for a handful of benchmark problems, while linear
new better regions, and will start to attract other particles, and decreasing weight gave near optimum results compared to
so on. The particles will be drawn towards and influence each other methods, but required a very high number of iterations.
other similar to individuals in a social group. Interestingly, if Random inertial weight is given by
looked on as social individuals, one of the driving factors of
rand()
the behavior of the particles is confirmation bias. From [? ] w = 0.5 + (9)
"...individuals seek to confirm not only their own hypotheses 2
but also those of their neighbors". Where rand() is a normal distributed random number in
Two equations are updated for each particle and time step; [0, 1].
velocities and positions. There is no known best population size to initialize a particle
The position is given by: swarm [21]. It is important to have enough particles for the
algorithm to be effective. The particles need to sample the
xi+1
p = xip + vp (5)
available solutions early, because the search space will become
Where xip is the position vector for the previous time-step, more and more restricted as the particles move towards the
and Vp is a velocity vector. The velocity is given by: global best position. [22] However, having a large amount
of particles will limit the number of time steps owing to
vp = vp + c1 r1 (pl − xp ) + c2 r2 (pg − xp ) (6) limited computational resources. [? ] recommends, based on
= vcurrent + vlocal + vglobal (7) experience, using between 10-50 particles.
The new velocity is the sum of three vectors, the current To achieve quick convergence and a small error, it is
velocity, the new velocity towards the global best position and necessary to tune the parameters w, c1 and c2 . Small changes
the new velocity towards the local best position. r1 and r2 are to the parameters can give large changes in the behavior of
normally distributed random numbers r ∼ U (0, 1). The local the particles [? ]. Choosing the parameters is a optimization
and global best positions are given by pl and pg . Often a Vmax problem in itself, and the optimal values will be different for
parameter is defined, such that if vp > Vmax then vp = vmax . each problem.
This is to limit the velocity. A well known usual value is Cl = C2 = 2, c1 + c2 =< 4
c1 and c2 are called cognitive and social scaling param- and w = [0.4 − 0.9]. Pedersen [23] recommends that c2 > c1 .
eters [20]. These parameters determine the balance between The size of the neighbor is often set to be the size of
exploration and exploration. c1 and c2 determines how much the swarm, that is, all the particles communicate with each
the movement of the particle will be affected by the local other. This is called starstructure, and has been found to
and global best found positions respectively. A large cc21 ratio perform well [18]. The main disadvantage of using only one
will restrict the search space quicker and shift the balance neighborhood is that the particles can not explore more than
towards exploration, as the particles will be pulled fast towards one optima at a time, they are all drawn towards the same
the global best known position. A small ratio means that the location. How many optima the objective function has should
balance will shift towards exploration. The values chosen for be considered when deciding the size of the neighborhood.
c2 and c1 will effect the convergence significantly. For optimization of electrical motors, it seems intuitive that
A parameter that can improve convergence was introduced there are several optima, and so using several neighborhoods
by Shi and Eberhart [21] in 1998. Adding an Inertial weight might be the best solution.
to the particles gives increased control of the velocity. With
inertial weight the expression for velocity becomes: B. Optimization in COMSOL
There are several ways to optimize a COMSOL model:
vp = w · vp + c1 r1 (pl − xp ) + c2 r2 (pg − xp ) (8)
• Using understanding of the model and intuition to tweak
Where w [0, 1] is the weight. w determines how much the the parameters by hand. One can also set up parametric
particle accelerates or deaccelerates. A large inertial weight sweeps manually.
will give the particles higher speed and result in a more global • Using the build-in optimization capabilities of COMSOL.
search, while a small w makes them search locally [? ]. The • Using LiveLink for Matlab to implement your own opti-
velocity need to be large enough that particles can escape mization algorithm.
local optima. The benefit of adding inertial weight is that the Explain why you do PSO, LIVLINK?
velocity can be changed dynamically, controlling the transition
from exploration to exposition.
Many different dynamic Inertia Weight strategies have been C. LiveLink for Matlab
introduced. Linear Decreasing Weight will simply decrease LiveLink for Matlab is a very useful product from COM-
the inertial weight for each time step. As the particles have SOL, that allows you to create, edit and run COMSOL models
explored the search room, and are closing in on the global from Matlab scripts. LiveLink for Matlab is one of COMSOL’s
maximum, the speed is reduced, moving from a global search "interactive" products, a group of products that allow you to
to a local search. Random Inertial Weight will simply speed up work on COMSOL models through other programs.
MASTER THESIS, JUNE 2017, NTNU 12

is a good idea to run mphstart in a loop. The reason for this


is that mphstart will by default try to communicate with the
mphserver through port number 2036. If some other process
is using this port, the mphserver will connect to the next
available port, for example port 2038. If the mphserver is
not connected to the default port, Matlab will only be able
to connect to it if you specify the correct port number, for
example "mphstart(2038)". So if you don’t know the correct
port, one solution is to simply try different ports until you
connect, see figure 16.

1 f o r p o r t = 2036:2055
2 try
3 mphstart ( port ) ;
4 break ;
5 catch
6 s = [ ’ t r i e d p o r t number ’ ,
num2str ( p o r t ) , ’
unsuccesfully ’ ];
7 disp ( s )
8 end
Fig. 15: The model "box.mph" before and after running a 9 end
Matlab script. The figure shows the COMSOL desktop and
Matlab side-by side, connected to the same server. COMSOL
automatically updates the graphics widow when you change Fig. 16: An suggestion of how to connect Matlab to the
the model in Matlab COMSOL mphserver on a cluster

To get a overview of some of the COMSOL commands in


LiveLink gives you a lot more control over how you want Matlab, type "help mli" at the Matlab command line.
to set up a model. You can for example use Matlab to run and One of the best ways to learn using LiveLink for Matlab
edit models in a loop, calculate and set geometry parameters, is to take a COMSOL model, and save it as a Matlab file.
post-process results and so on. Normally it is a good idea to use the function "File -> compact
The COMOSL documentation "Introduction To LiveLink history" beforehand, which removes any redundant history
For MATLAB" is recommended as a starting point. from the model so that the file is easy to read.
When talking about LiveLink, the terms "COMSOL mph- The syntax is quite straightforward, the commands in figure
server" and "the COMSOL desktop" is used. The COMSOL 17 will create a model and add a 3D box in the geometry.
desktop is the "normal" desktop program. The COMSOL The first command "mphstart" will connect Matlab with the
mphserver works as the interface between COMSOL and COMSOL mphserver. The next two lines (line 2 and 3) are
Matlab. It is a non-graphical application that can be found always the same, they import the COMSOL class so the
it the COMSOL installation folder. The COMSOL mphserver COMSOL commands are available.
can take in commands from Matlab and execute them.
Connecting to the mhpserver
On Windows, connecting Matlab to the COMSOL mphserver 1 mphstart ;
is straightforward. Start the server, and use the command 2 i m p o r t com . c o m s o l . model . ∗
3 i m p o r t com . c o m s o l . model . u t i l . ∗
"mphstart" in Matlab to connect to the server. The "LiveLink 4
for Matlab user Guide" in the COMSOL documentation covers 5model = M o d e l U t i l . c r e a t e ( ’ Model ’ ) ;
this topic nicely. 6model . modelNode . c r e a t e ( ’ comp1 ’ ) ;
A nice functionality is the ability to connect the COM- 7model . geom . c r e a t e ( ’ geom1 ’ , 3 ) ;
SOL desktop to the same mphserver as Matlab. If you are 8model . geom ( ’ geom1 ’ ) . c r e a t e ( ’ b l k 1 ’ , ’
Block ’ ) ;
working on a model with LiveLink and want to see the 9 model . geom ( ’ geom1 ’ ) . f e a t u r e ( ’ b l k 1 ’ ) . s e t
model, you can display the geometry in the COMSOL desktop, ( ’ s i z e ’ , { ’ 100 ’ ’ 100 ’ ’ 100 ’ } ) ;
see figure 15. This can be done in "File-> Client Server 10 model . geom ( ’ geom1 ’ ) . r u n ;
-> Connect to Server", and then "File-> Client Server ->
Import Application From Server". The graphics window will
then automatically update when you run the Matlab com- Fig. 17: Example of LiveLink for Matlab syntax
mand "model.geom(’geom1’).run;", which is equal to pressing
"Build All" in COMSOL. Working with a Model
When running a Matlab script on a cluster, I found that it I found that the easiest way to work with a model through
MASTER THESIS, JUNE 2017, NTNU 13

Matlab was not to save a model as a .m file and edit it, but
to use the "mphopen" function, which asks the mphserver to 1 module l o a d c o m s o l / 5 . 2
open an existing model. Figure 18 gives an example of the 2 module l o a d m a t l a b
"mphopen" function. Here the model "Box.mph" is loaded by 3
the mphserver. The data in the table with tag "tbl1" is then 4 COMSOL_MATLAB_INI= ’ m a t l a b −n o s p l a s h −
n o d i s p l a y −nojvm ’
extracted and returned to Matlab with the "mphtable" function.
5
When you are working with a model, it is important to 6 c o m s o l −nn 6 −np 8 − c l u s t e r s i m p l e −
use the right tags for the object you want to change. The m p i a r g −rmk −m p i a r g p b s m p h s e r v e r
command "mphnavigate" will open a window in Matlab with − s i l e n t −t m p d i r $w &
all the objects and their tags, but I found it a lot easier to just 7 sleep 8
8 m a t l a b −n o d i s p l a y −n o s p l a s h −r "
open the model in COMSOL, and find the tags there. You can
a d d p a t h $COMSOL_INC / m l i ,
find the tags of a study, geometry object, mesh and so on, by yourScript "
clicking the object and going to the "Properties" window.
If you write a Matlab script that interacts with a COMSOL
model, make sure that tags you are using in the script still Fig. 19: An example job script for running the COMSOL
refers to the right objects if you later change the model using mphserver
the COMSOL desktop.

takes a little while to start. MATLAB is then started, and


1 mphstart ; asked to run the script "yourScript". The script should contain
2 i m p o r t com . c o m s o l . model . ∗
3 i m p o r t com . c o m s o l . model . u t i l . ∗ a "mphstart" command, which will connect the COMSOL
4 Mphserver to MATLAB.
5 model = mphopen ( ’ Box . mph ’ ) ; If the Matlab script is small, there is little need to think too
6 much about parallel programming within the script. Any sim-
7 s t r = m p h t a b l e ( model , ’ t b l 1 ’ ) ;
8 table_data = s t r . data ulations run through the COMSOL mphserver will normally
take much longer time than running the Matlab script itself.
Likely there is more to gain by working on parallelizing the
Fig. 18: Example of LiveLink for Matlab syntax COMSOL model, than from spending time on parallelizing
the Matlab script.
If you are wondering how to do something, say for example COMSOL started out as a GUI-based program, and later
that you want to change a parametric sweep list, you can start batch mode execution was added. The desktop version is
by making some change to the list in the COMSOL desktop. more stable, you might quickly run into problems with the
If you then save the file as a m. file without using "compact COMSOL Server.
history", the command you want can be found at the bottom
of the file. This is often quicker than searching through the
documentation for the correct command.
Note: it is always a good idea to run a study in COM- V . S ETTING UP PSO
SOL before you start working with it in Matlab. Abort the
simulation if you want to, the important thing is to start the
simulation, because this sets up the "Job Configurations" node. Particle swarm optimization was implemented in Matlab,
To run a parametric sweep, the command is and applied to a model created by my fellow student Charlie
"model.batch(’p1’).run;". "p1" is the tag found in "Study -> Bjork. The model is a integrated magnetic gear permanent
Solver Configurations -> Parametric Sweep". So if the "Job magnet machine, see his thesis for further details [16]. He
Configurations" node is not set up correctly, there might not was interested in optimizing the machine to achieve a feasible
exist a object in the model with the tag "p1". If "p1" doesn’t machine with the highest possible torque.
exist in the model, the run command will give you an error In order to run PSO on the model, three things were needed
saying it can’t find the tag. from Bjork:
When working with models, I found that sometimes the "Job • A parametrized model
Configurations" node would freeze, and not include any new • An objective function
added studies. If this happens, try to delete the entire study • Lower and upper bounds for the parameters
node, and create it again from scratch. The problem had 12 dimensions, and it was a mixed
problem, with two of the values being integers (number of
D. Running LiveLink for Matlab on a Cluster poles and slots). The problem was defined as:
To run the Mphserver with Matlab on Vilje/Fram, you can
use a job script that looks something like:
The & symbol at the end of line 6, tells the Mphserver to
run in the background. "Sleep 8" is added because the server max f (x) = |torque|, x ∈ Rn
MASTER THESIS, JUNE 2017, NTNU 14

Subject to the constraints A lot of time was spend trying to implement PSO on Vilje.
In order to parallelize the script as much as possible, it would
2 < Pi < 30
be best if the particles could act independently of each other.
12 < Ni < 150 That is, if the fitness values of one particles could be evaluated
Pi + 10 < Ni ≤ 8 · Pi as soon as the particle has moved to a new position. To achieve
0.01 < HSRiron_thickness < 0.10 this with LiveLink, the first attempt was to start one mphserver
per particle. However, it was discovered that each instance of
0.01 < Slotthickness < 0.10
the mhpserver uses one license. This means that the number
. of particles would be limited to the number of licenses, not a
. feasible solution.
. In the end, the only working solution discovered was to use
only one mphserver. The particles have to wait until all of
0.01 < AGi < 0.1
them are finished before they can move on to the next step.
This is not an optimal solution, especially if there exist some
Each parameter represents one dimension in the search parameter combinations that creates models that are difficult
space, a 12 dimensional space where the length of the "walls" to solve. In that case, many particles could be held waiting
are determined by the parameter bounds. The bounds on each each iteration. A time limit on each simulation would help to
parameter is called the barriers of the search space. All of the limit waiting time, but COMSOL does as per today have no
dimensions, except Ni , had static barriers. Ni was handled functionality for aborting a simulation after a set time. Instead,
by first calculating the new particle positions in dimension Pi , some care was taken to choose good barriers.
and then setting the barriers for Ni to [Pi + 10, Pi · 8] in every The fitness evaluation was done by creating a parametric
iteration. sweep in the COMSOL model. For each iteration, the particle
The barriers were handled with "bouncing walls", if a positions were calculated and saved in a matrix called "posi-
particle hits a wall, the velocity will change direction. tions". The parametric sweep list was then set to the position
The integers were handled by simply rounding off the matrix as in figure 21. For reasons not entirely understood,
values. it is necessary to change the list in both "Study1->Parametric
Sweep" and "Study 1 -> Job Configurations -> Parametric
Sweep", or COMSOL gives you an error. It might be because
A. Implementing PSO with Matlab
the "Job Configurations" node does not update automatically.

1
2 model . r e s u l t . t a b l e ( ’ t b l 1 0 ’ ) .
clearTableData ;
3 model . s t u d y ( ’ s t d 1 ’ ) . f e a t u r e ( ’ param ’ ) .
set ( ’ plistarr ’ , positions ) ;
4 model . b a t c h ( ’ p1 ’ ) . s e t ( ’ p l i s t a r r ’ ,
positions ) ;
5
6 model . s o l ( ’ s o l 1 ’ ) . r u n A l l ;
7 model . b a t c h ( ’ p1 ’ ) . r u n ;

Fig. 21: Matlab code showing how you can change a paramet-
ric sweep list, and run the sweep using the mphserver

The method in figure 21 worked nicely on my personal


computer, but I could not get it to run successfully on Vilje.
I used many weeks trying to run the script, and a lot of
emails were sent to COMSOL Support. Eventually, COMSOL
suggested that there is a bug in COMSOL 5.2, which makes
it impossible to run the 5.2 mphserver distributed on a cluster.
Their suggested solution was to use COMSOL 5.3, and 5.3
was installed on Vilje.
Tests showed that the 5.3 mphserver distribute sweeps nicely
Fig. 20: General program flow on the cluster. Unfortunately, it turned out that there was
another type of bug in the 5.3 mphserver, which caused the
The general program flow of the implementation can be program to run forever. COMSOL just recently published a
seen in figure 20. update to 5.3, which will probably fix the bug, but as there
MASTER THESIS, JUNE 2017, NTNU 15

TABLE V: Chosen parameter values for PSO


Parameter Value
Vmax (1/45)*interval length
cg 2.5
cp 0.5
S 20-30

was limited time, the update was not tested. To summarize,


my experience is that the COMSOL mphserver works well on
Windows, but is not very stable on Linux in distributed mode.
In the end, an alternative way to run the sweep was found.
Matlab has a system command, which will run a command at
the Linux command line, and wait for it to finish. By using
this command, you can start a COMSOL batch job from the
Matlab script. The COMSOL batch job is the most stable way
to run a job, and no problems were encountered with this
method.
Fig. 23: Results from six runs of PSO on Vilje, using cp =
Figure 22 shows the final code. There are three programs
0.5, cg = 2.5
involved, Matlab, the COSMSOL mphserver, and a COMSOL
batch process. The mphserver will take the parametric sweep
list from Matlab, and set up the sweep. Instead of solving
the model, the model is saved as "tempmodel.mph" in line 2.
The system command is then used to solve the tempmodel
using the "normal" batch job. When the batch job is finished,
the results will be saved to the model "outfile.mph". The
mphserver will then load the new model, and the results can
be extracted.

1 model . b a t c h ( ’ p1 ’ ) . s e t ( ’ p l i s t a r r ’ ,
positions ) ; (a) "AG_i"
2 mphsave ( model , ’ t e m p f i l e . mph ’ ) ;
3
4 s t a t u s = s y s t e m ( ’ c o m s o l −nn 200 −np 4
−v e r b o s e − c l u s t e r s i m p l e −f
c o m s o l _ n o d e f i l e −b a t c h l o g l o g . t x t
b a t c h −t m p d i r $w − i n p u t f i l e
t e m p f i l e . mph − o u t p u t f i l e o u t f i l e .
mph ’ ) ;
5
6 model = mphopen ( ’ o u t f i l e . mph ’ ) ;

Fig. 22: Matlab code showing how the sweep was run using
the system command (b) "Stat_PM_thick"

Fig. 24: Particle paths for a run with S=30, showing the paths
Hopefully, COMSOL will fix the problems with the mph- of each particle for the control variables "Stat_PM_thick"
server so that this roundabout method will not be necessary (thickness of stator permanent magnets) and "AG_i" (inner
in future projects. air gap length).
If one of the simulations in the parametric sweep fails, there
will be a missing line in the probe table. To handle this a
function was created to find missing lines in the matrix, and B. Results
insert a fitness value of 0. PSO was run several times with different number of parti-
The maximum velocity was set by testing the algorithm and cles, and the results can be found in figure 23. As the figure
looking at the paths taken by the particles. It was given a value shows, it looks like a higher number of iterations would have
that gave a nice looking path without any large jumps. The been beneficial. Looking at the dark blue line, it rises rapidly
chosen PSO parameters can be found in table V. towards the end of the optimization.
MASTER THESIS, JUNE 2017, NTNU 16

Figure 24 shows the path taken by the particles in two There was many problems with running LiveLink for Mat-
of the dimensions, for one of the runs (dark blue line). For lab distributed on a cluster, and a great deal of time was spent
figure b, the particles converge early, at around iteration 30- trying to make it work. In the end, an alternative method was
40. Studying the particle paths for all the dimensions showed found for running PSO in Vilje.
that the particles converged early in all the dimensions, except Vilje will be replaced by the supercomputer "Fram" in the
for the one shown in figure 24 a). If the particles had been near future, but the methods and tips presented in this thesis
given enough iterations to converge in all the dimensions of should be valid and relevant for Fram as well.
the search room, a better objective function might have been
achieved.
VII . ACKNOWLEDGMENTS
The algorithm could also have been tested with several
neighborhoods, and with different scaling parameters. The I would like to thank:
optimization algorithm was not optimized further however, Supervisior Robert Nielssen, for help and encouragment
because the results was found to be excellent by Bjork, and Boyfriend Andreas Roessland, for moral support
because of time restrictions. Friend Muhammed Usman Hassan, for his inputs and for
A large parametric sweep of the same mode was also creating a positive working environment
run for Bjork on Vilje, with close to 2000 simulations. The Chalie Bjørk for a pleasant collaboration
results were used to select good parameter values in his
iterative optimization approach. PSO delivered a slightly better
objective function than his approach. PSO needed to run
R EFERENCES
more simulations in total to achieve a good result (about 10
000 simulations), but it was still a relative small amount of [1] W. Gander, M. J. Gander, and F. Kwok, Scientific
computational resources. Computing An Introduction using Maple and MATLAB.
It is surprising that PSO performed so well without any Switzerland: Springer, 2014.
tweaking. The chosen scaling parameters most likely fit the [2] G. H. Golub and J. M. Ortega, Scientific computing
problem very well. and differential equations : an introduction to numerical
methods. Academic Press, 1992.
VI . C ONCLUSION [3] W. Frei, “Meshing your geometry: When
Running COMSOL jobs on Vilje requires some technical to use the various element types,”
know-how, but it should be relatively easy for most students https://www.comsol.com/blogs/meshing-your-geometry-
to learn. Very little knowledge about parallel computing is various-element-types/, accessed: 2017-07-03.
required, as COMSOL will set up communication between [4] C. M. Cyclopedia, “The finite element method (fem),”
nodes automatically. https://www.comsol.no/multiphysics/finite-element-
The COMSOL batch job and the COMSOL mphserver method, accessed: 2017-07-01.
was both run on Vilje. The batch job was found to be the [5] S. J. Salon, Finite Element Analysis of Electrical Ma-
most stable version. The COMSOL mphsever was not run chines. Kluwer Academic Publishers, 1995.
successfully in more than one node. [6] R. Shonkwiler and L. Lefton, An Introduction to Parallel
The speedup of medium-sized single stationary studies was and Vector Scientific Computing. Cambridge University
found to be very limited, and according to COMSOL [14] the Press, 2006.
maximum possible speedup for large models is in the range [7] E. Rønquist, A. M. Kvarving, and E. Fonn, Introduction
8-13. The speedup is generally greater for models with ten to to Supercomputing, TMA4280, 2016.
hundred of millions of degrees of freedom. [8] “Top500 list - june 2017,”
In some cases, speedup can be improved by partitioning the https://www.top500.org/list/2017/06/?page=2, accessed:
mesh. The most important thing is to select the right solver. 2017-07-06.
Iterative solvers were found to be much faster for a selected [9] D. Padua, Encyclopedia of Parallel Computing.
model, but the speedup was in the same range. Springer, 2011.
The best possible speedup when running COMSOL jobs [10] J. Akin, Finite Element Analysis with Error Estimates, An
on Vilje is achieved when parametric sweeps are distributed Introduction to the FEM and Adaptive Error Analysis for
on the cluster. Population based optimization algorithms will Engineering Students. Elsevier Butterworth-Heinemann,
benefit from this, as in each iteration the population individuals 2005.
can be evaluated by a parametric sweep. [11] COMSOL, “COMSOL multiphysics reference manual,”
PSO was implemented, and gave good results when applied 2016.
to a machine optimization case. It is surprising that PSO was [12] I. Yavneh, “Why multigrid methods are so efficient,”
able to deliver so good results without any tweaking of the 2006.
algorithm, as most literature suggests that this is important in [13] J.-P. Weiss, “Using the domain decompo-
order to achieve good convergence. This suggests that PSO sition solver in COMSOL multiphysics,”
might be more powerful than first assumed, or that the chosen https://www.comsol.com/blogs/using-the-domain-
parameters fit the problem very well. PSO was found to be a decomposition-solver-in-comsol-multiphysics/, accessed:
very useful tool for design optimization. 2017-07-09.
MASTER THESIS, JUNE 2017, NTNU 17

[14] “COMSOL knowledge base:


Comsol and multithreading,”
https://www.comsol.com/support/knowledgebase/1096/,
accessed: 2017-07-09.
[15] COMSOL, “Forced convection cooling of an enclosure
with fan and grille,” 2010.
[16] C. Bjørk, “Design, optimization and analysis of an in-
tegrated magnetic gear permanent magnet machine for
marine applications,” 2017.
[17] S. Genovesi, D. Bianchi, A. Monorchio, and R. Mit-
tra, “Optimization techniques for electromagnetic design
with application to loaded antenna and arrays,” 2011.
[18] J. Kennedy, R. C. Eberhart, and Y. Shi, Swarm Intelli-
gence. Academic Press, 2001.
[19] I. Boussaïd and J. L. ad Patrick Siarry, “A survey
on optimization metaheuristics,” Information Sciences,
Volume 237, Pages 82-117, feb 2013.
[20] J. C. Bansal, P. K. Singh, M. Saraswat, A. Verma, S. S.
Jadon, and A. Abraham, “Inertia weight strategies in
particle swarm optimization,” 2011.
[21] R. C. Eberhart, Y. Shi, and J. Kennedy, Swarm Intel-
ligence. The Morgan Kaufmann Series in Artificial
Intelligence, Mar. 2001.
[22] A. E. Olsson, Particle Swarm Optimization: Theory,
Techniques and Applications. Hauppauge, US: Nova
Science Pub Inc, Mar. 2011.
[23] M. Erik and H. Pedersen, “Good parameters for particle
swarm optimization,” 2010.
[24] T. R. Bjørtuf, E. Attar, M. Saxegaard, A. Sitko,
O. Granhaug, and H. Landsverk, “Dielectric and thermal
challenges for next generation ring main units (RMU),”
2013.
[25] A. Pedersen, T. Christen, A. Blaszczyk, and H. Boehme,
“Streamer inception and propagation models for design-
ing air insulated power devices,” 2009.
[26] R. W. Blower, Distribution Switchgear. 8 Grafton
Street, London, Great Britain: Collins Proffesional and
Technical Books, 1986.
[27] V. Adams and A. Askenazi, Building Better Products
with Finite Element Analysis. 2530 Camino Entrada,
Santa Fe, USA: OnWord Press, 1999.
[28] J. Yström, “Fast solvers for complex problems,” 2007.
[29] S. B. Desai, S. R. Madhvapathy, A. B. Sachid, J. P.
Llinas, Q. Wang, G. H. Ahn, G. Pitner, M. J. Kim,
J. Bokor, C. Hu, H.-S. P. Wong, and A. Javey, “Mos2
transistors with 1-nanometer gate lengths,” Science: Vol.
354, Issue 6308, pp. 99-102, oct 2016.
[30] ITRS, “International technology roadmap for semicon-
ductors 2.0,” ITRS, Tech. Rep., 2015.
[31] COMSOL, “Optimization module, user’s guide, version
COMSOL 5.2,” 2016.
[32] COMSOL Multiphysics Reference Manual, COMSOL.
MASTER THESIS, JUNE 2017, NTNU 18

Appendix: Some Practicalities for Windows users

To use Vilje to solve a COMSOL model you need to:


1) Make a user account on Vilje (see hpc.ntnu.no)

2) Get a project account approved (see hpc.ntnu.no)

3) Download a SCP client for Windows. For example winSCP (https://winscp.net/eng/download.php.)


It looks like this:

Fig. 25: winSCP

winSCP allows you to move files from your computer to Vilje by simply dragging your files from one folder to the other.
The left window is your own computer, navigate to the right folder and drag your COMSOL file across.

4) Download a SSH client. For example puTTY (http://www.chiark.greenend.org.uk/ sgtatham/putty/download.html)

(a) puTTY start screen (b) How puTTY looks when you are logged in to Vilje

puTTy allows you to log in to Vilje from another computer. To log on to Vilje write "vilje.hpc.ntnu.no" under "Host
name", and press open. In the future, this will change as Vilje is replaced by "Fram". You will be prompted to give your
username and password.
Vilje runs Unix, and has a command-line interface. To navigate and submit your job, you need to learn a few Linux
commands. As a minimum, learn how to navigate.
• cd Change directory, feks "cd myfolder"
• cd.. Move to the parent directory.
• ls List the contents of the directory
• cat myfile Print the contents of "myfile" to screen
• vi myfile Edit the file "myfile" with Vi
To summarize: With winSCP you move your files to Vilje, with puTTY you can tell Vilje what to do with the files.
MASTER THESIS, JUNE 2017, NTNU 19

It is also very useful to learn how to use a text editing program like Vi/Vim, so you can easily change a job script. Vi
has its own set of commands, but you only need a few in order to edit a text file.
• i Insert mode will move you from the Vi command line and "into" the text, allowing you to edit the text.
• Ecs Use escape to leave "insert mode" and move back to the Vi command line.
• :q Quit the program without writing to file.
• :w Write to file
• :wq Write to file, and quit Vi.
• gg and shift+g In "command line mode", these commands will move you to the top or bottom of the text. Useful if
you are reading a large log file.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy