Easybuild: Building Software With Ease: November 2012

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/261019865

EasyBuild: Building Software With Ease

Conference Paper · November 2012


DOI: 10.1109/SC.Companion.2012.81

CITATIONS READS
34 643

4 authors, including:

Kenneth Hoste Andy Georges


Ghent University Ghent University
18 PUBLICATIONS   629 CITATIONS    29 PUBLICATIONS   1,437 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Andy Georges on 02 April 2015.

The user has requested enhancement of the downloaded file.


EasyBuild: Building Software With Ease.
Kenneth Hoste, Jens Timmerman, Andy Georges, Stijn De Weirdt
HPC team – Unit ICT infrastructure (DICT) – Ghent University
Krijgslaan 281, building S9, 9000 Gent, BELGIUM
E-mail: {kenneth.hoste, jens.timmerman, andy.georges, stijn.deweirdt}@ugent.be

Abstract—Maintaining a collection of software installations for • Incompleteness. For example, only compilation in the
a diverse user base can be a tedious, repetitive, error-prone and source directory is supported and it is a hassle to actually
time-consuming task. Because most end-user software packages install the executables, libraries and include files, etc.
for an HPC environment are not readily available in existing OS
package managers, they require significant extra effort from the • Non-standard procedure. Installation procedures are often
user support team. Reducing this effort would free up a large far more involved than a sequence of configure, build and
amount of time for tackling more urgent tasks. install steps. For example, the installation procedure can
In this work, we present EasyBuild, a software installation be interactive, i.e., requiring human intervention during
framework written in Python that aims to support the various the configuration and installation.
installation procedures used by the vast collection of software
packages that are typically installed in an HPC environment – • Custom-built scripts. On various occasions, custom shell
catering to widely different user profiles. It is built on top of scripts need to be used, as opposed to a set of standard
existing tools, and provides support for well-established installa- tools such as configure, make, cmake, etc.
tion procedures. Supporting customised installation procedures • Hard-coded parameters. System-specific parameters such
requires little effort, and sharing implementations of installation as the compiler commands or the list of libraries, e.g.,
procedures becomes very easy. Installing software packages that
are supported can be done by issuing a single command, even if BLAS/LAPACK, MPI, etc., are hard-coded in the con-
dependencies are not available yet. figuration files or in the installation scripts.
Hence, it simplifies the task of HPC site support teams,
and even allows end-users to keep their software installations Commercial software packages, which are commonly used
consistent and up to date. in various scientific research domains, rarely follow any stan-
dard. They all provide their own procedures, and thus also
I. I NTRODUCTION suffer from similar problems.
HPC software environments can be quite diverse. Some en- The problem is aggravated further in an HPC software
vironments have a rather limited number of installed software installation environment where users typically have several
packages they offer to end-users, while others offer support requirements that are not (well) handled by traditional package
to users with very diverse needs in terms of the software managers or build tools. Scientists need to have particular
packages they use and which they require to be installed on the builds – versions built with a specific compiler toolchain – of
system. Unfortunately, not all (scientific) software packages software packages available for an extensive period of time,
build and install according to the same procedure or using preferably indefinitely. 2 Additionally, researchers often desire
the same tools. This makes the task of installing the diversity to use to the latest and greatest version of the software when
of software often required by end-users time-consuming and they start (new) experiments. Often, they like to experiment
error-prone [2], [13]. with various builds of a particular software package using
Of course, the problem of implementing software installa- different compilers or libraries, for evaluating both perfor-
tion procedures is not new. Most UNIX-like systems such as mance and correctness. Although package managers are good
Linux and BSD based distributions employ a package manager at keeping software up to date and taking care of dependencies,
– closely coupled to the distribution – for deploying new to the best of our knowledge, most do not support these other
software packages and for maintaining the installation. Most requirements well.
package managers have some custom format to define how Due to these shortcomings, scientific software packages
a software package should be built and which packages it are getting less support from distributions and their pack-
depends on. For example, RedHat-based systems use rpm age manager maintainers – detailed customisation is time-
packages and yum to install them, where the build specifica- consuming and difficult; package managers rarely provide
tion is detailed in the RPM spec files. SUSE, Debian-based sufficient flexibility for dealing with this.
systems, BSD derivatives, etc. have similar tools. While all of
these make the maintenance of software environments simpler In short, to maintain the software in an HPC environment,
by supporting easy upgrade paths, automatic dependency reso- a tool is required that offers:
lution, etc., there are several shortcomings for maintaining one • Flexibility. There are many different installation proce-
or more installations of scientific software. This is mostly due dures and they should be supported with minimal effort.
to the differences between scientific software and software that This results in a tool that is able to build and install
is provided as a part of an operating system defined and con- software in a flexible, reproducible and robust way.
trolled by software maintainers. In the scientific community, • Co-existence of versions. In principle, installed software
typically fewer resources are spent on the maintenance of build should never have to be removed. Hence, different ver-
procedures; almost all of the effort is put into the development sions of builds must be able to be installed independently
(and testing) of the code. We found the following issues with of each other.
installation procedures for the scientific software packages we
support at the UGent HPC site1 . 2 The reason is quite straightforward: they should be able to reproduce –
this is good scientific practice – or extend their previously obtained results
1 http://www.ugent.be/hpc whenever the need arises.
• Dependency handling. Many critical software packages in November 2012, and is considered to be the first stable and
an HPC environment have dependencies on each other, robust public release of this framework.
e.g., numerical libraries such as BLAS, LAPACK, etc., With this paper, we want to show the potential EasyBuild
but also cooperative stacks of software applications – has for both user support teams and for end-users, given the
see for example the WRF dependency graph discussed targeted ease-of-use and lack for a need of admin rights.
in Section IV. Handling these dependencies is an aspect Our aim is twofold: (i) to encourage the use of EasyBuild,
that is traditionally handled well by most package man- and (ii) to receive feedback and ideas to further improve
agers. Such automatic dependency resolution significantly this framework, increasing its usefulness for tackling the
simplifies the maintenance of a collection of software issues HPC sites have when maintaining software installations.
installations, and is thus indispensable in any relevant EasyBuild is structured in a very modular way, simplifying
framework. collaboration and allowing the HPC community to provide
• Sharing implementations of installation procedures. Al- support for new software packages. Contributing is as easy as
though usually the installation procedure for software forking the repository on GitHub and filing a pull request so
packages is (well) documented, a lot of work is dupli- we can incorporate the contributions in the framework.
cated among user support teams: (i) digging through the The remainder of this paper presents an overview of Easy-
documentation, (ii) following the installation procedures, Build in Section II, highlights its main features in Section III,
and likely (iii) scripting the installation in some way. To and discusses a software installation use case in Section IV.
reduce this inefficient approach, it should be easy to share Finally, we compare our framework to related tools in Sec-
implementations of software installation procedures with tion V, and conclude in Section VI.
others. First and foremost, this requires a modular plug-
and-play enabled infrastructure, to which support for any II. E ASYBUILD OVERVIEW
particular software package can be added with minimal In this section, we give a detailed overview of EasyBuild
effort. This should be independent of the package source, by presenting its basic usage and configuration, listing the
whether it is self-written or obtained from others. Note required dependencies, and discussing the design.
that package managers also allow sharing of installation
A. Basic usage and configuration
procedures, but they are more rigid than what an HPC
site typically requires. The basic usage of EasyBuild is simple: run the eb com-
mand with the appropriate arguments. Usually, the path to
A tool that meets these criteria has several advantages: (i) it
the easyconfig files, see Section II-D, should be specified
reduces the effort on behalf of the user support team when the
explicitly. Several command-line options for eb are available,
result from earlier installations can be reproduced in a simple
e.g., --debug to enable debug mode, --robot to enable
way, and (ii) it enables forming a community to tackle the
the the automatic dependency resolver and supply a path to
software maintenance problem in a collective manner.
it, etc. A full list of options can be obtained by running eb
Because we were unable to find another tool that matches --help.
the requirements listed above, we started the development Before EasyBuild is used the first time, a simple config-
of EasyBuild, a modular build and install framework for uration file containing plain Python code should be created.
software, written in Python [11]. This framework replaced There, fixed-name variables are defined that specify (i) the
our earlier effort of deploying through custom RPM spec paths where the temporary log files are stored, (ii) the paths
files using a traditional package manager, which was ill of the build, source and installation directories, (iii) the format
suited for addressing the issues we faced and mostly resulted of the log file names, and (iv) the path to the easyconfig
in a number of large, hardly maintainable shell scripts for files repository – see Section III-B. The location of this
expressing customisation. configuration file can be provided to the eb command with
While EasyBuild originally supported just a few software the --config option or by setting the EASYBUILDCONFIG
packages that featured custom build procedures, it quickly environment variable.
grew to become a very important part of the user support EasyBuild comes with a default configuration
tools used by the HPC UGent team. Today, EasyBuild has file easybuild_config.py that uses the
support for over 250 scientific software packages – it is being $HOME/.local/easybuild directory as a prefix
developed and improved continuously. It allows us to limit the for the build, source and installation directories.
amount of time and effort required to install and update end-
user software packages, by investing time once to implement B. Dependencies
the installation procedure in an easyblock, see Section II-E. EasyBuild only has two direct dependencies: Python and
Subsequent builds of new versions of a software package environment modules. We use Python version 2.x, where the
or builds that are using different parameters can usually be version should be at least 2.4. The reason is quite straightfor-
obtained with very little or no effort, thereby saving lots of ward: on the UGent HPC clusters we run Scientific Linux 5.x
time and manpower. and 6.x, where the system Python versions are 2.4 and 2.6,
In April 2012, after more than three years of in-house respectively.
development, we have released EasyBuild on GitHub3 as open The environment modules software package is a well-known
source software under the GPLv2 license. It is also available tool in the HPC community. Through simple text files, referred
from the Python Package Index (PyPi)4 . We are currently to as environment modules, an easy-to-use interface can be
in the process of migrating all supported software packages offered to users to prepare their session environment for using
from our legacy version to the publicly available version. This a particular software package. Environment modules describe
paper presents EasyBuild version 1.0, which was released in the changes to the session environment that are necessary for
a piece of software to work correctly. They can append to the
3 http://hpcugent.github.com/easybuild/ PATH and LD_LIBRARY_PATH environment variables, such
4 http://pypi.python.org/pypi/easybuild/ that binaries and shared libraries are readily available.
The EasyBuild design is purposely very modular: easy-
tools
tools toolchains
framework blocks that provide support for new software packages can
asyncprocess compiler mpi linalg fft be easily plugged in without modifications to the existing
build_log gcc code. If easyblocks are available in the Python search path,
config inteliccifort EasyBuild will find them and use them when appropriate.
goalf ictce ...
environment … Likewise, new compiler toolchain definitions and support for
filetools additional compilers or libraries can be plugged in to be a part
module_generator framework
modules
framework easyblocks of toolchains (see Section II-G).
ordereddict easyblock generic
parallelbuild D. Easyconfig files
↳ EasyBlock configuremake
pbs_job Essentially, easyconfig files are text files with the .eb file
↳ ... ↳ ConfigureMake
repository
systemtools cmakemake name extension that contain a description in Python code
extension format. They specify the software package that should be
variables ↳ CMakeMake
version ↳ Extension ... installed along with a number of parameters to steer the build
toolchain easyconfig armadillo and installation procedure, including dependencies. For the
↳ EasyConfig ↳ EB_Armadillo sake of space, we only discuss the most important parameters
↳ ... cp2k here; a complete list of available parameters5 can be obtained
↳ EB_CP2K by running ‘eb --avail-easyconfig-params’.
eb wrf
main.py test ↳ EB_WRF
There are several mandatory parameters: (i) name denotes
scripts the software package name, (ii) version denotes the specific
easybuild_config.py ...
version of the package, (iii) toolchain denotes the compiler
toolchain, specified as a dictionary with the name and the
version, (iv) homepage denotes the URL of the software
Fig. 1: Overview of the EasyBuild design. package’s website, and (v) description. The last two
parameters are only used to include some documentation in
the environment module that is generated upon successful
completion of the installation.
EasyBuild heavily relies on environment modules in a
number of ways, making the environment modules software The following are noteworthy optional parameters.
package an important prerequisite. EasyBuild automatically easyblock specifies the custom easyblock that pro-
generates environment modules for every software package it vides the build and installation procedure, see Section II-E.
installs, thereby relieving the user having to manually create sources specifies the source6 files that should be present
appropriate module files. Furthermore, it relies on the set at any of the predetermined locations – if not, EasyBuild
of available environment modules for obtaining information will try and download them. Software packages specified
about installed software packages and their versions, and for in the dependencies parameter have a twofold effect.
resolving dependencies. First, before building the packages, EasyBuild makes sure that
the environment modules for all dependencies are available
By generating an environment module for every completed
and loaded. If this fails, EasyBuild can recursively build the
installation, EasyBuild allows for keeping different versions
required dependencies, see Section III-C. Note that automatic
and/or builds of software packages side-by-side, without af-
dependency resolution is not enabled by default. Second, they
fecting each other. Next to providing access to the installed
are added in the generated environment module file, such that
package for end-users, these modules also help EasyBuild to
all dependencies are correctly loaded in the environment. This
locate the software during subsequent installations of other
makes sure the environment for the user or for subsequent
dependent software packages, i.e., for resolving dependencies
EasyBuild runs is set up correctly and the installed software
– as will be explained in more detail later on.
can be found. sanity_check_paths specifies a dictionary
of files and directories that should be present after the software
C. High-level design package files have been installed.
We first give a high-level overview of the EasyBuild design Next to these parameters, it is also possible to specify
before diving into the details in the next sections, see Figure 1. configuration options, compiler flags, optimisation levels, etc.
EasyBuild consists of (i) the framework Python package The framework package contains the easyconfig module
containing several modules that form the core of the tool, with the EasyConfig class that processes the easyconfig
(ii) the easyblocks package providing the easyblocks that files at run time. For each easyconfig file supplied to Easy-
implement specific installation procedures that can be used to Build, an EasyConfig instance will be created. This allows
install one or more software packages (see Section II-E), (iii) our framework to obtain the information required to set up
the tools package providing a set of tools that offer supporting and run the installation procedure according to the given
functionality, (iv) the toolchains package providing support for specifications.
compiler toolchains, (v) the eb command, the main script and An example easyconfig file for the WRF software package
a default configuration file, and (vi) a unit testing framework in is discussed in Section IV.
the test package and some useful stand-alone scripts. Next to E. Easyblocks
this, a collection of easyconfig files for specifying installation
parameters is available, see Section II-D.
From an easyconfig file and a matching easyblock, Easy- 5 Documentation for all easyconfig parameters is provided at https://github.
Build is able to determine what is required by the various steps com/hpcugent/easybuild/wiki/Easyconfig-files.
that form the installation procedure and how they should be 6 Note that source does not mean source code, it can also refer to a binary
performed, see Section II-H. installer or binary package
As mentioned above, an easyblock defines the build and
installation procedure that can be used by one or more software EasyBlock
packages. The framework Python package provides support
for implementing easyblocks through the EasyBlock class
EB_CP2K EB_WRF
which resides in the easyblock module. This class imple- ConfigureMake
ments generic support for software installation procedures. It
serves as the base class that should be sub-classed to obtain
an easyblock that describes the installation procedure for a CMakeMake
particular (group of) software package(s).
The extension module provides support for installation
procedures of software packages extensions, e.g., Python pack- EB_Armadillo
ages, R libraries, Perl modules, etc., via the Extension
class. Such extensions can be installed in two ways: (i)
using Extension, as a part of the installed base software
package they extend, and (ii) completely separately from the Fig. 2: Schematic of the hierarchical organisation of selected
base software package. In the former case, the extensions are easyblocks, all deriving from the EasyBlock class.
listed in the exts_list parameter of the software package’s
easyconfig file. In the latter case, they are specified in their
own easyconfig and treated as a stand-alone software package generic easyblock classes. In Figure 2, this is illustrated by
with the base software as a dependency, and with their own EB_Armadillo, which slightly modifies the CMakeMake
dedicated environment module files. installation procedure with configuration parameters and de-
The easyblocks themselves are placed into a separate Python pendency checks that are specific to Armadillo.
package, aptly named easyblocks. Each easyblock is imple- By organising the implementation of software install pro-
mented as a Python module. For example, the cp2k module cedures in isolated Python modules as is done with the
provides the EB_CP2K Python class, which implements the easyblocks, sharing them becomes particularly easy – one
installation procedure for the molecular simulation software only needs to provide the Python module. Making easyblocks
CP2K. Likewise, the EB_Armadillo and EB_WRF Python available to EasyBuild amounts to extending the easyblocks
classes shown in Figure 1 provide support for the correspond- package, allowing the Python modules to be found in the
ing software packages. Python search path. We feel this is an important feature of
All the classes in the easyblock modules for specific soft- the EasyBuild framework.
ware packages are named according to a fixed class name The EasyBuild framework provides a very flexible interface
encoding scheme. This allows us to adequately cope with for implementing software install procedures. Not only does it
names of software packages that do not directly map to valid provide a lot of useful functionality required when installing
Python class names, for example python-meep or 7zip. software, it also allows to easily plug in easyblocks thereby
To avoid potential name clashes with existing functionality, adding support for new software packages and to build on
we prefix all class names of easyblocks bound to a particular existing easyblocks by extending or customising them.
software package with EB_. A concrete example of an easyblock implementation for the
Next to software package-specific easyblocks, EasyBuild WRF software package is shown in Section IV.
offers a number of generic easyblocks in the generic sub-
package of the easyblocks package. We briefly discuss a couple F. Scripts, tests and tools
of these shown in Figure 1. The configuremake module The main EasyBuild script is main.py. A handy wrapper
defines the ConfigureMake class, that implements the script named eb that searches for main.py in the Python
commonly used GNU configure, make, make install runtime search path is also provided, and is generally used as
installation procedure. This class allows specifying custom a command line tool for EasyBuild.
options to the configure and make commands. Software Additionally, there are several stand-alone Python scripts
packages that use this well-known installation procedure likely available that are useful during EasyBuild development, next
do not require a dedicated easyblock to be implemented, to Python package test for running unit tests, but these fall
because the configuremake easyblock already provides outside the scope of this paper.
support for them. Another example of a generic easyblock The tools package provides the backbone modules of the
is the CMakeMake class from the cmakemake module, EasyBuild framework. We will briefly highlight some func-
which supports software packages that use cmake instead of tionality provided there. More details are provided later in
configure for their build configuration. dedicated sections.
Figure 2 shows the hierarchy for the easyblock classes dis- Two important elements in the tools package are the
cussed above. The generic EasyBlock class sits at the root of filetools module and the toolchain package. The
the class hierarchy. Classes that implement custom support for filetools module provides various wrapper functions for
a particular software package directly sub-class this hierarchy shell commands. A couple of noteworthy functions are (i)
root, for example EB_CP2K and EB_WRF. Others implement extract_file for extracting source files with a command
an installation scheme that can be used by multiple software determined by the file extension, (ii) apply_patch for ap-
packages, such as ConfigureMake. Further customisation plying patch files and automatically determining patch levels,
of already supported installation procedures is implemented by and (iii) run_cmd and run_cmd_qa for running (interac-
sub-classing existing easyblocks. An example is CMakeMake tive) shell commands. The toolchain package provides the
where only the build configuration step differs from its parent necessary support for compiler toolchains, see Section II-G.
class ConfigureMake. All classes that offer support for Python modules providing an interface for communicating
a particular software package sub-class one or more of the with the environment modules tool and with the PBS cluster
tools I: read easyconfig XV: test cases
toolchain
II: fetch sources XIV: env. module
Compiler Mpi LinAlg Fft
III: check readiness XIII: cleanup

IV: unpack sources XII: sanity check

V: apply patches XI: extensions


Gcc OpenMPI Atlas Fftw
VI: prepare X: install

VII: configure build IX: test


IntelIccIfort IntelMPI IntelMKL IntelFFTW
VIII: build
compiler mpi linalg fft

toolchains
Fig. 4: Installation procedure broken down into steps, as
performed by EasyBuild.
Fig. 3: Modular design of support for compiler toolchains.

Next to goalf, other compiler toolchains can be defined.


resource manager are also available in the tools package. They may comprise Intel-provided compilers and libraries
(e.g., the ictce toolchain) or they can be based on another
G. Compiler toolchains compiler suite with the AMD Core Math Library (ACML),
etc.
All software packages that are supported by EasyBuild and
A toolchain should be installed like any other software
for which source code is available are compiled from source
package. This requires an easyconfig file that lists the name
before they are installed. This is preferred over using readily
and version along with the dependencies. The install procedure
available binary packages in the case of scientific software.
is provided by the Toolchain easyblock.
This way, EasyBuild maintains complete control over the
compiler and the libraries that provide low-level functionality, In a few special cases – when building a compiler for
e.g., MPI, linear algebra support like BLAS, LAPACK, etc. use in a custom toolchain or when installing binary software
As a consequence, system-specific optimisations can be used packages – it is necessary to use a dummy toolchain. This
and the overall performance of the installed binaries can be amounts to setting both name and version to dummy in the
increased. In EasyBuild, a (set of) compiler(s) and accompa- easyconfig toolchain parameter.
nying libraries are grouped together in a compiler toolchain. In practice, the first thing to do in an HPC environment to
For every software installation procedure, a compiler toolchain build and install a custom toolchain that is then subsequently
should be provided in the obligatory easyconfig toolchain used to deploy the rest of the end-user software packages
parameter. and their dependencies. This allows the low-level libraries
The support for custom compiler toolchains is organised to remain in place – recall that typically, software is never
in a very modular manner, see Figure 3. Essentially, any removed; using OS supplied compilers and libraries may break
compiler toolchain may be defined if EasyBuild supports all of the installation should there be an upgrade of the HPC systems.
its constituent parts. The framework already provides generic
classes for compilers and different library types (MPI, linear H. Step-wise installation procedure
algebra and FFT) in the tools.toolchain package. It suffices In this section, we briefly discuss the step-wise installation
to define the specifics of the toolchain elements in custom procedure that EasyBuild follows, see Figure 4. Most of
classes derived from these generic classes, thus providing the steps can be tailored to a particular (group of) software
the information required by EasyBuild for using the com- package(s) by providing a custom easyblock that sub-classes
piler and for accessing its associated libraries in a particular either the generic EasyBlock class or one of the existing
compiler toolchain. Definitions of compiler toolchains and easyblocks, see Section IV-A for an example.
their constituent parts are provided in the toolchains package Figure 4 should be mostly self-explanatory: the step-wise
(see Figure 1). By making additional modules available in installation procedure consists of reading the easyconfig file,
the toolchains package, custom toolchains can be provided obtaining the source files, checking whether everything is
without any changes required to the EasyBuild codebase, again ready to start the build (modules for dependencies are loaded,
highlighting the modular design of the EasyBuild framework. build directory is present, etc.), unpacking the sources, apply-
An example of a custom compiler toolchain is goalf which ing patches, preparing for build by setting up the toolchain,
entirely comprises open-source tools: the GCC compiler suite, configuring and running the build process, running the pro-
and the OpenMPI, ATLAS, BLACS, ScaLAPACK and FFTW vided software test suite (if any), installing the software,
libraries. To implement this, Gcc sub-classes the generic adding extensions if specified, performing a sanity check,
Compiler class. It supplies EasyBuild with e.g., the com- cleaning up any temporary files or directories, and finishing
mands to run the GNU C, C++ and Fortran compilers, (gcc, with the generation of an environment module. If any user-
g++ and gfortran, respectively). Similarly, it states how defined test cases are specified in the easyconfig file, they
to enable OpenMP support (-fopenmp), how to control are run after completing the install procedure. This way, they
floating-point precision, etc. validate and/or benchmark the installed software as it will be
used by the user. If any of these steps fail, EasyBuild reports Finally, any user-supplied tests cases are considered – these
relevant information about the failure in the installation log are specified in the tests easyconfig parameter. While a test
file, and throws an EasyBuildError which results in a suite provided by the software package is primarily intended
termination of the installation procedure. to test the correctness of (certain aspects of) the software, the
We limit this discussion to highlighting a couple of interest- user test cases have a different purpose. They aim to test the
ing aspects of the installation procedure. For a detailed descrip- installed software package as it would be used by an end-user
tion of each step, we refer to the EasyBuild documentation [7]. or by EasyBuild in subsequent installation runs of dependent
If no easyblock is specified through the easyblock pa- packages. However, these tests can also be used as benchmarks
rameter in the easyconfig and if EasyBuild is unable to locate a to evaluate the performance, the correctness and the accuracy
suitable easyblock based on the software package name, it will of the installed software.
fall back to the ConfigureMake easyblock by default. This
easyblock implements the GNU configure, make, make III. F EATURES
install installation procedure.
If for some reason, EasyBuild is unable to locate a source We now discuss the key features of EasyBuild.
file that is listed in the easyconfig, it tries to download it from
the URLs provided in the source_urls parameter. Should
this fail as well, the installation procedure is terminated and A. Keeping track of build logs
an appropriate message is written to the installation log.
EasyBuild thoroughly keeps track of the executed installa-
During the unpack step, EasyBuild determines the command tion procedure via the custom logger class EasyBuildLog.
for unpacking the sources from the source file name extension. Easyblocks can (and should) produce informative or debug
Similarly, when patches are applied in the next step, the patch log messages, that describe the actions being performed. The
level is determined automatically, unless it was hard-coded in log messages produced by the EasyBuild framework are inter-
the easyconfig. twined in the a single log file, and supply sufficient information
If the software package does not provide a way to move should issues arise during any step of the installation procedure
built files to their target installation location, EasyBuild allows and when the software packages are being used.
building inside the installation location. This can easily be
The installation log is stored in a sub-directory named
specified in the constructor of the easyblock class, by setting
easybuild of the install directory for future reference, along
the instance variable build_in_install_dir to True
with a copy of the used easyconfig file.
(see Section IV-A).
After the software has been installed, a sanity check is per-
formed. This check comprises making sure that a predefined B. Archiving easyconfig files
list of files and (non-empty) directories is available in the
installation directory. This check aims to catch cases where Easyconfig files that led to a successful installation pro-
the installer fails to return a non-zero exit code when it is cedure are archived in an easyconfig repository. This can be
terminated prematurely, which causes EasyBuild to mistakenly either a regular directory, or an Subversion or Git repository,
assume all is well. For example, we saw installation procedures as specified in the EasyBuild configuration file.
running to completion without actually producing any of the The original easyconfig file is augmented with extra infor-
expected binaries or libraries in the target installation directory. mation before it is stored in the archive, e.g., a comment
By default, the sanity check makes sure that the bin and mentioning the EasyBuild version that was used, the git
lib sub-directories of the target installation location are non- commit ID if it can be determined, a list of build statistics
empty. Custom sanity checking can be triggered by specifying specified in the build_stats easyconfig parameter, etc.
a list of files and directories in the sanity_check_paths
easyconfig parameter or by adding a default sanity check in C. Automatic dependency resolution
the easyblock. If any of these files or directories cannot be
found, EasyBuild assumes that the installation procedure has One of the key features is automatic dependency resolution,
at least partially failed. which is indispensable for a software installation framework as
After a successful installation procedure, an environment mentioned earlier. This feature is referred to as the EasyBuild
module corresponding to the installed software package is cre- robot.
ated. This allows to easily set up the shell session environment Recall that the easyconfig dependencies are checked
for running the installed software at a later point. Usually, by verifying they are available through environment modules
this involves adjusting some standard system environment – generated by EasyBuild. If the robot feature is turned on,
variables such as PATH for binaries, LD_LIBRARY_PATH EasyBuild automatically tries to locate easyconfig files for
for libraries required at runtime, etc. missing dependencies, including the compiler toolchain. These
EasyBuild also defines a couple of custom environment vari- easyconfigs are then installed in a hierarchical order, allowing
ables in each environment module it generates. For example, to bootstrap a complete installation. Paths where the robot
the EBROOTWRF and EBVERSIONWRF environment variables should look for the easyconfig files can be provided on the
are set in the WRF environment module. These variables are command line as arguments to the --robot option. For
useful for both EasyBuild when resolving dependencies and any supported software package, a dependency graph can be
for end-users who load the module. This allows them to deter- generated using the --dep-graph option.
mine the installation path of the software, e.g., for accessing Automatic dependency resolution is a very useful and
examples, header files or libraries that were installed. The gen- powerful tool when installing large sets of software packages
erated environment module also includes the documentation at once, e.g. when setting up a new system, or when installing
provided by the homepage and description easyconfig a software package with a large and/or complex dependency
parameters. graph.
D. Support for interactive installers WRF
Some scientific packages use an interactive installation
procedure. Moreover, there often is no alternative that allows
providing a completed configuration file to allow the installa- netCDF-Fortran
JasPer
tion to proceed autonomously. This is clearly an undesirable
feature for a framework that focuses on automating software netCDF
installation and maintenance. Doxygen
EasyBuild addresses this issue through the run_cmd_qa
function which supports a Q&A mechanism for handling
interactive installers. It suffices to provide a Python dictionary HDF5 Bison
flex
that maps regular expression patterns for questions to the
correct answer strings or actions – for example, hitting return
to continue – for the interactive installer to run to completion M4
zlib Szip
autonomously. A concrete example is given in Section IV-A
for the WRF software package.
E. Installing multiple software packages in parallel
When multiple software packages are being installed at the ictce
same time, either because they belong to a set of (independent)
software packages or because they are independent parts of a
dependency graph, it is often desirable to be able to run the Fig. 5: Dependency graph for the WRF software package. The
installation procedures in parallel, provided that the required arrows indicate a depends-on relationship.
resources are available and that the dependencies between the
software packages do not prevent it.
Through the parallel_build module in the tools pack-
age, EasyBuild provides support for running installation proce- are provided by EasyBuild. In this use case, we assume that
dures in parallel. This is done by submitting jobs to a resource EasyBuild already provides these easyconfig files and has the
manager like Torque/PBS, while setting dependencies between support required for installing these dependencies. Although
jobs if necessary, to make sure that installations are performed additional command line parameters can be specified, for
in the correct order. To enable this feature, the command line example, --debug to obtain detailed debug information as
option --job should be used. the installation procedure progresses, or --job to perform
the installation through PBS jobs, they are not required to
F. Regression and unit testing successfully build WRF – or any other software package for
Unit tests for EasyBuild are available in the test package. that matter.
They can be used to check whether the functionality provided After successfully completing the installation procedure,
by the framework and covered by these tests was not broken EasyBuild creates an environment module named (by default)
during development. Running the suite of EasyBuild unit tests WRF/3.4-ictce-4.0.6. This name is derived from the
is done using python -m easybuild.test.suite. name and toolchain parameters in the easyconfig file.
The unit tests are run by a Jenkins continuous integra- We now discuss how EasyBuild can be augmented to
tion server on a regular basis, i.e., on every commit to the support installing this software package.
develop branch of the EasyBuild git repository as well The WRF source provides a configuration script called
as before every release. The Jenkins instance is located at configure. This script, however, is an interactive script that
https://jenkins1.ugent.be/view/EasyBuild/. generates a configuration file configure.wrf rather than a
Additionally, the entire set of example easyconfigs files that classical GNU configure script. On top of this, said generated
are provided as a part of EasyBuild is installed in a dedicated configuration file should be edited to gain full control of
and pristine installation path by means of the regression test. the build parameters, e.g., compiler commands, optimisation
This way, the installation procedures for all the supported options, etc. Compilation is done using the compile script,
software packages are tested, as well as their interoperability. which wraps around the make command to perform some
This is important for both compiler toolchains and for software extra custom tasks.
packages that have a non-empty set of dependencies. The Note that the WRF installation procedure does not feature
results of these regression tests will also be available through an actual installation step, the sources must be unpacked in
the EasyBuild Jenkins project. the installation directory. Thus, the compile script must be
run there as well7 .
IV. U SE CASE : INSTALLING WRF Besides this, WRF has a rather complex dependency graph,
To illustrate the use of EasyBuild, we discuss the see Figure 5. Assuming that EasyBuild provides the support
installation of the WRF weather modeling software package. for all dependencies in this graph, the automatic dependency
WRF is a particularly interesting use case, because its resolution allows to install WRF using a single simple com-
installation procedure is by no means standard. However, mand, as shown above.
actually installing WRF using its easyconfig file is trivial. It In the following two sections we discuss how to add support
suffices to run the following command: for the WRF install procedure, and how to subsequently use
it to obtain a custom installation of WRF.
eb WRF-3.4.eb --robot 7 The WRF easyblock could copy the binaries, libraries and other fields
required at runtime to the installation directory after running the compile
Using the --robot option without an argument requires script, but this is not implemented at this point. We do welcome contributions
that easyconfig files for any WRF dependencies (see Figure 5) to EasyBuild.
A. Implementing the WRF easyblock
To add support for installing WRF to EasyBuild, a Python Listing 2: Example easyconfig file for WRF (WRF-3.4.eb)
module named wrf.py should be created in the easyblocks name = ’WRF’
version = ’3.4’
package. This module provides a class named EB_WRF that
inherits from the generic base easyblock (EasyBlock). homepage = ’http://www.wrf-model.org’
description = ’Weather Research and Forecasting’
The EB_WRF class implements the WRF installation pro-
cedure, as shown in Listing 1. We only show the relevant tcver = ’3.2.2.u3’
lines in the listed code, otherwise the easyblock does not toolchain = {’name’: ’ictce’,’version’: tcver}
toolchainopts = {’opt’: False, ’optarch’: False}
fit on a single page8 . The complete code is available from
the EasyBuild GitHub repository9 . Note that this example is sources = [’%sV%s.TAR.gz’ % (name, version)]
rather involved due to the complexity of the WRF installation patches = [
procedure. Typically, a new easyblock should be less complex ’WRF_parallel_build_fix.patch’,
and contain far less Python code. Nonetheless, this example ’WRF-3.4_known_problems.patch’,
’WRF_tests_limit-runtimes.patch’,
shows that even non-standard complex install procedures can ’WRF_netCDF-Fortran_separate_path.patch’]
easily be captured in a custom easyblock.
At the top of the Python module wrf.py, we import the dependencies = [(’JasPer’, ’1.900.1’),
(’netCDF’, ’4.2’),
required Python modules and EasyBuild functionality. Next to (’netCDF-Fortran’, ’4.2’)]
useful functions such as run_cmd_qa, this also comprises
the generic EasyBlock which EB_WRF sub-classes. buildtype = ’dmpar’
The __init__ constructor is customised to specify that
WRF should be built in the installation directory. This only
needs to be stated here; EasyBuild will use this information
during the installation process. Finally, the generated configuration file –
Through the static extra_options method, the configure.wrf – is patched using the fileinput
buildtype custom easyconfig parameter for WRF is and re Python modules, to ensure that the correct compiler
added. It is marked as mandatory, and will be enforced as settings are being used (see lines 54–64).
such. The appropriate toolchain module has set the environment
The configuration step is implemented in the variables such as CC, F90, MPICC, etc. These can be used
configure_step method, and consists of three parts. in an easyblock without having to know the exact compiler
First, several required preparations are made (lines 23–40): commands. Similarly, the default optimisation settings in the
• The environment variables for the dependencies are set WRF configure file can also be overridden to use the settings
with the env.set function. For netCDF and netCDF- provided by EasyBuild, but due to space limitations this has
Fortran, this is done with the set_netcdf_env_vars been omitted in Listing 1.
function provided by the netCDF easyblock. The next step is to perform the compilation procedure
• The WRFIO_NCD_LARGE_FILE_SUPPORT environ- using the compile wrapper script. This is implemented in
ment variable is set to indicate that WRF should be built the build_step method. The -j N option is passed to
with support for large files. compile, enabling parallel compilation. The actual value is
• The Config_new.pl Perl script – which is the ac- obtained through the parallel easyconfig parameter. The
tual interactive configure script – is patched using compile script is run three times: once to build WRF and
the patch_perl_script_autoflush function pro- twice to build two test cases. This ensures that both the
vided by EasyBuild to enable auto-flushing. This is ideal.exe and real.exe tools are also compiled.
required to run it correctly in an autonomous way. As mentioned above, the WRF installation procedure does
• The buildtype easyconfig parameter is checked, to not feature an actual installation step (the build is performed
make sure a valid value was supplied. in the installation directory), so the install_step method
Now that the environment is properly set up, the configure is defined as empty using the pass statement.
script is executed (see lines 43–51), using the run_cmd_qa
function provided by EasyBuild. A number of question pat- B. Installing WRF using an easyconfig file
terns and matching answers are supplied to allow the inter- The easyconfig file shown in Listing 2 specifies the ver-
active configure script to autonomously run to completion. sion of the WRF software package that should be built
Should the script crash for some reason or if an answer cannot along with the build parameters. Some optional easyconfig
be supplied to a question being asked, an EasyBuildError parameters are used here to steer the installation procedure.
will be thrown, halting the installation procedure. The toolchainopts parameter indicates that aggressive
Note that for one of the questions posed by the configure compiler optimisations should be avoided. The patches
script, the answer is being determined dynamically (see line parameter defines the list of patches that need to be applied in
49). The desired answer for selecting the Intel compiler toolkit order to correctly build WRF with the Intel compiler toolchain
is determined by EasyBuild from the matching option line. ictce. The custom easyconfig parameter buildtype is
also defined, as required by the WRF easyblock. This easycon-
8 Omissions include docstrings and comments, the required custom imple-
fig file can then be used to install the WRF software package
mentations of the sanity_check_step, make_module_req_guess
and make_module_extra functions, the definition of the test function as was shown above.
that runs the test suite included with WRF, support for building WRF with
others compilers, e.g., GCC, custom easyconfig parameters that are irrelevant This example illustrated how EasyBuild can be used in
to this discussion, etc.
9 The easyblock detailing the WRF install procedure can be found practice. Usually, after a one-time effort implementing the
at https://github.com/hpcugent/easybuild-easyblocks/blob/develop/easybuild/ installation procedure for a particular software package as an
easyblocks/w/wrf.py easyblock, custom installations of that software package can
Listing 1: Shortened version of the easyblock implementation for WRF (wrf.py).
1 import fileinput, os, re, sys
2
3 import easybuild.tools.environment as env
4 from easybuild.easyblocks.netcdf import set_netcdf_env_vars
5 from easybuild.framework.easyblock import EasyBlock
6 from easybuild.framework.easyconfig import MANDATORY
7 from easybuild.tools.filetools import patch_perl_script_autoflush, run_cmd, run_cmd_qa
8 from easybuild.tools.modules import get_software_root
9
10 class EB_WRF(EasyBlock):
11
12 def __init__(self, *args, **kwargs):
13 super(EB_WRF, self).__init__(*args, **kwargs)
14 self.build_in_installdir = True
15
16 @staticmethod
17 def extra_options():
18 extra_vars = [(’buildtype’, [None, "Type of build (e.g., dmpar, dm+sm).", MANDATORY])]
19 return EasyBlock.extra_options(extra_vars)
20
21 def configure_step(self):
22 # prepare to configure
23 set_netcdf_env_vars(self.log)
24
25 jasper = get_software_root(’JasPer’)
26 jasperlibdir = os.path.join(jasper, "lib")
27 if jasper:
28 env.setvar(’JASPERINC’, os.path.join(jasper, "include"))
29 env.setvar(’JASPERLIB’, jasperlibdir)
30
31 env.setvar(’WRFIO_NCD_LARGE_FILE_SUPPORT’, ’1’)
32
33 patch_perl_script_autoflush(os.path.join("arch", "Config_new.pl"))
34
35 known_build_types = [’serial’, ’smpar’, ’dmpar’, ’dm+sm’]
36 self.parallel_build_types = ["dmpar", "smpar", "dm+sm"]
37 bt = self.cfg[’buildtype’]
38
39 if not bt in known_build_types:
40 self.log.error("Unknown build type: ’%s’ (supported: %s)" % (bt, known_build_types))
41
42 # run configure script
43 bt_option = "Linux x86_64 i486 i586 i686, ifort compiler with icc"
44 bt_question = "\s*(?P<nr>[0-9]+).\s*%s\s*\(%s\)" % (bt_option, bt)
45
46 cmd = "./configure"
47 qa = {"(1=basic, 2=preset moves, 3=vortex following) [default 1]:": "1",
48 "(0=no nesting, 1=basic, 2=preset moves, 3=vortex following) [default 0]:": "0"}
49 std_qa = {r"%s.*\n(.*\n)*Enter selection\s*\[[0-9]+-[0-9]+\]\s*:" % bt_question: "%(nr)s"}
50
51 run_cmd_qa(cmd, qa, no_qa=[], std_qa=std_qa, log_all=True, simple=True)
52
53 # patch configure.wrf
54 cfgfile = ’configure.wrf’
55
56 comps = {
57 ’SCC’: os.getenv(’CC’), ’SFC’: os.getenv(’F90’),
58 ’CCOMP’: os.getenv(’CC’), ’DM_FC’: os.getenv(’MPIF90’),
59 ’DM_CC’: "%s -DMPI2_SUPPORT" % os.getenv(’MPICC’),
60 }
61 for line in fileinput.input(cfgfile, inplace=1, backup=’.orig.comps’):
62 for (k, v) in comps.items():
63 line = re.sub(r"ˆ(%s\s*=\s*).*$" % k, r"\1 %s" % v, line)
64 sys.stdout.write(line)
65
66 def build_step(self):
67 # build WRF using the compile script
68 par = self.cfg[’parallel’]
69 cmd = "./compile -j %d wrf" % par
70 run_cmd(cmd, log_all=True, simple=True, log_output=True)
71
72 # build two test cases to produce ideal.exe and real.exe
73 for test in ["em_real", "em_b_wave"]:
74 cmd = "./compile -j %d %s" % (par, test)
75 run_cmd(cmd, log_all=True, simple=True, log_output=True)
76
77 def install_step(self):
78 pass
then be performed relatively easily. Additionally, easyblocks discussed the design of the framework, and showed how it can
that are sufficiently general can be re-used for supporting other be used for installing (scientific) software packages so end-
software packages with similar install procedures. Sharing users can easily access them for their experiments. Through its
implementations of install procedures is quite simple as well: modular design, EasyBuild allows adding custom installation
usually it suffices to provide the module that implements the procedures for new software packages or support for new
custom easyblock. compiler toolchains with minimal effort. Software installa-
tions are performed in isolated directories. This effectively
V. R ELATED W ORK supports installing various versions of a software package side-
There are various other frameworks or tools that have by-side. It also minimises the potential effect of operating
similar goals as EasyBuild. However, none of them have full system updates on the installed end-user software. Sharing
support for the required features discussed in Section I. implementations of software installation procedures, the so-
The Ports frameworks [5] provide so-called Port files, which called easyblocks, is particularly easy.
are basically Makefiles together with patches to easily EasyBuild is open source software; we aim to make it a
build software packages. This framework is used by FreeBSD, community effort, thereby reducing the time required by a
NetBSD, OpenBSD, as well as re-implemented by Gentoo user support team for installing the diverse set of software
(Portage), Arch (Arch Build System), CRUX and for OS X HPC sites may require. We hope this framework is useful for
by MacPorts [9]. These Ports frameworks all share the goal the community at large, including end-users. Merging support
of easily sharing known installation procedures. However, they for over 250 software packages from the in-house to the public
are all very operating system dependent, have no support for version is ongoing.
installing different software builds alongside each other, and
most of them only have support for one specific compiler, i.e., ACKNOWLEDGMENTS
the installed system compiler. The authors would like to thank Dries Verdegem, Pieter De
Compile [10] is the compilation system used in GoboLinux. Baets, Toon Willems, Wouter Depypere, Luis Fernando Muñoz
It has support for over 10,000 so-called recipes. Compile Mejı́as, Fotis Georgatos and Cédric Laczny for their valuable
only supports GCC, and has a built-in system that resembles discussions and contributions.
environment modules. The recipes are bash scripts, and thus Kenneth Hoste, Jens Timmerman, Andy Georges and Stijn
a lot harder to maintain than a Python framework. De Weirdt are supported by the Unit ICT Infrastructure
The Red Hat software collections [12] are lists of software of Ghent University and the Flemish Supercomputer Centre
that can be installed alongside each other by extending the (VSC).
spec files with some extra macros. Scriptlets are used to This work was carried out using the STEVIN Supercom-
achieve similar features to environment modules. Because puter Infrastructure at Ghent University, funded by Ghent
spec files are being used, this system is again very operating University, the Flemish Supercomputer Centre (VSC), the Her-
system dependent, as less flexible compared to EasyBuild. cules Foundation and the Flemish Government – department
Rocks [1] provides bash scripts and Makefiles named EWI.
Rolls to build software on HPC clusters, e.g. the UCSD
Triton cluster [15]. The same concerns apply as with the other R EFERENCES
projects, mainly w.r.t. maintenance and flexibility. [1] Rock Clusters. Rocks - Open Soure Toolkit for Real and Virtual Clusters.
http://www.rocksclusters.org/roll-documentation/base/5.5/, 2012.
Homebrew [6] is a package manager for OS X, that installs [2] P. F. Dubois, T. Epperly, G. Kumfert. Why Johnny Can’t Build
packages in dedicated directories. Although it does not readily Computing in Science and Engineering, Vol. 5(5), pp. 83–88, 2003.
provide environment modules, this should be easy to support. [3] P.J. Eby Setuptools and easy install. http://pypi.python.org/pypi/
setuptools.
Homebrew is very much tied to OS X, and thus is a lot less [4] J.L. Furlani, P.W. Osel. Abstract yourself with modules. In LISA, 1996,
flexible than EasyBuild. pp. 193–204.
Slawinska et al. present a so-called system-call virtual ma- [5] The FreeBSD Project, FreeBSD Ports. http://www.freebsd.org/ports,
chine (SCVM) based approach to make software installations 2000.
[6] M. Howell. Homebrew, the missing package manager for OS X. http:
more portable [14]. This approach involves intercepting system //mxcl.github.com/homebrew/, 2012.
calls through strace and adding directives to build scripts [7] HPC UGent. EasyBuild documentation. http://github.com/hpcugent/
in a custom language. For now this framework is limited to easybuild/wiki, 2012.
[8] S. Knight SCons User Guide. http://www.scons.org/doc/production/
a proof-of-concept, and seems to be significantly harder to PDF/scons-user.pdf, 2010.
maintain. [9] MacPorts. The MacPorts Project. http://www.freshports.org, 2002.
Finally, there are some frequently used installation frame- [10] H. Muhammad. The ideas behind Compile. http://www.gobolinux.org/
index.php?page=doc/articles/compile, 2003.
works for Python packages, such as Buildout [16] and Se- [11] Python Software Foundation. Python Programming Language. http:
tuptools [3]. However, these tools focus on installing Python //www.python.org, 1990.
packages and applications, whereas we require the ability to [12] Red Hat. Red Hat Developer Toolset 1.0 – Software Collections Guide.
https://access.redhat.com/knowledge/docs/en-US/Red Hat Developer
install any application, irrespective of the language it is written Toolset/1/html/Software Collections Guide/index.html, 2012.
in. SCons [8] on the other hand does allow installing gen- [13] M. Slawinska. Enhancing Portability of HPC Applications across High-
eral software packages. Unlike EasyBuild, SCons essentially end Computing Platforms. In IPDPS, 2007, pp. 1–8.
[14] M. Slawinska, J. Slawinski, V. Sunderam. A Practical, SCVM-based
requires (re-)writing the actual makefiles in Python, rather Approach to Enhance Portability and Adaptability of HPC Application
than allow using the existing makefiles. As such, EasyBuild Build Systems In IMECS, 2012.
is more general, and requires less work as it is able to re-use [15] UCSD. Triton Resource - Build Your Own Cluster http://tritonresource.
sdsc.edu/build own.php, 2012.
the existing installation scripts offered by the target software [16] Zope Foundation Buildout. http://www.buildout.org, 2009.
packages.
VI. C ONCLUSION
This paper presented the EasyBuild Python framework for
installing software packages. We outlined its major features,

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy