Easybuild: Building Software With Ease: November 2012
Easybuild: Building Software With Ease: November 2012
Easybuild: Building Software With Ease: November 2012
net/publication/261019865
CITATIONS READS
34 643
4 authors, including:
All content following this page was uploaded by Andy Georges on 02 April 2015.
Abstract—Maintaining a collection of software installations for • Incompleteness. For example, only compilation in the
a diverse user base can be a tedious, repetitive, error-prone and source directory is supported and it is a hassle to actually
time-consuming task. Because most end-user software packages install the executables, libraries and include files, etc.
for an HPC environment are not readily available in existing OS
package managers, they require significant extra effort from the • Non-standard procedure. Installation procedures are often
user support team. Reducing this effort would free up a large far more involved than a sequence of configure, build and
amount of time for tackling more urgent tasks. install steps. For example, the installation procedure can
In this work, we present EasyBuild, a software installation be interactive, i.e., requiring human intervention during
framework written in Python that aims to support the various the configuration and installation.
installation procedures used by the vast collection of software
packages that are typically installed in an HPC environment – • Custom-built scripts. On various occasions, custom shell
catering to widely different user profiles. It is built on top of scripts need to be used, as opposed to a set of standard
existing tools, and provides support for well-established installa- tools such as configure, make, cmake, etc.
tion procedures. Supporting customised installation procedures • Hard-coded parameters. System-specific parameters such
requires little effort, and sharing implementations of installation as the compiler commands or the list of libraries, e.g.,
procedures becomes very easy. Installing software packages that
are supported can be done by issuing a single command, even if BLAS/LAPACK, MPI, etc., are hard-coded in the con-
dependencies are not available yet. figuration files or in the installation scripts.
Hence, it simplifies the task of HPC site support teams,
and even allows end-users to keep their software installations Commercial software packages, which are commonly used
consistent and up to date. in various scientific research domains, rarely follow any stan-
dard. They all provide their own procedures, and thus also
I. I NTRODUCTION suffer from similar problems.
HPC software environments can be quite diverse. Some en- The problem is aggravated further in an HPC software
vironments have a rather limited number of installed software installation environment where users typically have several
packages they offer to end-users, while others offer support requirements that are not (well) handled by traditional package
to users with very diverse needs in terms of the software managers or build tools. Scientists need to have particular
packages they use and which they require to be installed on the builds – versions built with a specific compiler toolchain – of
system. Unfortunately, not all (scientific) software packages software packages available for an extensive period of time,
build and install according to the same procedure or using preferably indefinitely. 2 Additionally, researchers often desire
the same tools. This makes the task of installing the diversity to use to the latest and greatest version of the software when
of software often required by end-users time-consuming and they start (new) experiments. Often, they like to experiment
error-prone [2], [13]. with various builds of a particular software package using
Of course, the problem of implementing software installa- different compilers or libraries, for evaluating both perfor-
tion procedures is not new. Most UNIX-like systems such as mance and correctness. Although package managers are good
Linux and BSD based distributions employ a package manager at keeping software up to date and taking care of dependencies,
– closely coupled to the distribution – for deploying new to the best of our knowledge, most do not support these other
software packages and for maintaining the installation. Most requirements well.
package managers have some custom format to define how Due to these shortcomings, scientific software packages
a software package should be built and which packages it are getting less support from distributions and their pack-
depends on. For example, RedHat-based systems use rpm age manager maintainers – detailed customisation is time-
packages and yum to install them, where the build specifica- consuming and difficult; package managers rarely provide
tion is detailed in the RPM spec files. SUSE, Debian-based sufficient flexibility for dealing with this.
systems, BSD derivatives, etc. have similar tools. While all of
these make the maintenance of software environments simpler In short, to maintain the software in an HPC environment,
by supporting easy upgrade paths, automatic dependency reso- a tool is required that offers:
lution, etc., there are several shortcomings for maintaining one • Flexibility. There are many different installation proce-
or more installations of scientific software. This is mostly due dures and they should be supported with minimal effort.
to the differences between scientific software and software that This results in a tool that is able to build and install
is provided as a part of an operating system defined and con- software in a flexible, reproducible and robust way.
trolled by software maintainers. In the scientific community, • Co-existence of versions. In principle, installed software
typically fewer resources are spent on the maintenance of build should never have to be removed. Hence, different ver-
procedures; almost all of the effort is put into the development sions of builds must be able to be installed independently
(and testing) of the code. We found the following issues with of each other.
installation procedures for the scientific software packages we
support at the UGent HPC site1 . 2 The reason is quite straightforward: they should be able to reproduce –
this is good scientific practice – or extend their previously obtained results
1 http://www.ugent.be/hpc whenever the need arises.
• Dependency handling. Many critical software packages in November 2012, and is considered to be the first stable and
an HPC environment have dependencies on each other, robust public release of this framework.
e.g., numerical libraries such as BLAS, LAPACK, etc., With this paper, we want to show the potential EasyBuild
but also cooperative stacks of software applications – has for both user support teams and for end-users, given the
see for example the WRF dependency graph discussed targeted ease-of-use and lack for a need of admin rights.
in Section IV. Handling these dependencies is an aspect Our aim is twofold: (i) to encourage the use of EasyBuild,
that is traditionally handled well by most package man- and (ii) to receive feedback and ideas to further improve
agers. Such automatic dependency resolution significantly this framework, increasing its usefulness for tackling the
simplifies the maintenance of a collection of software issues HPC sites have when maintaining software installations.
installations, and is thus indispensable in any relevant EasyBuild is structured in a very modular way, simplifying
framework. collaboration and allowing the HPC community to provide
• Sharing implementations of installation procedures. Al- support for new software packages. Contributing is as easy as
though usually the installation procedure for software forking the repository on GitHub and filing a pull request so
packages is (well) documented, a lot of work is dupli- we can incorporate the contributions in the framework.
cated among user support teams: (i) digging through the The remainder of this paper presents an overview of Easy-
documentation, (ii) following the installation procedures, Build in Section II, highlights its main features in Section III,
and likely (iii) scripting the installation in some way. To and discusses a software installation use case in Section IV.
reduce this inefficient approach, it should be easy to share Finally, we compare our framework to related tools in Sec-
implementations of software installation procedures with tion V, and conclude in Section VI.
others. First and foremost, this requires a modular plug-
and-play enabled infrastructure, to which support for any II. E ASYBUILD OVERVIEW
particular software package can be added with minimal In this section, we give a detailed overview of EasyBuild
effort. This should be independent of the package source, by presenting its basic usage and configuration, listing the
whether it is self-written or obtained from others. Note required dependencies, and discussing the design.
that package managers also allow sharing of installation
A. Basic usage and configuration
procedures, but they are more rigid than what an HPC
site typically requires. The basic usage of EasyBuild is simple: run the eb com-
mand with the appropriate arguments. Usually, the path to
A tool that meets these criteria has several advantages: (i) it
the easyconfig files, see Section II-D, should be specified
reduces the effort on behalf of the user support team when the
explicitly. Several command-line options for eb are available,
result from earlier installations can be reproduced in a simple
e.g., --debug to enable debug mode, --robot to enable
way, and (ii) it enables forming a community to tackle the
the the automatic dependency resolver and supply a path to
software maintenance problem in a collective manner.
it, etc. A full list of options can be obtained by running eb
Because we were unable to find another tool that matches --help.
the requirements listed above, we started the development Before EasyBuild is used the first time, a simple config-
of EasyBuild, a modular build and install framework for uration file containing plain Python code should be created.
software, written in Python [11]. This framework replaced There, fixed-name variables are defined that specify (i) the
our earlier effort of deploying through custom RPM spec paths where the temporary log files are stored, (ii) the paths
files using a traditional package manager, which was ill of the build, source and installation directories, (iii) the format
suited for addressing the issues we faced and mostly resulted of the log file names, and (iv) the path to the easyconfig
in a number of large, hardly maintainable shell scripts for files repository – see Section III-B. The location of this
expressing customisation. configuration file can be provided to the eb command with
While EasyBuild originally supported just a few software the --config option or by setting the EASYBUILDCONFIG
packages that featured custom build procedures, it quickly environment variable.
grew to become a very important part of the user support EasyBuild comes with a default configuration
tools used by the HPC UGent team. Today, EasyBuild has file easybuild_config.py that uses the
support for over 250 scientific software packages – it is being $HOME/.local/easybuild directory as a prefix
developed and improved continuously. It allows us to limit the for the build, source and installation directories.
amount of time and effort required to install and update end-
user software packages, by investing time once to implement B. Dependencies
the installation procedure in an easyblock, see Section II-E. EasyBuild only has two direct dependencies: Python and
Subsequent builds of new versions of a software package environment modules. We use Python version 2.x, where the
or builds that are using different parameters can usually be version should be at least 2.4. The reason is quite straightfor-
obtained with very little or no effort, thereby saving lots of ward: on the UGent HPC clusters we run Scientific Linux 5.x
time and manpower. and 6.x, where the system Python versions are 2.4 and 2.6,
In April 2012, after more than three years of in-house respectively.
development, we have released EasyBuild on GitHub3 as open The environment modules software package is a well-known
source software under the GPLv2 license. It is also available tool in the HPC community. Through simple text files, referred
from the Python Package Index (PyPi)4 . We are currently to as environment modules, an easy-to-use interface can be
in the process of migrating all supported software packages offered to users to prepare their session environment for using
from our legacy version to the publicly available version. This a particular software package. Environment modules describe
paper presents EasyBuild version 1.0, which was released in the changes to the session environment that are necessary for
a piece of software to work correctly. They can append to the
3 http://hpcugent.github.com/easybuild/ PATH and LD_LIBRARY_PATH environment variables, such
4 http://pypi.python.org/pypi/easybuild/ that binaries and shared libraries are readily available.
The EasyBuild design is purposely very modular: easy-
tools
tools toolchains
framework blocks that provide support for new software packages can
asyncprocess compiler mpi linalg fft be easily plugged in without modifications to the existing
build_log gcc code. If easyblocks are available in the Python search path,
config inteliccifort EasyBuild will find them and use them when appropriate.
goalf ictce ...
environment … Likewise, new compiler toolchain definitions and support for
filetools additional compilers or libraries can be plugged in to be a part
module_generator framework
modules
framework easyblocks of toolchains (see Section II-G).
ordereddict easyblock generic
parallelbuild D. Easyconfig files
↳ EasyBlock configuremake
pbs_job Essentially, easyconfig files are text files with the .eb file
↳ ... ↳ ConfigureMake
repository
systemtools cmakemake name extension that contain a description in Python code
extension format. They specify the software package that should be
variables ↳ CMakeMake
version ↳ Extension ... installed along with a number of parameters to steer the build
toolchain easyconfig armadillo and installation procedure, including dependencies. For the
↳ EasyConfig ↳ EB_Armadillo sake of space, we only discuss the most important parameters
↳ ... cp2k here; a complete list of available parameters5 can be obtained
↳ EB_CP2K by running ‘eb --avail-easyconfig-params’.
eb wrf
main.py test ↳ EB_WRF
There are several mandatory parameters: (i) name denotes
scripts the software package name, (ii) version denotes the specific
easybuild_config.py ...
version of the package, (iii) toolchain denotes the compiler
toolchain, specified as a dictionary with the name and the
version, (iv) homepage denotes the URL of the software
Fig. 1: Overview of the EasyBuild design. package’s website, and (v) description. The last two
parameters are only used to include some documentation in
the environment module that is generated upon successful
completion of the installation.
EasyBuild heavily relies on environment modules in a
number of ways, making the environment modules software The following are noteworthy optional parameters.
package an important prerequisite. EasyBuild automatically easyblock specifies the custom easyblock that pro-
generates environment modules for every software package it vides the build and installation procedure, see Section II-E.
installs, thereby relieving the user having to manually create sources specifies the source6 files that should be present
appropriate module files. Furthermore, it relies on the set at any of the predetermined locations – if not, EasyBuild
of available environment modules for obtaining information will try and download them. Software packages specified
about installed software packages and their versions, and for in the dependencies parameter have a twofold effect.
resolving dependencies. First, before building the packages, EasyBuild makes sure that
the environment modules for all dependencies are available
By generating an environment module for every completed
and loaded. If this fails, EasyBuild can recursively build the
installation, EasyBuild allows for keeping different versions
required dependencies, see Section III-C. Note that automatic
and/or builds of software packages side-by-side, without af-
dependency resolution is not enabled by default. Second, they
fecting each other. Next to providing access to the installed
are added in the generated environment module file, such that
package for end-users, these modules also help EasyBuild to
all dependencies are correctly loaded in the environment. This
locate the software during subsequent installations of other
makes sure the environment for the user or for subsequent
dependent software packages, i.e., for resolving dependencies
EasyBuild runs is set up correctly and the installed software
– as will be explained in more detail later on.
can be found. sanity_check_paths specifies a dictionary
of files and directories that should be present after the software
C. High-level design package files have been installed.
We first give a high-level overview of the EasyBuild design Next to these parameters, it is also possible to specify
before diving into the details in the next sections, see Figure 1. configuration options, compiler flags, optimisation levels, etc.
EasyBuild consists of (i) the framework Python package The framework package contains the easyconfig module
containing several modules that form the core of the tool, with the EasyConfig class that processes the easyconfig
(ii) the easyblocks package providing the easyblocks that files at run time. For each easyconfig file supplied to Easy-
implement specific installation procedures that can be used to Build, an EasyConfig instance will be created. This allows
install one or more software packages (see Section II-E), (iii) our framework to obtain the information required to set up
the tools package providing a set of tools that offer supporting and run the installation procedure according to the given
functionality, (iv) the toolchains package providing support for specifications.
compiler toolchains, (v) the eb command, the main script and An example easyconfig file for the WRF software package
a default configuration file, and (vi) a unit testing framework in is discussed in Section IV.
the test package and some useful stand-alone scripts. Next to E. Easyblocks
this, a collection of easyconfig files for specifying installation
parameters is available, see Section II-D.
From an easyconfig file and a matching easyblock, Easy- 5 Documentation for all easyconfig parameters is provided at https://github.
Build is able to determine what is required by the various steps com/hpcugent/easybuild/wiki/Easyconfig-files.
that form the installation procedure and how they should be 6 Note that source does not mean source code, it can also refer to a binary
performed, see Section II-H. installer or binary package
As mentioned above, an easyblock defines the build and
installation procedure that can be used by one or more software EasyBlock
packages. The framework Python package provides support
for implementing easyblocks through the EasyBlock class
EB_CP2K EB_WRF
which resides in the easyblock module. This class imple- ConfigureMake
ments generic support for software installation procedures. It
serves as the base class that should be sub-classed to obtain
an easyblock that describes the installation procedure for a CMakeMake
particular (group of) software package(s).
The extension module provides support for installation
procedures of software packages extensions, e.g., Python pack- EB_Armadillo
ages, R libraries, Perl modules, etc., via the Extension
class. Such extensions can be installed in two ways: (i)
using Extension, as a part of the installed base software
package they extend, and (ii) completely separately from the Fig. 2: Schematic of the hierarchical organisation of selected
base software package. In the former case, the extensions are easyblocks, all deriving from the EasyBlock class.
listed in the exts_list parameter of the software package’s
easyconfig file. In the latter case, they are specified in their
own easyconfig and treated as a stand-alone software package generic easyblock classes. In Figure 2, this is illustrated by
with the base software as a dependency, and with their own EB_Armadillo, which slightly modifies the CMakeMake
dedicated environment module files. installation procedure with configuration parameters and de-
The easyblocks themselves are placed into a separate Python pendency checks that are specific to Armadillo.
package, aptly named easyblocks. Each easyblock is imple- By organising the implementation of software install pro-
mented as a Python module. For example, the cp2k module cedures in isolated Python modules as is done with the
provides the EB_CP2K Python class, which implements the easyblocks, sharing them becomes particularly easy – one
installation procedure for the molecular simulation software only needs to provide the Python module. Making easyblocks
CP2K. Likewise, the EB_Armadillo and EB_WRF Python available to EasyBuild amounts to extending the easyblocks
classes shown in Figure 1 provide support for the correspond- package, allowing the Python modules to be found in the
ing software packages. Python search path. We feel this is an important feature of
All the classes in the easyblock modules for specific soft- the EasyBuild framework.
ware packages are named according to a fixed class name The EasyBuild framework provides a very flexible interface
encoding scheme. This allows us to adequately cope with for implementing software install procedures. Not only does it
names of software packages that do not directly map to valid provide a lot of useful functionality required when installing
Python class names, for example python-meep or 7zip. software, it also allows to easily plug in easyblocks thereby
To avoid potential name clashes with existing functionality, adding support for new software packages and to build on
we prefix all class names of easyblocks bound to a particular existing easyblocks by extending or customising them.
software package with EB_. A concrete example of an easyblock implementation for the
Next to software package-specific easyblocks, EasyBuild WRF software package is shown in Section IV.
offers a number of generic easyblocks in the generic sub-
package of the easyblocks package. We briefly discuss a couple F. Scripts, tests and tools
of these shown in Figure 1. The configuremake module The main EasyBuild script is main.py. A handy wrapper
defines the ConfigureMake class, that implements the script named eb that searches for main.py in the Python
commonly used GNU configure, make, make install runtime search path is also provided, and is generally used as
installation procedure. This class allows specifying custom a command line tool for EasyBuild.
options to the configure and make commands. Software Additionally, there are several stand-alone Python scripts
packages that use this well-known installation procedure likely available that are useful during EasyBuild development, next
do not require a dedicated easyblock to be implemented, to Python package test for running unit tests, but these fall
because the configuremake easyblock already provides outside the scope of this paper.
support for them. Another example of a generic easyblock The tools package provides the backbone modules of the
is the CMakeMake class from the cmakemake module, EasyBuild framework. We will briefly highlight some func-
which supports software packages that use cmake instead of tionality provided there. More details are provided later in
configure for their build configuration. dedicated sections.
Figure 2 shows the hierarchy for the easyblock classes dis- Two important elements in the tools package are the
cussed above. The generic EasyBlock class sits at the root of filetools module and the toolchain package. The
the class hierarchy. Classes that implement custom support for filetools module provides various wrapper functions for
a particular software package directly sub-class this hierarchy shell commands. A couple of noteworthy functions are (i)
root, for example EB_CP2K and EB_WRF. Others implement extract_file for extracting source files with a command
an installation scheme that can be used by multiple software determined by the file extension, (ii) apply_patch for ap-
packages, such as ConfigureMake. Further customisation plying patch files and automatically determining patch levels,
of already supported installation procedures is implemented by and (iii) run_cmd and run_cmd_qa for running (interac-
sub-classing existing easyblocks. An example is CMakeMake tive) shell commands. The toolchain package provides the
where only the build configuration step differs from its parent necessary support for compiler toolchains, see Section II-G.
class ConfigureMake. All classes that offer support for Python modules providing an interface for communicating
a particular software package sub-class one or more of the with the environment modules tool and with the PBS cluster
tools I: read easyconfig XV: test cases
toolchain
II: fetch sources XIV: env. module
Compiler Mpi LinAlg Fft
III: check readiness XIII: cleanup
toolchains
Fig. 4: Installation procedure broken down into steps, as
performed by EasyBuild.
Fig. 3: Modular design of support for compiler toolchains.