Pelican HPC
Pelican HPC
Pelican HPC
PelicanHPC
CLUTTER TO CLUSTER
Crunch big numbers with your very own high-performance computing cluster. BY MAYANK SHARMA
f your users are clamoring for the power of a data center but your penurious employer tells you to make do with the hardware you already own, dont give up hope. With some some time, a little effort, and a few open source tools, you can transform your mild-mannered desktop systems into a number-crunching super computer. For the impatient, the PelicanHPC Live CD will cobble off-the-shelf hardware into a high-performance cluster in no time. The PelicanHPC project is the natural evolution of ParallelKnoppix, which was a remastered Knoppix with packages for clustering. Michael Creel developed PelicanHPC for his own research work. Creel was interested in learning about clustering, and because adding packages was so easy, he added PVM, cluster tools like ganglia monitor, applications like GROMACS, and so forth. He also included some simple examples of parallel com-
puting in Fortran, C, Python, and Octave to provide some basic working examples for beginners. However, the process of maintaining the distribution was pretty time consuming, especially when it came to updating packages such as X and KDE. Thats when Creel discovered Debian Live, spent time wrapping his head around the live-helper package, and created a more systematic way to make a Live distro for clustering. So in essence, PelicanHPC is a single script that fetches required packages off a Debian repository, adds some configuration scripts and example software, and outputs a bootable ISO.
Boot PelicanHPC
Later in the article, Ill use the script to create a custom version. For now, Ill use the stock PelicanHPC release (v1.8) from the website [1] to put those multiple
cores to work. Both 32-bit and 64-bit versions are available, so grab the one that matches your hardware. The developer claims that with PelicanHPC you can get a cluster up and running in five minutes. However, this is a complete exaggeration you can do it in under three. First, make sure you get all the ingredients right: You need a computer to act as a front-end node, and others thatll act as slave computing nodes. The frontend and the slave nodes connect via the network, so they need to be part of a local LAN. Although you can connect them via wireless, depending on the amount of data being exchanged, you could run into network bottlenecks. Also, make sure the router between the front end and the slaves isnt running a DHCP server because the front end doles out IP addresses to the slaves. Although you dont really need a monitor or keyboard or mouse on the slave nodes, you need these on the front end. If you have a dual core with enough memory, it wouldnt be a bad idea to run
30
ISSUE 103
JUNE 2009
PelicanHPC
CoveR sToRy
Figure 1: If your slave node isnt headless, this is what itll say.
the front end on a virtual machine and the slave on physical machines. Primarily, PelicanHPC runs on memory, so make sure you have plenty. If youre doing serious work on the cluster, you can make it save your work on the hard disk, in which case, make sure you have a hard disk attached. In fact, to test PelicanHPC, you can run it completely on virtual hardware with virtual network connections, provided you have the juice on the physical host to power so much virtual hardware. With the hardware in place, pop in the Live CD in the front-end node and let it boot. If you want to choose a custom language or turn off ACPI or tweak some other boot parameters, you can explore the boot options from the F1 key; press Enter to boot with the default options. During bootup, PelicanHPC prompts you thrice. First it wants you to select a permanent storage device thatll house the /home directory. The default option ram1 stores the data on the physical RAM. If you want something more permanent, you just need to enter the device, such as hda1 or sda5. The device can be a hard disk partition or a USB disk just make sure its formatted as ext2 or ext3. If you replace the default option ram1 with a device, PelicanHPC will create a user directory at the root of that device. Next, PelicanHPC asks whether it should copy all the configuration scripts and the examples to the home directory on the specified device. If this is the first time you are running PelicanHPC, youll want to choose Yes. If youve selected a permanent storage location, such as a partition of the disk, on subsequent boots, you should choose No here. Of course if you are running PelicanHPC from RAM, youll always have to choose Yes. Finally, youre prompted to change the default password. This password will be
for the user user on the front-end nodes, as well as on the slave nodes. PelicanHPC is designed for a single user, and the password is in cleartext. When it has this info, PelicanHPC will boot the front-end node and drop off into the Xfce desktop environment.
the Network boot option is prioritized over other forms of booting in the BIOS. When it sees the front-end node, the slave displays the PelicanHPC splash screen and lets you enter any boot parameters (language, etc.), just as it did on the front-end node earlier. Instead of booting into Xfce, when its done booting, the slave node displays a notice that its part of a cluster and shouldnt be turned off (Figure 1). Of course, if your slave nodes dont have a monitor, just make sure the boot parameters in the BIOS are in the correct order and turn it on. When the node is up and running, head back to the front end and press the No button, which rescans the cluster and updates the number of connected nodes (Figure 2). When the number of connected nodes matches the number of slaves you turned on, press Yes. PelicanHPC displays a confirmation message and points you to the script thats used to reconfigure the cluster when you decide to add or remove a node (Figure 3). To resize the cluster, run the following script:
sh pelican_restarthpc
Thats it. Your cluster is up and running, waiting for your instructions.
If you have multiple network interfaces Crunchy Bar on the machine, youll be asked to select the one that is connected to the cluster. The developer, Creel, is a professor of Next, youre prompted to allow the economics at the Autonomous Universcripts to start the DHCP server, folsity of Barcelona in Catalonia, Spain. He lowed by confirmation to start the serworks in econometrics, which involves a vices thatll allow the slave nodes to lot of number crunching. Therefore, join the cluster. At first, the constant youll find some text and example GNU confirmations seem irritating, but they Octave code related to Creels research are necessary to prevent you from throwing the network into a tizzy with conflicting DHCP services or from accidentally interrupting on-going computations. Once it has your permission to start the cluster, the script asks you turn on the slave nodes. Slave nodes are booted over the network, so make sure Figure 2: Two nodes up and running; continue scanning for more.
JUNE 2009
ISSUE 103
31
CoveR sToRy
PelicanHPC
example2, shown in Figure 4, shows the result of the Monte Carlo test. Creel also suggests that PelicanHPC can be used for molecular dynamics with the open source software, GROMACS (GROningen MAchine for Chemical Simulations). The distributed project for studying protein folding, Folding@ Figure 3: Two nodes are up and running besides the front end. home, also uses GROMACS, and Creel believes that one and teaching. If youre interested in could also replicate this setup on a cluseconometrics, the econometrics.pdf file ter created by PelicanHPC. under the /home/user/Econometrics diCreel also suggests that users solely inrectory is a good starting point. Also terested in learning about high-perforcheck out the ParallelEconometrics.pdf mance computing should look to Paralfile under /home/user/Econometrics/ParlelKnoppix, the last version of which is allelEconometrics. This presentation is a still available for download [4]. nice introduction to parallel computing and econometrics. Parallel Programming with For the uninitiated, GNU Octave [2] is PelicanHPC a high-level computational language for numerical computations. It is the free One of the best uses for PelicanHPC is software alternative to the proprietary for compiling and running parallel proMATLAB program, both of which are grams. If this is all you want to use Peliused for hardcore arithmetic. canHPC for, you dont really need the Some sample code is in the /home/ slave nodes because the tools can comuser/Econometrics/Examples/ directory pile your programs on the front-end for performing tests such as kernel dennode itself. sity [3] and maximum likelihood estimaPelicanHPC includes several tools for tions, as well as for running the Monte writing and processing parallel code. Carlo simulations of how a new econoOpenMPI compiles programs in C, C++, metric estimator performs. and Fortran. SciPy and NumPy [5] are Py-
thon-based apps for scientific computing. PelicanHPC also has the MPI toolbox (MPITB) for Octave, which lets you call MPI library routines from within Octave.
MPI
PelicanHPC includes two MPI implementations: LAM/MPI and OpenMPI. When writing parallel programs in C or C++, make sure you include the mpi.h header file (#include <mpi.h>). To compile the programs, you need mpicc for C programs, mpic++ or mpiCC for C++ programs, and mpif77 for Fortran. Listing 1 has a sample Hello World program in C that uses the MPI library to print a message from all the nodes in the cluster. Compile it with mpicc:
mpicc borg-greeting.c -o borg-greeting
Run Tests
To run the tests, open a terminal and start GNU Octave by typing octave on the command line, which brings you to the Octave interface. Here you can run various examples of sample code by typing in a name. For example, the kernel estimations are performed by typing kernel_example. Similarly, pea_example shows the parallel implementation of the parameterized expectation algorithm, and mc_
borg-greeting
This command tells the MPI library to explicitly run four copies of the hello
32
ISSUE 103
JUNE 2009
CoveR sToRy
PelicanHPC
online from Peter Pachecos book, Parallel Programming with MPI [7]. See the OpenMPI website for additional documentation, including a very detailed FAQ [8].
So to roll out your own ISO or USB image, first install a recent Ubuntu or Debian release. Ive used Lenny to create a customized PelicanHPC release. Next, grab the live_helper package from the distros repository. Finally, grab the latest version of the make_pelican script (currently v1.8) from Pelicans download page [4]. Open the script in your favorite text editor. The script is divided into various sections. After the initial comments, which include a brief changelog, the first section lists the packages that will be available on the ISO. Here is where you make the changes. Listing 2 shows a modified version of this section, in which Ive commented out the binary blobs for networking, because I dont need this for my networks. Ive also added AbiWord and the GROMACS package. Because these packages are fetched off your distributions repositories, make sure you spell them as they appear there. GROMACS has several dependencies but you dont have to worry about adding them because theyll be fetched automatically. The next bit in the make_pelican script you have to tinker with is the architecture you want to build the ISO for and whether you want the ISO or USB image. This section also specifies the series of network addresses doled out by PelicanHPC:
PELICAN_NETWORK=U "10.11.12" MAXNODES="100" #ARCHITECTURE=U "amd64" #KERNEL="amd64" ARCHITECTURE=U "i386" KERNEL="686" IMAGETYPE="iso" #IMAGETYPE="usb-hdd" DISTRIBUTION=U
25 xorg xfce4 konqueror ksysguard ksysguardd kate kpdf 26 konsole kcontrol kdenetwork kdeadmin abiword 27 PACKAGELIST 28 ### END OF PACKAGELIST ###
34
ISSUE 103
JUNE 2009
THE AUTHOR
24 # X stuff
Mayank Sharma has written for various Linux publications, including Linux.com, IBMdeveloperWorks, and Linux Format, and has published two books through Packt on administering Elgg and Openfire. Occasionally he teaches FLOSS technologies. You can reach him via: http://www.geekybodhi.net.
PelicanHPC
CoveR sToRy
sh make_pelican
Now sit back and enjoy, or if you have a slow connection and are running this on a slow computer, you better do your taxes because itll take a while to fetch all the packages and compile them into a distro image. Figure 5: Tweak the make_pelican script to create your custom When its done, youll prompts. have a shiny new ISO named binary.iso under either the i386/ "lenny" or the amd64/ directory, depending on MIRROR="en" the architecture you build for. Now transfer the USB image onto a USB stick, The rest of the script deals with Pelior test the ISO image with VirtualBox or canHPC internals and shouldnt be with Qemu before burning it onto a disc. tweaked unless you know what youre Figure 5 shows the password screen of a doing. However, its advisable to browse modified PelicanHPC Live CD. through the other sections to get a better PelicanHPC is designed with ease of idea about how PelicanHPC magically use in mind for anyone who wants to transforms ordinary machines into exuse their spare computers to do some setraordinary computing clusters. rious number crunching. Building on the When youve tweaked the script, exeexperience of ParallelKnoppix, the develcute it from the console:
oper has put a lot of effort behind PelicanHPCs no-fuss approach to get your cluster off the ground in a jiffy. The customization abilities are the icing on the cake and make PelicanHPC an ideal platform for building your own custom cluster environment. n
INFO
[1] PelicanHPC: http://pareto.uab.es/ mcreel/PelicanHPC/ [2] GNU Octave: http://www.gnu.org/software/octave/ [3] Kernel density estimation: http://en.wikipedia.org/wiki/Kernel_ density_estimation [4] ParallelKnoppix download: http://pareto.uab.es/mcreel/ PelicanHPC/download/ [5] SciPy and NumPy: http://www.scipy.org/ [6] MPI tutorial: http://www.dartmouth. edu/~rc/classes/intro_mpi/ [7] Parallel Programming with MPI: http://www.cs.usfca.edu/mpi/ [8] OpenMPI FAQ: http://www.open-mpi.org/faq/? category=mpi-apps
JUNE 2009
ISSUE 103
35