Article

Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus

Authors:

Gabrielle Allen,

Thomas Dramlitsch,

Nicholas T. Karonis,

Brian ToonenAuthors Info & Claims

SC '01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing

Page 52

https://doi.org/10.1145/582034.582086

Published: 10 November 2001 Publication History

Abstract

Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make application development difficult. Here we describe an architecture and prototype implementation for a Grid-enabled computational framework based on Cactus, the MPICH-G2 Grid-enabled message-passing library, and a variety of specialized features to support efficient execution in Grid environments. We have used this framework to perform record-setting computations in numerical relativity, running across four supercomputers and achieving scaling of 88% (1140 CPU's) and 63% (1500 CPUs). The problem size we were able to compute was about five times larger than any other previous run. Further, we introduce and demonstrate adaptive methods that automatically adjust computational parameters during run time, to increase dramatically the efficiency of a distributed Grid simulation, without modification of the application and without any knowledge of the underlying network connecting the distributed computers.

References

[1]

G. Allen, W. Benger, C. Hege, J. Massó, A. Merzky, T. Radke, E. Seidel, and J. Shalf. Solving einstein's equations on supercomputers. IEEE Computer, 32(12), 1999.]]

Digital Library

[2]

G. Allen, T. Goodale, and E. Seidel. The cactus computational collaboratory: Enabling technologies for relativistic astrophysics, and a toolkit for solving pdes by communities in science and engineering. In 7th Symposium on the Frontiers of Massively Parallel Computation-Frontiers 99, New York, 1999. IEEE.]]

Digital Library

[3]

F. Berman. High-performance schedulers. In {11}, pages 279-309.]]

Digital Library

[4]

S. Brunett, D. Davis, T. Gottschalk, P. Messina, and C. Kesselman. Implementing distributed synthetic forces simulations in metacomputing environments. In Proceedings of the Heterogeneous Computing Workshop, pages 29-42. IEEE Computer Society Press, 1998.]]

Digital Library

[5]

http://www.cactuscode.org.]]

[6]

IMPI Steering Committee. IMPI --- interoperable message-passing interface, 1998. http://impi.nist.gov/IMPI/.]]

[7]

K. Czajkowski, I. Foster, and C. Kesselman. Co-allocation services for computational grids. In Proc. 8th IEEE Symp. on High Performance Distributed Computing. IEEE Computer Society Press, 1999.]]

Digital Library

[8]

G. E. Fagg, K. S. London, and J. J. Dongarra. MPI_Connect managing heterogeneous MPI applications inter operation and process control. In V. Alexandrov and J. Dongarra, editors, Recent advances in Parallel Virtual Machine and Message Passing Interface, volume 1497 of Lecture Notes in Computer Science, pages 93-96. Springer, 1998. 5th European PVM/MPI Users' Group Meeting.]]

Digital Library

[9]

I. Foster and N. Karonis. A grid-enabled MPI: Message passing in heterogeneous distributed computing systems. In Proceedings of SC'98. ACM Press, 1998.]]

Digital Library

[10]

I. Foster and C. Kesselman. Globus: A toolkit-based grid architecture. In {11}, pages 259-278.]]

Digital Library

[11]

I. Foster and C. Kesselman, editors. The Grid: Blueprint for a Future Computing Infrastructure. Morgan Kaufmann Publishers, 1999.]]

Digital Library

[12]

I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke. A security architecture for computational grids. In ACM Conference on Computers and Security, pages 83-91. ACM Press, 1998.]]

Digital Library

[13]

E. Gabriel, M. Resch, T. Beisel, and R. Keller. Distributed computing in a heterogenous computing environment. In Proc. EuroPVMMPI'98. 1998.]]

Digital Library

[14]

W. Gropp and E. Lusk. Reproducible measurements of mpi performance characteristics. http://www-unix.mcs.anl.gov/~gropp/papers.htm.]]

[15]

T. Kimura and H. Takemiya. Local area metacomputing for multidisciplinary problems: A case study for fluid/structure coupled simulation. In Proc. Intl. Conf. on Supercomputing, pages 145-156. 1998.]]

Digital Library

[16]

M. Litzkow, M. Livny, and M. Mutka. Condor --- a hunter of idle workstations. In Proc. 8th Intl Conf. on Distributed Computing Systems, pages 104-111, 1988.]]

[17]

Paul Messina. Distributed supercomputing applications. In {11}, pages 55-73.]]

Digital Library

[18]

J. Nieplocha and R. Harrison. Shared memory NUMA programming on the IWAY. In Proc. 5th IEEE Symp. on High Performance Distributed Computing, pages 432-441. IEEE Computer Society Press, 1996.]]

Digital Library

[19]

P. M. Papadopoulos and G. A. Geist. Wide-area ATM networking for large-scale MPPS. In SIAM conference on Parallel Processing and Scientific Computing, 1997.]]

[20]

R. Ponnusamy, J. Saltz, and A. Choudhary. Runtime-compilation techniques for data partitioning and communication schedule reuse. Technical Report CSTR-3055, Department of Computer Science, University of Maryland, 1993.]]

Digital Library

[21]

M. Ripeanu, A. Iamnitchi, and I. Foster. Cactus application: Performance predictions in grid environments. In proceedings of EuroPar 2001 Conference, LNCS 2150, 2001.]]

Digital Library

[22]

J. Saltz and M. Chen. Automated problem mapping: The crystal runtime system. In Proceedings of the Second Hypercube Microprocessors Conference, Knoxville, TN, September 1986.]]

[23]

E. Seidel and W. Suen. Numerical relativity as a tool for computational astrophysics. J. Comp. Appl. Math., 109(1-2):493-525, 1999.]]

[24]

J. Semke, J. Mahdavi, and M. Mathis. Automatic TCP buffer tuning. Computer Communication Review, 28(4), 1998.]]

Digital Library

[25]

T. Sheehan, W. Shelton, T. Pratt, P. Papadopoulos, P. LoCascio, and T. Dunigan. Locally self consistent multiple scattering method in a geographically distributed linked MPP environment. Parallel Computing, 24, 1998.]]

Digital Library

[26]

R. Wolski. Forecasting network performance to support dynamic scheduling using the network weather service. In Proc. 6th IEEE Symp. on High Performance Distributed Computing, Portland, Oregon, 1997. IEEE Press.]]

Digital Library

Cited By

Turilli MLiu FZhang ZMerzky AWilde MWeissman JKatz DJha S(2016)Integrating Abstractions to Enhance the Execution of Distributed Applications2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.64(953-962)Online publication date: May-2016
https://doi.org/10.1109/IPDPS.2016.64
Hammouda ASiegel ASiegel S(2015)Noise-Tolerant Explicit Stencil Computations for Nonuniform Process Execution RatesACM Transactions on Parallel Computing10.1145/27423512:1(1-33)Online publication date: 13-Apr-2015
https://dl.acm.org/doi/10.1145/2742351
Hammond JSchäfer ALatham R(2014)To INT_MAX... and beyond!Proceedings of the 2014 Workshop on Exascale MPI10.1109/ExaMPI.2014.5(1-8)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/ExaMPI.2014.5
Show More Cited By

Index Terms

Recommendations

Efficient execution of tightly-coupled parallel applications in grid computing environments
Introduction to grid computing with globus
Grid-enabling applications in a heterogeneous environment with globus and condor
MG '08: Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities

The authors Grid-enabled several test applications to conduct experiments that used the Condor Job Manager to schedule code execution across both Globus and Condor resources. Using Condor to manage Globus Grid services in addition to local Condor pools (...

Comments

Information & Contributors

Information

Published In

SC '01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing

November 2001

756 pages

ISBN:158113293X

DOI:10.1145/582034

Conference Chair:
Charles Slocomb
Los Alamos National Laboratory

Copyright © 2001 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2001

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SC '01

Sponsor:

SIGARCH
IEEE-CS

SC '01: International Conference for High Performance Computing, Networking, Storage and Analysis

November 10 - 16, 2001

Colorado, Denver

Acceptance Rates

SC '01 Paper Acceptance Rate 60 of 240 submissions, 25%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

79
Total Citations
View Citations
692
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Turilli MLiu FZhang ZMerzky AWilde MWeissman JKatz DJha S(2016)Integrating Abstractions to Enhance the Execution of Distributed Applications2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.64(953-962)Online publication date: May-2016
https://doi.org/10.1109/IPDPS.2016.64
Hammouda ASiegel ASiegel S(2015)Noise-Tolerant Explicit Stencil Computations for Nonuniform Process Execution RatesACM Transactions on Parallel Computing10.1145/27423512:1(1-33)Online publication date: 13-Apr-2015
https://dl.acm.org/doi/10.1145/2742351
Hammond JSchäfer ALatham R(2014)To INT_MAX... and beyond!Proceedings of the 2014 Workshop on Exascale MPI10.1109/ExaMPI.2014.5(1-8)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/ExaMPI.2014.5
Vassiliadis B(2011)The Grid as a Virtual Enterprise EnablerInformation Systems and New Applications in the Service Sector10.4018/978-1-60960-138-6.ch005(76-89)Online publication date: 2011
https://doi.org/10.4018/978-1-60960-138-6.ch005
Schäfer AHammer JFey DFeo JVilla OTumeo ASecchi S(2011)Parallel simulation of dendritic growth on unstructured gridsProceedings of the 1st Workshop on Irregular Applications: Architectures and Algorithms10.1145/2089142.2089148(15-22)Online publication date: 13-Nov-2011
https://dl.acm.org/doi/10.1145/2089142.2089148
Sanjay HVadhiyar S(2011)Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster GridsJournal of Grid Computing10.1007/s10723-010-9170-z9:3(379-403)Online publication date: 1-Sep-2011
https://dl.acm.org/doi/10.1007/s10723-010-9170-z
Hager GWellein G(2010)BibliographyIntroduction to High Performance Computing for Scientists and Engineers10.1201/EBK1439811924-b(309-321)Online publication date: 14-Jul-2010
https://doi.org/10.1201/EBK1439811924-b
Meng JSkadron K(2010)A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone OptimizationsInternational Journal of Parallel Programming10.1007/s10766-010-0142-539:1(115-142)Online publication date: 30-Jun-2010
https://doi.org/10.1007/s10766-010-0142-5
Cope JIskra KKimpe DRoss R(2010)Bridging HPC and grid file i/o with IOFSLProceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 210.1007/978-3-642-28145-7_22(215-225)Online publication date: 6-Jun-2010
https://dl.acm.org/doi/10.1007/978-3-642-28145-7_22
Nogueira LPinho LCoelho J(2010)Flexible and Dynamic Replication Control for Interdependent Distributed Real-Time Embedded SystemsDistributed, Parallel and Biologically Inspired Systems10.1007/978-3-642-15234-4_8(66-77)Online publication date: 2010
https://doi.org/10.1007/978-3-642-15234-4_8
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy