- Sponsor:
- sighpc
No abstract available.
Proceeding Downloads
Parallel hierarchical molecular structure estimation
Determining the structure of biological macromolecules such as proteins and nucleic acids is an important element of molecular biology because of the intimate relation between form and function of these molecules. Individual sources of data about ...
A data-parallel implementation of O(N) hierarchical N-body methods
The O(N) hierarchical N-body algorithms and Massively Parallel Processors allow particle systems of 100 million particles or more to be simulated in acceptable time. We present a data-parallel implementation of Anderson's method and demonstrate both ...
The design of a portable scientific tool: a case studying using SnB
Developing and maintaining a large software package is a complex task. Decisions are made early in the design process that affect i) the ability of a user to effectively exploit the package and ii) the ability of a software engineer to maintain it. This ...
Runtime performance of parallel array assignment: an empirical study
Compiling the array assignment statement of High Performance Fortran in the presence of block-cyclic distributions of data arrays is considered difficult, and several algorithms have been published to solve this problem. We present a comprehensive study ...
ScaLAPACK: a portable linear algebra library for distributed memory computers - design issues and performance
- Laura Susan Blackford,
- J. Choi,
- A. Cleary,
- A. Petitet,
- R. C. Whaley,
- J. Demmel,
- I. Dhillon,
- K. Stanley,
- J. Dongarra,
- S. Hammarling,
- G. Henry,
- D. Walker
This paper outlines the content and performance of ScaLAPACK, a collection of mathematical software for linear algebra computations on distributed memory computers. The importance of developing standards for computational and message passing interfaces ...
Network performance modeling for PVM clusters
The advantages of workstation clusters as a parallel computing platform include a superior price-performance ratio, availability, scalability, and ease of incremental growth. However, the performance of traditional LAN technologies such as Ethernet and ...
Scalable parallel algorithms for interactive visualization of curved surfaces
We present efficient parallel algorithms for interactive display of higher order surfaces on current graphics systems. At each frame, these algorithms approximate the surface by polygons and rasterize them over the graphics pipeline. The time for ...
STERN: a highly scalable parallel stereo terrain renderer for planetary mission simulations
In this paper, we describe STREN, a parallel stereo renderer for fixed-location terrain rendering tasks required for the simulation of planetary exploration missions. The renderer is based on a novel spatial data representation, called the TANPO map. ...
Education in high performance computing via the WWW: designing and using technical materials effectively
Cornell Theory Center, a national center for high performance computing, has been designing and delivering education programs on parallel processing in traditional workshops for years. With the advent and growth of the World Wide Web, we have been able ...
Compiler-directed shared-memory communication for iterative parallel applications
Many scientific applications are iterative and specify repetitive communication patterns. This paper shows how a parallel-language compiler and custom cache-coherence protocols in a distributed shared memory system together can implement shared-memory ...
Dynamic data distribution with control flow analysis
This paper describes the design of a data distribution tool which automatically derives the data mapping for the arrays and the parallelization strategy for the loops in a Fortran 77 program. The layout generated can be static or dynamic, and the ...
Transformations for imperfectly nested loops
Loop transformations are critical for compiling high-performance code for modern computers. Existing work has focused on transformations for perfectly nested loops (that is, loops in which all assignment statements are contained within the innermost ...
Earthquake ground motion modeling on parallel computers
- Hesheng Bao,
- Jacobo Bielak,
- Omar Ghattas,
- Loukas F. Kallivokas,
- David R. O'Hallaron,
- Jonathan R. Shewchuk,
- Jifeng Xu
We describe the design and discuss the performance of a parallel elastic wave propagation simulator that is being used to model and study earthquake-induced ground motion in large sedimentary basins. The components of the system include mesh generators, ...
Performance analysis and optimization on the UCLA parallel atmospheric general circulation model code
An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model(AGCM) on massively parallel computer systems. Several modifications to the parallel AGCM code aimed at ...
Climate data assimilation on a massively parallel Supercomputer
We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512-...
Performance analysis using the MIPS R10000 performance counters
Tuning supercomputer application performance often requires analyzing the interaction of the application and the underlying architecture. In this paper, we describe support in the MIPS R10000 for non-intrusively monitoring a variety of processor events -...
Profiling a parallel language based on fine-grained communication
Fine tuning the performance of large parallel programs is a very difficult task. A profiling tool can provide detailed insight into the utilization and communication of the different processors, which helps identify performance bottlenecks. In this ...
Modeling, evaluation, and testing of paradyn instrumentation system
This paper presents a case study of modeling, evaluating, and testing the data collection services (called an instrumentation system) of the Paradyn parallel performance measurement tool using well-known performance evaluation and experiment design ...
An analytical model of the HINT performance metric
The HINT Benchmark was developed to provide a broad-spectrum metric for computers, and to measure performance over the full range of memory sizes and time scales. We have extended our understanding of why HINT performance curves look the way they do, ...
Communication patterns and models in prism: a spectral element-Fourier parallel Navier-Stokes solver
In this paper we analyze communication patterns in the parallel three-dimensional Navier-Stokes solver Prism, and present performance results on the IBM SP2, the Cray T3D and the SGI Power Challenge XL. Prism is used for direct numerical simulation of ...
The C31 parallel benchmark suite - introduction and preliminary results
- Rakesh Jha,
- Richard C. Metzger,
- Brian VanVoorst,
- Luiz S. Pires,
- Wing Au,
- Minesh Amin,
- David A. Castanon,
- Vipin Kumar
Current parallel benchmarks, while appropriate for scientific applications, lack the defense relevance and representativeness for developers who are considering parallel computers for their Command, Control, Communication, and Intelligence (C3I) ...
Architectural and application: the performance of the NEC SX-4 on the NCAR benchmark suite
In November 1994, the NEC Corporation announced the SX-4 supercomputer. It is the third in the SX series of supercomputers and is upward compatible from the SX-3R vector processor with enhancements for scalar processing, short vector processing, and ...
Minimal adaptive routing with limited injection on Toroidal k-ary n-cubes
Virtual channels can be used to implement deadlock free adaptive routing algorithms and increase network throughput. Unfortunately, they introduce asymmetries in the use of buffers of symmetric networks as the toroidal k-ary n-cubes. In this paper we ...
Low-latency communication on the IBM RISC system/6000 SP
The IBM SP is one of the most powerful commercial MPPs, yet, in spite of its fast processors and high network bandwidth, the SP's communication latency is inferior to older machines such as the TMC CM-5 or Meiko CS-2. This paper investigates the use of ...
Compiled communication for all-optical TDM networks
While all-optical networks offer large bandwidth for transferring data, the control mechanisms to dynamically establish all-optical paths incur large overhead. In this paper, we consider the problem of adapting all-optical multiplexed networks in ...
Increasing the effective bandwidth of complex memory systems in multivector processors
In multivector processors, the lost cycles due to conflicts between concurrent vector streams make the effective throughput be lower than the peak throughput. When the request rate of all the concurrent vector streams to every memory module is less than ...
A parallel cosmological hydrodynamics code
Formation by gravitational collapse of galaxies and the large-scale structure of the universe is a nonlinear, multi-scale, multi-component problem. This complex process involves dynamics of the gaseous baryons as well as of the gravitationally dominant ...
Transient dynamics simulations: parallel algorithms for contact detection and smoothed particle hydrodynamics
Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian ...
Performance of a computational fluid dynamics code on NEC and Cray supercomputers: beyond 10 gigaflops
The implementation and optimization of a production mode Computational Fluid Dynamics (CFD) software to NEC and Cray supercomputing platforms are discussed. It is intended to assess the impact of different computer architectures and High Power Computing ...
Parallel preconditioners for elliptic PDEs
Iterative schemes for solving sparse linear systems arising from elliptic PDEs are very suitable for efficient implementation on large scale multiprocessors. However, these methods rely heavily on effective preconditioners which must also be amenable to ...
Index Terms
- Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
SC '17 | 327 | 61 | 19% |
SC '16 | 442 | 81 | 18% |
SC '15 | 358 | 79 | 22% |
SC '14 | 394 | 83 | 21% |
SC '13 | 449 | 91 | 20% |
SC '12 | 461 | 100 | 22% |
SC '11 | 352 | 74 | 21% |
SC '10 | 253 | 51 | 20% |
SC '09 | 261 | 59 | 23% |
SC '08 | 277 | 59 | 21% |
SC '07 | 268 | 54 | 20% |
SC '06 | 239 | 54 | 23% |
SC '05 | 260 | 62 | 24% |
SC '04 | 200 | 60 | 30% |
SC '03 | 207 | 60 | 29% |
SC '02 | 230 | 67 | 29% |
SC '01 | 240 | 60 | 25% |
SC '00 | 179 | 62 | 35% |
Supercomputing '95 | 241 | 69 | 29% |
Supercomputing '93 | 300 | 72 | 24% |
Supercomputing '92 | 220 | 75 | 34% |
Supercomputing '91 | 215 | 83 | 39% |
Overall | 6,373 | 1,516 | 24% |