skip to main content
10.5555/369028acmconferencesBook PagePublication PagesscConference Proceedingsconference-collections
Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing
1996 Proceeding
Publisher:
  • IEEE Computer Society
  • 1730 Massachusetts Ave., NW Washington, DC
  • United States
Conference:
SC '96: International Conference for High Performance Computing, Networking, Storage and Analysis Pittsburgh Pennsylvania USA 1 January 1996
ISBN:
978-0-89791-854-1
Published:
17 November 1996
Sponsors:
SIGARCH, IEEE-CS
Reflects downloads up to 22 Jan 2025Bibliometrics
Abstract

No abstract available.

Proceeding Downloads

Article
Free
Parallel hierarchical molecular structure estimation

Determining the structure of biological macromolecules such as proteins and nucleic acids is an important element of molecular biology because of the intimate relation between form and function of these molecules. Individual sources of data about ...

Article
Free
A data-parallel implementation of O(N) hierarchical N-body methods

The O(N) hierarchical N-body algorithms and Massively Parallel Processors allow particle systems of 100 million particles or more to be simulated in acceptable time. We present a data-parallel implementation of Anderson's method and demonstrate both ...

Article
Free
The design of a portable scientific tool: a case studying using SnB

Developing and maintaining a large software package is a complex task. Decisions are made early in the design process that affect i) the ability of a user to effectively exploit the package and ii) the ability of a software engineer to maintain it. This ...

Article
Free
Runtime performance of parallel array assignment: an empirical study

Compiling the array assignment statement of High Performance Fortran in the presence of block-cyclic distributions of data arrays is considered difficult, and several algorithms have been published to solve this problem. We present a comprehensive study ...

Article
Free
ScaLAPACK: a portable linear algebra library for distributed memory computers - design issues and performance

This paper outlines the content and performance of ScaLAPACK, a collection of mathematical software for linear algebra computations on distributed memory computers. The importance of developing standards for computational and message passing interfaces ...

Article
Free
Network performance modeling for PVM clusters

The advantages of workstation clusters as a parallel computing platform include a superior price-performance ratio, availability, scalability, and ease of incremental growth. However, the performance of traditional LAN technologies such as Ethernet and ...

Article
Free
Scalable parallel algorithms for interactive visualization of curved surfaces

We present efficient parallel algorithms for interactive display of higher order surfaces on current graphics systems. At each frame, these algorithms approximate the surface by polygons and rasterize them over the graphics pipeline. The time for ...

Article
Free
STERN: a highly scalable parallel stereo terrain renderer for planetary mission simulations

In this paper, we describe STREN, a parallel stereo renderer for fixed-location terrain rendering tasks required for the simulation of planetary exploration missions. The renderer is based on a novel spatial data representation, called the TANPO map. ...

Article
Free
Education in high performance computing via the WWW: designing and using technical materials effectively

Cornell Theory Center, a national center for high performance computing, has been designing and delivering education programs on parallel processing in traditional workshops for years. With the advent and growth of the World Wide Web, we have been able ...

Article
Free
Compiler-directed shared-memory communication for iterative parallel applications

Many scientific applications are iterative and specify repetitive communication patterns. This paper shows how a parallel-language compiler and custom cache-coherence protocols in a distributed shared memory system together can implement shared-memory ...

Article
Free
Dynamic data distribution with control flow analysis

This paper describes the design of a data distribution tool which automatically derives the data mapping for the arrays and the parallelization strategy for the loops in a Fortran 77 program. The layout generated can be static or dynamic, and the ...

Article
Free
Transformations for imperfectly nested loops

Loop transformations are critical for compiling high-performance code for modern computers. Existing work has focused on transformations for perfectly nested loops (that is, loops in which all assignment statements are contained within the innermost ...

Article
Free
Earthquake ground motion modeling on parallel computers

We describe the design and discuss the performance of a parallel elastic wave propagation simulator that is being used to model and study earthquake-induced ground motion in large sedimentary basins. The components of the system include mesh generators, ...

Article
Free
Performance analysis and optimization on the UCLA parallel atmospheric general circulation model code

An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model(AGCM) on massively parallel computer systems. Several modifications to the parallel AGCM code aimed at ...

Article
Free
Climate data assimilation on a massively parallel Supercomputer

We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512-...

Article
Free
Performance analysis using the MIPS R10000 performance counters

Tuning supercomputer application performance often requires analyzing the interaction of the application and the underlying architecture. In this paper, we describe support in the MIPS R10000 for non-intrusively monitoring a variety of processor events -...

Article
Free
Profiling a parallel language based on fine-grained communication

Fine tuning the performance of large parallel programs is a very difficult task. A profiling tool can provide detailed insight into the utilization and communication of the different processors, which helps identify performance bottlenecks. In this ...

Article
Free
Modeling, evaluation, and testing of paradyn instrumentation system

This paper presents a case study of modeling, evaluating, and testing the data collection services (called an instrumentation system) of the Paradyn parallel performance measurement tool using well-known performance evaluation and experiment design ...

Article
Free
An analytical model of the HINT performance metric

The HINT Benchmark was developed to provide a broad-spectrum metric for computers, and to measure performance over the full range of memory sizes and time scales. We have extended our understanding of why HINT performance curves look the way they do, ...

Article
Free
Communication patterns and models in prism: a spectral element-Fourier parallel Navier-Stokes solver

In this paper we analyze communication patterns in the parallel three-dimensional Navier-Stokes solver Prism, and present performance results on the IBM SP2, the Cray T3D and the SGI Power Challenge XL. Prism is used for direct numerical simulation of ...

Article
The C31 parallel benchmark suite - introduction and preliminary results

Current parallel benchmarks, while appropriate for scientific applications, lack the defense relevance and representativeness for developers who are considering parallel computers for their Command, Control, Communication, and Intelligence (C3I) ...

Article
Free
Architectural and application: the performance of the NEC SX-4 on the NCAR benchmark suite

In November 1994, the NEC Corporation announced the SX-4 supercomputer. It is the third in the SX series of supercomputers and is upward compatible from the SX-3R vector processor with enhancements for scalar processing, short vector processing, and ...

Article
Free
Minimal adaptive routing with limited injection on Toroidal k-ary n-cubes

Virtual channels can be used to implement deadlock free adaptive routing algorithms and increase network throughput. Unfortunately, they introduce asymmetries in the use of buffers of symmetric networks as the toroidal k-ary n-cubes. In this paper we ...

Article
Free
Low-latency communication on the IBM RISC system/6000 SP

The IBM SP is one of the most powerful commercial MPPs, yet, in spite of its fast processors and high network bandwidth, the SP's communication latency is inferior to older machines such as the TMC CM-5 or Meiko CS-2. This paper investigates the use of ...

Article
Free
Compiled communication for all-optical TDM networks

While all-optical networks offer large bandwidth for transferring data, the control mechanisms to dynamically establish all-optical paths incur large overhead. In this paper, we consider the problem of adapting all-optical multiplexed networks in ...

Article
Free
Increasing the effective bandwidth of complex memory systems in multivector processors

In multivector processors, the lost cycles due to conflicts between concurrent vector streams make the effective throughput be lower than the peak throughput. When the request rate of all the concurrent vector streams to every memory module is less than ...

Article
Free
A parallel cosmological hydrodynamics code

Formation by gravitational collapse of galaxies and the large-scale structure of the universe is a nonlinear, multi-scale, multi-component problem. This complex process involves dynamics of the gaseous baryons as well as of the gravitationally dominant ...

Article
Free
Transient dynamics simulations: parallel algorithms for contact detection and smoothed particle hydrodynamics

Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian ...

Article
Free
Performance of a computational fluid dynamics code on NEC and Cray supercomputers: beyond 10 gigaflops

The implementation and optimization of a production mode Computational Fluid Dynamics (CFD) software to NEC and Cray supercomputing platforms are discussed. It is intended to assess the impact of different computer architectures and High Power Computing ...

Article
Free
Parallel preconditioners for elliptic PDEs

Iterative schemes for solving sparse linear systems arising from elliptic PDEs are very suitable for efficient implementation on large scale multiprocessors. However, these methods rely heavily on effective preconditioners which must also be amenable to ...

Contributors

Index Terms

  1. Proceedings of the 1996 ACM/IEEE conference on Supercomputing

          Recommendations

          Acceptance Rates

          Overall Acceptance Rate 1,516 of 6,373 submissions, 24%
          YearSubmittedAcceptedRate
          SC '173276119%
          SC '164428118%
          SC '153587922%
          SC '143948321%
          SC '134499120%
          SC '1246110022%
          SC '113527421%
          SC '102535120%
          SC '092615923%
          SC '082775921%
          SC '072685420%
          SC '062395423%
          SC '052606224%
          SC '042006030%
          SC '032076029%
          SC '022306729%
          SC '012406025%
          SC '001796235%
          Supercomputing '952416929%
          Supercomputing '933007224%
          Supercomputing '922207534%
          Supercomputing '912158339%
          Overall6,3731,51624%
          pFad - Phonifier reborn

          Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

          Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


          Alternative Proxies:

          Alternative Proxy

          pFad Proxy

          pFad v3 Proxy

          pFad v4 Proxy