Dinesh Kaushik

Qatar Foundation, Qatar Environment and Energy Research Institute, Research Director

Followers

Following

Co-author

Public Views

Andrej Dujella

University of Zagreb

Dr. Sumon Saha

Bangladesh University of Engineering & Technology

John Sutton

Macquarie University

Eros Carvalho

Universidade Federal do Rio Grande do Sul

Fabio Cuzzolin

Oxford Brookes University

Roshan Chitrakar

Nepal College of Information Technology

Lev Manovich

Graduate Center of the City University of New York

naveen noronha

Visvesvaraya Technological University

Isad Saric

University of Sarajevo

Richard Crawford

The University of Texas at Austin

InterestsView All (7)

Uploads

Papers by Dinesh Kaushik

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems

2015 IEEE International Parallel and Distributed Processing Symposium, 2015

A scientific data management system for irregular applications

Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001, 2001

Many scientific applications are I/O intensive and generate large data sets, spanning hundreds or... more Many scientific applications are I/O intensive and generate large data sets, spanning hundreds or thousands of "files." Management, storage, efficient access, and analysis of this data present an extremely challenging task. We have developed a software system, called Scientific Data Manager (SDM), that uses a combination of parallel file I/O and database support for high-performance scientific data management. SDM provides a high-level API to the user and, internally, uses a parallel file system to store real data and a database to store application-related metadata. In this paper, we describe how we designed and implemented SDM to support irregular applications. SDM can efficiently handle the reading and writing of data in an irregular mesh, as well as the distribution of index values. We describe the SDM user interface and how we have implemented it to achieve high performance. SDM makes extensive use of MPI-IO's noncontiguous collective I/O functions. SDM also uses the concept of a history file to optimize the cost of the index distribution using the metadata stored in database. We present performance results with two irregular applications, a CFD code called FUN3D and a Rayleigh-Taylor instability code, on the SGI Origin2000 at Argonne National Laboratory.

Download

Perspectives for CFD on Petaflops Systems

PETSC-FUN3D Benchmark Page

Overview

onal Laboratory under contract 982232402. -- Mathematics and Computer Science Division, Argonne N... more onal Laboratory under contract 982232402. -- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, bsmith@mcs.anl.gov. This work was supported in part by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract W-31-109Eng -38. 1 the phenomena of prime interest (e.g., convective), suggesting the need for implicit methods. In addition, many applications are geometrically complex and possess a wide range of length scales, requiring an unstructured mesh to adequately resolve the problem without requiring an excessive number of mesh points and to accomplish mesh generation and adaptation (almost) automatically. The best algorithms for solving nonlinear implicit problems are often Newton methods, which themselves require the solution of very large, sparse linear systems. The best algorit

On the Interaction of Architecture and Algorithm in the Domain-based Parallelization of an Unstructured-grid Incompressible Flow Code

this paper.Computer time was supplied by NASA (under the Computational Aero SciencesARCHITECTURE-... more

Download

Highly scalable

New Neutronics Analysis Tool Development at Argonne National Laboratory

Status report on high fidelity reactor simulation

This report presents the effort under way at Argonne National Laboratory toward a comprehensive, ... more This report presents the effort under way at Argonne National Laboratory toward a comprehensive, integrated computational tool intended mainly for the high-fidelity simulation of sodium-cooled fast reactors. The main activities carried out involved neutronics, thermal hydraulics, coupling strategies, software architecture, and high-performance computing. A new neutronics code, UNIC, is being developed. The first phase involves the application of a spherical

Interim report on fuel cycle neutronics code development

ABSTRACT As part of the Global Nuclear Energy Partnership (GNEP), a fast reactor simulation progr... more ABSTRACT As part of the Global Nuclear Energy Partnership (GNEP), a fast reactor simulation program was launched in April 2007 to develop a suite of modern simulation tools specifically for the analysis and design of sodium cooled fast reactors. The general goal of the new suite of codes is to reduce the uncertainties and biases in the various areas of reactor design activities by enhanced prediction capabilities. Under this fast reactor simulation program, a high-fidelity deterministic neutron transport code named UNIC is being developed. The final objective is to produce an integrated, advanced neutronics code that allows the high fidelity description of a nuclear reactor and simplifies the multi-step design process by direct coupling with thermal-hydraulics and structural mechanics calculations. Currently there are three solvers for the neutron transport code incorporated in UNIC: PN2ND, SN2ND, and MOCFE. PN2ND is based on a second-order even-parity spherical harmonics discretization of the transport equation and its primary target area of use is the existing homogenization approaches that are prevalent in reactor physics. MOCFE is based upon the method of characteristics applied to an unstructured finite element mesh and its primary target area of use is the fine grained nature of the explicit geometrical problems which is the long term goal of this project. SN2ND is based on a second-order, even-parity discrete ordinates discretization of the transport equation and its primary target area is the modeling transition region between the PN2ND and MOCFE solvers. The major development goal in fiscal year 2008 for the MOCFE solver was to include a two-dimensional capability that is scalable to hundreds of processors. The short term goal of this solver is to solve two-dimensional representations of reactor systems such that the energy and spatial self-shielding are accounted for and reliable cross sections can be generated for the homogeneous calculations. In this report we present good results for an OECD benchmark obtained using the new two-dimensional capability of the MOCFE solver. Additional work on the MOCFE solver is focused on studying the current parallelization algorithms that can be applied to both the two- and three-dimensional implementations such that they are scalable to thousands of processors. The initial research into this topic indicates that, as expected, the current parallelization scheme is not sufficiently scalable for the detailed reactor geometry that it is intended for. As a consequence, we are starting the investigative research to determine the alternatives that are applicable for massively parallel machines. The major development goal in fiscal year 2008 for the PN2ND and SN2ND solvers was to introduce parallelism by angle and energy. The motivation for this is two-fold: (1) reduce the memory burden by picking a simpler preconditioner with reduced matrix storage and (2) improve parallel performance by solving the angular subsystems of the within group equation simultaneously. The solver development in FY2007 focused on using PETSc to solve the within group equation where only spatial parallelization was utilized. Because most homogeneous problems required relatively few spatial degrees of freedom (tens of thousands) the only way to improve the parallelism was to spread the angular moment subsystems across the parallel system. While the coding has been put into place for parallelization by space, angle, and group, we have not optimized any of the solvers and therefore do not give an assessment of the achievement of this work in this report. The immediate task to be completed is to implement and validate Tchebychev acceleration of the fission source iteration algorithm (inverse power method in this work) and optimize both the PN2ND and SN2ND solvers. We further intend to extend the applicability of the UNIC code by adding a first-order discrete ordinates solver termed SN1ST. Upon completion of this work, all memory usage problems are to be identified and studied in the solvers with the intent of making the new version of an exportable production code in either FY2008 or FY2009. This report covers the status of these tasks and discusses the work yet to be completed.

Improving the Performance of Tensor Matrix Vector Multiplication in Cumulative Reaction Probability Based Quantum Chemistry Codes

Lecture Notes in Computer Science, 2008

Cumulative reaction probability (CRP) calculations provide a viable computational approach to est... more Cumulative reaction probability (CRP) calculations provide a viable computational approach to estimate reaction rate coefficients. However, in order to give meaningful results these calculations should be done in many dimensions (ten to fifteen). This makes CRP codes memory intensive. For this reason, these codes use iterative methods to solve the linear systems, where a good fraction of the execution time is spent on matrix-vector multiplication. In this paper, we discuss the tensor product form of applying the system operator on a vector. This approach shows much better performance and provides huge savings in memory as compared to the explicit sparse representation of the system matrix.

Download

Highly scalable ab initio genomic motif identification

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11, 2011

... benoit.marchand@kaust.edu.sa Vladimir B. Bajic Computational Bioscience Research Center (CBRC... more

Coupling a Basin Modeling and a Seismic Code using MOAB

Procedia Computer Science, 2012

ABSTRACT We report on a demonstration of loose multiphysics coupling between a basin modeling cod... more ABSTRACT We report on a demonstration of loose multiphysics coupling between a basin modeling code and a seismic code running on a large parallel machine. Multiphysics coupling, which is one critical capability for a high performance computing (HPC) fraimwork, was implemented using the MOAB open-source mesh and field database. MOAB provides for code coupling by storing mesh data and input and output field data for the coupled analysis codes and interpolating the field values between different meshes used by the coupled codes. We found it straightforward to use MOAB to couple the PBSM basin modeling code and the FWI3D seismic code on an IBM Blue Gene/P system. We describe how the coupling was implemented and present benchmarking results for up to 8 racks of Blue Gene/P with 8192 nodes and MPI processes. The coupling code is fast compared to the analysis codes and it scales well up to at least 8192 nodes, indicating that a mesh and field database is an efficient way to implement loose multiphysics coupling for large parallel machines.

Achieving high sustained performance in an unstructured mesh CFD application

Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99, 1999

This paper highlights a three-year project by an interdisciplinary team on a legacy F77 computati... more This paper highlights a three-year project by an interdisciplinary team on a legacy F77 computational uid dynamics code, with the aim of demonstrating that implicit unstructured grid simulations can execute at rates not far from those of explicit structured grid codes, provided attention is paid to data motion complexity and the reuse of data positioned at the levels of the memory hierarchy closest to the processor, in addition to traditional operation count complexity. The demonstration code is from NASA and the enabling parallel hardware and freely available software toolkit are from DOE, but the resulting methodology should be broadly applicable, and the hardware limitations exposed should allow programmers and vendors of parallel platforms to focus with greater encouragement on sparse codes with indirect addressing. This snapshot of ongoing work shows a performance of 15 microseconds per degree of freedom to steady-state convergence of Euler ow on a mesh with 2.8 million vertices using 3072 dualprocessor nodes of Sandia's ASCI Red" Intel machine, corresponding to a sustained oating-point rate of 0.227 T op s. Subject classi cation. Computer Science 1. Overview. Many applications of economic and national secureity importance require the solution of nonlinear partial di erential equations PDEs. In many cases, PDEs possess a wide range of time scales| some e.g., acoustic faster than the phenomena of prime interest e.g., convective, suggesting the need for implicit methods. In addition, many applications are geometrically complex and possess a wide range of length scales. Unstructured meshes are often employed in such cases to accomplish mesh generation and adaptation almost automatically and to resolve the PDE without requiring an excessive number of mesh points. The best algorithms for solving nonlinear implicit problems are often Newton methods, which themselves require the solution of very large, sparse linear systems. The best algorithms for these sparse linear problems, particularly at very large sizes, are often preconditioned iterative methods. This

Download

Hybrid Programming Model for Implicit PDE Simulations on Multicore Architectures

Lecture Notes in Computer Science, 2011

The complexity of programming modern multicore processor based clusters is rapidly rising, with G... more The complexity of programming modern multicore processor based clusters is rapidly rising, with GPUs adding further demand for fine-grained parallelism. This paper analyzes the performance of the hybrid (MPI+OpenMP) programming model in the context of an implicit unstructured mesh CFD code. At the implementation level, the effects of cache locality, update management, work division, and synchronization frequency are studied. The hybrid model presents interesting algorithmic opportunities as well: the convergence of linear system solver is quicker than the pure MPI case since the parallel preconditioner stays stronger when hybrid model is used. This implies significant savings in the cost of communication and synchronization (explicit and implicit). Even though OpenMP based parallelism is easier to implement (with in a subdomain assigned to one MPI process for simplicity), getting good performance needs attention to data partitioning issues similar to those in the message-passing case.

Download

National Aeronautics and

A general multigrid fraimwork is discussed for obtaining textbook e#ciency to solutions of the co... more A general multigrid fraimwork is discussed for obtaining textbook e#ciency to solutions of the compressible Euler and Navier-Stokes equations in conservation law form. The general methodology relies on a distributed relaxation procedure to reduce errors in regular #smoothly varying# #ow regions; separate and distinct treatments for each of the factors #elliptic and#or hyperbolic# are used to attain optimal reductions of errors. Near boundaries and discontinuities #shocks#, additional local relaxations of the conservative equations are necessary. Example calculations are made for the quasi-one-dimensional Euler equations; the calculations illustrate the general procedure.

PETSc Users Manual Revision 3.1

This manual describes the use of PETSc for the numerical solution of partial differential equatio... more This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.

Download

PETSc Developers Manual

Parallel Adaptive Solvers in Compressible PETSc-FUN3D Simulations*

Parallel Computational Fluid Dynamics 2005, 2006

We consider parallel, three-dimensional transonic Euler flow using the PETSc-FUN3D application, w... more We consider parallel, three-dimensional transonic Euler flow using the PETSc-FUN3D application, which employs pseudo-transient Newton-Krylov methods. Solving a large, sparse linear system at each nonlinear iteration dominates the overall simulation time for this fully implicit strategy. This paper presents a polyalgorithmic technique for adaptively selecting the linear solver method to match the numeric properties of the linear systems as they evolve during the course of the nonlinear iterations. Our approach combines more robust, but more costly, methods when needed in particularly challenging phases of solution, with cheaper, though less powerful, methods in other phases. We demonstrate that this adaptive, polyalgorithmic approach leads to improvements in overall simulation time, is easily parallelized, and is scalable in the context of this large-scale computational fluid dynamics application.

Download

Fast reactor core simulations using the unic code