Papers by Mohamed H. Aissa
A computational Fluid Dynamics (CFD) code for steady simulations solves a set of non-linear parti... more A computational Fluid Dynamics (CFD) code for steady simulations solves a set of non-linear partial differential equations using an iterative time stepping process, which could follow an explicit or an implicit scheme. On the CPU, the difference between both time stepping methods with respect to stability and performance has been well covered in the literature. However, it has not been extended to consider modern high-performance computing systems such as Graphics Processing Units (GPU). In this work, we first present an implementation of the two time-stepping methods on the GPU, highlighting the different challenges on the programming approach. Then we introduce a classification of basic CFD operations, found on the degree of parallelism they expose, and study the potential of GPU acceleration for every class. The classification provides local speedups of basic operations, which are finally used to compare the performance of both methods on the GPU. The target of this work is to enable an informed-decision on the most efficient combination of hardware and method when facing a new application. Our findings prove, that the choice between explicit and implicit time integration relies mainly on the convergence of explicit solvers and the efficiency of preconditioners on the GPU.
master thesis, Feb 18, 2013
The aim of this work is to accelerate an existing two-dimensional explicit Euler solver implement... more The aim of this work is to accelerate an existing two-dimensional explicit Euler solver implemented on a single CPU. The solver based on Discontinuous-Galerkin method for the spatial discretization is ported to run on a graphical processing units (GPU) using the CUDA programming language.
The input has been rearranged in one-dimensional buffers to maximize the profit from the GPU SIMD architecture. A GPU implementation of the explicit solver has been performed. The GPU implementation runs on a Tesla C1060 GPU and a Tesla K20
GPU. The CPU implementation has been also extended to run on multi-core CPU over OpenMP. The multi-CPU program runs on a hyper-threaded E5540 Intel Xeon processor.
Peak speedup of 66x has been achieved for the simulation of an acoustic pulse between GPU and single CPU implementation and peak speedup of 6x between the GPU and multiple CPUs implementation.
Optical inspection systems constitute hardware components (e.g. measurement sensors, lighting sys... more Optical inspection systems constitute hardware components (e.g. measurement sensors, lighting systems, positioning systems etc.) and software components (system calibration techniques, image processing algorithms for defect detection and classification, data fusion, etc.). Given an inspection task choosing the most suitable components is not a trivial process and requires expert knowledge. For multiscale measurement systems, the optimization of the measurement system is an unsolved problem even for human experts. In this contribution we propose two assistant systems (hardware assistant and software assistant), which help in choosing the most suitable components depending on the task considering the properties of the object (e.g. material, surface roughness, etc.) and the defects (e.g. defect types, dimensions, etc.). The hardware assistant system uses general rules of thumb, sensor models/simulations and stored expert knowledge to specify the sensors along with their parameters and the hierarchy (if necessary) in a multiscale measurement system. The software assistant system then simulates many measurements with all possible defect types for the chosen sensors. Artificial neural networks (ANN) are used for pre-selection and genetic algorithms are used for finer selection of the defect detection algorithms along with their optimized parameters. In this contribution we will show the general architecture of the assistant system and results obtained for the detection of typical defects on technical surfaces in the micro-scale using a multiscale measurement system.
Uploads
Papers by Mohamed H. Aissa
The input has been rearranged in one-dimensional buffers to maximize the profit from the GPU SIMD architecture. A GPU implementation of the explicit solver has been performed. The GPU implementation runs on a Tesla C1060 GPU and a Tesla K20
GPU. The CPU implementation has been also extended to run on multi-core CPU over OpenMP. The multi-CPU program runs on a hyper-threaded E5540 Intel Xeon processor.
Peak speedup of 66x has been achieved for the simulation of an acoustic pulse between GPU and single CPU implementation and peak speedup of 6x between the GPU and multiple CPUs implementation.
The input has been rearranged in one-dimensional buffers to maximize the profit from the GPU SIMD architecture. A GPU implementation of the explicit solver has been performed. The GPU implementation runs on a Tesla C1060 GPU and a Tesla K20
GPU. The CPU implementation has been also extended to run on multi-core CPU over OpenMP. The multi-CPU program runs on a hyper-threaded E5540 Intel Xeon processor.
Peak speedup of 66x has been achieved for the simulation of an acoustic pulse between GPU and single CPU implementation and peak speedup of 6x between the GPU and multiple CPUs implementation.