0% found this document useful (0 votes)
68 views5 pages

Tics On Embedded Systems

This document analyzes the performance of several bioinformatics tools on a VLIW embedded system architecture. It finds that: 1) The number and type of functional units impacts performance by determining how many operations the compiler can schedule simultaneously. 2) Instruction cache miss rates are high with direct mapping but drop significantly with 4-way set associative caches, indicating bioinformatics applications have small code footprints. 3) Compiler optimizations like superblocks and hyperblocks provide 1.1-2X speedups, suggesting they are important for efficient VLIW execution of bioinformatics applications.

Uploaded by

DhiviyaSampath
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views5 pages

Tics On Embedded Systems

This document analyzes the performance of several bioinformatics tools on a VLIW embedded system architecture. It finds that: 1) The number and type of functional units impacts performance by determining how many operations the compiler can schedule simultaneously. 2) Instruction cache miss rates are high with direct mapping but drop significantly with 4-way set associative caches, indicating bioinformatics applications have small code footprints. 3) Compiler optimizations like superblocks and hyperblocks provide 1.1-2X speedups, suggesting they are important for efficient VLIW execution of bioinformatics applications.

Uploaded by

DhiviyaSampath
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Bioinformatics on Embedded Systems: A Case Study of Computational Biology Applications on VLIW Architecture

INTRODUCTION Bioinformatics applications represent the increasingly important workloads. Their characteristics and implications on the underlying hardware design, however, are largely unknown. Currently, biological data processing ubiquitously relies on the high-end systems equipped with expensive, general-purpose processors. The future generation of bioinformatics requires the more flexible and cost-effective computing platforms to meet its rapidly growing market. The programmable, application-specific embedded systems appear to be an attractive solution in terms of easy of programming, design cost, power, portability and time-to-market. The first step towards such systems is to characterize bioinformatics applications on the target architecture. Such studies can help in understanding the design issues and the trade-offs in specializing hardware and software systems to meet the needs of bioinformatics market. This paper evaluates several representative

bioinformatics tools on the VLIW based embedded systems. We investigate the basic characteristics of the benchmarks, impact of function units, the efficiency of VLIW execution, cache behavior and the impact of compiler optimizations. The architectural implications observed from this study can be applied to the design optimizations. To the best of our knowledge, this is one of the first such studies that have ever been attempted.

EXPERIMENTAL METHODOLOGY Simulation Framework Our experimental framework is based on the Trimaran system designed for research in instruction-level parallelism [10]. Trimaran uses the IMPACT compiler [11] as its front-end. The IMPACT compiler performs C parsing, code profiling, block formation

and traditional optimizations [12]. It also exploits support for speculation and predicated execution using superblock [13] and hyperblock [14] optimizations. The Trimaran backend ELCOR performs instruction selection, register allocation and machine dependent code optimizations for the specified machine architecture. The Trimaran simulator generator generates the simulator targeted for a parameterized VLIW microprocessor architecture. Machine Configuration The simulated machine architecture comprises a VLIW microprocessor core and a twolevel memory hierarchy. The VLIW processor exploits instruction level parallelism with the help of compiler to achieve higher instruction throughput with minimal hardware. The core of the CPU consists of 64 general purpose registers, 64 floating point registers, 64 predicate registers, 64 control registers and 16 branch registers. There is no support for register renaming like in a superscalar architecture. Predicate registers are special 1-bit registers that specify a true or false value. Comparison operations use predicate registers as their target register. The core can execute up to eight operations every cycle, one each for the eight functional units it has. There are 4 integer units, 2 floating point units, 1 memory unit and 1 branch unit. The memory unit performs load/store operations. The branch unit performs branch, call and comparison operations. The level-one (L1) memory is organized as separate instruction and data caches. The processors level-two (L2) cache is unified.

Memory Hierarchy

L1 I-Cache L1 D-Cache L2 Cache

8KB, direct map, 32 byte/line, cache hit 1 cycle 8KB, 2-way, 32 byte/line, cache hit 1 cycle 64KB, 4-way, 64 byte/line, L2 hit 5 cycles, 35 cycles external memory latency

Machine configuration VLIW Core Issue Width General Purpose Registers Floating-Point Registers Predicate Registers

Control Registers Branch Target Registers

Number of Integer Units

Number of Floating Point Units

Number of Memory Units Number of Branch Units

8 64, 32-bit 64, 64-bit 64, 1-bit (used to store the Boolean values of instructions using predication) 64, 32-bit (containing the internal state of the processor) 16, 64-bit (containing target address and static predictions of branches) 4, most integer arithmetic operations: 1 cycle, integer multiply 3 cycles, integer divide 8 cycles 2, floating point multiply 3 cycles, floating point divide 8 cycles 1 1, 1 cycle latency

RESULTS

 Impact of Function Units On the VLIW processors, the number and type of function units affects the available resources for the compiler to schedule the operations. The presence of several instance of certain function unit allows the compiler to schedule several operations using that unit at the same time.The impact of the integer and memory units on the benchmark performance in this subsection is investigated.  Cache Performance Direct map instruction caches yield high miss rates on nearly all of the studied benchmarks. The conflict misses due to the lack of associativity dominate the cache misses. The instruction cache miss rates drop significantly with the increased associativity: the 4-way set associative, 8KB L1 I-cache shows a miss rate of less than 1%. This indicates that bioinformatics applications usually have small code footprints.

A small, highly associative instruction cache can attain good performance on the bioinformatics applications.  Compiler Optimizations The IMPACT compiler provides a set of classic optimizations such as constant propagation, copy propagation, constant folding, and strength reduction. These optimizations do not necessitate any additional microarchitectural support. On the IMPACT compiler, level 0 option does not contain any optimization. Level 1 option contains local optimizations. Level 2 option contains local and global optimizations. Level 3 option contains local, global and jump optimizations. Level 4 option contains local, global, jump and loop optimizations.

The hyperblock optimization results in speedups ranging from 1.1X to 2.0X. On the average, superblock and hyperblock optimizations improve performance by a factor of 1.3X and 1.5X. These speedups present an opportunity for improving the efficiency of VLIW execution on the bioinformatics applications.

Fxx CONCLUSION Classic compiler optimizations provide a factor of 1.0X to 1.15X performance improvement. More aggressive compiler optimizations such as superblock and hyperblock optimizations provide additional 1.1X to 2.0X performance enhancement, suggesting that they are important for the VLIW machine to sustain the desirable performance on the bioinformatics applications.

EMBEDDED SYSTEMS ASSIGNMENT-III

Submitted By, D.MADHUVANTHI 08EIR054 EIE-A

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy