Multiplier-Less Hardware Realization of Trigonometric Functions For High Speed Applications
Multiplier-Less Hardware Realization of Trigonometric Functions For High Speed Applications
Multiplier-Less Hardware Realization of Trigonometric Functions For High Speed Applications
net/publication/334167226
CITATION READS
1 28
5 authors, including:
8 PUBLICATIONS 13 CITATIONS
Jadavpur University
18 PUBLICATIONS 20 CITATIONS
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Geographic and Annual Influences on Optical Follow-up of Gravitational Wave Events View project
All content following this page was uploaded by K. Gaurav Kumar on 08 July 2019.
Abstract— This paper presents a unified architecture of hardware that requires less computational time and consumes
trigonometric functions using CORDIC algorithm and its less device resources.
implementation on FPGA. The hardware finds applications in
fields of signal processing, mathematical calculators, and In this work, multiplier-less unified hardware for
various other engineering applications. CORDIC algorithm, calculating trigonometric functions is proposed. The
based on principle of vector rotations, computes the architecture is then programmed in Verilog HDL and
trigonometric functions using only add and shift operations. implemented on Virtex 4 FPGA. The present architecture has
The proposed architecture is a structural model, coded in also been compared in detail with previous published works.
Verilog HDL and is implemented on Virtex-4 FPGA kit. The
proposed architecture, which determines five trigonometric The structure of the rest of the paper is as follows. In
functions, utilizes less hardware resources than previously Section II, CORDIC, its mathematical background and the
reported architectures that can compute only sine and cosine unified algorithm are discussed. Proposed architecture is
functions. This architecture also provides an improvement in presented in section III. In Section IV, implementation
terms of chip area consumed and the maximum frequency of details have been highlighted along with results. Section V
operation of the hardware. The proposed architecture can also presents the performance of our architecture while
be easily reconfigured for applications where higher accuracy comparing with other architectures available in the literature.
is required. The novelty of this paper is implementing the Section VI concludes the paper.
unified architecture along with reducing the device resource
utilization and number of clock cycles required with respect to II. CORDIC ALGORITHM
previous works. CORDIC algorithm is used in calculators, discrete signal
processors, etc. [3]. Vector rotations are used to compute all
Keywords—Unified architecture, Trigonometric functions,
CORDIC, FPGA the trigonometric functions. The algorithm provides an
efficient iterative method to calculate the vector rotations by
I. INTRODUCTION an angle, using only add and shift operations.
Trigonometric functions like sine, cosine, tangent,
Fig. 1 shows the rotation of a vector P(x, y) through an
arcsine, arccosine, derived from complex exponential
functions, are used in wide range of applications, like digital angle α in the anti-clockwise direction to give a vector Pǯ (xǯ,
signal processing, wireless communications, biomedical yǯ). The coordinates of the point Pǯ is given by:
engineering, robotics, etc. There are several approaches to
realize hardware that performs these functionalities, i.e., (1)
Lookup Table (LUT), CORDIC (Coordinate Rotation Digital
(2)
Computer) and Polynomial Approximation.
Lookup table method uses memory blocks to store the
values of the functions to be computed, for every possible
input argument [1].This approach is relatively simple, but the
hardware requires more registers i.e. more memory.
Moreover, if less number of entries are stored in LUT, the
results obtained are inaccurate. If more number of entries
areused, the method produces accurate results, but the
realized hardware becomes uneconomical and memory-
inefficient.
Approximation algorithms such as polynomial
approximation take help of Maclaurin series to calculate
trigonometric functions. The hardware, realized by this Fig. 1. Rotating vector on 2-D plane.
method, requires large number of multipliers, adders and
shifters, making it area inefficient. As the values of factorials If angle of rotation is restricted, such that Ƚ ൌ ʹǦ,
remain fixed, LUTs are used to store these fixed values. where is an integer, then multiplication by tangent can be
CORDIC, invented by J. E. Volder in 1959 [2], is an performed by using only shift operations and total
efficient algorithm to calculate trigonometric, hyperbolic, calculation can be done using addition and shifting [4]. By
exponential and logarithmic functions. This algorithm uses choosing a proper sequence of ȽͲǡȽͳǡ ǤǤǤetc. and direction of
only add and shift operations, which are easy to implement in rotation, we can calculate the trigonometric functions [5].
,(((
A. Unified Algorithm adder/ subtractor, lookup table etc. are designed to perform
The algorithm is initialized with a vector (x0, y0) and an various micro-operations of the algorithm.
angle z0. At every iteration, the vector is rotated either in Barrel shifter is used for controlled shift operation. An N-
clock-wise or in anti-clockwise depending on the values of bit barrel shifter has been implemented which rotates
xi-1, yi-1, zi-1, obtained in the previous iteration, the angle zi is arbitrary number of bits of input number to the right. The
updated accordingly. The algorithm stops after a predefined circuit has N-bit input lines and a control line that dictates
number of iterations. the amount of shift.
ALGORITHM 1: Unified Algorithm Controlled adder/subtractor is also used for controlled
addition and subtraction to update x, y, and z as and when
Require: Input α and the function to be computed required. Also lookup tables are used to store tan-1 values.
1: Initialize x0, y0 and z0. The proposed architecture for unified algorithm has been
2: for i = 0 to N-1 do shown in fig. 2. Here, three registers xi, yi and zi stores value
of x, y, z after ith iteration. The rotating vector is denoted by
3: Choose , , //d is the direction of rotation (x, y) and angle of rotation is represented by z. After each
4: iteration, values of xi, yi and zi are calculated from previous
values and the registers are updated at positive edge of the
5: clock.
6: IV. FPGA IMPLEMENTATION
7: end for The architecture presented in Section III has been coded
in Verilog HDL. The code, synthesized in Xilinx ISE Design
Here, N denotes the number of iterations. Values of x0, Suite 14.7, has been implemented on Virtex 4 FPGA kit
y0, z0 are chosen appropriately for separate functions. Table I (xc4vlx60). Timing waveform analysis report and Device
shows the different values of x0, y0, z0 for computing the Utilization summary are discussed in following subsections
corresponding trigonometric functions: to justify novelty of our proposed architecture.
TABLE I. SELECTION OF X0, Y0, Z0 A. Timing waveform analysis
Required Function x0 y0 z0 The simulation uses a clock with 100MHz frequency.
sinα, cosα 0.6073 0 α Table III and Figure 3 provide the timing analysis of the
sin-1α, cos-1α 0.6073 0 0 proposed hardware for N=12, 13, 14. It is observed that the
tan-1α 1 α 0 number of clock cycles required to get the output increase
linearly with N. Thus, an optimized value of N has to be
chosen based on trade-off between accuracy and speed of
Depending upon the direction of rotation, dx, dy, dz can be
operation.
+1 or -1. Their values are selected as from Table II.
B. Device Utlization Summary
TABLE II. SELECTION OF DIRECTION OF ROTATIONS
Device utilization summary is also provided for the
Required dx dy dz proposed architecture. There are 22664 slice, 53248 slice
Function flip-flops, 53248 four-input LUTs and 448 bonded IOBs
sinα, cosα -1 if zi≥ 0 +1 if zi≥ 0 -1 if zi≥ 0 present in Virtex 4 FPGA (xc4vlx60) kit. Table IV provides
+1 otherwise -1 otherwise +1 otherwise
sin-1α -1 if yi< α +1 if yi< α +1 if yi< α
the number of each type of components needed for different
+1 otherwise -1 otherwise -1 otherwise values of N. This table clearly shows that device utilization
cos-1α +1 if xi< α -1 if xi< α -1 if xi< α of the hardware goes up as no of iterations (N) increases.
-1 otherwise +1 otherwise +1 otherwise Trade-off between the accuracy and the area of the hardware
tan-1α +1 if yi≥ 0 -1 if yi≥ 0 +1 if yi≥ 0 must be taken into consideration while choosing the value of
-1 otherwise +1 otherwise -1 otherwise N. Figure 4 shows the graphical representation of device
utilization summary for different N.
Number of iterations (N) is chosen according to desired The value of N for the proposed hardware is chosen as
accuracy of results. Higher N implies better precision, but 12, based on the accuracy, maximum operating frequency of
demands more computation time. N=12 implies that the the application and the area of the hardware.
precision of results is 2-12 and the hardware requires 12 clock
cycles. For applications that require less precision, value of
N can be reduced to optimize computation time. From the
algorithm, it is obvious that its running time complexity is V. COMPARATIVE ANALYSIS
O(n). The performance of existing designs reported in literature
is compared with the proposed unified architecture. To have
III. PROPOSED ARCHITECTURE same parameter setting for reference, our proposed design is
The unified algorithm has been implemented as FPGA synthesized using Xilinx ISE design suite with Virtex 4 as
based hardware. The proposed architecture is a structural target device and N = 12.To compare the performance, a
model with the capability of calculating the values of five parameter, area multiplied by delay (A × T) is defined. The
trigonometric functions, namely sine, cosine, arcsine, parameter Area (A) is the number of resources required (i.e.,
arccosine and arctangent. Components like barrel shifter, A = Slices + Slice flip flops + LUTs + Bonded IOBs), while
REFERENCES
[1] P. T. P. Tang, “Table-lookup Algorithms for Elementary Functions
and Their Error Analysis,” Proceedings 10th IEEE Symposium on
Computer Arithmetic, pp-232-236,1991.
[2] J. E. Volder, “The CORDIC Trigonometric Computing Technique,”
IRE Transactions on Electronic Computers, vol-EC-8, pp-330-334,
1959.
[3] V. Considine, “CORDIC Trigonometric Function Generator for
DSP,” International Conference on Acoustics, Speech, and Signal
Processing, pp-2381-2384, 1984.
[4] A. Saha, K. G. Kumar, and A. Ghosh, “Area Efficient Architecture of
Hyperbolic functions for high frequency applications,” International
Conference on Circuits, Controls and Communications, pp-139-142,
2017.
[5] J. S. Walther, “A Unified Algorithm for Elementary Functions,”
Springer Joint Computer Conference, pp-379-385, 1971.
[6] K. Maharatna, A. Troya, S. Banerjee, and E. Grass, “Virtually
Fig. 5. Comparative Analysis for validating performance of the hardware. scalingfreeadaptive cordic rotator,” IEEE Proceedings-Computers and
Digital Techniques, vol. 151, no. 6, pp. 448–456, 2004.
area delay product is almost 50% better than the previously [7] E. Garcia, R. Cumplido, and M. Arias, “Pipelined cordic design
onfpga for a digital sine and cosine waves generator,”, 3rd
reported architectures. Based on this performance analysis, it International Conference on Electrical and Electronics Engineering,
can be concluded that the proposed architecture is better than IEEE, pp. 1–4, 2006.
all the other existing implementations based on area of [8] L. Vachhani, K. Sridharan, and P. K. Meher, “Efficient cordic
hardware, maximum operating frequency (MOF) and area- algorithmsand architectures for low area and high throughput
delay product (A×T). Future works involve introducing implementation,” IEEE Transactions on Circuits and Systems II:
parallel processing and pipelining techniques, along with Express Briefs, , vol. 56, no. 1, pp. 61–65, 2009.
approximate circuits to further increase the MOF and [9] S. Aggarwal and K. Khare, “Hardware efficient architecture for
generating sine/cosine waves,” 25th International Conference on
decrease the area. VLSI Design (VLSID), IEEE, pp. 57–61, 2012.
ACKNOWLEDGMENT [10] Antonius P. Renardy, Nur Ahmadi, Ashbir A. Fadila, Naufal Shidqi
and Trio Adiono, “FPGA Implementation of CORDIC Algorithms
The authors are thankful to Asim Maiti, Dr. Swarup forSine and Cosine Generator”, 5th International Conference on
Kumar Mitra, and Rathindra Nath Biswas for their insightful Electrical Engineering and Informatics, Bali, Indonesia, 2015.
suggestions and inputs during the preparation of the
manuscript.