Compatible Class Encoding in Roth-Karp Decomposition For Two-Output LUT Architecture
Compatible Class Encoding in Roth-Karp Decomposition For Two-Output LUT Architecture
Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan, R.O.C.
Abstract
Roth-Karp decomposition is one of the most popular techniques for LUT-based FPGA technology mapping because it can decompose a node into a set of nodes with fewer numbers of fanins. In this paper, we show how to formulate the compatible class encoding problem in Roth-Karp decomposition as a symbolic-output encoding problem in order to exploit the feature of the two-output LUT architecture. Based on this formulation, we also develop an encoding algorithm to minimize the number of LUT's required to implement the logic circuit. Experimental results show that our encoding algorithm can produce promising results in the logic synthesis environmentfor the two-output LUT architecture.
1. How to select the lambda set? 2. How to encode the compatible classes? The algorithm proposed in [111 provides a heuristic to choose a good lambda set. Another algorithm proposed in [lo] formulates Problem 2 as a symbolic-input encoding problem. However, both of these two algorithms only consider the singleoutput LUT architecture. In this paper, we propose a new formulation for Problem 2 and develop a new compatible class encoding algorithm which can fully exploit the feature of the two-output LUT architecture. This paper is organized as follows. Section 2 describes the compatible class encoding problem in Roth-Karp decomposition. In Section 3, our new encoding algorithm which addresses the two-output LUT architecture is given in detail. Section 4 shows experimental results and the concluding remarks are given in Section 5.
1. Introduction
Field Programmable Gate Arrays (FPGA's) are modern logic devices which can be programmed by the users to implement their own logic circuits. Because of the short turnaround time compared with that of the standard ASIC process, they become increasingly popular in rapid system prototyping recently. Many FPGA architectures have been proposed and the Look-Up Table(LUT)-based architecture is the most popular one. It consists of many configurable LUT's and each LUT can implement any k-input function. Besides, there is another similar LUT-based FPGA architecture which implements not only single-output functions but also two-output functions. A LUT of this architecture can implement either one k-input function or two (k-I)-input functions with totally k inputs. For example, in Xilinx XC3000 architecture[l], k is equal to 5. Many algorithms developed for LUT-based FPGA technology mapping have been proposed in previous studies[2111. Most of these algorithms first decompose the given Boolean network to be k-feasible. A Boolean network is said to be k-feasible if all nodes in the network are k-feasible, and a node is said to be k-feasible if the number of its fanins is no more than k. Hence, the corresponding circuit can be directly realized by an one-to-one mapping between nodes and LUTs. If there are some nodes that are not k-feasible, they then need to be decomposed to be a set of k-feasible nodes. Many proposed decomposition techniques, such as AND-OR decomposition, cofactoring, disjoint decomposition, if-then-else DAG, communication complexity reduction[ 121, and Roth-Karp decomposition[l3], are widely used here. In this paper, we only focus on Roth-Karp decomposition. Given the LUT-based FPGA as the target architecture, two interesting problems in Roth-Karp decomposition should be noticed:
respect to F, denoted as x,
E
Each element E BIXl is called a 1 minterm . All mutually compatible h minterms can be grouped together to form a compatible class, and all compatible classes are pairwisely disjoint. Theorem2.1: Given two functions 6 : BIXl such that
+ Wand G : W X BIYl + B,
(1)
E BIXl, 6 (X,) 6 (X,) xI Xz (2) 0 is a function with binary inputs and a symbolic output. Xis called the bound (1) set. Y is called the free (p)set. Property 2.1: The number of the admissible values in W,IWl, must be no 0 less than the number of compatible classes in X. From Property 2.1, minimum IWl is equal to the number of compatible classes in X and can be obtained by redefining Eq. (2) as: v X I , xz E BlXl , 6 (x,) = 6 (x2) e X I xz (2')
359
In order to implement I? by binary logic, the symbolicoutput encoding for W has to be performed. At least t =
The admissible values of Ware defined to be the ids of the compatible classes. Thus, I5 maps each h minterm to the id of the compatible class it belongs. In this example, 3 ( [ l o g : ] ) binary-output functions al,a2, and a3, are needed to implement 6 . Suppose these symbolic values are randomly encoded without any strategy; for instance, each symbolic value is encoded with its binary bit pattern. Then, the encoding of 6 and th nctions are shown below:
[@#l
...,
required to encode 6 . Hence, Eq. (1) can be rewritten as: R x , U) = G(a,(x), a2(x), ..., at(x), Y ) (17 It is clear that the disjoint Roth-Karp decomposition can be used to reduce the number of fanins of a function under the condition o f t < 1x1. From the above discussion, we find that Roth-Karp decomposition only requires that h minterms belonging to the same compatible class are encoded with the same code, i.e., if xI x,, then I5 (x,) = 6 (x,). In other words, to have a correct Roth-Karp decomposition, we only need to assign a unique code to each compatible class and do not have to care what the code is. However, different encoding combinations for the a2, compatible classes will result in different al, ..., at,and G. There is a compatible class encoding strategy being proposed in [lo]. It models the compatible class encoding problem as the classical symbolic-input encoding problem. Many existing techniques are then used to encode the compatible classes for minimizing the literal counts of G. Because it assumes that the better decomposition quality can be obtained if the resulting G is simple, i.e., has smaller number of literals. However, this strategy only concentrates on singleoutput functions. Furthermore, it did not properly take advantage of the feature of either the single-output or the t w c ~ output LUT architecture. To exploit the property of the two-output LUT architecture, we formulate the compatible class encoding problem as: Tofind a set of compatible class encoding patterns to encode t as many afunctions to be independent of a least one of their input variables as possible. Thus, two a functions which are independent of at least one of their input variables can be merged into a two-output k-LUT. In fact, our formulation of this encoding problem is a kind of the symbolic-output encoding because we encode the symbolic output variable W into binary-output encoding functions a,,
---
+ ZIXZX3+ n,x,x4 + x2x3x4 +z2x3x4 Obviously, three LUTs are required to implement these functions since all of them depend on all four input variables. Example 2: As in Example 1, a better encoding of 5 is used here. The
encoding of I5 and the resulting Boolean functions are shown below:
XIX2
x3x4
3. Our Compatible Class Encoding Algorithm For the easy illustration, the LUT used in this section is
either a 4-LUT or a two-output 4-LUT. Example 1: Given a compatible class encoding function 15 with a set of inputs X as the h set and a symbolic output variable W:
x3x4
x3x4
W={O,1.2,3.4}
a, = xlX3x4 + Xlx3T4 a,= xIx2x3x4 We find that a3 is independent of x, and x3, and a2 is independent of x,, respectively. Therefore, a3 and azcan be
merged into a two-output LUT. Two instead of three LUTs are thus required to implement these a functions. Now we give some definitions and derive some properties from them. Based on these definitions and properties, our compatible class encoding algorithm is then constructed.
---
x3x4 W
360
Definition 3.1: An independent set with respect to an input variable x, denoted as ZSx, is defined as a set of class id's such that there exists a binary-inputloutput function a being independent of x where the ON -set of a,aoN = ( m I m is a h minterm where a ( m )E ISx) Example 3: In Example 1, (0, 2) is an ISx,. Because we can find a
function a3= x2x4 + Z2Z4 as illustrated in Example 2 satisfies Definition 3.1. Definition 3.2: An ISX is a minimum independent set with respect to the input variable x, denoted as MISX,if and only if VT, T c ISx and T # 0 T isnolongeran I S x . 0 Example 4: In Example 1, (0,2) and (1, 3, 4) are two MISx,'s by Definition 3.2. Property 3.1: Given two h minterms m and m' where m' is the same with m except that it is complemented at the position of variable x. Let C i (m) = k and 6 (m') k', then k and k' must belong to the = 0 same MZSXby Definition 3.1. Property 3.2: Two MISX'Sare either identical or disjoint, and the union 0 set of all MZSX'S W. is From Property 3.1 and 3.2, we can easily develop an algorithm to find all MIS's for a given variable x. Example 5: The set, Set-MZSx, containing all MIS's with respect to each variable x of Example 1 is shown below, respectively. Set-MZSx, = ( (021, ( 1,341 1 Set-MISx3 = ( (0,2),( 1 , 3 4
((02:134)). After discarding two useless ISX'S(0 and W) and exploiting the equivalence relation among dichotomies, the size of Set-Dx is given by Property 3 . 4 Property 3.4:
0
Property 3.5: A dichotomy Dx = (1:r) can be used to make the encoding function aiindependent of x if the i-th bit of the code of the compatible class c is equal to: iftheidofc E 1, {! (I iftheidofc E r. 0 Example 6: In Example 1, there exists a dichotomy, D = (02:134) E Set-Dx,. If a3is encoded by D.
x3x4
SetJ41Sx2 = ( (0,3) ,( 1,2,4)) Set-MZSx4 = ( (0,1,2,3,4)) Property 3.3: Each combination of an arbitrary number of MISx's represents an ISX. So the size of the set Set-ISx containing all
0 1~x1s equal to 21s*-M1sxl . is Thus, Set-lSx can be derived from Set-MlSx by applying Property 3.3. For example, Set-ISx, in Example 1 is (0,(0, 21,
( 1 , 3.41, (0, 1,2, 3 , 4 ) ) . Notice that 0 and (0, 1,2, 3,4) (W, are discarded since they are useless for encoding purpose described later. At this point, we introduce a terminology, dichotomy, which is first used for symbolic-input encoding in [14]. The notion of dichotomy is slightly modified here for convenience. Definition 3.3: A dichotomy with respect to x, Dx, is given by an ordered 0 pair, denoted as (l:r), where 1 is an ISx and r is W- ISx. Definition 3.4: Two dichotomies, D, = (l,:r,) and D, = (l,:r,), are
Set-Dx,, respectively, then a,encoded by either D, or D, is independent of both x, and xz. In this case, the number of
fanins of ai can be further reduced and the number of interconnection nets required for routing is also reduced. Therefore, a dichotomy which is independent of the most inputs should be chosen for encoding first. A procedure, Exhaustive-Search-Encoding, is developed to encode the compatible classes under three constraints described above. It first tries to find the maximum number@) of dichotomies for encoding to satisfy Constraint 1 and 2. If it fails, it reduces the number of encoding dichotomies by 1 and tries again. This procedure guarantees to get the optimum encoding solution, i.e., the maximum number of dichotomies can be found to encode a functions without violating Constraint 1 and2.
equivalent if (I, = 1 and r, = r2) or (1, = r, and r, = 1J. , 0 From Definition 3.3 and 3.4, a set of distinct, i.e., nonequivalent, dichotomies with respect to x, Set-Dx, can be generated from ISX. For instance, Set-Dx, in Example 1 is
361
Example 7: By reexamining Example 1, we find that two dichotomies (02:134) and (03:124) can be found by Exhaustive-Search-Encoding to encode a3and 04, respectively. is then encoded to satisfy Constraint 1. Therefore, the encoding result is identical with that illustrated in Example 2. The time efficiency of this procedure is not that good. Suppose there are d dichotomies being used to encode t bits, then the worst case, in which no one can be used for encoding, takes
4. Experimental Results
The algorithm described above has been implemented in SIS environment which is developed by UC Berkeley[lS]. Besides, our algorithm is integrated into a version of Roth-Karp decomposition[ll], which has the lambda set selection strategy, to enhance its performance. In order to investigate the quality of our encoding algorithm, denoted as Algorithm 3, two experiments are conducted over a large set of MCNC and ISCAS benchmark circuits to compare the results with those of another two versions of Roth-Karp decomposition. Algorithm 1 has neither the lambda set selection strategy nor the compatible class encoding strategy and is used in mis-pga[3]. Algorithm 2 has only the lambda set selection strategy and is implemented in [113. The target architecture is the Xilinx XC3000 FPGA which can implement either one 5-input function or two 4-input functions with totally 5 inputs as described above. The initial networks of one experiment are two-level circuits and are obtained by performing the SIS script: collapse simplify -d -m nocomp The initial networks of another experiment are multi-level circuits and are obtained by performing the SIS standard m l i utlevel optimization script. After obtaining the initial networks, the same mapping script: xl-k-decomp /*various algorithms applied here */ xl-partition -tm xl-cover xl-merge -1 is used in both experiments. This script first decomposes the network to be 5-feasible. Algorithm 1, 2, and 3 are applied here. Thus, each node can be implemented by a 5-LUT. Then, it tries to merge pairs of nodes which can be implemented by two-output 5-LUT's as many as possible. Both experiments run on a SUN SPARC 5 workstation and the results are shown in Table I and 1, respectively. 1 Table I shows the results on 16 two-level Benchmarks. On average, Algorithm 3 requires 41 % and 20% fewer LUT's than that of Algorithm 1 and 2, respectively. Moreover, Algorithm 3 is also very time-efficient. It only requires 14% and 90% CPU time than that of Algorithm 1 and 2, respectively. Table II shows the results on 26 multi-level benchmarks. We find that Algorithm 2 and 3 produce almost the same results. On average, they both require 32% fewer LUT's than that of Algorithm 1, and also take 15-20% less CPU time than that of Algorithm 1 in this experiment. These two experiments show that our encoding strategy can provide greater improvement for circuits starting with two-level forms but no significant improvement for circuits starting with multi-level forms. It is because that node functions of the multi-level circuits optimized by SIS are simpler and have fewer inputs than node functions of the two-level circuits. The similar reason has also been suggested in [lo]. Therefore, our encoding strategy does not make much difference for multilevel circuits after choosing a good lambda set for decomposing functions.
cf + cf-l+...+Cf
number of iterations dramatically increases as d increases. Thus, a more efficient algorithm should be developed for the practical usage. Consider an example that the numbers of the compatible classes and dichotomies are 6 and 5, respectively. t is then equal to 3 in this case. The search space of finding the optimum encoding solution for this example is shown in Fig. 1. Any path from node S to the nodes of the third level in the tree represents a possible combination of three dichotomies. Similarly, any path from node S to the nodes of the second level in the tree represents a possible combination of two dichotomies.
C Rerations :
Fig. 1 : The search space of finding the maximum number of dichotomies for the encoding. In this example, 25 iterations are required to find the optimum solution in the worst case. Suppose that the first dichotomy is (0:12345), we find that it is impossible to distinguish class 1-5 no matter what the remaining dichotomies are because only two more bits are allowed to use. Thus, the subtree rooted at the first dichotomy can be pruned without affecting the search of the optimum solution. This pruning algorithm is illustrated in Fig. 2.
Fig. 2 : An example of the pruning strategy on the search space of the dichotomies. Hence, the pruning algorithm can be formally described as: At any node of the search space of the dichotomies, if all the compatible classes cannot be partitioned into sets whose sizes are no more than by using the current dichotomy, then the subtree rooted at this node is pruned. This pruning algorithm also guarantees to get the optimum encoding solution and executes more efficiently than the procedure Exhaustive-Search-Encoding .
362
architecture. We show how to formulate this problem as a symbolic-output encoding problem. Based on this formulation, we also develop an encoding algorithm and integrate it into a version of Roth-Karp decomposition with the lambda set selection strategy. Experiment results show that our new encoding algorithm can efficiently use fewer LUTs to implement circuits starting with two-level forms for the twooutput LUT architecture. By investigating the optimization strategy of current logic synthesis systems, our new encoding technique is very useful in both two-level and multi-level logic synthesis systems targeting for the two-output LUT-based FPGA.
References
[l] Xilinx Inc., 2100, Logic Drive, San Jose, CA-95124, The Programmable Logic Data Book. [2] R. Murgai, Y. Nishizaki, N. Shenoy, R. K. Brayton, and A. Sangiovanni-Vincentelli, "Logic Synthesis for Programmable Gate Arrays," i Proc. 27th Design Automation Con$, June 1990, pp.620n 625. [3] R. Murgai, N. Shenoy, R. K. Brayton, and A. Sangiovanni-VincenteUi, "Improved Logic Synthesis Algorithms for Table Look Up Architectures," in Proc. In?. Con5 Computer-Aided Design, Nov. 1991, pp.564-567, [4] R. J. Francis, J. Rose, and K. Chung, "Chortle : A Technology Mapping Program for Lookup Table-Based Field Programmable Gate Arrays," in Proc. 27th Design Automation Conf. June 1990, pp.613619. [SI R. J. Francis, J. Rose, and Z. Vranesic, "Chortle-crf : Fast Technology Mapping for Lookup Table-Based PGA's," in Proc. 28th Design Automation Conf. June 1991, pp.227-233. [6] K. Karplus, "Xmap : A Technology Mapper for Table-Lookup Field Programmable Gate Arrays," in Proc. 28rh Design Automation Con$, June 1991, pp.240-243. [7] N. Woo, "A Heuristic Method for FPGA Technology Mapping Based on the Edge Visibility," in Proc. 28th Design Automation Conf.. June 1991, pp.248-251. [8] D. Filo, J. C. Yang, F. Mailhot, and G. D. Micheli, "Technology Mapping for a Two-Output RAM-baed Field-Programmable Gate Arrays," in Proc. European Design Automation Conf., Feb. 1991, pp.534-538. [9] Y. T. Lai, M. Pedram, and Sarma B. K. Vrudhula, "BDD Based Decomposition of Logic Functions with Application to FPGA Synthesis," in Proc. 30th Design Automation Con$, June 1993, pp.642-647. [10]R. Murgai, R. K. Brayton, and A. Sangiovanni-Vincentelli,"Optimum Functional Decomposition Using Encoding," in Proc. 31st Design Automation Cont. June 1994, pp.408-414. [11]W.-Z. Shen, J.-D. Huang, and S.-M. Chao, "Lambda Set Selection in Roth-Karp Decomposition for LUT-Based FPGA Technology Mapping," in Proc. 32nd Design Automation Cant, June 1995, pp.65-69. [12]TingTing Hwang, Robert M. Owens, Mary J. Irwin, and Kuo Hua Wang, "Logic Synthesis for Field-Programmable Gate Arrays," in IEEE Trans. on Computer-Aided Design, Oct. 1994, pp.1280-1287. [13]J. P. Roth, and R. M. Karp, "Minimization Over Boolean Graphs," in IBM Journal of Research and Development, April 1962, pp.227-238. [14]S. Yang and M. Ciesielski, "Optimum and Suboptimal Algorithm for Input Encoding and Its Relationship to Logic Minimization," in ZEEE Truns. on Computer-Aided Design, Jan. 1991, pp.4-12. [15]R. K. Brayton, R. Rudell, A. Sangiovanni-Vincentelli, and A. R. Wang, "MIS : A Multi-Level Logic Optimization System," in IEEE Truns. on Computer-Aided Design, Nov. 1987, pp.1062-1081.
:KT name 5xpl 9sym 9salu2 alu4 apex6 apex7 b9 bw c499 C880 clip count des duke2 e64 f5lm misexl misex2 misex3 rd73 rd84 rot sa02 vg2 Zm A l
Table I1
1 Nor 1 orithm 1 : R.-K. decomposition in SIS. Algorithm 2 : R.-K. decomposition with h set selection strategy[ll]. Algorithm 3 : Algorithm 2 with compatible class encoding strategy.
--
363