Parallel CRC Generator
Parallel CRC Generator
Parallel CRC Generator
Abstract
This paper presents a theoretical result in the context of realizing high speed hardware for parallel CRC
checksums. Starting from the serial implementation widely reported in literature, we have identified a recursive
formula from which our parallel implementation is derived. In comparison with previous works, the new scheme is
faster and more compact and is independent of the technology used in its realization. In our solution, the number
of bits processed in parallel can be different from the degree of the polynomial generator. Lastly, we have also
developed high level parametric codes that are capable of generating the circuits autonomously, when only the
polyonomial is given.
Index Terms
parallel CRC, LFSR, error-detection, VLSI, FPGA, VHDL, digital logic
I. I NTRODUCTION
Cyclic Redundancy Check (CRC) [1][5] is widely used in data communications and storage devices as
a powerful method for dealing with data errors. It is also applied to many other fields such as the testing of
integrated circuits and the detection of logical faults [6]. One of the more established hardware solutions
for CRC calculation is the Linear Feedback Shift Register (LFSR), consisting of a few flip-flops (FFs) and
Giuseppe Campobello is with the Department of Physics, Uniersity of Messina, Contrada Papardo, Salita Sperone 31, 98166 Messina,
ITALY and INFN Section of Catania, 64, Via S.Sofia, I-95123 Catania, ITALY; e-mail: gcampo@ai.unime.it; Tel: +39 (0)90 6765231
Giuseppe Patan`e is with the Department of Physics, University of Messina, Contrada Papardo, Salita Sperone 31, 98166 Messina, ITALY
and INFN Section of Catania, 64, Via S.Sofia, I-95123 Catania, ITALY; e-mail: gpatane@ai.unime.it; Tel: +39 (0)90 6765231
Marco Russo (corresponding author) is with the Department of Physics, University of Catania, 64, Via S.Sofia, I-95123 Catania, ITALY
and INFN Section of Catania, 64, Via S.Sofia, I-95123 Catania, ITALY; e-mail: marco.russo@ct.infn.it
logic gates. This simple architecture processes bits serially. In some situations, such as high-speed data
communications, the speed of this serial implementation is absolutely inadequate. In these cases, a parallel
computation of the CRC, where successive units of
desirable.
Like any other combinatorial circuit, parallel CRC hardware could be synthetized with only two levels of
gates. This is defined by laws governing digital logic. Unfortunately, this implies a huge number of gates.
Furthermore, the minimization of the number of gates is an
when complex circuits must be realized, one generally use heuristics or seeks customized solutions.
This paper presents a customized, elegant, and concise formal solution for building parallel CRC
hardware. The new scheme generalizes and improves previous works. By making use of some mathematical
principless, we will derive a recursive formula that can be used to deduce the parallel CRC circuits.
Furthermore, we will show how to apply this formula and to generate the CRC circuits automatically. As
in modern synthesis tools, where it is possible to specify the number of inputs of an adder and automatically
generate necessary logic, we developed the necessary parametric codes to perform the same tasks with
parallel CRC circuits. The compact representation proposed in the new scheme provides the possibility of
saving hardware significantly and reaching higher frequencies in comparison to previous works. Finally,
in our solution, the degree of the polynomial generator,
, can be different.
The article is structured as follows: Sect. II illustrates the key elements of CRC. In Sect. III we
summarize previous works on parallel CRCs to provide appropriate background. In Sect. IV we derive
our logic equations and present the parallel circuit. In addition, we illustrate the performance by some
examples. Finally, in Sect. V we evaluate our results by comparing them with those presented in previous
works. The codes implemented are included in appendix.
3
Sequence with redundancy for error detecting
Original sequence to transmit
S1
b0 b1
S2
b k1
b 0
b 1
b m1
S
Divisor sequence
DIVISION WITH
NO REMAINDER
P
p0 p1
S = PQ
pm
is the quotient.
bits
bits.
bits
, to
,
of
is commonly
known as a Frame Check Sequence (FCS). It is generated by taking into account that the fact that the
complete sequence,
, obtained by concatenating of
and
,
, , of +1
to R. R divides (i.e. the message and the FCS) by , using the same particular
arithmetic, after it receives the message. If there is no remainder, R assumes there was no error. Fig. 1
illustrates how this mechanism works.
A modulo 2 arithmetic is used in the digital realization of the above concepts [3]: the product operator
is accomplished by a bitwise AND, whereas both the sum and subtraction are accomplished by bitwise
XOR operators. In this case, a CRC circuit (modulo 2 divisor) can be easily realized as a special shift
register, called LFSR. Fig. 2 shows a typical architecture. It can be used by both the transmitter and the
receiver. In the case of the transmitter, the dividend is the sequence
.
.
FFs
th FF output and a term given by the logical AND between
and
. The signal is obtained by taking a XOR of the input and . If is zero, only a shift
operation is performed (i.e. XOR related to is not required); otherwise the feedback
with
. We point out that the AND gates in Fig. 2 are unnecessary if the divisor
The sequence
is XOR-ed
is time-invariant.
bits are sent through . In the end, the FCS appears at the output end of the FFs.
Another possible implementation of the CRC circuit [7] is shown in Fig. 3. In this paper we will call it
LFSR2. In this circuit, the outputs of FFs (after
LFSR2 computes FCS faster than LFSR. In practice, the message length is usually much greater than
;
bits each, it is
report the main works in literature. Later, in Section V we compare our results with those presented in
literature.
As stated by Albertengo and Sisto [7] in 1990, previous works [8][10] dealt empirically with the
problem of parallel generation of CRCs. Furthermore, the validity of the results is in any case
restricted to a particular generator polynomial. Albertengo and Sisto [7] proposed an interesting
analytical approach. Their idea was to apply the digital filter theory to the classical CRC circuit.
Q
D
Q
D
m-1
Q
D
Fig. 2.
Divisor
clear
clock
Serial sequence
x 0
CK
CLR
FCS
x0
x 1
CK
CLR
x1
x m-1
CK
CLR
x m-1
They derived a method for determining the logic equations for any generator polynomial. Their
formalization is based on a -trasform. To obtain logic equations, many polynomial divisions are
needed. Thus, it is not possible to write a synthesizable VHDL code that automatically generates the
equations for parallel CRCs. The theory they developed is restricted to cases where the number of
bits processed in parallel is equal to the polynomial degree (
).
In 1996, Braun et al. [11] presented an approach suitable for FPGA implementation. A very complex
analytical proof is presented. They developed a special heuristic logic minimization to compute CRC
checksums on FPGA in parallel. Their main results are a precise formalism and a set of proofs to
derive the parallel CRC computation starting from the bit-serial case. Their work is similar to our
work but their proofs are more complex.
Fig. 3.
m1
p
CK
CLR
CK
CLR
Divisor
clear
clock
x 0
FCS
x0
x 1
x1
Serial sequence
CK
CLR
x m1
x m1
LFSR2 architecture
Later, McCluskey [12] developed a high speed VHDL implementation of CRC suitable for FPGA
implementation. He also proposed a parametric VHDL code accepting the polynomial and the input
data width, both of arbitrary length. This work is similar to ours, but the final circuit has a worse
performance.
In 2001 Sprachmann [13] implemented parallel CRC circuits of LSFR2. He proposed interesting
VHDL parametric codes. The derivation is valid for any polynomial and data-width
, but equations
In the same year, Shieh et al. [14] proposed another approach based on the theory of the Galois field.
The theory they developed is quite general like those presented in our paper (i.e.
). Howerver their hardware implementation is strongly based on lookahead techniques [15], [16];
Thus their final circuits require more area and elaboration time. The possibility to use several smaller
look-up tables (LUTs), is also shown, but the critical path of the final circuits grows substantially.
Their derivation method is similar to ours but, as in [13], equations are not optimized (see Section
V).
COMPUTATION
Starting from the circuit represented in Fig. 2 we have developed our parallel implementation of the
CRC. In the following, we assume that the degree of polynomial generator () and the length of the
message to be processed ( ) are both multiples of the number of bits to be processed in parallel ( ). This
is typical in data transmission where a message consists of many bytes and the polynomial generator, as
desired parallelism, consist of a few nibbles.
In the final circuit that we will obtain, the sequence
of
From linear systems theory [17] we know that a discrete-time, time-invariant linear system can be
expressed as follows:
(1)
where
is the state of the system, the input and the output. We use , , , to denote matrices,
and use
, , and
(2)
We can apply eq. (2) to the LFSR circuit (Fig. 2). In fact, if we use to denote the XOR operation, and
the symbol to denote bitwise AND, and
structure
is a ring with identity (Galois Field GF(2) [18]). From this consideration the solution
of the system (1)(expressed by (2)) is valid even if we replace multiplication and addition with the AND
and XOR operators respectively. In order to point out that the XOR and AND operators must be also
used in the product of matrices, we will denote their product by .
Let us consider the circuit shown in Fig. 2. It is just a discrete-time, time-invariant linear system for
which: the input
vector
is the
-th bit of the input sequence; the state
coincides with
, i.e. and are the identity and zero matrices respectively. Matrix
When
coincides with
where
, the solution derived from eq. (2) with substitution of the operators, is:
and
where
(3)
is the initial state of the FFs. Considering that the system is time-invariant, we obtain a
recursive formula:
(4)
where, for clarity, we have indicated with
and
, respectively the next state and the present state of the
system, and
etc., where
followed by a sequence of
zeros.
clock periods,
is the desired
FCS.
Now, it is important to evaluate the matrix
ranges from 2 to
columns of
(5)
This formula permits an efficient VHDL code to be written as we will show later.
From eq. (5) we can obtain
the vector
we have:
where
when
. Furthermore, we have:
(6)
10
So,
columns of
.
is and the lower right part must be filled with zeros.
=
Then, if
columns of
(7)
Finally, if we use , , and
As indicated above, having
example:
available, a power of
of lower order is immediately obtained. So, for
11
The same procedure may be applied to derive equations for parallel version of LFSR2. In this case the
matrix
where
(8)
A. Hardware realization
A parallel implementation of the CRC can be derived from the above considerations. Yet again, it
consists of a special register. In this case the inputs of the FFs are the exclusive sum of some FF outputs
and inputs. Fig. 4 shows a possible implementation. The signals
precisely,
if the divisor
is fixed, then the AND gates are unnecessary. Furthermore, the number of FFs remains
then inputs
eq. (8) we have a circuit very similar to that in Fig. 4 where inputs
12
Parallel
sequence
d0
FCS
e 0,0
x 0
e 0,1
d1
x0
D
CK
CLR
e 0,m-1
e 1,0
x 1
e 1,1
x1
D
CK
CLR
e 1,m-1
d m-1
e m-1,0
x m-1
m-1
e m-1,1
CK
CLR
m-1,m-1
clock
Enables
Fig. 4.
clear
MATLAB code that is able to generate a VHDL code. The code produces logic equations of the desired
CRC directly; Thus it is synthesized much faster than the previous VHDL code.
B. Examples
Here, our results, applied to four commonly used CRC polynomial generators are reported. As we stated
in the previous paragraph,
matrix. In order to
vector in which each element is the hexadecimal representation of the binary sequence obtained from the
corresponding row of
have
, where the first bit is the most significant. For the example reported above we
7 C E F .
CRC-12:
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1
CRC-12
CFF 280 140 0A0 050 028 814 40A 205 DFD A01 9FF
CRC-16:
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1
CRC-16
DFFF 3000 1800 0C00 0600 0300 0180 00C0 0060 0030 0018 000C 8006 4003 7FFE
13
BFFF
CRC-CCITT:
1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1
CRC-CCITT
0C88 0644 0322 8191 CC40 6620 B310 D988 ECC4 7662 3B31 9110 C888 6444
3222 1911
CRC-32:
1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0,
1
CRC-32
FB808B20
58374486
AC1BA243
AD8D5A01
AD462620
56A31310
2B518988
95A8C4C4
CAD46262
656A3131
493593B8
249AC9DC
924D64EE
C926B277
9F13D21B
B409622D
21843A36
90C21D1B
33E185AD
627049F6
313824FB
E31C995D
V. C OMPARISONS
Albertengo and Sisto [7] based their formalization on -transform. In their approach, many polynomial divisions are required to abtain logic equations. This implies that it is not possible to write
synthesizable VHDL codes that automatically generate CRC circuits. Their work is based on the
LFSR2 circuit. As we have already shown in Sect. IV, our theory can be applied to the same circuit.
However eq. (8) shows that, generally speaking, one more level of XOR is required with respect
to the parallel LFSR circuit we propose. This implies that our proposal is, generally, faster. Further
considerations can be made if FPGA is chosen as the target technology. Large FPGAs are, generally,
based on look-up-tables (LUTs). A LUT is a little SRAM which usually has more than two inputs
(tipically four or more). In the case of high speed LFSR2 there are many two-input XOR gates (see
[7] page 68). This implies that, if the CRC circuit is realized using FPGAs, many LUTs are not
completely utilized. This phenomenon is less critical in the case of LFSR. As a consequence, parallel
14
LFSR realizations are cheaper than LFSR2 ones. In order to give some numerical results to confirm
our considerations, we have synthesized the CRC32 in both cases. With LFSR we needed 162 LUTs
to obtain a critical path of 7.3 ns, whereas, for LFSR2, 182 LUTs and 10.8 ns are required.
The main difference between our work and Braun et als [11] is the dimension of the matrices to
be dealt with and the complexity of the proofs. Our results are simpler, i.e., we work with smaller
matrices and our proofs are not so complex as those present in [11].
McCluskey [12] developed a high speed VHDL implementation of CRC suitable for FPGA implementation. The results he obtained are similar to ours (i.e. he started from the LFSR circuit and
derived an empirical recursive equation). But he deals with matrices a little greater than the ones
we use. Even in this case only XOR are used in the final representation of the formula. Accurate
results are reported in the paper dealing with two different possible FPGA solutions. One of them
offers us the possibility of comparing our results with those presented by McCluskey. The solution
suitable for our purpose is related to the use of the ORCA 3T30-6 FPGA. This kind of FPGA uses a
technology of 0.3-0.35m containing 196 Programmable Function Units (PFUs) and 2436 FFs. Each
PFU has 8 LUTs each with 4 inputs and 10 FFs. One LUT introduces a delay of 0.9 ns. There is a
flexible input structure inside the PFUs and a total of 21 inputs per PFU. The PFU structure permits
the implementation of a 21-input XOR in a single PFU. We have at our disposal the MAX+PLUSII
ver.10.0 software1 to synthesize VHDL using ALTERA FPGAs. We have synthesized a 21-input
XOR and have realized that 1 Logic Array Block (LAB) is required, for synthesizing both small
and fast circuits. In short, each LAB contains some FFs and 8 LUTs each with 4 inputs. We have
synthesized our results regarding the CRC-32. The technology target was the FLEX-10KATC144-1.
It is smaller than ORCA3T30. The process technology is 0.35
0.9 ns. When
m,
, the synthesized circuit needed 162 logic cells (i.e., a total of 162 LUTs)
with a maximum operating frequency of 137 MHz. McCluskey, in this case, required 36 PFUs (i.e.,
1
available at www.altera.com
15
a total of 288 LUTs) with a speed of 105.8 MHz. So, our final circuit requires less area (
and has greater speed (
)
). The better results obtained with our approach are due to the fact
that matrices are different (i.e. different starting equations) and, what is more, McCluskeys matrix
is larger.
We have compiled VHDL codes reported in [13] using our Altera tool. The implementation of
our parallel circuit usually requires less area (70-90
) and has higher speed (a speedup of 4 is
achieved). For details see Table I. There are two main motives that explain these results: the former
is the same mentioned at the beginning of this section regarding the differences between LFSR and
LFSR2 FPGA implementation. The latter is that our starting equations are optimized. More precisely,
in our equation of , term
appears only once or not at all, while, in the starting equation of [13]
may appear more (up to times), as it is possible to observe in Fig.4 in [13]. Optimizations like
and must be processed from a synthesis tool. When grows, many
expressions of this kind are present in the final equations. Even if very powerful VHDL synthesis
tools are used, it is not sure that they are able to find the most compact logical form. Even when
they are able to, more synthesis time is necessary with respect to our final VHDL code.
In [14] a detailed performance evaluation of the CRC-32 circuit is reported. For comparison purposes
we take results from [14] when
of 448 2-input XORs and a critical path of 15 levels of XORs. After Synopsys optimization they
obtain 408 2-input XORs and 7 levels of gates. We evaluated the number of required 2-input XORs
starting from the matrix
, counting the ones and supposing the realization of XORs with more
than two inputs with binary tree architectures. So, for example, to realize a 8-input XOR, 3 levels
of 2-input XORs are required with a total of 7 gates. This approach gives 452 2-input XORs and
only 5 levels of gates before optimization. This implies that our approach produces faster circuits,
but the circuits are a little bit larger. However, with some simple manual tricks it is possible to obtain
hardware savings. For example, identifying common XOR sub-expressions and realizing them only
16
TABLE I
P ERFORMANCE EVALUATION FOR A VARIETY OF
LOOK - UP TABLES USED IN
PARALLEL
TIME OF THE
IS THE SYNTHESIS
Polynomial
LUTs
CPD
,
CRC12 [our]
21
6 ns
20,7
CRC12 [13]
27
24.2 ns
CRC16 [our]
28
7.2 ns
100,8
CRC16 [13]
31
28.1 ns
CRC-CCITT [our]
39
6.9 ns
113,8
CRC-CCITT [13]
30
10.2 ns
10
CRC32 [our]
162
7.3 ns
long,15
CRC32 [13]
220
30.5 ns
360
once, the number of required gates decreases to 370. With other smart tricks it is possible to obtain
more compact circuits. We do not have at our disposal the Synopsys tool, so we do not know which
is the automatic optimization achievable starting from the 452 initial XORs.
VI. ACKNOWLEDGMENTS
The authors wish to thank Chao Yang of the ILOG, Inc. for his useful suggestions during the revision
process. The authors wish also to thank the anonimous reviewers for their work, a valuable aid that
contributed to the improvement of the quality of the paper.
A PPENDIX
A. VHDL code
In Figs. 5 and 6 we present two VHDL listings, named respectively crcpack.vhd and crcgen.vhd. They
are the codes we developed describing our parallel realization of both LFSR and LFSR2 circuits. It is
necessary to assign to the constant CRC the divisor
17
1
2
3
4
5
6
7
8
9
10
11
12
library ieee ;
use i e e e . s t d l o g i c 1 1 6 4 . a l l ;
package c r c p a c k i s
c o n s t a n t CRC16 : s t d l o g i c v e c t o r ( 1 6 downto 0 ) : =
11000000000000101 ;
c o n s t a n t CRCDIM: i n t e g e r : = 1 6 ;
c o n s t a n t CRC : s t d l o g i c v e c t o r (CRCDIM downto 0 ) : =
CRC16 ;
c o n s t a n t DATA WIDTH : i n t e g e r r a n g e 1 t o CRCDIM: = 1 6 ;
t y p e m a t r i x i s a r r a y (CRCDIM 1 downto 0 ) o f
s t d l o g i c v e c t o r (CRCDIM 1 downto 0 ) ;
end c r c p a c k ;
Fig. 5.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
library ieee ;
u s e i e e e . s t d l o g i c 1 1 6 4 . a l l ; u s e work . c r c p a c k . a l l ;
entity crcgen is
port ( r e s , clk : s t d l o g i c ;
Din : s t d l o g i c v e c t o r (DATA WIDTH 1 downto 0 ) ;
Xout : o u t s t d l o g i c v e c t o r (CRCDIM 1 downto 0 ) ) ;
end c r c g e n ;
a r c h i t e c t u r e r t l of crcgen i s
s i g n a l X , X1 , X2 , D i n s :
s t d l o g i c v e c t o r (CRCDIM 1 downto 0 ) ;
begin
p r o c e s s ( Din )
v a r i a b l e Dinv : s t d l o g i c v e c t o r (CRCDIM 1 downto 0 ) ;
begin
Dinv : = ( o t h e r s = 0 ) ;
Dinv (DATA WIDTH 1 downto 0 ) : = Din ;
LFSR
LFSR2
Dinv (CRCDIM 1 downto CRCDIM DATA WIDTH ) : = Din ;
D i n s =Dinv ;
end p r o c e s s ;
LFSR
X2 =X ;
X2 =X x o r D i n s;
LFSR2
process ( r e s , clk )
begin
i f r e s = 0 t h e n X =( o t h e r s = 0 ) ;
e l s i f r i s i n g e d g e ( c l k ) t h e n X =X1 xor D i n s ;
t h e n X =X1;
LFSR2
end i f ;
end p r o c e s s ;
Xout =X;
LFSR
T h i s p r o c e s s b u i l d m a t r i x M=F w
p r o c e s s ( X2 )
v a r i a b l e Xtemp , v e c t , v e c t 2 :
s t d l o g i c v e c t o r (CRCDIM 1 downto 0 ) ;
v a r i a b l e M, F : m a t r i x ;
begin
Matrix F
F ( 0 ) : = CRC (CRCDIM 1 downto 0 ) ;
f o r i i n 0 t o CRCDIM 2 l o o p
v e c t : = ( o t h e r s = 0 ) ; v e c t (CRCDIM i 1 ) : = 1 ;
F ( i +1):= vect ;
end l o o p ;
M a t r i x M=F w
M(DATA WIDTH 1):=CRC (CRCDIM 1 downto 0 ) ;
f o r k i n 2 t o DATA WIDTH l o o p
v e c t 2 : =M(DATA WIDTH k + 1 ) ; v e c t : = ( o t h e r s = 0 ) ;
f o r i i n 0 t o CRCDIM 1 l o o p
i f v e c t 2 (CRCDIM 1 i ) = 1 t h e n v e c t : = v e c t xor F ( i ) ;
end i f ;
end l o o p ;
M(DATA WIDTH k ) : = v e c t ;
end l o o p ;
f o r k i n DATA WIDTH 1 t o CRCDIM 1 l o o p
M( k ) : = F ( k DATA WIDTH + 1 ) ;
end l o o p ;
C o m b i n a t i o n a l l o g i c e q u a t i o n s : X1 = M ( x ) X
Xtemp : = ( o t h e r s = 0 ) ;
f o r i i n 0 t o CRCDIM 1 l o o p
v e c t : =M( i ) ;
f o r j i n 0 t o CRCDIM 1 l o o p
i f v e c t ( j ) = 1 then
Xtemp ( j ) : = Xtemp ( j ) xor X2 (CRCDIM 1 i ) ;
end i f ;
end l o o p ;
end l o o p ;
X1 =Xtemp ;
end p r o c e s s ;
end r t l ;
Fig. 6.
18
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
[ Maximal number o f x o r i n p u t i s , . . .
num2str ( max ( Dimxor ) ) ]
for i =1: n ,
Dec ( i ) = 0 ;
f o r j = 1 : n , Dec ( i ) = Dec ( i ) + 2 ( n j ) Fk ( i , j ) ; end ;
z e r o f i l l = z e r o s ( n/4 s i z e ( d e c 2 h e x ( Dec ( i ) ) , 2 ) , 1 ) ;
s t r = [ num2str ( z e r o f i l l ) d e c 2 h e x ( Dec ( i ) ) ] ;
FH( i , : ) = s t r ;
end ;
i f ( nargin = = 3 ) ,
i f ( strcmp ( o p t , v h d l ) = = 1 ) ,
%Make VHDL f i l e named c r c g e n . vhd
echo o f f ;
Package = str2mat ( package c r c p a c k i s , . . .
[ c o n s t a n t CRCDIM : i n t e g e r : = , num2str ( n ) , ; ] , . . .
[ c o n s t a n t DATA WIDTH : i n t e g e r : = , num2str ( k ) , ; ] , . . .
end c r c p a c k ; ) ;
i f e x i s t ( c r c g e n . vhd ) , d e l e t e c r c g e n . vhd ; end ;
d i a r y c r c g e n . vhd
disp ( Package )
type crcgen . t x t
disp ( S)
d i s p ( end r t l ; )
diary o f f ;
e c h o on ;
end ;
end ;
Fig. 7.
.
B. Matlab code
In Fig. 7 we report the Matlab code, named crcgen.m, used to directly produce the VHDL listing
of the desired CRC where only the logical equations of the CRC are present. This code is synthesized
much faster than the previous one. In order to work correctly, the crcgen.m file needs another file, named
crcgen.txt; this file contains the first 34 rows of crcgen.vhd.
19
R EFERENCES
[1] W.W.Peterson, D.T.Brown, Cyclic Codes for Error Detection, Proc. IRE, Jan. 1961.
[2] A.S.Tanenbaum, Computer Networks. Prentice Hall, 1981.
[3] W.Stallings, Data and Computer Communications. Prentice Hall, 2000.
[4] T.V. Ramabadran and S.S. Gaitonde, A tutorial on CRC computations, IEEE Micro, Aug. 1988.
[5] N.R.Sexana, E.J.McCluskey, Analysis of Checksums, Extended Precision Checksums and Cyclic Redundancy Checks, IEEE
Transactions on Computers, July 1990.
[6] M.J.S.Smith, Application-Specific Integrated Circuits. Addison-Wesley Longman, Jan. 1998.
[7] G.Albertengo, R.Sisto, Parallel CRC Generation, IEEE Micro, Oct. 1990.
[8] R.Lee, Cyclic Codes Redundancy, Digital Design, July 1977.
[9] A.Perez, Byte-wise CRC Calculations, IEEE Micro, June 1983.
[10] A.K.Pandeya, T.J.Cassa, Parallel CRC Lets Many Lines Use One Circuit, Computer Design, Sept. 1975.
[11] M.Braun et al., Parallel CRC Computation in FPGAs, in Workshop on Field Programmable Logic and Applications, 1996.
[12] J.McCluskey, High Speed Calculation of Cyclic Redundancy Codes, in Proc. of the 1999 ACM/SIGDA seventh Int. Symp. on Field
Programmable Gate Arrays, p. 250, ACM Press New York, NY, USA, 1999.
[13] M.Spachmann, Automatic Generation of Parallel CRC Circuits, IEEE Design & Test of Computers, May 2001.
[14] M.D.Shieh et al., A Systematic Approach for Parallel CRC Computations, Journal of Information Science and Engineering, May
2001.
[15] G.Griffiths and G.Carlyle Stones, The Tea-Leaf Reader Algorithm: An Efficient Implementation of CRC-16 and CRC-32,
Communications of the ACM, July 1987.
[16] D.V. Sarwate, Computation of Cyclic Redundancy Checks via Table Look-Up, Communications of the ACM, Aug. 1988.
[17] Gene F.Franklin et al., Feedback Control of Dynamic Systems. Addison Wesley, 1994.
[18] J.Borges and J.Rifa, A Characterization of 1-Perfect Additeive Codes, Pirdi-1/98, 1998.
Giuseppe Campobello was born in Messina (Italy) in 1975. He received the Laurea (MS) degree in Electronic
Engineering from the University of Messina (Italy) in 2000. Since 2001 he has been a Ph.D. student in Information
Technology at the University of Messina (Italy). His primary interests are reconfigurable computing, VLSI design,
micropocessor architectures, computer networks, distributed computing and soft computing. Currently, he is also
associated at the National Institute for Nuclear Physics (INFN), Italy.
20
Giuseppe Patan`e was born in Catania (Italy) in 1972. He received the Laurea and the Ph.D. in Electronic Engineering
from the University of Catania (Italy) in 1997 and the University of Palermo (Italy) in 2001, respectively. In 2001, he
joined the Department of Physics at the University of Messina (Italy) where, currently, he is a Researcher of Computer
Science. His primary interests are soft computing, VLSI design, optimization techniques and distributed computing.
Currently, he is also associated at the National Institute for Nuclear Physics (INFN), Italy.
Marco Russo was born in Brindisi in 1967. He received the M.A. and Ph.D. degrees in electronical engineering
from the University of Catania, Catania, Italy, in 1992 and 1996. Since 1998, he has been with the Department of
Physics, University of Messina, Messina, Italy as an Associate Professor of Computer Science. His primary interests
are soft computing, VLSI design, optimization techniques and distributed computing. He has more than 90 technichal
publications appearing in internationl journals, books and conferences. He is coeditor in the book Fuzzy Learning and
Applications (CRC Press, Boca Raton, FL). Currently, he is in charge of Research at the National Institute for Nuclear Physics (INFN).