TDC Verified

A Low-complexity FPGA TDC based on a DSP
Delay Line and a Wave Union Launcher

Zijie Wang∗ , Jiajun Lu∗ , Jose Nunez-Yanez†
University of Bristol,UK∗ , University of Linkoping, Sweden†
Email: ∗ sv20118@bristol.ac.uk, ∗ gf2064@bristol.ac.uk, † jose.nunez-yanez@liu.se
Abstract—High-precision time-to-digital converters (TDCs) are com/LGG1997/FPGA TDC 1 0 to encourage validation

key components for controlling quantum systems and FPGAs and further work in the area.
have gained popularity for this task thanks to their low-cost
and flexibility compared with Application Specific Integrated II. R ELATED WORK
Circuits (ASICs). This paper investigates a novel FPGA-based
TDC architecture that combines a wave union launcher and delay Currently, there are several mature time-to-digital converter
lines constructed with DSP blocks. The configuration achieves (TDC) architectures based on different platforms. In the ap-
a 8.07ps RMS resolution on a low-cost Zynq FPGA with a plication specific integrated circuit (ASIC) domain, most of
power usage of only 0.628W. The low power consumption is the digital TDCs consist of coarse counters and fine counters,
achieved thanks to a combination of operating frequency and
logic resource usage that are lower than other methods, such as where the coarse component is implemented with an ordinary
multi-chain DSP based TDCs and multi-chain CARRY4 based counter to roughly calculate the pulse width, and the fine part
TDCs. is usually constructed by delay lines to obtain a finer time
Index Terms—Quantum computing, Field Programmable Gate resolution. For example, the flash TDC published by Takahiro
Array (FPGA), Time-to-digital converter (TDC), Wave Union J. Yamaguchi et al. achieves 1.9ps resolution [6] and the flash
Launcher, DSP delay lines
delay-line is constructed using a 65nm CMOS process. H.
Huang and C. Sechen claims to have achieved 0.1ps resolution
I. I NTRODUCTION
TDC utilising dynamic buffer delay-line based on a 45nm
The field of Quantum Computing is advancing rapidly with CMOS process [7]. Constract to the previously mentioned
applications in the areas of medicine, communications, and digital TDCs, two kinds of analog TDCs have been proposed
artificial intelligence [1] [2] [3]. TDCs play a vital role in by, Haruo Kobayashi et, Integral and Delta-Sigma TDC [8],
almost all existing computational systems for measuring the where the former method is built with two counters, and
arrival time of particles such as photons [4]. FPGAs are widely the latter detects phase differences to measure time intervals.
used in the TDC field due to superior flexibility compared Although ASIC TDCs have higher accuracy than FPGA TDCs,
to ASICs as well as high-performance with relatively low- the flexibility and low NRE (Non-Returnable-Engineering) of
cost and ease-of-use characteristics. Furthermore, FPGAs can FPGAs have increased its popularity in this field. Traditional
offer ps level of TDC detection accuracy required by a low- FPGAs can implement a wide variety of logical functions but
latency quantum controller capable of running closed-loop they are not suitable for designing analog circuits. Hence, the
control algorithms. An additional important requirement in FPGA-based TDC is similar to the digital ASIC-based TDC,
many Quantum systems is the need to operate at very low- containing a coarse part and a fine part, in which the fine part
temperatures inside a cryostat [5]. In essence the electronic has three common realizations: vernier TDC [9], multi-phase
control system should be physically close to the cryostat clock TDC [10] [11] and tapped-delay-line (TDL) TDC [12]
to reduce latency but any effects on cryostat temperature [13]. Modern FPGAs include large numbers of look-up tables
should be minimised. The capability of power dissipation in (LUTs), digital signal processors (DSPs) and CARRY4 delay
the cryostat is very limited and negatively affected by the components, and the fabric is particularly suitable for TDL
electronic control system. Motivated by these observations this TDC such as the CARRY4 delay-line-based TDC [14] and
research investigates a novel realisation in FPGA technology DSP48E1 delay-line-based TDC [15]. However, FPGA-based
of a high-performance TDC hardware based on a wave union TDL-TDC resolution suffers from ultra bin width in some
launcher and DSP (Digital Signal Processing) blocks. The logic boundaries, which are dominantly caused by long wiring
paper contributions are as follows: between the component logic, such as adjacent DSP48E1s or
• We design a low-complexity and low-frequency TDC that CLBs. It is possible to utilize multiple delay lines to decrease
minimizes the power dissipation and temperature effects. the influence of the ultra bin width and to increase the TDC’s
• We perform a comparison of the proposed circuit in terms resolution at the cost of extra logic usage. J. Wu and Z. Shi
of resolution, power and logic utlization with with other provided an alternative solution based on using a wave union
mainstream FPGA-TDCs. launcher (WUL) to generate multi-edge pulses in order to
• We make the hardware that achieves an accuracy of 8 ps subdivide the ultra bin width in the CARRY4 delay lines [16]
on low-cost FPGA devices open-source at https://github. [17]. The advantage is that this can increase the resolution
and it also can reduce the amount of logic used compared with a frequency close to the system clock when the chip’s
with creating multi-delay lines. temperature fluctuation exceeds a certain range. The pulses hit
Considering the merits of the low propagation delay in the each bin multiple times and the wider bin collect more hits.
integrated DSP blocks and the low logical consumption of the Thus, the block can measure the actual width of each bin in
wave union launcher, this paper proposes an approach that the delay line and correct the fine time output.
combines a DSP delay line and wave union launcher as its
main novelty, achieving a TDC with a good compromise in IV. S UB - BLOCKS ARCHITECTURE
terms of resolution performance and low power consumption. A. Clock and Random-Pulse
III. FPGA-TDC ARCHITECTURE This module has two functions. The first function is to
generate the TDC module clock and reset signal, and the
The main component in the architecture is the fine counter
second one is to generate a 172.02MHz pulse into the TDC
that is composed of a wave union launcher, a single delay line
Fine block as a calibration signal. The interconnection of the
created with several DSP blocks, a two-level register, a multi-
block is shown in Fig.2, which instantiates three MMCMs with
edge detection encoder block and a calibration module as seen
different configurations. The “i clk” signal is connected to the
in Fig. 1. The purpose of the wave union launcher which
onboard system clock in the Zedboard (100MHz), and then
is realized through several CARRY4 logic components is to
“clk inst0” uses it to generate 2 100MHz output clocks as the
produce multi-pulses into the delay line, and the functionality
clock sources for “clk inst1” and “clk inst2”. The presence of
of the two-level register array is to eliminate the effects of
“clk inst0” ensures that the routing is successful and avoids
meta-stability. The multi-edge encoder can be configured to
routing errors during the implementation phase reported by
detect rising, falling or any clock edge. The positions of these
Vivado. The “clk inst1” uses the “clk out1” (100MHz) to
edges in the delay line are encoded to Binary-Coded Decimal
produce a 172MHz system clock (o clk) for TDC Fine block,
(BCD) for subsequent processing.
while “clk inst2” utilizes “clk out2” to generate a 172.02MHz
and then output to “pulse inst” block. The purpose of using
172MHz as a system clock is to reduce the logical utilisation
of delay-line which means that the delay-line can be shorten
without increasing the clock frequency. On the other hand,
random pulses near the system clock frequency (172MHz)
are useful to perform a efficient code density test for self-
calibration, meanwhile, the pulse frequency closest to 172MHz
can only be set to 172.03MHz due to the limitation of the
MMCM block in the Zedboard.
Fig. 1. The structure of the Wave Union based FPGA-TDC.
There is a single clock in the system clk 172m with a

frequency of 172 MHz driving the two-level register array, the
multi-edge encoder and the calibration module. A global clock
block generates a system clock using a Mixed-Mode Clock Fig. 2. The clock and reset signal module.
Manager (MMCM) and a buffer global (BUFG) integrated in
FPGA. The MMCM utilizes a 100 MHz onboard clock in The main purpose of pulse inst module is frequency divi-
the Zedboard and generates a 172MHz clock with minimum sion. In this project, auto-calibration system requires a pulse
clock jitter. Due to the variations in operating temperatures resource, and project aims to make full use of resource in
the bin width of the TDC is not constant and, for example, FPGA without attachment any extra device as possible. Thus,
any increases in the chip’s temperature would lengthen the using a 172MHz clock source as a stimulus is a good choice.
propagation delay in the CARRY4 block, which would expand The first problem is that the input pulse’s frequency should
the bin width and reduce the accuracy of the system. For be close to the clock frequency as possible for producing
this reason, it is essential to carry out a real-time calibration high-precision self-calibration results. A available solution is
of the system to improve resolution. The calibration module to create a pulse with a 172.03MHz based on 172MHz clock
aims to address the limitation, emitting a series of pulses by using MMCM in Xilinx FPGA. Fig.3 (a) shows a case using
an original 172.02MHz clock as a stimulation source into the the CIN port of the first CARRY4, and the COUT port of each
delay line for auto-calibration. In that case, the pulse’s width is CARRY4 is connected to the CIN port of the next CARRY4.
too short, which will deduce the multi-edge encoder sampling Such a configuration pattern can minimize the propagation
an unpredicted number of pulse’s edge, which influences the delay between CARRY4 blocks to avoid a large pulse width.
FPGA-TDC reliability. In addition, if the 172.02MHz pulse is Furthermore, the S0 port of the last four CARRY4 is connected
input directly into the DSP delay line, it may cause one pulse to the trigger signal to act as a switch that decides whether the
to be sampled multiple times in the DSP delay line. Generally, signal from the previous stage can be transmitted to the next
the total propagation delay time of the DSP delay line will be stage. The bypass input ports (DI0) of the first multiplexer in
larger than the period of the 172.02MHz calibration pulse, the last four CARRY4 are assigned to 1, 0, 1 and 0 respectively
which will cause the pulse to be sampled multiple times. Our to initialize the state of the launcher. The launcher can be
solution is to subdivide the calibration pulse so that on one extended by attaching more CARRY4 components.
hand, the pulse can maintain the same phase as the original
calibration pulse and on the other hand, it is not sampled
multiple times (Fig.3 (b)).
Fig. 4. The wave union launcher’s structure.

Fig. 3. The random-pulse transmission diagram in the pulse gen block.
The transmission process of the signal is depicted in Fig.5.
Fig.5 (a) shows that without any trigger signal input there
B. Wave Union Launcher are 4 edges in the launcher unit, 2 falling edges and 2 rising
There are two different wave union styles, the Finite Step edges, with blue grids representing logic low, and orange grids
Response (FSR) type and the Infinite Step Response (ISR) representing logic high. The first MUX of the last group of
type. The FSR wave union launcher can produce a limited four CARRY4 outputs the signal of DI0 port, which also
amount of edges such as a 3-edge or 5-edge sequence into the plays the role of isolating the previous stage signal. When the
delay line while the ISR can generate an unlimited number of trigger signal is high, the first MUX of the last four CARRY4
edges. Ideally, increasing the number of edges can improve components starts to transmit signals in channel 1 from the
the resolution of the system [18]. However, the maximum previous stage to the next CARRY4. At this point the wave
number of edges that can be accommodated by the delay union signals move towards the right. Fig.5 (b) shows that
chain is related to its length, which means that in practice the signal has shifted one grid to the right and that there are
the ISR wave launcher accuracy is also limited. Moreover, the three rising edges in the wave union generator. The example
uncertain number of edges produced by the ISR complicates case shown in Fig.5 (c) demonstrates that the signal has been
the accuracy analysis and for this reason the FSR wave union transmitted for one period and has a phase difference of 90°
launcher is selected in this work. The schematic of an FSR from the initial signal.
launcher can be seen in Fig.4, which can emit 3 positive edges
and 2 negative edges. The launcher consists of 5 CARRY4 C. DSP48E1 delay lines
components with different configurations. One of the reasons The shorter the propagation delay within the same bin in
for choosing the CARRY4 is that it has an extremely short delay line, the higher the accuracy the TDC can achieve.
propagation delay initially designed for constructing a fast A delay line based on the integrated DSP blocks available
Carry Propagate Adder (CPA). In addition, each CARRY4 in the FPGA can achieve a smaller bin width [19] than
contains four 2 to 1 multiplexers (MUX), four selection signals using the CARRY4 logic. In this research, a single delay
(S0, S1, S2, S3), a carry input (CIN), a carry output (COUT) line is composed of 16 DSPs with a total of 768 output
and four data outputs (CO0, CO1, CO2, CO3). When the ports (768 bins). Except for the first DSP48E1, the rest are
selection signal is 0, the multiplexer will output the signal of configured into a 48-bit post adder to sum the values of
channel 0, and when the selection signal is 1, it will output the CARRYCASCIN and input C where the input signal C has
signal of channel 1. The trigger signal is directly connected to 48 bits, and CARRYCASIN has only one bit. The result of
as carry-lookahead inside the DSP blocks, the pulse transmis-
sion may randomly generate “bubbles” such as 11111111 to
00001011. It is significant to design a proper edge detection
module to address this problem. The raw circuit schematic of
the rising edge detection is realized by multiple three-input
AND gates (Fig.7 (a)), which recognizes the transition “100”
rather than “10” and eliminate bubbles. In addition, the same
structure can be used for detecting falling edges by modifying
the input pattern. If there are larger bubbles such as 00001001,
the number of inputs in the AND gate can be increased. In
fact, such circuit cannot be directly implemented in FPGA, and
it is necessary to be mapped into the circuit corresponding to
FPGA through EDA tools such as Vivado. Because the AND
Fig. 5. The transmission process of the wave union.
gate has the characteristic of 3-input and 1-output, EDA tools
will transform it to a LUT with 6-input and 1-output, as shown
the addition is output to P[47:0] and the carry data is assigned in Fig.7 (b). It is noteworthy that if only bin 0 and bin 1 show
to CARRYCASOUT. high in the delay chain, it is difficult to determine whether
The first DSP48E1 of the delay line must use the CAR- the input pattern is a valid rising edge or noise. Therefore, to
RYIN port to connect the output of the WUL instead of the avoid this uncertainty outputs 0 and 1 are bypassed by directly
CARRYCASIN since the CARRYCASIN is used to connect connecting them to ground.
the previous level DSP48E1’s CARRYCASOUT. If the output
of WUL is directly connected to CARRYCASIN, then Vivado
generates a synthesis error. The CARRYCASIN of the other
DSPs are connected to the CARRYCASOUT of the previous
stage to form the delay chain. The 48-bit input C of each DSP
block is set to 48’b1111. . . 1111 in order to generate a carry to
the next stage when the CARRYCASIN is high. The 768 ports
will output 1 without any trigger into CARRYIN and when a
signal inputs to the CARRYIN port of the first DSP, the output
of the carry line will successively change from 1 to 0 from left
to right. It is important to note that the carry-lookahead adder
inside the DSP blocks could lead to output jumps so that the
Nth output port is changed to 0 before the (N-1)th output port
and a solution to this unwanted situation is discussed in the
next section. The total length of the delay line is 768 bins,
and a longer delay line can be achieved appending additional
DSP components.
Fig. 7. The Look-Up Table based edge detection block. (a) The raw circuit
schematic of the edge-detection block implemented by Verilog. (b) The
equivalent circuit mapping in FPGA.
E. Multi-edge encoder module

The multi-edge encoder transfers the data from the multi-
Fig. 6. The structure of the single DSP48E1 delay line. edge detection module into a binary code. A the delay line
based on CARRY4 is encoded using a priority encoder, which
only converts the edge located in the highest position of the
D. Multi-edge detection module delay line into a binary code. In this research, it is necessary
Multi-edge detection is necessary to read out the position of to sum all edge positions in the delay line, so the priority
the edges accurately. A common issue is how to avoid a “data encoder is not suitable. In contrast to the priority encoder, the
bubble”. In the ideal case, the data of the DSP delay line is multi-edge encoder encodes the position of each edge in the
recorded through two-level registers and is transmitted into the delay line and adds them up to gain the total result [20]. The
detection. Usually, the 1 to 0 transitions are clean thermometer schematic of the module is shown in Fig.8 consisting of 16
codes such as 11111111 to 00001111. However, due to the encoder units. Each block has 48 input ports, connecting the
uneven propagation delays in the FPGA and structures such output of the multi-edge detection, the binary code resulting
from the sum of the edge positions and block offset that is the
product of the number of edges in the block and offset indexes.
The encoder unit is designed with 48 input ports in order to
align the number of output ports on the DSP so that the same
number of encoder units can be added when the delay chain is
extended by attaching more DSPs to the tail. The output of the
16 blocks is summed together through 31 adders arranged in
5 rows, and each row represents a pipeline, which means that
the output data of the encoder module needs 5 clock cycles to
obtain the final accumulation result (OUT[15:0]). All of the
components are driven by the 172Mhz clock.
Fig. 9. The implement detail of encode unit (a) and sub-encode (b).
mainstream methods. All of the blocks are built based on

parameterization and modularization, which means that the
FPGA TDC can switch to different types by setting different
parameters such as multi-carry4 chains based TDC, multi-
DSP chains based TDC, single-DSP chain Wave Union TDC
and single-carry4 chain Wave Union TDC. The first three
types of TDC utilize a priority encoder while the Wave Union
based TDC uses a multi-edge encoder. Similarly, the length
of the delay line can also be modified with parameters. All
the variations target the zynq-7000 device available in the
Fig. 8. Multi-edge encode block with 5-level output pipeline. Zedboard and the tests are carried out at the same room
temperature (25°C). The Xilinx ILA IP core is used to obtain
The schematic of the “Encoder Unit” depicted in Fig.9 (a) is the data output by encoders.
composed of 8 sub-encoder modules shown in Fig.9 (b). There
are six input ports on the left of each sub-encoder to connect A. Code density test
the output of the delay line, while there are two different types The linear code density test can measure the relative width
of outputs on the right side. The upper port outputs the sum of each bin. Theoretically, the width of each bin should be
of the edge positions obtained by multiplying different input identical but due to the influence of process variation, routing,
ports by corresponding coefficients. The bottom port sums all temperature and other factors, the bin’s width can vary [21].
input ports to get the number of edges. For example, assuming Consequently, the code density test is an effective approach
that the input ports of the first and fifth sub-encoder are set to to detect the width of each bin to calculate the accuracy of
1, the first port is multiplied by 1, the fifth port is multiplied by TDC. To enable the test, the mode of the TDC needs to be
6, and then their results are added to obtain the position sum of switched to auto-calibration mode. In this case, the WUL will
all edges (7). Meanwhile, the port “Number of edges” outputs shield the “trigger” input signal while enabling random pulses
2. Considering the characteristics of the Look-Up Table (LUT) in the “Clock & Reset” module, generating 80,000 to 120,000
in the FPGA, which has 6 input ports and one output port, it is random pulses. In the test, we have obtained the code density
important to design a sub-encoder module corresponding to the histogram of four different wave union configurations, generat-
LUT ports, which can reduce the logic depth. For example, ing one-edge, two-edge, four-edge, and six-edge respectively.
implementing a function with 6 inputs and one output only The objective is to explore if the proposed structure can reach
requires one LUT, while two LUTs are required to realize an acceptable resolution. The width of the Nth bin in the test
the function with 8 inputs and one output also increasing the equals 5813 × MN /MT OT AL ps, where MN and MT OT AL
fan out and extending the delay time. Therefore, the sub- represent the number of edges that appear in the Nth bin and
encoder module is designed with 6 inputs. The structure of the total number of pulses respectively.
the “Encoder module” is similar to “sub-encoder”. It should The TDC’s bin width based on DSP delay line architecture
be noted that each adder has a pipeline, so there are two-stage is shown in Fig.10. These figures demonstrate the histograms
pipelines in the “Encoder module”, and a total of five-stage of bin width of the TDC (DSP + Wave Union) corresponding
pipelines in the multi-edge encoder. to one-edge (a), two-edge (b), four-edge (c), and six-edge
(d) respectively, where the abscissa represents the effective
V. E XPERIMENTS bins’ position, and the ordinate exhibits the width of each
This section verifies the performance and complexity of the bin. The one edge case indicates that the launcher is disabled.
proposed architecture and its accuracy compared with other According to Fig.10 (a), the ultra- width bin located in the 48th
Fig. 10. The code density histogram of four different wave union configurations. (a) DSP based TDC without wave union launcher with single edge detection.
(b) two-edge mode in wave union launcher (only detecting falling edges). (c) four-edge mode in wave union launcher (detecting rising-edge and falling-edge).
(d) six-edge mode in wave union launcher(detecting rising-edge and falling-edge).
bin and the 96th bin reaches 250ps and 300ps respectively. the structural characteristics ofthe FPGA, it is difficult to
Since a DSP has only 48 output ports, the cascading of the ensure the deviation of each bin within 1 LSB.
two DSPs extends the delay time due to long routing in The DNL of the DSP Wave Union TDC is shown in Fig.11.
FPGA, increasing the 48th bin width to form an undesirable Due to the ultra-width bin in the DSP, the values of several
ultra-width bin. As seen in Fig.10 (b), the peak value of DNL are much larger than the surrounding values. Some of
the ultra-width bin has decreased below 200ps in the two the bins have a very large DNL value that nearly reaches 30
waves case, and some of the missing bins that appear in LSB, while the lowest value is below 1 LSB, which confirms
the one wave case have been compensated, which are the that ultra-width bins still exist. Fortunately, most of the bins
factors that reduce RMS resolution to 45.27ps. The remaining are in the range of 4 LSB to 1 LSB.
Fig.10 (c), (d) show the same trend in reducing the bin width
and increasing resolution to 17.65ps and 8.07ps respectively.
Comparing the four histograms, it is apparent that increasing
the produced edges in WUL can significantly promote the
uniformity of TDC, which also shows that it is a feasible
method to subdivide the DSP’s ultra-width bin utilizing the
wave union. However, when the number of edges is further
increased, the improvement of accuracy is small. One of
the reasons is that the multi-edge encoder cannot recognize
additional edges when they are in the same bin, and the
encoder can only consider them as one edge. In this case, the
coding of the bin is lost, resulting in a poor subdivision effect.
Fig. 11. Different non-linearity of the bins.
Another factor is due to the width of the waves. The wave
union launcher is based on the CARRY4 components and the Integral Nonlinearity (INL) expresses the degree of devia-
gap between the rising edge of the two waves is equal to the tion between the actual transfer function and the theoretical
delay time of the two CARRY4s. The distance between waves value. We expect the curve to be approximately a straight line
is a factor that limits the possible enhancement of accuracy in with a constant slope. As seen in Fig.12, combining the DNL
the TDC. analysis mentioned before, the values of the DNL concentrate
from 1 LSB to 4 LSB. The slope of DNL curve always changes
B. Different Nonlinearity and Integral Nonlinearity
slightly, and it shows an upward trend.
In the ideal situation, the differential nonlinearity (DNL) of
the TDC is 0 Least Significant Bit (LSB) [22]. The LSB can VI. C OMPLEXITY A NALYSIS
be expressed by T clk/Bin total, where Tclk is the sampling The resource utilization and power consumption of the
clock period in the delay line and the second parameter is FPGA are key metrics for the TDC. Reducing the logic
the amount of sum of bins. In the six-edge configuration, resources can contribute a lower power and temperature for
the sampling clock is 5.813ns and Bintotal is 2358 (effective FPGA-TDC. Table I evaluates the implementation complex-
bins), therefore, the 1 LSB is 2.465ps. If DNL is less than ity of the TDC in different wave union configurations and
the absolute value of 1 LSB, the transmission function has compares it with alternative TDC designs based on multi-
guaranteed monotonicity without code loss. However, due to chain DSP and multi-chain CARRY4. In order to ensure the
TABLE I
R ESOURCE USAGE AND POWER REQUIREMENTS IN DIFFERENT TDC S
RMS (ps) POWER (W) LUT FDCE MMCM DSP CARRY4

One CARRY4 delay line 76.76 0.481 7270 1698 3 0 220
Two CARRY4 delay lines 41.07 0.503 8395 3380 3 0 429
Four CARRY4 delay lines 22.71 0.549 10664 6723 3 0 848
Eight CARRY4 delay lines 12.50 0.645 14947 13406 3 0 1674
One DSP delay line 121.45 0.510 14562 2242 3 16 28
Two DSP delay lines 57.31 0.546 15196 4462 3 32 45
Four DSP delay lines 22.75 0.606 16447 8894 3 64 80
Eight DSP delay lines 10.33 0.707 18963 17772 3 128 138
One-edge WUL 113.33 0.620 17077 5017 3 16 419
Two-edge WUL 45.27 0.620 17077 5018 3 16 427
Four-edge WUL 17.65 0.628 17093 5018 3 16 427
Six-edge WUL 8.07 0.628 17095 5018 3 16 433
significant improvement from 45.27ps to 17.65ps, and reaches

8.07ps RMS in with the six-edge configuration.
VII. C ONCLUSIONS
The performance of the DSP delay line and wave union
launcher architecture successfully achieves 8.07ps RMS with
0.628W power consumption. The RMS time resolution of the
DSP delay line based TDC has a lower transmission delay
and small bin width compared with the CARRY4 based delay
line. The wave union can subdivide the ultra-width bin and
also reduce the resource utilization in FPGA. The wave union
Fig. 12. Integral non-linearity of the bins.
TDC’s results demonstrate the possibility of combining a
DSP delay line and a wave union launcher to achieve high
resolution while maintaining low resource usage. Future work
reliability of the experiment, all TDCs use a common clock involves modifying the multi-edge encoder to adopt multi-
generation block and random pulse generation block consisting delay lines based on DSPs and the wave union method.
of the same number of MMCMs and the BUFGs. As the This should further reduce the FPGA resources because each
number of delay lines in the multi-chain CARRY4 based delay line in the multi-chain structure requires an independent
TDC increases, the RMS improves with a higher consumption priority encoder but the multi-edge encoder just needs one.
of logic. For example, the FDCE resource requirements of We also plan to evaluate the self-calibration capabilities of the
the eight-chain (13406) doubles the four-chain (6723). The architecture when the FPGA is operating at low temperatures
RMS has improved up to 12.5ps with the eight-edge config- inside the cryostat and how the adaptive voltage scaling
uration compared to 22.7ps with the four-edge configuration. techniques developed can further reduce power consumption.
Similarly, the multi-chain DSP based TDC shows the same Overall, the solution of combining a single DSP delay line
trend in RMS, power and logical usage. The RMS of the and wave union TDC achieves accuracy comparable to using
eight-chain DSP based TDC has a higher value reaching DSP delay lines, but it also shows great potential in saving
10.1ps. However, the method has the highest LUT (18963) FPGA resources.
and FDCE (17772) count, and power consumption reaches
R EFERENCES
0.707W. The advantages of the wave union structure is that
the logic utilisation and power consumption only increase [1] V. Hlukhov, ”FPGA Based Digital Quantum Computer Verification,”
2020 IEEE 11th International Conference on Dependable Systems,
slightly with the number of wave union edges, while the RMS Services and Technologies (DESSERT), 2020, pp. 178-182.
improves from 113.33 ps to 8.07 ps. The wave union launcher [2] V. G. Kanas, O. Karamitrou and K. N. Sgarbas, ”On the development of
composed of four CARRY4 can produce four edges, while a brain quantum-computer interfaces,” 2014 13th International Conference
on Control Automation Robotics Vision (ICARCV), 2014, pp. 239-242.
two-edge TDC just detects the rising edge. Hence, appending [3] N. Elsayed, A. S. Maida and M. Bayoumi, ”A Review of Quantum
a falling edge detection can obtain the four-edge TDC without Computer Energy Efficiency,” 2019 IEEE Green Technologies Confer-
modifying the launcher or the DSP delay line. Therefore, the ence(GreenTech), 2019, pp. 1-3.
[4] Q. Shen et al., ”Time interval analyzer with FPGA-based TDC for free
change from two-edge to four-edge only slightly increase the space quantum key distribution: Principle and validation with prototype
resources and the power consumption. The TDC’s RMS has a setup,” 2012 18th IEEE-NPSS Real Time Conference, 2012, pp. 1-6.
[5] E. Charbon et al., ”Cryo-CMOS for quantum computing,” 2016 IEEE [14] Y. Wang, X. Zhou, Z. Song, J. Kuang and Q. Cao, ”A 3.0-ps rms
International Electron Devices Meeting (IEDM), 2016, pp. 13.5.1-13.5.4. Precision 277-MSamples/s Throughput Time-to-Digital Converter Using
[6] Yamaguchi, T.J., Komatsu, S., Abbas, M., Asada, K., Mai-Khanh, Multi-Edge Encoding Scheme in a Kintex-7 FPGA,” in IEEE Transac-
N.N. and Tandon, J., 2012, June. A CMOS flash TDC with 0.84–1.3 tions on Nuclear Science, vol. 66, no. 10, pp. 2275-2281, Oct. 2019.
ps resolution using standard cells. In 2012 IEEE Radio Frequency [15] Tancock, S., Dahnoun, N. (2019). A 5.25ps-resolution TDC on FPGA
Integrated Circuits Symposium (pp. 527-530). using DSP blocks. Paper presented at International Conference on
[7] H. Huang and C. Sechen, ”A 14-b, 0.1ps resolution coarse-fine time- Digital Image Signal Processing, Oxford, United Kingdom.
to-digital converter in 45 nm CMOS,” 2014 IEEE Dallas Circuits and [16] J. Wu and Z. Shi, ”The 10-ps wave union TDC: Improving FPGA
Systems Conference (DCAS), 2014, pp. 1-4. TDC resolution beyond its cell delay,” 2008 IEEE Nuclear Science
[8] H. Kobayashi et al., ”Fine Time Resolution TDC Architectures -Integral Symposium Conference Record, 2008, pp. 3440-3446.
and Delta-Sigma Types,” 2019 IEEE 13th International Conference on [17] J. Wu, ”On-Chip processing for the wave union TDC implemented in
ASIC (ASICON), 2019, pp. 1-4. FPGA,” 2009 16th IEEE-NPSS Real Time Conference, 2009, pp. 279-
[9] Y. Wang, P. Kuang and C. Liu, ”A 256-channel multi-phase clock 282.
sampling-based time-to-digital converter implemented in a Kintex-7 [18] J. Wang, S. Liu, L. Zhao, X. Hu and Q. An, ”The 10-ps Multitime
FPGA,” 2016 IEEE International Instrumentation and Measurement Measurements Averaging TDC Implemented in an FPGA,” in IEEE
Technology Conference Proceedings, 2016, pp. 1-5, Transactions on Nuclear Science, vol. 58, no. 4, pp. 2011-2018, Aug.
[10] Y. Jiao, X. Shi, L. Zhou, W. Chen and C. Chen, ”A Vernier Caliper Time- 2011.
to-Digital Converters with Ultralow Nonlinearity in FPGAs,” 2020 IEEE [19] S. Tancock, E. Arabul, N. Dahnoun and S. Mehmood, ”Can DSP48A1
20th International Conference on Communication Technology (ICCT), adders be used for high-resolution delay generation?” 2018 7th Mediter-
2020, pp. 1655-1659. ranean Conference on Embedded Computing (MECO), 2018, pp. 1-6.
[11] N. Lusardi, S. Salgaro, F. Garzetti, N. Corna, G. Ticozzi and A. [20] Y. Wang, J. Kuang, C. Liu and Q. Cao, ”A 3.9-ps RMS Precision Time-
Geraci, ”FPGA-based Multi-Phase Shift-Clock Fast-Counter Time-to- to-Digital Converter Using Ones-Counter Encoding Scheme in a Kintex-
Digital Converter for Extremely-Large Number of Channels,” 2020 7 FPGA,” in IEEE Transactions on Nuclear Science, vol. 64, no. 10, pp.
IEEE Nuclear Science Symposium and Medical Imaging Conference 2713-2718, Oct. 2017.
(NSS/MIC), 2020, pp. 1-4. [21] J. Wu, ”Several Key Issues on Implementing Delay Line Based TDCs
[12] P. Chen, Y. Hsiao and Y. Chung, ”A high resolution FPGA TDC Using FPGAs,” in IEEE Transactions on Nuclear Science, vol. 57, no.
converter with 2.5 ps bin size and 3.796.53 LSB integral nonlinearity,” 3, pp. 1543-1548, June 2010.
2016 2nd International Conference on Intelligent Green Building and [22] M. W. Fishburn and E. Charbon, ”Time-to-digital converters for PET:
Smart Grid (IGBSG), 2016, pp. 1-5. An examination of metrology aspects,” 2012 IEEE Nuclear Science
[13] E. Bayer and M. Traxler, ”A high-resolution (60; 10 ps RMS) 32- Symposium and Medical Imaging Conference Record (NSS/MIC), 2012,
Channel Time-to-Digital Converter (TDC) implemented in a Field pp. 839-840.
Programmable Gate Array (FPGA),” 2010 17th IEEE-NPSS Real Time
Conference, 2010, pp. 1-5.

TDC Verified

Uploaded by

Copyright:

Available Formats

TDC Verified

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TDC Verified

Uploaded by

Copyright:

Available Formats

A Low-complexity FPGA TDC based on a DSP

Delay Line and a Wave Union Launcher

Abstract—High-precision time-to-digital converters (TDCs) are com/LGG1997/FPGA TDC 1 0 to encourage validation

Fig. 1. The structure of the Wave Union based FPGA-TDC.

There is a single clock in the system clk 172m with a

Fig. 4. The wave union launcher’s structure.

E. Multi-edge encoder module

mainstream methods. All of the blocks are built based on

RMS (ps) POWER (W) LUT FDCE MMCM DSP CARRY4

significant improvement from 45.27ps to 17.65ps, and reaches

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.