The Promise and Challenge of Stochastic Computing
The Promise and Challenge of Stochastic Computing
Fig. 4(b) implement such a complex arithmetic function?— therefore interpreted as hybrid analog–digital circuits because
now began to receive attention. The relation between the they employ digital components and signals to process analog
logic circuits and the stochastic functions they implement data. Theoretically, the AND gate of Fig. 2 can perform mul-
has been clarified, resulting in general design procedures for tiplication on numbers with arbitrary precision. However, to
implementing arithmetic operations [75]. Correlation effects find the probability pZ = pX ×pY we must obtain a finite num-
in SC have recently been quantified, leading to the surprising ber of discrete samples of the circuit’s output from which to
conclusion that correlation can serve as a valuable compu- estimate pZ . The estimation’s accuracy increases slowly with
tational resource [5]. Bit-stream length can be reduced by the number of samples, and is limited by noise considerations,
careful management of correlation and precision (progressive making it impractical to estimate pZ with high precision.
precision [6]). The high contribution of stochastic-BN con-
version circuits to overall SC costs [75] is being recognized A. Stochastic Number Formats
and addressed. New technologies, notably memristors, have
Interpreting SNs as probabilities is natural, but it limits them
appeared that have naturally stochastic properties which reduce
to the unit interval [0,1]. To implement arithmetic operations
data-conversion needs [43].
outside this interval, we need to scale the number range in
Despite these successes, SC still has limitations that must
application-dependent ways. For example, integers in the range
be considered when used in certain applications. Most impor-
[0, 256] can be mapped to [0, 1] by dividing them by a scaling
tantly, the run time of SC circuits increases prohibitively when
factor of 256, so that {0, 1, 2, . . . , 255, 256} is replaced by {0,
high precision or highly accurate computations are needed.
1/256, 2/256, . . . , 255/256, 1}. Such scaling can be considered
Recent investigations have shown that the long computation
as a preprocessing step required by SC.
time may lead to excessive energy consumption, thus making
SC can readily be defined to handle signed numbers. An SN
low-precision BC a better choice [1], [58], [62]. Manohar [58]
X whose numerical value is interpreted in the most obvious
provided a theoretical comparison between SC and BC and
fashion as pX is said to have the unipolar format. To accommo-
shows that even for multiplication, SC ends up having more
date negative numbers, many SC systems employ the bipolar
gate invocations (i.e., the number of times an AND gate is
format where the value of X is interpreted as 2pX −1, so the SC
called). De Aguiar and Khatri [1] performed a similar compar-
range effectively becomes [−1, 1]. Thus, an all-0 bit-stream
ison but instead of comparing the number of gate invocations,
has unipolar value 0 and bipolar value −1, while a bit-stream
they actually implemented BC and SC multipliers with dif-
with equal numbers of 0s and 1s has unipolar value 0.5, but
ferent bit widths. They concluded that SC multiplication is
bipolar value 0. Note that the function of an SC circuit usu-
more energy efficient for computations that require 6 bits
ally changes with the data format used. For instance, the AND
of precision (or lower). However, if conversion circuits are
gate of Fig. 2 does not perform multiplication in the bipolar
needed, SC is almost always worse than BC [1].
domain. Instead, an XNOR gate must be used, as shown in
This poses an important challenge to SC designers: their
Example 1 below. On the other hand, both formats can use the
designs must be competitive in terms of energy efficiency
same adder circuit. In what follows, to reduce confusion, we
with BC circuits of similar accuracy/precision. Some of
use X to denote the numerical value of the SN X. With this
the topics that can potentially address this problem are
convention, X = pX in the unipolar domain, while X = 2pX −1
as follows.
in the bipolar domain.
1) Exploiting progressive precision to reduce overall
Several other SN formats have appeared in the literature.
run time.
Inverted bipolar is used in [2] to simplify the notation for spec-
2) Exploiting SC’s error tolerance to improve energy usage.
tral transforms. In [61] the value of a bit-stream is interpreted
3) Reducing or eliminating the cost of data conversion.
as the ratio of 1s to 0s, which creates a very wide, albeit sparse,
Examples of these techniques appear in the current literature.
number range. Table I shows the various number formats
This paper focuses on more recent SC work than the sur-
mentioned so far. These formats deal with single bit-stream
vey [3], and attempts to highlight the big challenges facing
only. Dual-rail and multirail representations have also been
SC and their potential solutions. The remainder of this paper
proposed. Gaines [29], for example, presented dual-rail unipo-
is organized as follows. Section II provides a formal intro-
lar and bipolar number formats, along with the basic circuits
duction to SC and its terminology, including SC data formats,
for each format. Toral et al. [94] proposed another dual-rail
basic operations, and randomness requirements. Readers famil-
encoding that represents a ternary SN X = x1 x2 . . . xN , where
iar with the topic can skip this section. General synthesis
each xi ∈ {−1, 0, 1}; it will be discussed in Section IV-A. The
methods for combinational and sequential SC circuits are dis-
binomial distribution generator of [75], which is discussed in
cussed in Section III. Section IV examines the application
Section III, produces a multirail SN.
domains of SC, as well as some emerging new applications.
The conclusion and future challenges of SC are discussed in
Section V. B. Stochastic Number Generation
We can map an ordinary BN to an SN in unipolar format
using the SNG in Fig. 1. To convert the unipolar SN back
II. BASIC C ONCEPTS to the binary, it suffices to count the number of 1s in the
Probabilities are inherently analog quantities that corre- bit-stream using a plain (up) counter. Slight changes to these
spond to continuous real numbers. Stochastic circuits can be circuits allow for conversion between bipolar SNs and BNs. In
1518 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO. 8, AUGUST 2018
TABLE I
P OSSIBLE I NTERPRETATIONS OF A B IT-S TREAM OF L ENGTH N C ONTAINING N1 1 S AND N0 0 S
SC, number-conversion circuits tend to cost much more than Section I, if a bit-stream representing X is fanned out to both
number-processing circuits. For example, to multiply two 8- inputs of the AND gate in Fig. 2, the gate computes X instead
bit BNs using the SC multiplier in Fig. 2, we need two SNGs of X squared. This major error is due to maximal (positive)
and a counter. A rough gate count reveals that the conversion correlation between the AND’s input signals. In general, if cor-
circuits have about 250 gates while the computation part, is relation changes the output number, the resulting error does
just a single AND gate. Extensive use of conversion circuits not necessarily go toward zero as N increases.
can severely affect the cost of SC circuits. Qian et al. [76] It is instructive to interpret SN generation as a Monte Carlo
reported that the conversion circuits consume up to 80% of sampling process [6]. Consider again the SNG in Fig. 1 and,
the total area of several representative designs. For this rea- for simplicity, assume that both input B and random source R
son, it is highly desirable to reduce the cost of conversion have arbitrary precision. Assume further that the value pX of
circuits. B is unknown. The SNG effectively generates a sequence X of
Methods to reduce the cost of constant number generation N samples, and we can get an estimate p̂X of pX by counting
are investigated in [25] and [79]. For massively parallel appli- the number of 1s in X. It is known √ that p̂X converges to the
cations such as LDPC decoding, a single copy of random exact value pX at the rate of O(1/ N).
number generator can be shared among multiple copies of For most stochastic designs, LFSRs are used as the ran-
SC circuits to provide random inputs, thus effectively amor- dom number sources (RNSs) to produce stochastic bit-streams.
tizing the cost of conversion circuits [21], [89]. Furthermore, Although these random sources are, strictly speaking deter-
inherently stochastic nanotechnologies like memristors offer ministic, they pass various randomness tests [32], [44] and
the promise of very low-cost SNGs [43]. The cost of data con- so are considered pseudo-random. Such tests measure cer-
version can also be lowered if analog inputs are provided to tain properties of a bit-stream, e.g., the frequency of 1s, the
the SC circuit. In this case, it may be feasible to directly frequency of runs of k 1s, etc., and check the extent to which
convert the inputs from analog to stochastic using ramp- these properties match the behavior of a true random number
compare analog-to-digital converters [46], [64] or delta-sigma generator.
converters [83]. Despite what is commonly believed, SNs do not need to
pass many randomness tests. In fact, in order to have p̂X = pX
C. Accuracy and Randomness we only need X to have the correct frequency of 1s. So it is
The generation of an SN X resembles an ideal Bernoulli possible to replace RNSs by so-called deterministic sources,
process producing an infinite sequence of random 0s and 1s. which employ predictable patterns and lack most of the usual
In such a process, each 1 is generated independently with randomness attributes [6], [38]. An example of a deterministic
fixed probability pX ; 0s thus appear with probability 1 − pX . format is where all the 1s of an SN are grouped together and
The difference between the exact value pX and its estimated followed by all the 0s, as in 111111100000 [13].
value p̂X (estimated over N samples) indicates the accuracy To generate a deterministic bit-stream of the above form, we
of X. This difference is usually expressed by the mean square can use a counter to generate a sequence of deterministic val-
error (MSE) EX given by ues 0, 1/N, 2/N, . . . , (N − 1)/N and feed it to the comparator
of Fig. 1. It can be proved that the difference between p̂X (the
2 pX (1 − pX ) value of the generated bit-stream) and pX (the constant number
EX = E p̂X − pX = . (3)
N fed to the comparator) is no more than 1/N, implying that p̂X
Equation (3) implies that inaccuracies due to random fluctua- converges to pX at the faster rate of O(1/N). This motivates
tions in the SN bit-patterns can be reduced as much as desired the use of deterministic number sources in SC, and indeed
by increasing the bit-stream length N. Hence the precision of some SC circuits use such deterministic numbers [6]. However,
X can be increased by increasing N or, loosely speaking, the there are several challenges to overcome when deterministic
quality of a stochastic computation tends to improve over time. number formats are used, including limited scalability, and
This property is termed progressive precision, and is a feature the cost of number generation to conserve the deterministic
of SC that will be discussed further later. formats.
Stochastic circuits are subject to another error source which When many mutually uncorrelated SNs are needed, we
is much harder to deal with, namely insufficient independence can still extend the foregoing deterministic number generation
or correlation among the input bit-streams of a stochastic cir- approach, but its cost significantly increases with number of
cuit. Correlation is due to signal reuse caused by reconvergent inputs. Gupta and Kumaresan [34] described an SN multiplier
fanout, shared randomness sources, and the like. As noted in that produces exact results for any given input precision.
ALAGHI et al.: PROMISE AND CHALLENGE OF SC 1519
proposed [2], [7], [19], [49], [52], [82], [101]. These meth-
ods can be classified into two types depending on whether
the target design is reconfigurable or fixed. A reconfigurable
design has some programmable inputs that allow the same
design to be reused for different functions. A fixed design can
only implement one target function. In this section, unless oth-
erwise specified, we only discuss SC design in the unipolar
domain.
where the c(a1 , . . . , an )’s are constant coefficients. This is stochastic circuit in Fig. 10 realizes a polynomial of the form
a multilinear polynomial on X1 , . . . , Xn [see (4)]. An impor-
tant finding in [7] is that the coefficient vector − →
c = g(a1 , . . . , an )
n
−
→ F(X1 , . . . , Xn ) =
[c(0, . . . , 0), . . . , c(1, . . . , 1)] is the spectrum vector S spec-
T
2m
ified by (8). (a1 ,...,an ) j=1
∈{0, 1}n
a 1−aj
Example 2: Consider an XOR gate, which serves as
a multiplier in IBP format (see Example 1). Its original × Xj j 1 − Xj . (9)
truth table vector is [0 1 1 0]T . Replacing 0s and 1s by
−
→ T In this equation, for any (a1 , . . . , an ) ∈ {0, 1}n ,
+1s and −1s, we get the vector T = [ + 1 − 1 − 1 + 1] .
g(a1 , . . . , an ) denotes the weight of the Boolean func-
Applying (8) to perform the Fourier transform yields the
tion f (a1 , . . . , an , r1 , . . . , rm ) on r, . . . rm , i.e., the number of
spectrum vector
input vectors (b1 , . . . , bm ) ∈ {0, 1}m such that f (a1 , . . . , an ,
⎡ ⎤⎡ ⎤ ⎡ ⎤
+1 +1 +1 +1 +1 0 b1 , . . . , bm ) = 1.
−→ 1⎢ +1 −1 +1 −1 ⎥⎢ −1 ⎥ ⎢ 0 ⎥ Example 3: Consider the case where the combinational cir-
S = 2⎢ ⎥⎢ ⎥ = ⎢ ⎥.
2 ⎣ +1 +1 −1 −1 ⎦⎣ −1 ⎦ ⎣ 0 ⎦ cuit in Fig. 10 is a MUX, with x1 and x2 as its data inputs
+1 −1 −1 +1 +1 1 and r1 as its select input. Then, the circuit’s Boolean function
is f (x1 , x2 , r1 ) = (x1 ∧ r1 ) ∨ (x2 ∧ r1 ). We have f (0, 0, r1 ) = 0,
This again shows that the stochastic function of XOR is IBP f (0, 1, r1 ) = r1 , f (1, 0, r1 ) = r1 , and f (1, 1, r1 ) = 1.
multiplication. Correspondingly, we have g(0, 0) = 0, g(0, 1) = g(1, 0) = 1,
Based on the relation between spectral transforms and and g(1, 1) = 2. According to (9), the circuit’s stochastic
stochastic circuits, a method to synthesize a stochastic cir- function is
−
→
cuit for a target function S is proposed in [7]. The basic
−
→ −
→ 1 1 2
idea is to apply the inverse Fourier transform T = Hn S to F(X1 , X2 ) = (1 − X1 )X2 + X1 (1 − X2 ) + X1 X2
−
→ 2 2 2
obtain the vector T . However, this vector may contain entries
−
→ = 1/2(X1 + X2 ).
that are neither +1 nor −1, implying that S does not corre-
spond to a Boolean function. For example, consider the scaled This again shows that the stochastic function of MUX is
−
→
addition function 1/2(X1 + X2 ). Its S (coefficient) vector is a scaled addition.
−
→ −
→
[0 1/2 1/2 0]T , and the inverse Fourier transform T = H2 S A synthesis method is further proposed in [101] to realize
−
→
yields T = [1 0 0 − 1]T , which contains the non-Boolean a general polynomial. It first converts the target to a multilin-
element zero. This problem is implicitly resolved in the stan- ear polynomial. Then, it transforms the multilinear polynomial
dard MUX-based scaled adder (Fig. 5) which has a third input to a polynomial of the form shown in (9). This transfor-
r that introduces the constant probability 0.5. mation is unique and can be easily obtained. After that,
−
→
In general, an entry −1 < q < 1 in the T vector the problem reduces to finding an optimal Boolean function
corresponds to an SN of constant probability (1 − q)/2. f ∗ (x1 , . . . , xn , r1 , . . . , rm ) such that for each (a1 , . . . , an ) ∈
STRAUSS employs extra SNs of probability 0.5 to generate {0, 1}n , the weight of f ∗ (a1 , . . . , an , r1 , . . . , rm ) is equal to
these SNs, since a probability of 0.5 can be easily obtained the value g(a1 , . . . , an ) specified by the multilinear polyno-
from an LFSR. A heuristic method is introduced to synthesize mial. A greedy method is applied to find a good Boolean
a low-cost circuit to produce multiple constant probabilities function. The authors also found that in synthesizing poly-
simultaneously. nomials of degree more than 1, all (a1 , . . . , an ) ∈ {0, 1}n can
A synthesis problem similar to that of [7] is addressed be partitioned into a number of equivalent classes and the
in [101]. The authors first analyzed the stochastic behavior of weight constraint can be relaxed so that the sum of the weights
a general combinational circuit whose inputs comprise n vari- f (a1 , . . . , an , r1 , . . . , rm ) over all (a1 , . . . , an )’s in each equiv-
able SNs X1 , . . . , Xn and m constant input SNs R1 , . . . , Rm alence class is equal to a fixed value derived from the target
of value 0.5, as shown in Fig. 10. If the Boolean function of polynomial. Zhao and Qian [101] exploited this freedom to
the combinational circuit is f (x1 , . . . xn , r1 , . . . , rm ), then the further reduce the circuit cost.
ALAGHI et al.: PROMISE AND CHALLENGE OF SC 1523
IV. A PPLICATIONS
SC has been applied to a variety of application domains,
including artificial neural networks (ANNs) [12], [14], [15],
[17], [24], [39], [46], [93], [95], control systems [59], [100],
reliability estimation [35], data mining [21], DSP [4], [18],
[40], [48], [50], [54], [83], and decoding of modern error-
correcting codes (ECCs) [26], [30], [47], [63], [85], [86], [89],
[90], [96], [97]. Most of these applications are characterized by
the need of a large amount of arithmetic computation, which
can leverage the simple circuitry provided by SC. They also
have low precision requirements for the final results, which
can avoid the use of the excessively long SNs to represent
data values. In this section, we review four important appli- Fig. 11. Stochastic implementation of a five-tap FIR filter with an uneven-
cations for which SC has had some success: 1) filter design; weighted MUX tree.
2) image processing; 3) LDPC decoding; and 4) ANNs.
A. Filter Design
The design of finite impulse response (FIR) filters is con-
sidered in [18] and [36]. A general M-tap FIR filter computes (a) (b)
an output based on the M most recent inputs as follows:
Fig. 12. Two-line stochastic encoding. (a) Example of encoding the value
Y[n] = H0 X[n] + H1 X[n − 1] + . . . + HM−1 X[n − M + 1] −0.5. (b) Multiplier for the encoding.
(10)
where X[n] is the input signal, Y[n] is the output signal, and Hi Area-efficient stochastic designs for the discrete Fourier
is the filter coefficient. The FIR filter thus computes the inner transform (DFT) and the fast Fourier transform (FFT), which
product of two vectors [see (2)]. A conventional binary imple- are important transformation techniques between the time and
mentation of (10) requires M multipliers and M − 1 adders, frequency domains, are described in [99]. An M-point DFT
which has high hardware complexity. SC-based designs can for discrete signals X[n] (n = 0, 1, . . . , M − 1) computes
potentially mitigate this problem. the frequency domain values Y[k] (k = 0, 1, . . . , M − 1)
Since the values of H, X, and Y may be negative, bipo- as follows:
lar SNs are used to encode them. A straightforward way to
M−1
implement (10) uses M XNOR gates for multiplications and Y[k] = kN
X[n]WM
an M-to-1 MUX for additions. However, this implementation n=0
has the problem that the output of the MUX is 1/M times the
desired output. Such down-scaling causes severe accuracy loss where WM = e−j(2π/M) . The FFT is an efficient way to realize
when M is large. the DFT by using a butterfly architecture [70].
To address the foregoing problem, a stochastic design The basic DFT computation resembles that of an FIR fil-
based on an uneven-weighted MUX tree has been ter. Although the technique of the uneven-weighted MUX tree
proposed [18], [36]. Fig. 11 shows such a design for can be applied [98], the accuracy of the result degrades as
a five-tap FIR filter. The input Sign(Hi ) is a stream of the number of points becomes larger due to the growing scal-
bits, each equal to the sign bit of Hi in its 2s complement ing factor. To address this problem, the work in [99] proposes
binary representation. The probability for the select input of a scaling-free stochastic adder based on a two-line stochastic
each MUX is shown inthe figure. The output probability encoding scheme [94]. This encoding represents a value in the
of the design is Y[n]/ 4i=0 |Hi |. In the general case, the interval [−1, 1] by a magnitude stream M(X) and a sign stream
outputprobability of an uneven-weighted MUX tree is S(X). Fig. 12(a) shows an example of encoding the value −0.5.
Y[n]/ M−1 |Hi |. Note that the scaling factor is reduced to Indeed, this encoding can be viewed as employing a ternary
M−1 i=0 M−1
i=0 |Hi | ≤ M. In the case where i=0 |Hi | < 1, the stochastic stream X = x1 x2 . . . xN with each xi ∈ {−1, 0, 1}.
proposed design will even scale up the result. The magnitude and the sign of xi are represented by the
Although the datapath of the stochastic FIR filter consists of ith bit in the magnitude stream and the sign stream, respec-
just a few logic gates as shown in Fig. 11, the interface SNGs tively. If the sign bit is 0 (1), the value is positive (negative).
(not shown) may occupy a large area, offsetting the potential Fig. 12(b) shows the multiplier for this encoding. Experimental
area benefit brought by the simple datapath. To further reduce results indicate that using the stochastic multiplier and the
the area of SNGs, techniques of sharing the RNSs used in special stochastic adder to implement DFT/FFT can achieve
the SNGs and circularly shifting the outputs of the RNS to much higher accuracy than an implementation based on the
generate multiple random numbers with low correlation are uneven-weighted MUX tree when the number of points M
proposed in [36]. is large.
1524 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO. 8, AUGUST 2018
The earliest stochastic decoder was proposed for binary convergence problem of a pure stochastic decoder for nonbi-
LDPC codes (for simplicity, hereafter referred to as LDPC nary LDPC codes, a way of mixing the binary computation
codes), which have very efficient decoding performance that and stochastic computation units has been proposed [86].
approaches the Shannon capacity limit [81]. They have been A technique of splitting and shuffling stochastic bit-streams
adopted in several recent digital communication standards, is described in [97] to simultaneously mitigate the costs of
such as the DVB-S2, the IEEE 802.16e (WiMAX), and the long stochastic bit-streams and rerandomization of a stochastic
IEEE 802.11n (WiFi) standards. decoder for polar codes.
A binary LDPC code is characterized by a bipartite fac-
tor graph consisting of two groups of nodes: 1) variable D. Artificial Neural Networks
nodes (VNs) and 2) parity-check nodes (PNs). A widely used
ANNs, mimicking aspects of biological neural networks,
method to decode an LDPC code applies the sum-product
are an early application of SC [14], [15], [24], [68], [93].
algorithm (SPA) to the factor graph. The SPA iteratively passes
Only recently, with advances in machine learning algorithms
a probability value, which represents the belief that a bit in
and computer hardware technology, have they found com-
the code block is 1, from a VN to a connected PN, or vice
mercial success in applications such as computer vision and
versa. The codeword is determined by comparing the final
speech recognition [45]. ANNs are usually implemented in
probabilities against a threshold.
software on warehouse-scale computing platforms, which are
The major computation in the decoder involves the follow-
extremely costly in size and energy needs. These short-
ing two operations on probabilities:
comings have stimulated renewed interest in using SC in
pC = pA (1 − pB ) + pB (1 − pA ) (12) ANNs [10], [11], [41], [46], [80]. Furthermore, many clas-
pX pY sification tasks such as ANNs do not require high accu-
pZ = . (13)
pX pY + (1 − pX )(1 − pY ) racy; it suffices that their classification decisions be cor-
rect most of the time [51]. Hence, SC’s drawbacks of low
Binary implementation of (12) and (13) requires complicated
precision and stochastic variability are well-tolerated in ANN
arithmetic circuits, such as adders, multipliers, and dividers.
applications.
To alleviate this problem, Gaudet and Rapley [30] proposed
A widely used type of ANN is the feed-forward network
a stochastic implementation of LDPC decoding in which (12)
shown in Fig. 15 [37]. It is composed of an input layer, several
and (13) are realized efficiently by the circuits in Fig. 4(a)
hidden layers, and an output layer. A node in the network is
and (b), respectively.
referred to as a neuron. Each neuron in a hidden or an output
Besides reducing the area of the processing units, SC also
layer is connected to a number of neurons in the previous layer
reduces routing area. In a conventional binary implementa-
via weighted edges. The output 0 (inactive) or 1 (active) of
tion, the communication of probability values of precision k
a neuron is determined by applying an activation function to
between two nodes requires k wires connecting the two nodes,
the weighted sum of its inputs. For example, the output of the
which leads to a large routing area. However, with SC, due to
neuron Y1 in Fig. 15 is given by
its bit-serial nature, communication between two nodes only
n
requires a single wire. Another benefit of SC is its support
of an asynchronous pipeline. In SN representation, bit order Y1 = F Wi Xi (14)
does not matter, so we do not require the input of the PNs i=1
and VNs to be the output bits of the immediately previous where Xi is the signal produced by the ith input neuron of
cycle. This allows different edges to use different numbers Y1 , Wi is the weight of the edge from Xi to Y1 , and F(Z) is
of pipeline stages, thus increasing the clock frequency and the activation function. A frequent choice for F is the sigmoid
throughput [89]. function defined by
To improve the SPA convergence rate, Tehrani et al. [89]
added a module called edge memory (EM) to each edge in 1
F(Z) =
the factor graph. Since one EM is assigned to each edge, 1 + e−βZ
the hardware usage of EMs can be large. To further reduce where β is the slope parameter.
this hardware cost, Tehrani et al. [90] introduced a module A key problem in ANN design is the addition of a large
called a majority-based tracking forecast memory, which is number of items supplied to a neuron; a similar problem occurs
assigned to each VN. This method has been integrated into in FIR filters with a large number of taps. The straightforward
a fully parallel stochastic decoder ASIC that decodes the use of MUX-based adders to perform the scaled addition is not
(2048, 1723) LDPC code from the IEEE 802.3an (10GBASE- a good solution, because the scaling factor is proportional to
T) standard [90]. This decoder turns out to be one of the most the number of a neuron’s connections. When rescaling the final
area-efficient fully parallel LDPC decoders. MUX output, even a very small error due to stochastic vari-
Stochastic LDPC decoders essentially implement the belief ation may be enlarged significantly. To address this problem,
propagation algorithm [73]. This fundamental approach can Li et al. [51] revived the old idea of using an OR gate as
also be used to decode other ECCs, such as polar codes and an adder [29]. OR combines two unipolar SNs X and Y as
nonbinary LDPC codes. Given their algorithm-level similar- follows:
ity, researchers have proposed SC-based decoders for these
codes [85], [86], [96], [97]. For example, to resolve the slow Z = X + Y − XY.
1526 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO. 8, AUGUST 2018
most of the input data can be classified easily because they are
far from the decision boundary. For these input data, compu-
tation with low-precision SNs is enough to obtain the correct
results. Based on this, the authors devised an early decision ter-
mination (EDT) strategy which adaptively selects the number
of bits to use in the computation depending on the difficulty of
the classification tasks. The resulting design has a misclassifi-
cation error rate very close to the conventional implementation.
Furthermore, EDT reduces energy consumption with a slight
increase in misclassification errors.
Efficient stochastic implementation of convolutional neu-
ral networks (CNNs), a special type of feed-forward ANN,
Fig. 15. Typical feed-forward network structure. is the focus of [10]. In a CNN, the signals of all the neu-
rons in a layer are obtained by first convolving a kernel
with the signals in the input layer−a special kind of filtering
This is not strictly addition, but when either X 1 or Y 1, operation−and then applying an activation function. The size
the output Z is approximately the sum of the two inputs. To of the kernel is much less than that of the input layer, which
make the inputs close to zero, Li et al. [51] applied a moderate means a neuron signal only depends on a subset of the neu-
scaling factor to scale down the inputs. rons in its input layer. CNNs have been successfully applied
Some other studies have addressed the addition problem to machine learning tasks such as face and speech recognition.
with new stochastic data representations [11], [17]. In [17], A major contribution of [10] is an efficient stochastic imple-
an encoding scheme called extended stochastic logic (ESL) is mentation of the convolution operation. Unlike SC that uses
proposed which uses two bipolar SNs X and Y to represent the SNs to encode real values, the proposed method uses the prob-
number X/Y. ESL addition has the advantage of being exact, ability mass function of a random variable to represent an array
with no scaling factor. Moreover, ESL encoding allows easy of real values. An efficient implementation of convolution is
implementation of multiplication, division, and the sigmoid developed based on this representation. Furthermore, a few
function. Together, these operations lead to an efficient neuron other techniques are introduced in [10] to implement other
design. components of a CNN, such as the pooling and nonlinear acti-
Ardakani et al. [11] have proposed the concept of integer vation components. Compared to a conventional binary CNN,
SN (ISN) in which a sequence of random integers represents the proposed SC implementation achieves large improvements
a value equal to the mean of these integers. For example, the in performance and power efficiency.
sequence 2, 0, 4, 1 represents 7/4. With this encoding, any real Efficient stochastic implementation of CNNs has also been
number can be represented without prior scaling. The weights studied by Ren et al. [80]. They performed a comprehensive
in an ANN, which can lie outside the range [−1, 1], do not study of SC operators and how they should be optimized to
need to be scaled. The addition of two ISNs uses a conven- obtain energy-efficient CNNs. Ren et al. [80] adopted the
tional binary adder, which makes the sum exact. Multiplication approximate APC of [42] to add a large number of input
of two ISNs requires a conventional binary multiplier, which is stochastic bit-streams. Kim et al. [42] reported that the approx-
expensive. Fortunately, in the ANN implementation proposed imate APC has negligible accuracy loss and is about 40%
in [11], one input to the multiplier, which corresponds to the smaller than the exact APC.
neuron signal, is always a binary SN. Then, the conventional
multiplier is reduced to several AND gates. The sigmoid acti- V. D ISCUSSION
vation function is implemented by a counter similar to that
Since the turn of the present century, significant progress
in [14]. Although the hardware cost of the ISN implementa-
has been made in developing the theory and application of
tion is larger than that of a binary stochastic implementation,
SC. New questions and challenges have emerged, many of
the former has much lower latency and energy consumption.
which still need to be addressed. With the notable exception
Compared to the conventional binary design, the ISN design
of LDPC decoder chips, few large-scale SC-based systems
produces fewer misclassification errors, while reducing energy
have actually been built and evaluated. As a result, real-world
and area cost substantially.
experience with SC is limited, making it likely that many
Another recent work [41] proposes two new ways to design
practical aspects of SC such as its true design costs, run-time
ANNs with SC. The first considers training in the design phase
performance, and energy consumption are not yet fully appre-
to make the network friendly to a stochastic implementation.
ciated. Small-scale theoretical and simulation-based studies are
The authors observed that weights close to zero, which corre-
fairly plentiful, but they often consider only a narrow range
spond to (bipolar) SNs of probability 0.5, contribute the most
of issues under restrictive assumptions.
to random fluctuation errors. Therefore, they proposed to iter-
atively drop near-zero weights and then retrained the network
to derive a network with high classification accuracy but no A. Conclusion
near-zero weights. The second technique is to exploit the pro- Based on what is now known, we can draw some general
gressive precision property of SC. The authors observed that conclusions about what SC is, and is not, good for.
ALAGHI et al.: PROMISE AND CHALLENGE OF SC 1527
1) Precision and Errors: SC is inherently approximate and energy usage is therefore emerging as a significant challenge
inexact. Its probability-based and redundant data encoding for SC.
makes it a relatively low-precision technology, but one that 4) Design Issues: Until recently, SC design was an ad
is very tolerant of errors. It has been successfully applied hoc process with little theory to guide it. However, thanks
to image-processing using 256-bit SNs, which correspond to a deeper understanding of the properties of stochastic func-
roughly to 8-bit (fixed-point) BNs. SC is unsuited to the very tions and circuits, several general synthesis techniques have
high 32- or 64-bit precision error-sensitive calculations that been developed, which can variously be classified as reconfig-
are the domain of BNs and BC. This is seen in the random urable or fixed, and combinational or sequential [7], [49], [76].
noise-like fluctuations that are normal to SNs, in the way SNs The new understanding has revealed unexpected and novel
are squeezed into the unit interval producing errors near the solutions to some of SC’s basic problems.
boundaries, and in the fact that SNs grow in length expo- For example, it has come to be recognized that dif-
nentially faster than BNs as the desired level of precision ferent circuits realizing different logic functions can have
increases. Also the stochastic encoding of numbers does not the same stochastic behavior [19]. Far from just being the
provide a dynamic range, similar to the one provided by enemy, correlation can sometimes be harnessed as a design
floating point numbers. resource to reduce circuit size and cost, as the edge detec-
On the other hand, low precision and error tolerance have tors of Fig. 13(a) vividly illustrate. Common circuits like
definite advantages. They have evolved in the natural world for the MUX-based scaled adder turn out to have correla-
use by the human brain and nervous system. Similar features tion insensitivity that enables RNSs to be removed or
are increasingly seen in artificial constructs like deep learn- shared (see Fig. 8). A fundamental redesign of the SC
ing networks that aim to mimic brain operations [23]. Thus it scaled adder itself is shown in Fig. 16, which converts it
seems pointless to compare SC and BC purely on the basis of from a three-input to a two-input element, while improv-
precision or precision-related costs alone [1], [58]. ing both its accuracy and correlation properties [46]. Despite
Finally, we observe that while BC circuits have fixed such progress, many questions concerning the properties of
precision, SC circuits have the advantage of inherently vari- stochastic circuits that influence design requirements, remain
able precision in their bit-streams. Moreover, the bit-streams unanswered.
can be endowed with progressive precision where accu- 5) Circuit Level Aspects: Since SC employs digital com-
racy improves monotonically as computation proceeds, as ponents, conventional digital design process (synthesis, auto-
has been demonstrated for some image-processing tasks [4]. matic placement and routing, timing closure, etc.) have
If a variable precision cannot be exploited, a simple bit- been used to implement SC ASIC and FPGA-based designs.
reduction technique in BC often provides better energy effi- However, as discussed in this paper, SC shares similarities
ciency over SC. As reported in recent work, with fixed with analog circuits, so the digital design aspects of it may
precision, SC becomes worse for designs above 6 bits of differ from conventional digital circuits.
precision [1], [46]. Various circuit-level aspects of SC designs have been inves-
2) Area-Related Costs: The use of tiny circuits for opera- tigated very recently as a means of improving SC’s energy
tions like multiplication and addition remains SC’s strongest efficiency [9], [65]. They suggest that SC circuits are prob-
selling point. A stochastic multiplier contains orders-of- ably not optimal if they are designed using standard digital
magnitude fewer gates than a typical BC multiplier. However, design tools. Najafi et al. [65] demonstrated that SC circuits
many arithmetic operations including multiplication require do not need clock trees. Eliminating the clock tree signifi-
uncorrelated inputs to function correctly. This implies a need cantly reduces the energy consumption of the circuit. In fact,
for randomization or decorrelation circuits incorporating many employing analog components, rather than digital, can lead to
independent random sources or phase-shifting delay ele- significant energy savings [66]. One example is the use of ana-
ments (isolators), whose combined area can easily exceed that log integrators, instead of counters, to collect the computation
of the arithmetic logic [92]. The low-power benefit of stochas- results.
tic components must be weighed against the additional power Alaghi et al. [9] have investigated a different circuit-level
consumed by their randomization circuitry. aspect of SC. They showed that SC’s inherent error-tolerance
3) Speed-Related Costs: Perhaps the clearest drawback of makes it robust against errors caused by voltage overscaling.
SC is its need for long, multicycle SNs to generate satisfactory Voltage overscaling, i.e., the process of reducing the power
results. This leads to long run-times, which are compensated consumption of the circuit without reducing the frequency,
for, in part, by the fact that the clock cycles tend to be usually leads to critical path timing failures and catastrophic
very short. Parallel processing, where long bit-streams are errors in regular digital circuits. However, timing violations in
partitioned into segments that are processed in parallel is SC manifest as extra or missing pulses on the output SN. The
a speed-up possibility that has often been proposed, but not extra and missing pulses tend to cancel each other out, leading
be studied much [21]. The same can be said of progressive to negligible error. An optimization method is described in [9]
precision. that balances the circuit paths to guarantee maximum error
Small stochastic circuits have relatively low power con- cancellation. It is worth noting that the observations of [9]
sumption. However, since energy = power × time, the have been confirmed through a fabricated chip.
longer run-times of stochastic circuits can lead to higher The new results suggest that circuit-level aspects of SC
energy use than their BC counterparts [62]. Reducing must be considered at design time, as they provide valuable
1528 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO. 8, AUGUST 2018
R EFERENCES [25] Y. Ding, Y. Wu, and W. Qian, “Generating multiple correlated prob-
abilities for MUX-based stochastic computing architecture,” in Proc.
[1] J. M. de Aguiar and S. P. Khatri, “Exploring the viability of stochastic ICCAD, San Jose, CA, USA, 2014, pp. 519–526.
computing,” in Proc. Int. Conf. Comput. Design (ICCD), New York, [26] Q. T. Dong, M. Arzel, C. Jego, and W. J. Gross, “Stochastic decod-
NY, USA, 2015, pp. 391–394. ing of turbo codes,” IEEE Trans. Signal Process., vol. 58, no. 12,
[2] A. Alaghi and J. P. Hayes, “A spectral transform approach to stochastic pp. 6421–6425, Dec. 2010.
circuits,” in Proc. Int. Conf. Comput. Design (ICCD), Montreal, QC, [27] D. Fick, G. Kim, A. Wang, D. Blaauw, and D. Sylvester, “Mixed-signal
Canada, 2012, pp. 315–312. stochastic computation demonstrated in an image sensor with integrated
[3] A. Alaghi and J. P. Hayes, “Survey of stochastic computing,” ACM 2D edge detection and noise filtering,” in Proc. IEEE Custom Integr.
Trans. Embedded Comput. Syst., vol. 12, no. 2, pp. 1–19, May 2013. Circuits Conf. (CICC), San Jose, CA, USA, 2014, pp. 1–4.
[4] A. Alaghi, C. Li, and J. P. Hayes, “Stochastic circuits for real-time [28] B. R. Gaines, “Stochastic computing,” in Proc. AFIPS Spring Joint
image-processing applications,” in Proc. Design Autom. Conf. (DAC), Comput. Conf., 1967, pp. 149–156.
Austin, TX, USA, 2013, pp. 1–6. [29] B. R. Gaines, “Stochastic computing systems,” in Advances in
[5] A. Alaghi and J. P. Hayes, “Exploiting correlation in stochastic circuit Information Systems Science, vol. 2, J. T. Tou, Ed. Boston, MA, USA:
design,” in Proc. Int. Conf. Comput. Design (ICCD), Asheville, NC, Springer-Verlag, 1969, pp. 37–172.
USA, Oct. 2013, pp. 39–46. [30] V. C. Gaudet and A. C. Rapley, “Iterative decoding using stochastic
[6] A. Alaghi and J. P. Hayes, “Fast and accurate computation using computation,” Electron. Lett., vol. 39, no. 3, pp. 299–301, Feb. 2003.
stochastic circuits,” in Proc. Design Autom. Test Europe Conf. (DATE), [31] W. Gerstner and W. M. Kistler, Spiking Neuron Models. Cambridge,
Dresden, Germany, 2014, pp. 1–4. U.K.: Cambridge Univ. Press, 2002.
[7] A. Alaghi and J. P. Hayes, “STRAUSS: Spectral transform use in [32] S. W. Golomb, Shift Register Sequences. Laguna Hills, CA, USA:
stochastic circuit synthesis,” IEEE Trans. Comput.-Aided Design Integr. Aegean Park Press, 1982.
Circuits Syst., vol. 34, no. 11, pp. 1770–1783, Nov. 2015. [33] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd ed.
[8] A. Alaghi and J. P. Hayes, “Dimension reduction in statistical sim- Upper Saddle River, NJ, USA: Prentice-Hall, 2002.
ulation of digital circuits,” in Proc. Symp. Theory Model. Simulat. [34] P. K. Gupta and R. Kumaresan, “Binary multiplication with PN
(TMS DEVS), Alexandria, VA, USA, 2015, pp. 1–8. sequences,” IEEE Trans. Acoust., Speech, Signal Process., vol. 36,
[9] A. Alaghi, W.-T. J. Chan, J. P. Hayes, A. B. Kahng, and J. Li, no. 4, pp. 603–606, Apr. 1988.
“Optimizing stochastic circuits for accuracy-energy tradeoffs,” in Proc. [35] J. Han et al., “A stochastic computational approach for accurate and
ICCAD, Austin, TX, USA, 2015, pp. 178–185. efficient reliability evaluation,” IEEE Trans. Comput., vol. 63, no. 6,
[10] M. Alawad and M. Lin, “Stochastic-based deep convolutional networks pp. 1336–1350, Jun. 2014.
with reconfigurable logic fabric,” IEEE. Trans. Multi-Scale Comput. [36] H. Ichihara, T. Sugino, S. Ishii, T. Iwagaki, and T. Inoue, “Compact
Syst., vol. 2, no. 4, pp. 242–256, Oct./Dec. 2016. and accurate digital filters based on stochastic computing,” IEEE
[11] A. Ardakani, F. Leduc-Primeau, N. Onizawa, T. Hanyu, and Trans. Emerg. Topics Comput., to be published. [Online]. Available:
W. J. Gross, “VLSI implementation of deep neural network using inte- http://ieeexplore.ieee.org/document/7565493/
gral stochastic computing,” IEEE Trans. Very Large Scale Integr. (VLSI) [37] A. K. Jain, J. Mao, and K. M. Mohiuddin, “Artificial neural networks:
Syst., vol. 25, no. 10, pp. 2688–2699, Oct. 2017. A tutorial,” Computer, vol. 29, no. 3, pp. 31–44, Mar. 1996.
[12] S. L. Bade and B. L. Hutchings, “FPGA-based stochastic neural [38] D. Jenson and M. Riedel, “A deterministic approach to stochastic com-
networks-implementation,” in Proc. IEEE Workshop FPGAs Custom putation,” in Proc. Int. Conf. Comput.-Aided Design (ICCAD), Austin,
Comput. Mach., Napa County, CA, USA, 1994, pp. 189–198. TX, USA, 2016, pp. 1–8.
[13] D. Braendler, T. Hendtlass, and P. O’Donoghue, “Deterministic bit- [39] Y. Ji, F. Ran, C. Ma, and D. J. Lilja, “A hardware implementation of
stream digital neurons,” IEEE Trans. Neural Netw., vol. 13, no. 6, a radial basis function neural network using stochastic logic,” in Proc.
pp. 1514–1525, Nov. 2002. Design Autom. Test Europe Conf. (DATE), Grenoble, France, 2015,
pp. 880–883.
[14] B. D. Brown and H. C. Card, “Stochastic neural computation I:
[40] H. Jiang, C. Shen, P. Jonker, F. Lombardi, and J. Han, “Adaptive filter
Computational elements,” IEEE Trans. Comput., vol. 50, no. 9,
design using stochastic circuits,” in Proc. IEEE Symp. VLSI (ISVLSI),
pp. 891–905, Sep. 2001.
Pittsburgh, PA, USA, 2016, pp. 122–127.
[15] B. D. Brown and H. C. Card, “Stochastic neural computation II:
[41] K. Kim et al., “Dynamic energy-accuracy trade-off using stochastic
Soft competitive learning,” IEEE Trans. Comput., vol. 50, no. 9,
computing in deep neural networks,” in Proc. Design Autom. Conf.
pp. 906–920, Sep. 2001.
(DAC), Austin, TX, USA, 2016, Art. no. 124.
[16] A. W. Burks, H. H. Goldstine, and J. Von Neumann, Preliminary [42] K. Kim, J. Lee, and K. Choi, “Approximate de-randomizer for stochas-
Discussion of the Logical Design of an Electronic Computer tic circuits,” in Proc. Int. SoC Design Conf., 2015, pp. 123–124.
Instrument. Princeton, NJ, USA: Inst. Adv. Study, Jan. 1946. [43] P. Knag, W. Lu, and Z. Zhang, “A native stochastic computing archi-
[17] V. Canals, A. Morro, A. Oliver, M. L. Alomar, and J. L. Rosselló, tecture enabled by memristors,” IEEE Trans. Nanotechnol., vol. 13,
“A new stochastic computing methodology for efficient neural network no. 2, pp. 283–293, Mar. 2014.
implementation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 3, [44] D. E. Knuth, “The art of computer programming,” in Seminumerical
pp. 551–564, Mar. 2016. Algorithms, vol. 2, 2nd ed. Redwood City, CA, USA: Addison-Wesley,
[18] Y.-N. Chang and K. K. Parhi, “Architectures for digital filters using 1998.
stochastic computing,” in Proc. Int. Conf. Acoust. Speech Signal [45] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
Process. (ICASSP), Vancouver, BC, Canada, 2013, pp. 2697–2701. pp. 436–444, May 2015.
[19] T.-H. Chen and J. P. Hayes, “Equivalence among stochastic logic [46] V. T. Lee, A. Alaghi, J. P. Hayes, V. Sathe, and L. Ceze, “Energy-
circuits and its application,” in Proc. Design Autom. Conf. (DAC), efficient hybrid stochastic-binary neural networks for near-sensor com-
San Francisco, CA, USA, 2015, pp. 131–136. puting,” in Proc. Design Autom. Test Europe Conf. (DATE), Lausanne,
[20] T.-H. Chen and J. P. Hayes, “Design of division circuits for stochastic Switzerland, 2017, pp. 13–18.
computing,” in Proc. IEEE Symp. VLSI (ISVLSI), Pittsburgh, PA, USA, [47] X.-R. Lee, C.-L. Chen, H.-C. Chang, and C.-Y. Lee, “A 7.92 Gb/s 437.2
2016, pp. 116–121. mW stochastic LDPC decoder chip for IEEE 802.15.3c applications,”
[21] V. K. Chippa, S. Venkataramani, K. Roy, and A. Raghunathan, “StoRM: IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 62, no. 2, pp. 507–516,
A stochastic recognition and mining processor,” in Proc. Int. Symp. Low Feb. 2015.
Power Electron. Design (ISLPED), 2014, pp. 39–44. [48] P. Li and D. J. Lilja, “Using stochastic computing to implement digi-
[22] S. S. Choi, S. H. Cha, and C. Tappert, “A survey of binary similarity and tal image processing algorithms,” in Proc. Int. Conf. Comput. Design
distance measures,” J. Syst. Cybern. Informat., vol. 8, no. 1, pp. 43–48, (ICCD), Amherst, MA, USA, 2011, pp. 154–161.
2010. [49] P. Li, D. J. Lilja, W. Qian, K. Bazargan, and M. Riedel, “The synthe-
[23] M. Courbariaux, Y. Bengio, and J.-P. David, “BinaryConnect: Training sis of complex arithmetic computation on stochastic bit streams using
deep neural networks with binary weights during propagations,” in sequential logic,” in Proc. Int. Conf. Comput.-Aided Design (ICCAD),
Proc. Int. Conf. Neural Inf. Process. Syst. (NIPS), Montreal, QC, San Jose, CA, USA, 2012, pp. 480–487.
Canada, 2015, pp. 3123–3131. [50] P. Li, D. J. Lilja, W. Qian, K. Bazargan, and M. D. Riedel,
[24] J. A. Dickson, R. D. McLeod, and H. C. Card, “Stochastic arithmetic “Computation on stochastic bit streams: Digital image processing case
implementations of neural networks with in situ learning,” in Proc. Int. studies,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22,
Conf. Neural Netw., San Francisco, CA, USA, 1993, pp. 711–716. no. 3, pp. 449–462, Mar. 2014.
1530 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO. 8, AUGUST 2018
[51] B. Li, M. H. Najafi, and D. J. Lilja, “Using stochastic computing to [77] W. Qian and M. D. Riedel, “The synthesis of robust polynomial arith-
reduce the hardware requirements for a restricted Boltzmann machine metic with stochastic logic,” in Proc. Design Autom. Conf. (DAC),
classifier,” in Proc. Int. Symp. FPGA, Monterey, CA, USA, 2016, Anaheim, CA, USA, 2008, pp. 648–653.
pp. 36–41. [78] W. Qian, M. D. Riedel, and I. Rosenberg, “Uniform approximation
[52] P. Li, W. Qian, M. D. Riedel, K. Bazargan, and D. J. Lilja, “The and Bernstein polynomials with coefficients in the unit interval,” Eur.
synthesis of linear finite state machine-based stochastic computational J. Combinatorics, vol. 32, no. 3, pp. 448–463, 2011.
elements,” in Proc. Asia South Pac. Design Autom. Conf. (ASP DAC), [79] W. Qian, M. D. Riedel, H. Zhou, and J. Bruck, “Transforming proba-
Sydney, NSW, Australia, 2012, pp. 757–762. bilities with combinational logic,” IEEE Trans. Comput.-Aided Design
[53] Y. Liu and K. K. Parhi, “Architectures for stochastic normalized and Integr. Circuits Syst., vol. 30, no. 9, pp. 1279–1292, Sep. 2011.
modified lattice IIR filters,” in Proc. Asilomar Conf. Signals Syst. [80] A. Ren et al., “SC-DCNN: Highly-scalable deep convolutional neu-
Comput., Pacific Grove, CA, USA, 2015, pp. 1351–1381. ral network using stochastic computing,” in Proc. Int. Conf. Archit.
[54] Y. Liu and K. K. Parhi, “Architectures for recursive digital filters using Support Program. Lang. Oper. Syst. (ASPLOS), Xi’an, China, 2017,
stochastic computing,” IEEE Trans. Signal Process., vol. 64, no. 14, pp. 405–418.
pp. 3705–3718, Jul. 2016. [81] T. J. Richardson and R. L. Urbanke, “The capacity of low-density
[55] G. G. Lorentz, Bernstein Polynomials, 2nd ed. New York, NY, USA: parity-check codes under message-passing decoding,” IEEE Trans. Inf.
AMS Chelsea, 1986. Theory, vol. 47, no. 2, pp. 599–618, Feb. 2001.
[56] K. Ma et al., “Architecture exploration for ambient energy harvesting [82] N. Saraf, K. Bazargan, D. J. Lilja, and M. D. Riedel, “Stochastic
nonvolatile processors,” in Proc. Int. Symp. High Perform. Comput. functions using sequential logic,” in Proc. Int. Conf. Comput. Design
Archit. (HPCA), Burlingame, CA, USA, 2015, pp. 526–537. (ICCD), Asheville, NC, USA, 2013, pp. 507–510.
[57] W. Maass and C. M. Bishop, Eds., Pulsed Neural Networks. [83] N. Saraf, K. Bazargan, D. J. Lilja, and M. Riedel, “IIR filters using
Cambridge, MA, USA: MIT Press, 1999. stochastic arithmetic,” in Proc. Design Autom. Test Europe Conf.
[58] R. Manohar, “Comparing stochastic and deterministic computing,” (DATE), Dresden, Germany, 2014, pp. 1–6.
IEEE Comput. Archit. Lett., vol. 14, no. 2, pp. 119–122, Jul./Dec. 2015. [84] N. Saraf and K. Bazargan, “Polynomial arithmetic using sequential
[59] S. L. T. Marin, J. M. Q. Reboul, and L. G. Franquelo, “Digital stochastic logic,” in Proc. Great Lakes Symp. VLSI (GLSVLSI), Boston,
stochastic realization of complex analog controllers,” IEEE Trans. Ind. MA, USA, 2016, pp. 245–250.
Electron., vol. 49, no. 5, pp. 1101–1109, Oct. 2002. [85] G. Sarkis and W. J. Gross, “Efficient stochastic decoding of non-binary
[60] P. Mars and W. J. Poppelbaum, Stochastic and Deterministic Averaging LDPC codes with degree-two variable nodes,” IEEE Commun. Lett.,
Processors. London, U.K.: Peter Peregrinus, 1981. vol. 16, no. 3, pp. 389–391, Mar. 2012.
[61] S.-J. Min, E.-W. Lee, and S.-I. Chae, “A study on the stochastic com- [86] G. Sarkis, S. Hemati, S. Mannor, and W. J. Gross, “Stochastic decoding
putation using the ratio of one pulses and zero pulses,” in Proc. Int. of LDPC codes over GF(q),” IEEE Trans. Commun., vol. 61, no. 3,
Symp. Circuits Syst. (ISCAS), London, U.K., 1994, pp. 471–474. pp. 939–950, Mar. 2013.
[62] B. Moons and M. Verhelst, “Energy-efficiency and accuracy of stochas- [87] J. Sauvola and M. Pietikäinen, “Adaptive document image binariza-
tic computing circuits in emerging technologies,” IEEE J. Emerg. Sel. tion,” Pattern Recognit., vol. 33, no. 2, pp. 225–236, 2000.
Topics Power Electron., vol. 4, no. 4, pp. 475–486, Dec. 2014. [88] I. Schur, “Über potenzreihen, die im innern des einheitskreises
beschränkt sind,” J. für die Reine und Angewandte Mathematik,
[63] A. Naderi, S. Mannor, M. Sawan, and W. J. Gross, “Delayed stochastic
vol. 147, pp. 205–232, 1917.
decoding of LDPC codes,” IEEE Trans. Signal Process., vol. 59, no. 11,
[89] S. S. Tehrani, S. Mannor, and W. J. Gross, “Fully parallel stochas-
pp. 5617–5626, Nov. 2011.
tic LDPC decoders,” IEEE Trans. Signal Process., vol. 56, no. 11,
[64] M. H. Najafi et al., “Time-encoded values for highly efficient stochastic
pp. 5692–5703, Nov. 2008.
circuits,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25,
[90] S. S. Tehrani et al., “Majority-based tracking forecast memories for
no. 5, pp. 1644–1657, May 2017.
stochastic LDPC decoding,” IEEE Trans. Signal Process., vol. 58, no. 9,
[65] M. H. Najafi, D. J. Lilja, M. Riedel, and K. Bazargan, pp. 4883–4896, Sep. 2010.
“Polysynchronous stochastic circuits,” in Proc. Asia South Pac. Design [91] P.-S. Ting and J. P. Hayes, “Stochastic logic realization of matrix oper-
Autom. Conf. (ASP DAC), Macau, China, 2016, pp. 492–498. ations,” in Proc. Euromicro Conf. Digit. Syst. Design (DSD), Verona,
[66] M. H. Najafi and D. J. Lilja, “High-speed stochastic circuits using Italy, 2014, pp. 356–364.
synchronous analog pulses,” in Proc. Asia South Pac. Design Autom. [92] P.-S. Ting and J. P. Hayes, “Isolation-based decorrelation of stochastic
Conf. (ASP DAC), 2017, pp. 481–487. circuits,” in Proc. Int. Conf. Comput. Design (ICCD), Scottsdale, AZ,
[67] M. H. Najafi and M. E. Salehi, “A fast fault-tolerant architecture for USA, 2016, pp. 88–95.
Sauvola local image thresholding algorithm using stochastic comput- [93] J. E. Tomberg and K. K. K. Kaski, “Pulse-density modulation tech-
ing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 2, nique in VLSI implementations of neural network algorithms,” IEEE
pp. 808–812, Feb. 2016. J. Solid-State Circuits, vol. 25, no. 5, pp. 1277–1286, Oct. 1990.
[68] N. Nedjah and L. de Macedo Mourelle, “Stochastic reconfigurable [94] S. L. Toral, J. M. Quero, and L. G. Franquelo, “Stochastic pulse
hardware for neural networks,” in Proc. Euromicro Conf. Digit. Syst. coded arithmetic,” in Proc. Int. Symp. Circuits Syst. (ISCAS), Geneva,
Design (DSD), 2003, pp. 438–442. Switzerland, 2000, pp. 599–602.
[69] F. Neugebauer I. Polian, and J. P. Hayes, “Framework for quanti- [95] D. E. Van Den Bout and T. K. Miller, III, “A digital architecture
fying and managing accuracy in stochastic circuit design,” in Proc. employing stochasticism for the simulation of hopfield neural nets,”
Design Autom. Test Europe Conf. (DATE), Lausanne, Switzerland, IEEE Trans. Circuits Syst., vol. 36, no. 5, pp. 732–738, May 1989.
2017, pp. 1–6. [96] B. Yuan and K. K. Parhi, “Successive cancellation decoding of polar
[70] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals & Systems, codes using stochastic computing,” in Proc. Int. Symp. Circuits Syst.
2nd ed. Upper Saddle River, NJ, USA: Prentice-Hall, 1996. (ISCAS), Lisbon, Portugal, 2015, pp. 3040–3043.
[71] B. Parhami and C.-H. Yeh, “Accumulative parallel counters,” in Proc. [97] B. Yuan and K. K. Parhi, “Belief propagation decoding of polar codes
Asilomar Conf. Signals Syst. Comput., Pacific Grove, CA, USA, 1995, using stochastic computing,” in Proc. Int. Symp. Circuits Syst. (ISCAS),
pp. 966–970. Montreal, QC, Canada, 2016, pp. 157–160.
[72] K. K. Parhi and Y. Liu, “Architectures for IIR digital filters using [98] B. Yuan, Y. Wang, and Z. Wang, “Area-efficient error-resilient dis-
stochastic computing,” in Proc. Int. Symp. Circuits Syst. (ISCAS), crete fourier transformation design using stochastic computing,” in
Melbourne, VIC, Australia, 2014, pp. 373–376. Proc. Great Lakes Symp. VLSI (GLSVLSI), Boston, MA, USA, 2016,
[73] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of pp. 33–38.
Plausible Inference. San Mateo, CA, USA: Morgan Kaufmann, 1988. [99] B. Yuan, Y. Wang, and Z. Wang, “Area-efficient scaling-free DFT/FFT
[74] W. J. Poppelbaum, C. Afuso, and J. W. Esch, “Stochastic comput- design using stochastic computing,” IEEE Trans. Circuits Syst. II, Exp.
ing elements and systems,” in Proc. AFIPS Fall Joint Comput. Conf., Briefs, vol. 63, no. 12, pp. 1131–1135, Dec. 2016.
Anaheim, CA, USA, 1967, pp. 635–644. [100] D. Zhang and H. Li, “A stochastic-based FPGA controller for
[75] W. Qian, “Digital yet deliberately random: Synthesizing logical com- an induction motor drive with integrated neural network algo-
putation on stochastic bit streams,” Ph.D. dissertation, Dept. Elect. rithms,” IEEE Trans. Ind. Electron., vol. 55, no. 2, pp. 551–561,
Comput. Eng., Univ. Minnesota, Minneapolis, MN, USA, 2011. Feb. 2008.
[76] W. Qian, X. Li, M. D. Riedel, K. Bazargan, and D.J. Lilja, “An architec- [101] Z. Zhao and W. Qian, “A general design of stochastic circuit and
ture for fault-tolerant computation with stochastic logic,” IEEE Trans. its synthesis,” in Proc. Design Autom. Test Europe Conf. (DATE),
Comput., vol. 60, no. 1, pp. 93–105, Jan. 2011. Grenoble, France, 2015, pp. 1467–1472.
ALAGHI et al.: PROMISE AND CHALLENGE OF SC 1531