0% found this document useful (0 votes)
7 views

0109

SiPterposer is a proposed flexible communication fabric designed to enhance System-in-Package (SiP) designs by enabling arbitrary network topologies and robust fault tolerance, achieving near-100% assembly yield despite typical bonding defects. It utilizes a generic passive interposer and mass-producible bridge chiplets to reduce non-recurring engineering costs and improve integration efficiency. The system incorporates lightweight error correction codes to address chiplet-to-interposer bonding defects, making it a cost-effective solution for custom silicon manufacturing.

Uploaded by

Sudip Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

0109

SiPterposer is a proposed flexible communication fabric designed to enhance System-in-Package (SiP) designs by enabling arbitrary network topologies and robust fault tolerance, achieving near-100% assembly yield despite typical bonding defects. It utilizes a generic passive interposer and mass-producible bridge chiplets to reduce non-recurring engineering costs and improve integration efficiency. The system incorporates lightweight error correction codes to address chiplet-to-interposer bonding defects, making it a cost-effective solution for custom silicon manufacturing.

Uploaded by

Sudip Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SiPterposer: A Fault-Tolerant Substrate for Flexible

System-in-Package Design
Pete Ehrett Todd Austin Valeria Bertacco
University of Michigan, Ann Arbor
{wpehrett, austin, valeria}@umich.edu

Abstract—As Moore’s Law scaling slows down, specialized


heterogeneous designs are needed to sustain computing perfor-
mance improvements. Unfortunately, the non-recurring engineer-
ing (NRE) costs of chip design—designing interconnects, creating
masks, etc.—are often prohibitive. Chiplet-based disintegrated
design solutions could address these economic issues, but current
technologies lack the flexibility to express a rich variety of designs
without redesigning the communication substrate. Moreover, as
the number of chiplets increases, yield suffers due to 2.5D assem-
bly defects. This work addresses these problems by presenting
a flexible communication fabric that supports construction of
arbitrary network topologies and provides robust fault-tolerance,
demonstrating near-100% chip assembly yield at typical bonding
defect rates. We achieve these goals with less than 3% additional
power and zero exposed latency overhead for various real-world
applications running on an example SiP.
Fig. 1: SiPterposer proposes an off-the-shelf solution to reduce the
cost of custom SiPs by eliminating the need for custom interposer
I. I NTRODUCTION designs and slashing chiplet bonding yield loss.
Application-specific hardware design is widely regarded as which could enable novel cost/performance optimizations.
one of the most promising methods for continuing improve-
However, these methods require a robust means of inte-
ments in chip performance and power efficiency, particularly
grating the components used to compose a new application-
when Moore’s Law can no longer be relied upon as a
specific design – and integration alone can represent a quarter
primary driver of advancement [1]. Unfortunately, creating
of the total NRE cost for a custom chip [5]. Thus, for reusable
custom hardware requires a huge investment in development
hardware design to succeed at reducing custom chip costs, it
resources. First, per-design NRE costs are very high – a set
is crucial to consider not just the functional units or chiplets,
of 16nm masks for a new design, for example, may cost
but also the cost of the underlying integration/communication
nearly $6 million [2]. For high-volume chips, these costs can
layer. In particular, cost-effective chiplet-centric design re-
be amortized effectively, but they quickly become prohibitive
quires an interposer with three key properties: generality, flex-
for smaller-volume designs. Second, creating a custom chip
ibility, and resilience. Generality allows the interposer to be
often requires integrating common logic alongside custom,
mass-produced at low cost. Flexibility is important because the
application-specific logic in a single monolithic die. Even
substrate must support a wide range of applications – including
though some soft IP components may be reused across many
arbitrary chiplets and interconnect topologies – without costly
custom chips, which saves design time, there is no associated
redesign. Resilience matters because the chiplet-to-interposer
economy of scale in the fabrication process because each chip
bonding process is quite defect-prone [6], lowering yields and
is still manufactured monolithically – thus, costs remain high.
raising costs if not handled carefully.
Recently, research on reusable hardware design – building
system components that can be reused across a wide variety To that end, we propose a novel integration fabric, called
of application-specific designs – has begun to address these SiPterposer, based on a generic passive interposer structure
problems. Some works target NRE costs, proposing architec- and the use of off-the-shelf bridge chiplets to create desired
tures that compose application-specific hardware from arrays interconnect topologies, as outlined in Fig. 1. This work
of generic functional units on a mass-produced chip [3]. Others discusses SiPterposer’s economic viability, demonstrates its
focus on manufacturing costs, proposing systems in which capacity to achieve the design goals of generality, flexibility,
chips are fabricated in small blocks, called ‘chiplets’, which and resilience, and evaluates its interconnect performance and
are bonded in 2.5D to an interposer that delivers power, clock, system overheads. Our key contributions are:
etc. and provides inter-chiplet data wiring [4]. This System-in- • A generic, fully-passive interposer structure that may be
Package (SiP) concept could enable solutions where resources configured at chip assembly-time to generate any custom
used in many different designs are fabricated at high volume interconnect topology, eliminating the need for custom in-
(hence, at lower per-unit cost) and integrated with smaller terposer design and fabrication when creating a custom SiP.
pieces of chip-specific custom logic on an interposer. Rather • A set of design-independent, mass-producible ‘bridge’
than incurring the high costs of designing and building a large chiplets that can be used to connect distinct regions of a SiP.
monolithic die for each new chip, it would instead be possible These, along with the interposer structure, enable flexible
to design and build only the smaller pieces of custom logic, assembly-time interconnect formation.
saving time and money. Moreover, the chiplets in a given SiP • Designs and analyses of three low-overhead ECC methods
need not even be manufactured at the same technology node, to improve resilience of chiplet-to-interposer bonds.

978-3-9819263-2-3/DATE19/2019
c EDAA 510
(a) Intra-rail buffering bridge
(b) 2-rail mesh bridge
Fig. 4: Bridge chiplet examples. (a) is an active bridge that connects
Fig. 2: Assembly-time customization with SiPterposer. The bridge two adjacent clusters on the same rail via bidirectional buffers; if
chiplet (Fig. 4b) connects two cores on different rails (denoted by attached across a set of blown fuses, it can act as a repeater for long
green arrows). The blown fuses enable the cores to communicate in interconnect paths. (b) is a passive bridge suitable for constructing
isolation from accelerator pairs 0/1 and 2/3, and vice versa. a small mesh by directly connecting parts of two adjacent rails; this
pattern can extend across additional rails to enable larger designs.
around failed components [12] or applying ECC to correct
transient faults and crosstalk [13]. To our knowledge, however,
these methods have not been leveraged in the SiP space, and no
prior work has applied ECC to tolerate SiP assembly defects.
III. S YSTEM OVERVIEW
SiPterposer structure. To achieve our goals of generality,
flexibility, and resilience, we propose constructing SiPs using
a generic, fully-passive interposer based on a simple internal
wiring pattern consisting of long, straight data rails (see Fig.
2). Each rail consists of a large number of parallel wires that
Fig. 3: μbump clusters comprise a grid of 512 µbumps, each attached span the entire width of the interposer but are not directly
to a distinct interposer wire. The top shows a simplified view of the connected either to each other or to wires in any other rail.
internal structure; the white dots indicate which µbump connects to µbumps are connected to each wire at regular intervals, and a
each wire. Each wire may be fused between each cluster on a rail. group of µbumps, one per wire in the rail, comprises a cluster.
Our evaluation finds that SiPterposer has significant eco- A chiplet may span and connect to one or more of the clusters,
nomic advantages over traditional 2.5D methods while provid- either on the same rail or on different rails.
ing increased reliability and assembly-time flexibility, which For our analysis throughout the remainder of this paper,
could substantially lower the cost of custom silicon. we define each cluster to support a 512-bit data connection.
Because µbumps are much larger than interwire distances,
II. R ELATED W ORK they are offset slightly from the centers of the wires to which
they attach, forming a two-dimensional grid (Fig. 3). In our
Recent work in the SiP space has detailed the economic and design, the µbumps are a typical 20µm wide with 40µm
technological advantages of building large chips in a ‘disinte- pitch [8] and are configured in a 16x32 grid, resulting in cluster
grated’ fashion – dividing them into multiple independently- dimensions of ~0.7mm by ~1.4mm. Further, we partition each
fabricated chiplets and then integrating them on an inter- cluster into eight 64-bit logical links (akin to one node in
poser [7]. Most such work assumes the use of a custom a full-mesh network with 64-bit full-duplex links). Different
interposer, and those that attempt to provide greater flexibility sets of chiplets may communicate simultaneously by using
either continue to impose some design restrictions or else disjoint subsets of the available links. Fig. 3 illustrates this
require active logic within the interposer [4]. layout/partitioning scheme.
The materials and mechanical reliability of microbumps Portions of interposer wiring between each cluster of
(µbumps) in chiplet/interposer bonds have been studied ex- µbumps may be separated during the assembly process, caus-
tensively [8]. [9] presented an empirical study of defect rates ing them to act as small electronic fuses (Figs. 2 and 3).
across an image sensor bonded to a substrate using a large Although we refer to these wire segments as ‘fuses’, there
array of fine-pitch µbumps. There has also been extensive prior is no need to add specialized fuse components or other
work on correcting bonding defects between layers of a 3D discontinuities to the fabric; existing technology can fuse link
chip, most of which focuses on replacing defective through- wires at the pitch we propose [14]. At assembly-time, blowing
silicon vias (TSVs) with redundant ones [10]. [11] proposed all fuses at a specific point in a rail can completely disconnect
adding ECC to correct TSV link defects. However, all these a set of chiplets from others. Alternatively, blowing fuses to
are designed either to minimize the number of TSVs in a 3D disconnect only a subset of the logical links within a rail would
system or to provide error correction far stronger than SiPs allow, for example, system-wide broadcasts over one link,
demand. Few, if any, address µbump defects in general or while other links carry out communication between adjacent
2.5D integration in particular. By contrast, this paper takes chiplets. Blowing fuses can also enable a chiplet to act as a
a targeted approach by accounting for 2.5D-specific design repeater (Fig. 4a) or as a node in a mesh with different parts
considerations and realistic defect models. of a cluster connecting different network edges (Fig. 5).
Finally, in the NoC space, interconnect reliability has Other electrical- and protocol-level considerations either can
been explored broadly, with various works discussing routing be handled with standard techniques or are design-dependent.

Design, Automation And Test in Europe (DATE 2019) 511


Fig. 5: Construction of a 2x2 mesh with SiPterposer. Each line between µbump clusters represents 64 fuses on the interposer. East/West
edges (C0-C1 and C2-C3) use half the available interposer wires. North/South edges (C0-C2 and C1-C3) use passive bridge chiplets B0 and
B1 (see Fig. 4b) in conjunction with the other half of the interposer wires. The blown fuses electrically isolate the edges from one another.

Power distribution, for instance, may use typical VLSI [15] using this process, in Fig. 5.
and SiP [16] methods, while interfaces between clock domains Importantly, because SiPterposer’s electrical structure does
are handled within the chiplets using existing NoC method- not require an active interposer, it may be implemented on any
ologies [17]. The proposed physical structure may also be desired substrate material – Si, organic, glass, etc. (our evalu-
tessellated to produce a different interposer size without costly ation assumes Si). Furthermore, it may easily be layered with
redesign. Since SiPterposer is based on a passive interposer, other chiplet placement or system configuration techniques
communication protocols are defined and handled chiplet- (e.g., [18]) as part of a holistic design methodology.
side. These can range from AMBA buses to packet-switched Defect-tolerance. The overall SiP concept is quite promising,
networks to fully-custom designs. For the rest of this work, all but it also introduces a new point of failure into chip fab-
chiplets are assumed to use a packet-based network protocol. rication via the chiplet-to-interposer assembly process. Prior
Bridge chiplets. In addition, we propose the use of dedicated, work estimates this process yield at 99%-99.5% per 1024-
generic bridge chiplets which, in conjunction with the inter- µbump chiplet – a loss that, even in a system with relatively
poser structure, enable assembly-time customization of the few chiplets, can be responsible for as much as 26% of the
system’s connectivity. These bridge chiplets can be designed in total manufacturing cost [6]. Rather than trying to reduce the
a handful of different patterns and then mass-produced. Simple incidence of these defects, we instead propose tolerating them
units (e.g., Fig. 4b) include only passive wiring; these ‘bridge’ by adding a module to each chiplet to provide lightweight
electrical gaps by directly connecting wires in one data rail ECC on each interposer bond. Although this requires both
to another. Other bridges may be active devices, deployed additional wires to carry parity information and chiplet-side
horizontally across portions of a rail separated by blown fuses encode/decode logic, it requires no active logic on the inter-
to act as a buffer in the middle of that rail (e.g., Fig. 4a), poser, preserving its simplicity and its low manufacturing cost.
create clock boundaries, or be full-blown routing devices. To In our design, each µbump cluster provides a 512-bit data
illustrate the reusability of a small set of bridges across many connection to an interposer rail, partitioned into eight 64-bit
different designs, we limit our selection for the remainder of logical links. We further divide each 64-bit link into four 16-bit
this work to Fig. 4b’s passive unit (and its derivatives). sublinks and then apply a form of ECC to each sublink, either
Arbitrary topology construction. By blowing fuses in the Hamming single-error-correction (SEC) or Bose-Chaudhuri-
interposer wiring and connecting bridge chiplets across dis- Hocquenghem double-error-correction (DEC). SEC requires 5
connected regions, we can create any arbitrary interconnect parity bits per sublink (672 total µbumps per cluster), while
topology that a chip may require, as follows: DEC requires 10 parity bits per sublink (832 µbumps per
1) Align the network graph to a Manhattan layout. cluster). The sublinks within each logical link are interleaved
2) Rotate the graph so as many edges as possible run along to better resist physically-adjacent defects [13].
the axis of SiPterposer’s internal wiring, lowering overhead The nature of chip warpage during die bonding [8] inspired
by reducing the number of bridge chiplets required. us to also design a third, defect-pattern-aware coding method.
3) Map the nodes in the graph to chiplets and arrange them Since warpage-induced mechanical stress causes bonding de-
as blocks atop SiPterposer’s µbump clusters according to fects to occur most often at a chip’s edges, we suggest a
their logical layout in the network graph. hybrid concentric coding structure, with four logical links in
4) For each edge in the graph: the center of the chiplets (using SEC) and four links along
their more-defect-prone edges (using DEC). As our evaluation
a) If possible, map the edge to an unused subset of inter-
shows, this hybrid approach provides a better yield-overhead
poser/bridge wires already in the design.
balance than either SEC-only or DEC-only.
b) If too few bridge wires: (i) add or extend a bridge, or
(ii) time-multiplex access via chiplet-side logic, akin to
virtual channels (VCs) in a NoC. IV. E VALUATION
c) If too few interposer wires: (i) move the nodes connected We evaluated SiPterposer’s defect-tolerance and overall per-
by the edge to another rail (adjusting previously mapped formance by simulating assembly of a hypothetical 48-chiplet
edges accordingly), or (ii) time-multiplex access. system. First, we determined the whole-chip assembly yield
d) Blow fuses at the endpoints of the new link, to reduce for varying defect rates, ECC, and bonding defect patterns.
wire loading and permit other parts of the newly sepa- Second, we synthesized our ECC hardware and used the results
rated interposer wires to be used freely for other edges. with whole-system models to simulate SiPterposer’s impact on
As an example, we illustrate building a simple 2x2 mesh network performance and overall chip power/area overhead.

512 Design, Automation And Test in Europe (DATE 2019)


(a) Uniform (b) Edge-weighted (c) Empirical
Fig. 6: SEC, DEC, and Hybrid coding performance on uniform, edge-weighted, and empirical defect patterns. The vertical lines on
each graph denote specific per-chiplet bond yields (a 90%, 70%, etc., chance that there are zero faulty µbump bonds between a single chiplet
and the interposer), corresponding to particular per-µbump failure probabilities on the x-axis.

A. Chiplet Bond Resilience with ECC TABLE I: Router synthesis results (with ECC)
Baseline SEC DEC Hybrid
We began our evaluation of defect-tolerance by creating
a worst-case scenario for the error correction schemes we Power (mW) 3.09 4.92 13.25 9.09
propose, configuring our 48 chiplets into a fully-connected Area (µm2 ) 2108 4973 15752 10363
Area overhead - 8x6 mesh – 0.07% 0.34% 0.21%
system. Each chiplet uses exactly one µbump cluster, and Area overhead - SoC – 0.04% 0.22% 0.13%
every µbump on a given chiplet is directly connected to the
corresponding µbump on every other chiplet (assuming no TABLE II: Network params TABLE III: Chipset params
defects). We defined a failed chip as one in which there exists
an uncorrectable fault in any link between any pair of chiplets. Network clk 2GHz Chiplet area 72mm2
Routing fn dor (mesh), Chiplet pwr 4W total
We assumed known-good-dies in our simulations in order to min (SoC) (56mW/mm2 )
isolate the effects of coding on µbump bonding defects. VCs/buffer size 3/8 CPU clk 900MHz
Router pipeline 4 cycles Chiplet clk 500MHz
To model assembly defects, we assigned each µbump bond Link traversal 1 cycle Memory 2GB/500MHz
an independent failure probability based on its physical posi- Pkt size (flits) 16 (mesh), Vid/img size 1080p
tion within a cluster, relative to one of three potential defect 1 (SoC)
patterns. The first pattern, uniform, assumes that each µbump
bond has an equal chance of failure. The second pattern, edge- increases the number of bridge chiplets needed, which, in turn,
weighted, incorporates the effect of die warpage via a linear accentuates our proposed system’s overheads.
increase in bond failure probability with a µbump’s distance We established two baseline systems: a monolithic SoC,
from the center of a chiplet, from a baseline at the center to 10x and a traditional SiP using a fixed-topology passive interposer.
that value at the outermost corner. The third pattern, empirical, The latter is necessary because the novelty of this work lies
simulates real-world failures using data derived from [9]. in the methods we propose for constructing SiPs; thus, it is
For each coding method and defect pattern, we conducted important to evaluate SiPterposer against both traditional SoCs
Monte Carlo simulations (100K trials) of chip assembly to and non-reusable interposer designs. All active logic in each
calculate whole-chip assembly yields while sweeping the base system is assumed to use a 45nm process node; interposers use
per-µbump failure probability. For the edge-weighted and 65nm global wire widths. Since the interposers and bridges are
empirical defect patterns, we normalized the failure probability passive, the three systems differ only in encode/decode logic
of an overall chiplet bond to that of a chiplet bond having a overhead and link wire dimensions.
uniform defect pattern with the same base per-µbump failure First, we constructed HDL models of our SEC and DEC
probability. Fig. 6 compares each coding method vs. a system ECC modules. We then integrated these into an open-source
with no error correction. In general, there is little effect on NoC router model [19] and synthesized the modified router
chip assembly yield with increasing defect pattern complexity, with Synopsys Design Compiler using an IBM 45nm library to
from uniform to edge-weighted to empirical. Hybrid coding is determine the ECC modules’ area and power overheads; these
an exception: its defect-tolerance increases significantly on the are summarized in Table I. As the ECC modules were inserted
edge-weighted defect pattern. It performs even better with an directly before/after the input/output buffers, off the critical
empirical defect distribution, since this pattern’s defects are path of the router, they exposed no additional timing overhead
even more heavily biased towards the edges of each chiplet. to the design. Hybrid coding would entail area and power
consumption exactly between the SEC and DEC routers.
B. Interconnect Performance Next, we evaluated network performance and power over-
To evaluate SiPterposer’s network performance and electri- head using a combination of BookSim [20], ORION 2.0 [21],
cal characteristics, we modeled two systems-in-package – one and LTSPICE. Since adding ECC to the routers introduced no
synthetic, one inspired by real-world SoCs – and evaluated additional latency, and because we assume equal link widths
their overheads vs. SoC and traditional-SiP equivalents. across the three example systems, performance overhead could
Mesh network, synthetic traffic. Our first, synthetic, system only come from added delay from longer inter-chiplet links on
is an 8x6 full-mesh network containing 48 identical 4mm2 SiPterposer. Using LTSPICE with ORION’s wire models and
chiplets with 1W nominal power consumption, in which each [22]’s µbump models, we computed the delay of the longest
chiplet has a 64-bit full-duplex data connection to each of its link as 62.2ps, small enough to not impact network timing.
neighbors. This is representative of a homogeneous multicore Finally, we constructed a model of each interconnect (SoC,
chip, and constitutes a worst-case scenario for SiPterposer, SiP, and SiPterposer) in BookSim and analyzed the link wire
since the large number of links required in the topology power with ORION for uniform random traffic at varying

Design, Automation And Test in Europe (DATE 2019) 513


Fig. 8: Whole-system power overhead on real-world applications,
Fig. 7: Realistic-chipset layout on SiPterposer. All inter-chiplet normalized to SoC. The impact of increased wire length on SiPter-
connections are dedicated, 64-bit, full-duplex links. Green arrows poser vs. a traditional SiP is minimal. SiPterposer actually saves
denote links via the interposer alone. Orange arrows show links that power in most cases vs. a monolithic SoC since its larger interconnect
require bridges (Fig. 4b). µbumps, interposer traces, and bridges use wires are more energy-efficient.
RC models derived from [21] and [22].
the hybrid structure has power and area overhead exactly
injection rates. Network parameters are given in Table II.
in between SEC-only and DEC-only. Applying the principle
We then combined these results with the synthesized router
of routing around failed network components [12] further
designs and our prior chiplet power assumption to determine
improves the case for hybrid coding. As a proof-of-concept,
the total power overhead (vs. SoC) of each system. This ranged
we examined the average number of non-defective links in a
from 0.98x for the SiP (using SEC) at packet injection rate
SiP with each ECC method and used the result as a proxy for
r=0.001 to 1.10x for SiPterposer (using DEC) at r=0.0225.
Realistic chipset, real-world traffic. Our second evaluation the total bisection bandwidth available to the system, as shown
framework approximates a real-world mobile chipset with a in Fig. 9. Here, hybrid coding performs closely to DEC-only,
mesh-like network topology. Using die shot analyses [23] and but with much lower power and area overhead.
other SoC power/area data [24] as guides, we defined a 72mm2 Note that our hybrid coding technique is applied to a
system with 4W average power consumption, comprising 12 single µbump cluster; for chiplets spanning multiple clusters,
core chiplets (see Fig. 7 and Table III). As before, we use this technique will become less effective as its structure less
45nm technology for chiplets/SoC, and 65nm for interposers. closely mirrors the warpage pattern created on the chiplet at
In this system, each chiplet has one radix-5 router with 64- any individual cluster. However, a similar principle could be
bit full-duplex links, similar to that of the previous section (see applied at the inter-cluster level – i.e., using DEC for whole
Table II), which may communicate through µbump clusters at clusters towards the edges of a chiplet and SEC for clusters
the corners of the chiplet (this permits shorter inter-chiplet near its center. We leave further exploration of this and other
links in both the baseline SiP and SiPterposer, but could have aspects of SiP defect-pattern-aware coding for future work.
worse yield due to die warpage). Adapting this system for To understand the impact of SiPterposer’s flexibility on chip
SiPterposer requires three bridges (see Fig. 7). Our analysis performance, we compared its power consumption to that of a
is based on closed-loop simulation with GemDroid [25] and traditional SiP with a custom interposer. With synthetic traffic
BookSim, plus ORION’s link power models, with GemDroid’s on our 8x6 mesh, we observed a roughly linear increase in
IP block power models normalized to a 4W 45nm SoC. We power consumption with increasing network load: 1%-7% with
evaluated real-world performance on various application traces no ECC, 1%-9% with SEC, and 1%-11% with DEC (reaching
from the Android Emulator [26]. Again, since integrating a limit as the network saturates). As previously noted, though,
ECC into our router requires no additional latency and the this mesh topology with uniform random traffic represents a
delay contribution of increased wire length is small (222.3ps worst-case-scenario for SiPterposer. More-realistic loads on
at maximum, well within our 0.5ns link traversal budget), our example chipset showed overheads on the low end of these
the only overhead we need to consider is wire power. These ranges, averaging 1.5% with no ECC, 2.0% with SEC, and
results, in whole-system context, are detailed in Fig. 8. Even 2.3% with DEC. These values are quite reasonable considering
using DEC, SiPterposer incurs no more than 0.2% power the degree of flexibility SiPterposer offers—and the impact of
overhead vs. the baseline SoC and 2.9% vs. a traditional SiP. that flexibility on chip cost, as discussed in the next section.
Finally, using LTSPICE, we performed rudimentary analysis
V. D ISCUSSION of the longest interposer-bridge-µbump links within each sys-
Resilience and performance. At currently achievable 2.5D tem to understand how they might perform in mixed-signal
chiplet bonding defect rates (99%+ per-bond yield), any of environments (note that discontinuities at µbump junctions
our proposed ECC methods can achieve assembled-chip yields were not modeled). An AC frequency sweep showed worst-
near 100%. Differences in resilience become more apparent at case -3dB points of 2.48GHz for the mesh and 681.1MHz for
higher defect rates. SEC drops off quickly below a 90%-95% the SoC, which could be further improved by, for instance,
per-chiplet bond yield, but DEC remains strong even with bond adjusting wire dimensions, modifying chiplet placement, or
yield as low as 50%-70%, at the cost of a traffic-dependent including dedicated analog transmission lines as part of the
increase in power consumption. Overheads for any of our interposer fabric. We leave more-comprehensive analysis and
methods are quite low, particularly with real-world application optimization of such possibilities for future work.
traffic, considering the substantial yield gains they provide. Economic analysis. Since the novelty of this work lies in
Hybrid coding, especially with our empirical defect pattern, the methods we propose for constructing SiPs, we compare
is an intriguing case. It has chip yield substantially better than the economics of SiPterposer against custom, non-reusable
SEC at low to moderate defect rates, but at high defect rates, interposer designs. In this analysis, we use the 48-chiplet mesh
yield drops far more quickly than with DEC. In addition, and the realistic-chipset from Section IV-B as examples.

514 Design, Automation And Test in Europe (DATE 2019)


VI. C ONCLUSIONS
In this paper, we presented SiPterposer, a mass-producible,
flexible, and defect-resistant communication fabric for SiPs.
We showed how it can be used to build arbitrary net-
work topologies, evaluated the potential cost savings of
our assembly-time-configurable structure, and demonstrated
how tolerating defects by applying ECC to chiplet/interposer
µbump bonds allows us to realize near-100% chip assembly
Fig. 9: Avg. bisection bandwidth vs. defect rate (empirical pattern). yields at typical defect rates. An example SiPterposer system
Hybrid coding outperforms SEC-only and nearly matches DEC-only. achieves these benefits on real-world applications with less
than 3% additional power and zero exposed latency overhead
compared with traditional SiPs or SoCs.
Acknowledgment. This work was supported by the Center for
Applications Driving Architectures (ADA), one of six centers
of JUMP, a Semiconductor Research Corporation (SRC) pro-
gram co-sponsored by DARPA.
R EFERENCES
[1] N. Lu, “A new silicon age 4.0: Generating semiconductor-intelligence
paradigm with a virtual moore’s law, economics and heterogeneous
technologies,” in ISLPED ’17.
Fig. 10: Cost of SiPterposer vs. custom-interposer solutions. [2] M. Khazraee et al., “Moonwalk: Nre optimization in asic clouds,”
SIGPLAN Not., Apr 2017.
[3] C. Tan et al., “Stitch: Fusible heterogeneous accelerators enmeshed with
We can characterize the cost of producing an interposer as many-core architecture for wearables,” in ISCA ’18.
c = (f /(d ∗ n)) + v, where f represents the fixed/NRE costs [4] D. Stow et al., “Cost analysis and cost-driven ip reuse methodology for
of design and verification, masks, etc., d is the number of soc design based on 2.5d/3d integration,” in ICCAD ’16.
[5] M. Deneroff, “Building an soc: How to do it? what will it cost?” in
distinct interposer designs built using those fixed costs, n is the Workshop on System-on-Chip Design for HPC, Jul 2015.
quantity produced of each design, and v is the variable cost of [6] C. Palesko and A. Palesko, “Cost breakdown of 2.5d and 3d packaging,”
making each die. For a traditional custom-interposer SiP, d=1; Additional Confs. (Dev. Pkg., HiTEC, HiTEN, & CICMT), vol. 2016.
[7] A. Kannan, N. E. Jerger, and G. H. Loh, “Enabling interposer-based
for SiPterposer, d may be much higher – i.e., SiPterposer’s disintegration of multi-core processors,” in MICRO ’15.
NRE costs are amortized across a far greater total volume. [8] T. Hisada et al., “Study of warpage and mechanical stress of 2.5d
First, we assume that a custom interposer has the same package interposers during chip and interposer mount process,” in Int’l
variable cost per unit area as SiPterposer. As Figs. 5 and 7 sug- Symposium on Microelectronics, vol. 2012, 01 2012, pp. 967–974.
[9] Y. Takemoto et al., “Characterization of 4m micro-bump interconnec-
gest, SiPterposer requires more interposer area because of the tions at 7.6μm pitch for 3d stacked 16m pixel image sensor,” IEEE
additional space needed to attach bridge chiplets, plus the area Trans. on Semicond. Mfg., 2017.
of the bridge chiplets themselves. To create either the mesh [10] L. Jiang, Q. Xu, and B. Eklow, “On effective tsv repair for 3d-stacked
ics,” in DATE ’12.
or the chipset on SiPterposer, only passive bridge chiplets are [11] V. Pasca, L. Anghel, and M. Benabdenbi, “Fault tolerant communication
needed; thus, we lump the added interposer and bridge chiplet in 3d integrated systems,” in 2010 Int’l Conf. on Dependable Systems
area together to determine SiPterposer’s total area overhead vs. and Networks Workshops (DSN-W), June 2010, pp. 131–135.
[12] R. Parikh and V. Bertacco, “udirec: Unified diagnosis and reconfigura-
a custom interposer (mesh=1.42x, chipset=1.35x). From these tion for frugal bypass of noc faults,” in MICRO ’13.
values, using a manufacturing cost of ~$1,500 per 300mm- [13] S. Shamshiri, A. Ghofrani, and K. T. Cheng, “End-to-end error correc-
diameter interposer wafer (~2.19-cents per mm2 ) [27], we tion and online diagnosis for on-chip networks,” in 2011 IEEE Int’l Test
Conf., Sept 2011, pp. 1–10.
determine v for each scenario: $4.20 vs. $5.96 for the mesh [14] J. Lee and J. J. Griffiths, “Polarization effect in laser processing of fine
and $1.58 vs. $2.12 for the chipset on a custom interposer vs. pitch link structures for advanced memory designs,” IEEE Trans. on
SiPterposer, respectively. Semicond. Mfg., vol. 22, no. 4, pp. 572–578, Nov 2009.
[15] M. Zhao et al., “Hierarchical analysis of power distribution networks,”
Next, we assume that the NRE cost f for each distinct IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., 2002.
interposer design is $1 million – an intentionally conservative [16] G. Kim et al., “Chip-package co-design of power distribution network
figure. Given that 65nm masks alone cost about $700,000 [2], for system-in-package applications,” in EPTC 2004.
[17] T. Bjerregaard and S. Mahadevan, “A survey of research and practices
this assumption creates a worst-case scenario for SiPterposer of network-on-chip,” ACM Comput. Surv., vol. 38, no. 1, Jun. 2006.
since, the higher the NRE cost, the stronger the argument [18] A. Coskun et al., “A cross-layer methodology for design and optimiza-
for using a single interposer design. Letting d=100 (i.e., tion of networks in 2.5d systems,” in ICCAD ’18.
[19] “Lisnoc.” [Online]. Available: http://www.lisnoc.org
SiPterposer is used for 100 different designs for which a cus- [20] N. Jiang et al., “A detailed and flexible cycle-accurate network-on-chip
tom interposer would otherwise be needed), we can compute simulator,” in ISPASS ’13.
the break-even quantities for the mesh and the chipset on [21] A. B. Kahng et al., “Orion 2.0: A power-area simulator for interconnec-
tion networks,” IEEE Trans. Very Large Scale Integr. Syst., 2012.
SiPterposer vs. a custom interposer (see Fig. 10). [22] P. Ehrett et al., “Analysis of microbump overheads for 2.5d disintegrated
Thus, even when using unfavorable assumptions, SiPter- design,” University of Michigan, Ann Arbor, MI, Tech. Rep., 2017.
poser’s flexibility has economic benefits for designs with vol- [23] C. Ward, “Chipworks tears down apple’s a5 chip,” in Engadget.
[24] H. Esmaeilzadeh et al., “Looking back and looking forward: Power,
umes up to the hundreds of thousands or low millions of units. performance, and upheaval,” Commun. ACM, vol. 55, no. 7, Jul. 2012.
The benefits are even greater at lower volumes; for the example [25] C. Nachiappan et al., “Gemdroid: A framework to evaluate mobile
chipset with n=10,000, each custom interposer would cost platforms,” in SIGMETRICS ’14.
[26] “The android emulator.” [Online]. Available: https://developer.an-
$101.58; SiPterposer, just $3.12. These estimates do not in- droid.com/studio/run/emulator.html
clude the effect of yield gains from ECC on chiplet/interposer [27] H. Y. Li et al., “The cost study of 300mm through silicon interposer
bonds, which would further reduce fabrication costs. (tsi) with beol interconnect,” in EPTC 2013.

Design, Automation And Test in Europe (DATE 2019) 515

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy