Dagatan Nino PR
Dagatan Nino PR
Dagatan Nino PR
Architecture
A RESEARCH PAPER
The 3rd and 5th simulations give encouraging results. Higher miss rate delays of
splitting L2 cache are minimized by the larger L3 cache. The higher bandwidth of the
parallel p2p connections between the corresponding L1 and L2 caches of each core
yields a considerable IPC improvement. The improvement difference between 1st, 2nd
and 4th simulation against 3rd and 5th sets suggest that splitting the L2 cache doesn’t
pay off when there is no L3 cache to compensate the higher L2 miss rates. However it
gives considerable improvement when the L3 cache is present. The improvement
differences between 2nd and 4th simulation sets and also between 3rd and 5th
simulation
sets suggests the IPC increase is higher in CMPs with higher number of cores as there
is higher bandwidth demand from L2 caches. Being aware of the many hardware
costs and complications that the separation of L2 cache implies (which need to be
analyzed), having separate per core Instruction and Data L2 caches may be a reality in
the imminent many-core CPUs with L4 off-chip caches.
RESEARCH QUESTIONS
HOW DOES CACHE PERFORMANCE VARY ACROSS DIFFERENT CPU
ARCHITECTURE AND GENERATIONS?
In this paper I presented a set of simulations for different CPU L2 cache organizations
comparing the overall IPC of unified vs. separate Instruction and Data L2 caches. The
results indicate that splitting L2 cache into two equally sized instruction and data
caches provides a not always considerable but higher IPC. This improvement is mere-
ly 0.4 % in a single core L2 organization. It is 0.5 and 1.2 in dual-core and quad-core
organizations. The highest improvements are attained in dual-core and quad-core
CPUs with a shared L3, respectively 2.4 % and 3 %. The results suggest that a L2
cache split in L2 Instruction cache and L2 Data cache makes sense in many core (i.e.
at least four cores) CPUs with at least a L3 cache present on-chip.
Even though the results of the last simulation may seem encouraging there are dif-
ferent hardware costs and complications of having per core separate L2 caches. In
[12] the author proposes a logical split of L1 data cache based on run-time data locali-
ty analysis. He presents an interesting evaluation of the hardware cost this of organi-
zation and concludes that the major problem is the extra space required for storing the
extra tags of the two caches. A similar analysis of the extra complications and hard-
ware costs of having separate L2 per core Instruction and Data caches is a tough un-
dertaking and a possible future direction.