Assignment 1
Assignment 1
Assignment 1
A bus connects each processor to the main memory in a system with several processors. Now,
maintaining a shared cache for all processors will increase the cache's size, decreasing the
system's performance. Each CPU incorporates its cache for improved performance. By storing a
duplicate of this data block in their cache, processors can share the same data block. The Cache
Coherence Problem is the challenge of keeping multiple local caches synchronized when one of
the processors updates its local copy of data which is shared among multiple caches.
In this example, the processor P1 updates the cached copy of shared memory block X.
Inconsistency in data would occur. Because the processor P1 has a changed copy of the shared
memory block, X1, However, the shared memory block X will still be present in the main
memory and the caches of the other CPUs. This is known as the cache coherence problem.
Suppose there are three processors, each having cache.
(B)
Software base and Hardware base solutions to the Cache Coherence
Problem
Compiler-based coherence techniques examine the code to identify which data items are
potentially unsafe for caching and mark them accordingly. Noncacheable things are then
prevented from being cached by the operating system or hardware. The most basic solution is to
prevent any shared data variables from being cached. This is too cautious, because a shared data
structure may be used entirely during some periods and effectively read-only during others.
Cache coherence is an issue only when at least one processor can change the variable and at least
one other process can access the variable. More efficient ways use code analysis to establish safe
times for shared variables. During crucial times, the compiler adds instructions to the resulting
code to guarantee cache coherence.
Hardware-Based Solution
Cache coherence protocols are the usual name for hardware-based solutions. These solutions
offer dynamic recognition of possible inconsistency issues at runtime. Caches are used more
effectively since they are only used when an issue truly emerges, which improves performance
compared to a software solution. Additionally, these methods are visible to the compiler and
programmer, which relieves the burden of software development. The location of the data line
state information, the way it is arranged, the locations where coherence is ensured, and the
enforcement methods are only a few of the specifics that different hardware systems. Hardware
techniques can be broadly categorized into two groups: directory protocols and snoopy protocols.
Directory Protocol: Directory protocols gather and keep track of data about where copies
of lines are located. A directory is often maintained in the main memory, and there is a
centralized controller that is a component of the main memory controller. Information on the
global status of several local caches is contained in the directory. The centralized controller
examines and sends the appropriate directives for data transfer between memory and caches or
between caches in response to a request from a specific cache controller. It is also in charge of
maintaining the state information, therefore any local activity that can have an impact on a line's
overall status must be reported to the central controller.
The controller typically keeps track of which processors have
copies of which lines. A processor must ask the controller for exclusive access to a line before it
may write to a local copy of that line. The controller sends a notification to all processors that
have a cached copy of this line, requiring each processor to invalidate its copy before allowing
this exclusive access. The controller provides exclusive access to the seeking processor after
getting acknowledgments from each of these processors. A miss notification will be sent to the
controller if another processor tries to read a line that is only authorized for another processor.
The controller then provides the processor that is holding that line an instruction instructing it to
do a write back to the main memory. Both the originating processor and the requesting processor
may now read from the line simultaneously. The shortcomings of a central bottleneck and the
cost of communication between the multiple cache controllers and the central controller plague
directory methods. However, they work well in large-scale systems with several buses or other
intricate connectivity schemes.
Snoopy Protocol: All of the cache controllers in a multiprocessor are responsible for
ensuring cache coherence according to Snoopy protocols. When a line it has is shared with
another cache, the cache must be aware of this. A broadcast technique must be used to notify the
other caches when an update operation is done on a shared cache line. To watch these
broadcasted alerts and respond appropriately, each cache controller can "snoop" on the network.
A bus-based multiprocessor is a perfect platform for snoopy protocols since the shared bus offers
an easy method for broadcasting and snooping. However, because avoiding bus access is one of
the goals of using local caches, caution must be taken to ensure that the additional bus traffic
necessary for broadcasting and snooping does not nullify the benefits of using local caches. Two
basic approaches to the snoopy protocol have been explored:
Write-Update protocol: Using this protocol, if a processor updates its cache data, it
immediately updates all the other cached copies also. The broadcast mechanism is used to send
the new data block to all the caches having copies.
Write-invalidate protocol : Here, the modified cache block is not immediately sent to the
other cache. To put it simply, all of the other cache copies and the original version in shared
memory each get an invalidate instruction that renders all of their copies of the data invalid. The
letter "I" stands for "invalid" or "dirty" data. The updating processor delivers the updated data to
any other processor that wishes to access the data (a cached copy) now that it has been updated
by the updating processor. Multiprocessor systems like the Pentium 4 and PowerPC employ the
write-invalidate technique.