33% found this document useful (3 votes)
817 views4 pages

Case Study

The document discusses designing parallel algorithms to sort a list of visitors alphabetically and search for a specific visitor ("John") in the sorted list on a supercomputer with 10 processors. It describes a parallel quicksort algorithm that partitions the data into blocks sized for processor cache, assigns processors to sort halves recursively, and uses a stack to track state for load balancing. This algorithm achieves average O(N/P) partition time and O(P) sorting time for linear speedup. Cache effects could provide super-linear speedup.

Uploaded by

Karo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
33% found this document useful (3 votes)
817 views4 pages

Case Study

The document discusses designing parallel algorithms to sort a list of visitors alphabetically and search for a specific visitor ("John") in the sorted list on a supercomputer with 10 processors. It describes a parallel quicksort algorithm that partitions the data into blocks sized for processor cache, assigns processors to sort halves recursively, and uses a stack to track state for load balancing. This algorithm achieves average O(N/P) partition time and O(P) sorting time for linear speedup. Cache effects could provide super-linear speedup.

Uploaded by

Karo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Case Study

ABC.com is a website where you can watch original movie DVDs.  It currently maitains
the list of visitors and details of their visit.  The website gets almost 1 billion visitors
everyday and at midnight it processes all the information.  It takes almost 5 hours to
pocess all the information and the system remains down for that long.  It causes the
company a huge loss.  The company decided to buy a super computer for faster
analysis.  The supercomputer has 10 processors.  Now the need is to design a parallel
algorithm for the following problems:

We now have the list of visitors for the day and the number of movies they watched.

Question 1: Design a parallel algorithm that would sort the names alphabetically.

Write a parallel search algorithm that would find a visitor "John" in this sorted list and
show how many movies he watched.

Can either sorting or searching achieve super linear speedup?


Ans 3

The degree of the increase in the computational speed between a parallel algorithm and
a corresponding sequential algorithm is called speedup and expressed by ratio of
T(sequential) to T(parallel).

            If the given ratio exceeds p, where p is the number of processors (cores) used,
super linear speedup takes place. The most common reason for it is the cache effect. It
is called that due to increased total size of cache in multiprocessor system, hence
increased data transfer rate between RAM and CPU, which is cardinal to the work with
the large data sets.

Traditional parallel computer performance evaluation has fixed problem size and varied
the number of processors, the so-called fixed-size model. In mid‘80s the scaled-size
model was developed and subsequently substantiated by experiments on a 1024-
processor hypercube. The scaled size model specifies that the storage complexity
grows in proportion to the number of processors. A third model is the fixed-time model,
in which the problem is scaled to take a constant time as processors are added and
rarely used in real-world applications. Algorithm described here is optimized for the
fixed-size model. It is a modification of the Quicksort algorithm by C. A. R. Hoare(1962)
to be utilized on a system with several processors (or cores).

            On the first step, original data set is viewed as blocks of twice the size of the L1
cache (which is typically 32 or 64 kB). Processor with the smallest PID chooses the
pivot element. Then all processors in parallel invoke “neutralization” function on the
leftmost and the rightmost remaining blocks, effectively swapping elements respective
to the value of the pivot, which leaves only <=P+1 blocks to be sorted. After that
remaining not “neutralized” blocks are getting swapped with the “neutralized” ones and
getting sorted sequentially.

            The next step is to split given data set at the pivotal point and assign processors
to each half according to its size. Stack is used to keep track of the state of the sorting
algorithm and the sequential steps of the recursion are turned into PUSH and POP
operations on this stack. Whenever a processor encounters a small subarray, which it
can fit in the cache, it will use inserting sort to sort it without PUSHing it into the stack.
When a processor finished its job, it begins helping other processors by POPing out
unused (yet unsorted) arrays from their stacks.

            Such optimization of algorithm brings average time of partition phase to O(N/P),
for N>>B, where N – number of elements, B – number of elements in one block and P –
number of processors, and the sorting phase yields us speedup O(P), provided that all
processors are largely independent from one another at this stage and no
synchronization required. This bring total speedup up to T(s)/T(p) = P, i.e. linear
speedup.

            Also, reduced time of memory access due to cache effect further decreases
overhead and yields super linear speedup.
Ans 1

Merge sort first divides the unsorted list into smallest possible sub-lists, compares it with
the adjacent list, and merges it in a sorted order. It implements parallelism very nicely
by following the divide and conquer algorithm.

procedureparallelmergesort(id, n, data, newdata)

begin

data = sequentialmergesort(data)

for dim = 1 to n

data = parallelmerge(id, dim, data)

endfor

newdata = data

end

Ans 2

In the conventional sequential BFS algorithm, two data structures are created to store
the frontier and the next frontier. The frontier contains the vertexes that have same
distance(it is also called "level") from the source vertex, these vertexes need to be
explored in BFS. Every neighbor of these vertexes will be checked, some of these
neighbors which are not explored yet will be discovered and put into the next frontier. At
the beginning of BFS algorithm, a given source vertex s is the only vertex in the frontier.
All direct neighbors of s are visited in the first step, which form the next frontier. After
each layer-traversal, the "next frontier" is switched to the frontier and new vertexes will
be stored in the new next frontier. The following pseudo-code outlines the idea of it, in
which the data structures for the frontier and next frontier are called FS and NS
respectively.

1 define bfs_sequential(graph(V,E), source s):

2 for all v in V do

3 d[v] = -1;

4 d[s] = 0; level = 0; FS = {}; NS = {};


5 push(s, FS);

6 while FS !empty do

7 for u in FS do

8 for each neighbour v of u do

9 if d[v] = -1 then

10 push(v, NS);

11 d[v] = level;

12 FS = NS, NS = {}, level = level + 1;

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy