0% found this document useful (0 votes)
215 views10 pages

Parallel Computing LessonPlan

The document outlines a lesson plan for the Parallel Computing course (BCS702) for the 7th semester of the 2025-2026 academic year, detailing course structure, prerequisites, outcomes, and evaluation methods. It includes modules on parallel programming, GPU programming, distributed memory programming with MPI, shared-memory programming with OpenMP, and GPU programming with CUDA, along with associated teaching hours and assessment criteria. The plan emphasizes the importance of understanding parallel computing concepts and techniques to effectively utilize modern computing resources.

Uploaded by

pallavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
215 views10 pages

Parallel Computing LessonPlan

The document outlines a lesson plan for the Parallel Computing course (BCS702) for the 7th semester of the 2025-2026 academic year, detailing course structure, prerequisites, outcomes, and evaluation methods. It includes modules on parallel programming, GPU programming, distributed memory programming with MPI, shared-memory programming with OpenMP, and GPU programming with CUDA, along with associated teaching hours and assessment criteria. The plan emphasizes the importance of understanding parallel computing concepts and techniques to effectively utilize modern computing resources.

Uploaded by

pallavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

COURSE LESSON PLAN

Semester: 7th Academic Year: 2025-2026


Course Name: Parallel Computing Course Code: BCS702
Total Teaching Hours: 50 Duration of Exam:03
Lecture-Tutorial-Practical (LTP): 4:2:0 Total No. of contact hours per week: 4
Exam Marks: 100 IA Marks: 50
Lesson Plan Author/Employee ID: Prof. Shammi L Date:28/06/2025
Checked By: Date:

Preamble: The Internet of Things (IoT) is a revolutionary paradigm that connects everyday
physical objects to the internet, enabling real-time data collection, remote control, and
intelligent decision-making. As smart devices become increasingly embedded into our lives—
ranging from home automation systems to industrial monitoring the need for robust
understanding of IoT technologies, platforms, protocols, and design methodologies is essential.

Prerequisites: Students are expected to have basic proficiency in C programming, along with a
fundamental understanding of computer organization, data structures, and operating systems.
Familiarity with process and thread management will be helpful for grasping concepts related
to memory models and synchronization in parallel computing.

Course Outcomes:
(Indicate levels of learning in accordance with Bloom’s Taxonomy)

Course Module Bloom’s


Outcome Course Outcome Statement Numbe taxonomy
Number r/s Level
CO1 Explain the need for parallel programming. 1 L2
CO2 Demonstrate parallelism in MIMD system. 2 L2
Apply MPI library to parallelize the code to solve the given
CO3 3 L3
problem.
Apply OpenMP pragma and directives to parallelize the code
CO4 4 L3
to solve the given problem.
CO5 Design a CUDA program for the given problem. 5 L3

Parallel Computing Page 1 of 10


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Scheme from VTU website:

Parallel Computing Page 2 of 10


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

COURSE CONTENT (CBCS)

Module – 1
Introduction to Parallel programming, Parallel hardware and parallel software:
Classifications of parallel computers, SIMD systems, MIMD systems, Interconnection
networks, Cache coherence, Shared-memory vs distributed memory, Coordinating the 10 hrs
processes/threads, Shared-memory, Distributed-memory.
Textbook: Ch.1, 2: 2.3, 2.4(2.4.2-2.4.4)
Module – 2
GPU programming, Programming hybrid systems, MIMD systems, GPUs,
Performance: Speedup and efficiency in MIMD systems, Amdahl’s law, Scalability in 10hrs
MIMD systems, Taking timings of MIMD programs, GPU performance.
Textbook: Ch.2: 2.4 (2.4.5, 2.4.6), 2.5, 2.6
Module– 3
Distributed memory programming with MPI: MPI functions, The trapezoidal rule in
MPI, Dealing with I/O, Collective communication, MPI-derived datatypes, Performance 10hrs
evaluation of MPI programs, A parallel sorting algorithm.
Textbook 1: Ch.3: 3.1 – 3.7

Module – 4
Shared-memory programming with OpenMP: OpenMP pragmas and directives, The
trapezoidal rule, Scope of variables, The reduction clause, loop carried dependency,
scheduling, producers and consumers, Caches, cache coherence and false sharing in 10hrs
OpenMP, tasking, thread safety.
Textbook: Ch.5: 5.1 – 5.11
Module – 5
GPU programming with CUDA: GPUs and GPGPU, GPU architectures, Heterogeneous
computing, Threads, blocks, and grids, Nvidia compute capabilities and device
architectures, Vector addition, Returning results from CUDA kernels, CUDA trapezoidal 10hrs
rule I, CUDA trapezoidal rule II: improving performance, CUDA trapezoidal rule III:
blocks with more than one wrap.
Textbook: Ch.6: 6.1-6.11, 6.13

Parallel Computing Page 3 of 10


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

REFERENCES:

Book Publication Information


Code Title & Author
Type Edition Publisher Year
Peter S Pacheco, Matthew Malensek – An
T1 2nd Morgan Kauffman
Introduction to Parallel Programming
Text
Michael J Quinn – Parallel Programming
T2 McGrawHill
in C with MPI and OpenMP
Calvin Lin, Lawrence Snyder –
R1 Principles of Parallel Pearson
Programming
Barbara Chapman – Using Scientific &
Reference R2 OpenMP: Portable Shared Engineering
Memory Parallel Programming Computation
William Gropp, Ewing Lusk – Scientific &
R3 Using MPI: Portable Parallel 3rd
Engineering
Programming Computation

Evaluation Scheme for INTERNAL ASSESSMENT

Assessment Weightage in Marks

Internal Assessment Exam 1 15


Internal Assessment Exam 2 15
Internal Assessment Exam 3 15
Assignments/Any other activity 10
Lab Internals/Record and Observation 25
Total 50

Course Unitization for IA Exams and Semester Examination

No. of Questions in No. of Questions in


Teaching
Module Chapter Improvement- IA
Hours IA Exam I IA Exam II Exam III

Module-1, 50% of VTU Exam VTU Exam


1 15
Module 2 Pattern Pattern
VTU Exam
Module 2-last 50% (Answer any (Answer any
2 15 Pattern
&3 two questions) two questions)
3 Module-4,5 20

Note: Each Question carries 20 marks and may consist of sub-questions.

Date: Head of the department


Parallel Computing Page 4 of 10
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Module - wise Plan

MODULE - I

Lesson Schedule:
Clas
Portions Covered
s No. Text
1 Introduction to Parallel programming T1
2 Parallel hardware T1
3 Parallel software: Classifications of parallel computers T1
4 SIMD systems, MIMD systems T1
5 Interconnection networks T1
6 Cache coherence, Shared-memory vs distributed memory T1
Coordinating the processes/threads, Shared-memory, Distributed-
7 T1
memory

Questions as per Bloom’s Taxonomy

Level 1:

1. What are the two main classifications of parallel computers?


2. Define SIMD and MIMD systems.
3. List any four interconnection network types used in parallel computing.
4. What is cache coherence?
5. Identify whether the following is a shared-memory or distributed-memory system
(provide examples).

Level 2:

1. Explain the key differences between SIMD and MIMD architectures.


2. Describe the concept of shared-memory and how it differs from distributed-memory
systems.
3. Why is cache coherence important in shared-memory systems?
4. Summarize the role of interconnection networks in parallel hardware.
5. Distinguish between thread-level and process-level parallelism.

Level 3:

1. Apply the concept of shared-memory architecture to illustrate how threads


communicate in OpenMP.
2. Given a simple problem, demonstrate how MIMD systems can execute it in parallel.
3. Use a diagram to show how an interconnection network supports communication
between processors.
4. Illustrate how to coordinate processes in a distributed-memory system using MPI.
5. Apply knowledge of cache coherence to identify performance issues in a multi-threaded
program.

Parallel Computing Page 5 of 10


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

MODULE – II
Lesson Schedule:
Clas
Portions Covered Text
s No.
9 GPU programming T1
10 Programming hybrid systems T1
11 MIMD systems, GPUs T1
12 Performance: Speedup and efficiency in MIMD systems T1

13 Amdahl’s law, Scalability in MIMD systems T1

14 Taking timings of MIMD programs T1

15 GPU performance T1

Questions as per Bloom’s Taxonomy

Level 1:
1. Define MIMD (Multiple Instruction, Multiple Data) system.
2. What is speedup in the context of parallel computing?
3. State Amdahl’s Law.
4. List the key components of a hybrid system involving CPU and GPU

Level 2:

1. Explain the difference between GPU and CPU architectures.


2. Describe how speedup and efficiency are calculated in MIMD systems.
3. Interpret the meaning of Amdahl’s Law and its implication on scalability.
4. Explain how GPU performance can be measured and improved.
5. Compare shared-memory and hybrid computing models.

Level 3:

1. Apply Amdahl’s Law to calculate the theoretical speedup when 80% of a task is
parallelized over 4 processors.
2. Use timing functions to measure and compare execution time of a sequential vs. parallel
MIMD program.
3. Given a real-world task (e.g., image processing), show how to implement it using GPU
for performance gain.
4. Apply knowledge of hybrid systems to design a simple CPU-GPU cooperative task.
5. Demonstrate performance scaling by running a parallel code with increasing processor
count and plotting speedup.

Parallel Computing Page 6 of 10


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

MODULE – III
Lesson Schedule:

Clas
Portions Covered Text
s No.
17 Distributed memory programming with MPI: MPI functions T1

18 The trapezoidal rule in MPI T1

19 Dealing with I/O T1

20 Collective communication T1

21 MPI-derived datatypes T1

22 , Performance evaluation of MPI programs T1

23 A parallel sorting algorithm T1

Questions as per Bloom’s Taxonomy

Level 1:
1. Name any four basic MPI functions used in every MPI program.
2. Define collective communication in MPI.
3. What is the purpose of MPI_Comm_rank and MPI_Comm_size?
4. List any two MPI collective communication functions.

Level 2:

1. Explain the role of the trapezoidal rule in demonstrating parallel computation with MPI.
2. Describe how MPI handles communication between processes.
3. Explain how MPI-derived datatypes can help in structuring communication.
4. Summarize the difference between point-to-point and collective communication in MPI.
5. Describe the need for performance evaluation in MPI programs.

Level 3:

1. Apply MPI_Bcast to distribute input data from the root process to all other processes.
2. Implement the trapezoidal rule in MPI to approximate definite integrals in parallel.
3. Use MPI functions to write a parallel program that sorts a set of numbers using a
distributed sorting algorithm.
4. Apply MPI file I/O routines to write output from multiple processes into a common file.
5. Evaluate the performance of an MPI-based matrix multiplication program by measuring
execution time with increasing process count.

MODULE – IV
Lesson Schedule:
Parallel Computing Page 7 of 10
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Class
Portions Covered Text
No.
Shared-memory programming with OpenMP: OpenMP pragmas
27 T1
and directives
28 The trapezoidal rule T1
29 Scope of variables T1
30 The reduction clause T1
31 loop carried dependency T1
32 scheduling T1
33 producers and consumers T1
34 Caches T1
35 Cache coherence and false sharing in OpenMP T1
36 Tasking T1
37 Thread safety T1

Questions as per Bloom’s Taxonomy

Level 1:
1. What is OpenMP used for in parallel programming?
2. List any four commonly used OpenMP directives.
3. Define the reduction clause in OpenMP.
4. What is loop-carried dependency?
5. What does the #pragma omp parallel directive do?
Level 2:
1. Explain the purpose of the trapezoidal rule and how it's implemented in OpenMP.
2. Describe how variable scope (shared vs. private) affects parallel execution in OpenMP.
3. Explain the impact of false sharing on OpenMP program performance.
4. Summarize the scheduling types supported by OpenMP and their differences.
5. Explain how tasking is used in OpenMP and why it is useful.

Level 3:
1. Apply OpenMP pragmas to parallelize a trapezoidal rule-based numerical integration
program.
2. Write a parallel program using OpenMP to compute the sum of an array using the
reduction clause.
3. Use scheduling strategies to optimize load balancing in a loop-heavy program.
4. Demonstrate false sharing using a parallel array update and suggest a fix.
5. Implement a producer-consumer problem using OpenMP sections or tasks with proper
synchronization.

MODULE – V
Parallel Computing Page 8 of 10
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Lesson Schedule:
Clas
Portions Covered Text
s No.
38 GPU programming with CUDA: GPUs and GPGPU, GPU architectures T1
39 Heterogeneous computing, Threads, blocks, and grids T1
40 Nvidia compute capabilities and device architectures T1
41 Vector addition T1
42 Returning results from CUDA kernels T1
43 CUDA trapezoidal rule I T1
44 CUDA trapezoidal rule II T1
45 Improving performance T1
46 CUDA trapezoidal rule III: blocks with more than one wrap T1

Questions as per Bloom’s Taxonomy

Level 1:
1. List the three main components in CUDA thread hierarchy.
2. What is a CUDA kernel?
3. Name any two Nvidia compute capability versions.
4. Define a CUDA block and a CUDA grid.

Level 2:
1. Explain the difference between CPU and GPU architectures.
2. Describe the structure and role of threads, blocks, and grids in a CUDA program.
3. Summarize the concept of heterogeneous computing and its benefits.
4. Explain how CUDA kernels return results to the host.
5. Describe how the trapezoidal rule is parallelized in CUDA.

Level 3:
1. Apply CUDA programming to implement a simple vector addition using kernel launch.
2. Use CUDA to implement the trapezoidal rule for numerical integration (Version I).
3. Modify your trapezoidal rule implementation to improve performance by reducing
memory accesses (Version II).
4. Apply thread and block configurations for large input sizes using multiple warps
(Version III).
5. Demonstrate the impact of block size on performance by benchmarking different
configurations in a CUDA program.

Parallel Computing Page 9 of 10


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Course outcomes, CO Mapping with POs / PSOs

COURSE PROGRAM OUTCOMES


OUTCOME PO PO PO
S PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PSO1 PSO2 PSO3
10 11 12

CO1 2 “_” “_” 2 “_” “_” “_” “_” “_” “_” “_” 2 2 “_” 2

CO2 3 3 3 3 3 “_” “_” “_” “_” “_” “_” 2 3 “_” 3

CO3 3 3 3 3 3 “_” “_” “_” “_” “_” “_” 2 3 2 3

CO4 3 3 3 3 3 “_” “_” “_” “_” “_” “_” 2 3 “_” 3

CO5 3 3 3 3 3 “_” “_” “_” “_” “_” “_” 2 3 2 3

Correlated levels:
High (H) :3
Medium (M) : 2
Low (L) :1

Parallel Computing Page 10 of 10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy