• To understand the fundamental principles and engineering trade-offs involved in designing modern parallel computers
  • To develop programming skills to effectively implement parallel architecture



  • Ability to design parallel programs to enhance machine performance in parallel hardware environment
  • Ability to design and implement parallel programs in modern environments such as CUDA, OpenMP, etc.


Unit – I

          Introduction: The need for parallelism, Forms of parallelism (SISD, SIMD, MISD, MIMD), Moore's Law and Multi-cores, Fundamentals of Parallel Computers, Communication architecture, Message passing architecture, Data parallel architecture, Dataflow architecture, Systolic architecture, Performance Issues.

Unit – II

          Large Cache Design: Shared vs. Private Caches, Centralized vs. Distributed Shared Caches, Snooping-based cache coherence protocol, directory-based cache coherence protocol, Uniform Cache Access, Non-Uniform Cache Access, D-NUCA, S-NUCA, Inclusion, Exclusion, Difference between transaction and transactional memory, STM, HTM.


Unit – III

         Graphics Processing Unit: GPUs as Parallel Computers, Architecture of a modern GPU, Evolution of Graphics Pipelines, GPGPUs, Scalable GPUs, Architectural characteristics of Future Systems, Implication of Technology and Architecture for users, Vector addition, Applications of GPU.


Unit – IV

         Introduction to Parallel Programming: Strategies, Mechanism, Performance theory, Parallel Programming Patterns: Nesting pattern, Parallel Control Pattern, Parallel Data Management, Map: Scaled Vector, Mandelbrot, Collative: Reduce, Fusing Map and Reduce, Scan, Fusing Map and Scan, Data Recognition: Gather, Scatter, Pack , Stencil and Recurrence, Fork-Join, Pipeline

Unit – V

        Parallel Programming Languages: Distributed Memory Programming with MPI: trapezoidal rule in MPI, I/O handling, MPI derived datatype, Collective Communication, Shared Memory Programming with Pthreads: Conditional Variables, read-write locks, Cache handling, Shared memory programming with Open MP: Parallel for directives, scheduling loops, Thread Safety, CUDA: Parallel programming in CUDA C, Thread management, Constant memory and Event, Graphics Interoperability, Atomics, Streams.



  1. D. E. Culler, J. P. Singh, and A. Gupta, “Parallel Computer Architecture”, MorganKaufmann, 2004  
  2. Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar, “Multi-Core Cache Hierarchies”, Morgan & Claypool Publishers, 2011
  3. Peter and Pach Eco, “An Introduction to Parallel Programming”, Elsevier, 2011
  4. James R. Larus and Ravi Rajwar, “Transactional Memory”, Morgan & Claypool Publishers, 2007
  5. David B. Kirk, Wen-mei W. Hwu, “Programming Massively Parallel Processors: A Hands-on Approach”, 2010
  6. Barbara Chapman, F. Desprez, Gerhard R. Joubert, Alain Lichnewsky, Frans Peters “Parallel Computing: From Multicores and GPU's to Petascale”, 2010
  7. Michael McCool, James Reinders, Arch Robison, “Structured Parallel Programming: Patterns for Efficient Computation”, 2012
  8. Jason Sanders, Edward Kandrot, “CUDA by Example: An Introduction to GeneralPurpose GPU Programming”, 2011