CS403

PARALLEL ARCHITECTURES AND PROGRAMMING

Objectives

  • To understand the fundamental principles and engineering trade-offs involved in designing modern parallel computers
  • To develop programming skills to effectively implement parallel architecture

 

Outcomes

  • Ability to design parallel programs to enhance machine performance in parallel hardware environment
  • Ability to design and implement parallel programs in modern environments such as CUDA, OpenMP, etc.

 

Unit – I

          Introduction: The need for parallelism, Forms of parallelism (SISD, SIMD, MISD, MIMD), Moore's Law and Multi-cores, Fundamentals of Parallel Computers, Communication architecture, Message passing architecture, Data parallel architecture, Dataflow architecture, Systolic architecture, Performance Issues.

Unit – II

          Large Cache Design: Shared vs. Private Caches, Centralized vs. Distributed Shared Caches, Snooping-based cache coherence protocol, directory-based cache coherence protocol, Uniform Cache Access, Non-Uniform Cache Access, D-NUCA, S-NUCA, Inclusion, Exclusion, Difference between transaction and transactional memory, STM, HTM.

 

Unit – III

         Graphics Processing Unit: GPUs as Parallel Computers, Architecture of a modern GPU, Evolution of Graphics Pipelines, GPGPUs, Scalable GPUs, Architectural characteristics of Future Systems, Implication of Technology and Architecture for users, Vector addition, Applications of GPU.

 

Unit – IV

         Introduction to Parallel Programming: Strategies, Mechanism, Performance theory, Parallel Programming Patterns: Nesting pattern, Parallel Control Pattern, Parallel Data Management, Map: Scaled Vector, Mandelbrot, Collative: Reduce, Fusing Map and Reduce, Scan, Fusing Map and Scan, Data Recognition: Gather, Scatter, Pack , Stencil and Recurrence, Fork-Join, Pipeline

Unit – V

        Parallel Programming Languages: Distributed Memory Programming with MPI: trapezoidal rule in MPI, I/O handling, MPI derived datatype, Collective Communication, Shared Memory Programming with Pthreads: Conditional Variables, read-write locks, Cache handling, Shared memory programming with Open MP: Parallel for directives, scheduling loops, Thread Safety, CUDA: Parallel programming in CUDA C, Thread management, Constant memory and Event, Graphics Interoperability, Atomics, Streams.

 

TEXT BOOKS/REFERENCE BOOKS

  1. D. E. Culler, J. P. Singh, and A. Gupta, “Parallel Computer Architecture”, MorganKaufmann, 2004  
  2. Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar, “Multi-Core Cache Hierarchies”, Morgan & Claypool Publishers, 2011
  3. Peter and Pach Eco, “An Introduction to Parallel Programming”, Elsevier, 2011
  4. James R. Larus and Ravi Rajwar, “Transactional Memory”, Morgan & Claypool Publishers, 2007
  5. David B. Kirk, Wen-mei W. Hwu, “Programming Massively Parallel Processors: A Hands-on Approach”, 2010
  6. Barbara Chapman, F. Desprez, Gerhard R. Joubert, Alain Lichnewsky, Frans Peters “Parallel Computing: From Multicores and GPU's to Petascale”, 2010
  7. Michael McCool, James Reinders, Arch Robison, “Structured Parallel Programming: Patterns for Efficient Computation”, 2012
  8. Jason Sanders, Edward Kandrot, “CUDA by Example: An Introduction to GeneralPurpose GPU Programming”, 2011