CSS 534: Parallel Programming in Grid and Cloud
Autumn 2016

TTh 545-750pm at UW1-010

Prof. Munehiro Fukuda


Professor:

Munehiro Fukuda <mfukuda@u.washington.edu>, room UW1-331, phone 352-3459, office hours: TTh 5:10-5:40pm and 7:45-8:15pm or by appointment

Course Description:

This course closely examines programming methodology and middleware used for parallel computing in grid and cloud systems, and develops your programming skills high enough to design and build application-specific parallel software with up-to-date grid and cloud programming environments.

Topics covered include parallel-computing platforms, parallel-programming models (such as supported with OpenMP, MPI, and other paradigm-oriented language tools such as MapReduce), programming patterns (such as task and data parallelism), middleware for job and resource management, and fault tolerance.

For each of the topics, we will cover the background and motivation, programming and design theory, and current technology implementation in grid and cloud. In addition, the first six weeks will advance your parallel-programming knowledge and skills through in-class discussions and laboratory hands-on programming exercises, while the last four weeks will strengthen your ability of system analysis through literature surveys and presentations on grid/cloud systems.

Four programming assignments are given: (1) shared-memory-based programming with a tool such as OpenMP, (2) message-passing-based programming with a tool such as MPI, (3) paradigm-oriented programming with MapReduce, and (4) parallelization of an open problem with the MASS (multi-agent spatial simulation) library

Prerequisites:

MS in Computer Science and Software Engineering as well as Cyber-Security Engineering only

Work Load and Grading:

Course Work Percentage Achievements Approximately Corresponding Numeric Grade
Programming 1 15% 90s 3.5 - 4.0
Programming 2 15% 80s 2.5 - 3.4
Programming 3 15% 70s 1.7 - 2.4
Programming 4 20% 60s or below 0.0 - 1.6
Literature Survey 15%
Midterm Exam 10%
Final Exam 10%

Textbooks/References:

The class mainly focuses on the original lecture notes. Therefore, the course has no specific textbook although there are many useful references (marked with recommended) and manuals to assist your programming and literature-survey work. Note that some of the following references start with an abbreviation that is used to indicate chapters referred in our course schedule.
  1. Parallel Programming
  2. Grid and Cloud
  3. OpenMP
  4. MPI
  5. MapReduce

Policies:

Although post-lecture laboratory work can be done in collaboration, all programming assignments 1 through to 4 are to be done independently. Any collaboration of work will result in severe penalty. You may discuss the problem statement and any clarification with each other, but any actual work to be turned in must be done without collaboration.

Any homework is due at the beginning of class on the due date. The submission may be postponed only in emergencies such as accidents, sickness, sudden business trips, and family emergencies, in which case you may turn in your homework late with a written proof. No make-up exams will be given except under exceptional circumstances. Barring emergencies, I must be informed before the exam.

To request academic accommodations due to a disability, please contact Disability Resources for Services (DRS) in UW1-170, (email: rosal@uw.edu, TDD: 425-352-5307, and FAX: 425-352-5114). If you have a documented disability on file with the DRS office, please have your DRS counselor contact me and we can discuss accommodations.

Course Goals:

The overall goal of CSS 534, "Parallel Programming in Grid and Cloud" includes:

Programming Assignments:

Four programming assignments are given. Please read assignment.html to understand the environment you use for assignments and the submission/grading procedures. You will submit only a soft copy to CollectIt for Homework.

Laboratory Work:

For the first five weeks, we are planning to have post-lecture laboratory sessions that allow you to get familiar with tools such as OpenMP, MPI, MapReduce, and MASS so as to start each programming assignment smoothly. For each laboratory session, we will move from the classroom to the UW1-320 Linux laboratory where you may work together or independently on each programming exercise. Each student must independently turn in the source code and execution results to CollectIt for Lab.

Group Discussions:

The first six weeks will also have post-lecture group discussions where you will form a team of 3 or 4 students, discuss about given topics, and present the discussions in the classroom.

Literature Surveys and Presentations

The last four weeks focus on case studies of grid and cloud systems. There you will choose one system, study about it through literature survey, and give a presentation of what you studied. Please refer to CSS534 Survey Work for more details.

Topics covered and tentative 534 schedule:

Note that this is an approximate ordering of topics. Chapters will take about the allotted time and not all sections in all chapters are covered.

(Static/dynamic load balancing)
Week Date Topics Sessions Readings Assignment
0 Sept 29 System Models
SMP, clusters, hybrid systems, grid, and cloud
  Patterns Ch 2: Background and Jargon of Parallel Computing
CDK4 Ch 2: System Models
 
1 Oct 4 Programming Foundations
Intro to shared memory, messag passing, embarassingly parallel, divide and conquer, pipeline, and synhcronization
Discussion 0
(Grid vs Cloud, and how to distribute data)
MPI Ch 1: Introduction  
  Oct 6 Share Memory Programming 1
Loop parallelism, multi-processes, multi-threads, and OpenMP
Laboratory 1
(OpenMP)
Patterns Appx A: Intro to OpenMP
OpenMP Ch 1: Introduction
OpenMP Ch 3: Writing a First OpenMP Program
OpenMP Ch 4: OpenMP Languge Features
Program 1 assigned
2 Oct 11 Shared Memory Programming 2
Memory model, DSM, and performance issues
  Patterns Ch 2: Background and Jargon of Parallel Programming
OpenMP Ch 5: How to Get Good Performance
 
  Oct 13   Discussion 1
(Static/dynamic load balancing)
   
3 Oct 18 Message Passing Programming 1
MPI: Message Passing Interface
  Patterns Appx B: Intro to MPI
MPI Ch 3: Greeting
MPI Ch 4: Numerical Integration
MPI Ch 5: Collective Communicateion
 
  Oct 20   Laboratory 2
(MPI)
  Program 1 due
Program 2 assigned
4 Oct 25 Message Passing Programming 2
MPI Performance
Hybrid OpenMP and MPI
Discussion 2
(OpenMP vs MPI and Hybrid pros/cons)
MPI Ch 11: Performance
MPI Ch 12: More on Performance
 
  Oct 27 Paradigm-Oriented Programming 1
Task parallelism and MapReduce
Laboratory 3
(MapReduce)
Patterns Ch3.2: Task Decomposition
Patterns Ch4.4: Task Parallelism
MapReduce Ch 2: MapReduce Basics
MapReduce Ch 3: MapReduce Algorithm Design
 
5 Nov 1 Paradigm-Oriented Programming 2
MapReduce applications
Discussion 3
(MapReduce pros/cons and MapReduce-suitable applications)
MapReduce Ch 4: Inverted Indexing for Text Retrieval
MapReduce Ch 5: Graph Algorithms
 
  Nov 3 Paradigm-Oriented Programming 3
Data parallelism, GlobalArrays, (Spark), MASS, (RepatHPC), and thread migration
Laboratory 4
(MASS)
Patterns Ch3.3: Data Decomposition
Patterns Ch4.6: Geometric Decomposition
http://www.emsl.pnl.gov/docs/global/ https://spark.apache.org/
Program 2 due
Program 3 assigned
6 Nov 8 Midterm Exam in class      
  Nov 10   Discussion 4
(Paradim programming vs DSM, and their computation granurality)
http://repast.sourceforge.net/docs.php
http://depts.washington.edu/dslab/MASS
 
7 Nov 15 Parallel Programming Patterns
Task, Data, and Mixed parallelism
Discussion 5
(Application design with task, data, and mixed parallelism)
Patterns Ch4:The Algorithm Structure Design Space
Patterns Ch5:The Supporting Structures Design Space
Patterns Ch6:The Implementation Mechanisms Desgin Space
 
  Nov 17 Job Management
Rsh, intra/inter-cluster and centralized/decentralized scheduling, autonomic computing, and cluster virtualization
  Grid-D Ch 5: Schedulers
Grid-2 Ch 18: Resource and Service Management
Program 3 due
Program 4 assigned
8 Nov 22 OpenPBS, Condor, Globus (GRAM and DUROC), Amazon EC2 (XenoServer), StartCluster, and Azure Scheduler/Batch Survey Work Presentation 1 http://www.mcs.anl.gov/research/projects/openpbs/
http://www.cs.wisc.edu/condor/
http://www.globus.org/
http://www.cl.cam.ac.uk/research/srg/netos/xeno/
http://graal.ens-lyon.fr/diet/
http://star.mit.edu/cluster/
https://azure.microsoft.com/en-us/services/scheduler/
https://azure.microsoft.com/en-us/documentation/services/batch/
 
  Nov 24 Holiday (No School)      
9 Nov 29 File Management
Network, distributed, and parallel file systems
  CDK4 Ch 8: DFS
Grid-D Ch 9: Data Management
Grid-2 Ch 22: Data Access, Integration, and Management
Hadoop Ch 3: Hadoop DFS
Cloud Ch 2:Amazon Cloud Computing
 
  Dec 1 NFS, AFS, PVFS, MPI/IO, Hadoop, Amazon-S3, Azure Storage, and Apache Storm Survey Work Presentation 2 http://www.openafs.org/
http://www.parl.clemson.edu/pvfs/
http://www.mcs.anl.gov/projects/romio/
http://hadoop.apache.org/
http://aws.amazon.com/s3/
https://azure.microsoft.com/en-us/documentation/services/storage/
http://storm.apache.org/
 
10 Dec 6 Fault Tolerance
Two-phase commitment, replication, and check-pointing
  CDK4 Ch 14: Distributed Transaction
CDK4 Ch 15: Replication
Cloud Ch 4: Ready for the Cloud
Cloud Ch 6: Disaster Recovery
 
  Dec 8 MS Cloud DB, Coda/AFS, Condor-MW, MapReduce/Hadoop, and FT-MPI Survey Work Presentation 3 http://azure.microsoft.com/blog/2012/07/30/fault-tolerance-in-windows-azure-sql-database/
http://www.coda.cs.cmu.edu/
http://www.cs.wisc.edu/condor/mw/
http://hadoop.apache.org/
http://icl.cs.utk.edu/ftmpi/
 
11 Dec 13 Student Final Project Student Presentation   Program 4 due
  Dec 15 Final Exam in class