CSS 534: Parallel Programming in Grid and Cloud
Autumn 2016
TTh 545-750pm at UW1-010
Prof. Munehiro Fukuda
This course closely examines programming methodology and middleware used for parallel computing in grid and cloud systems, and develops your programming skills high enough to design and build application-specific parallel software with up-to-date grid and cloud programming environments.
Topics covered include parallel-computing platforms, parallel-programming models (such as supported with OpenMP, MPI, and other paradigm-oriented language tools such as MapReduce), programming patterns (such as task and data parallelism), middleware for job and resource management, and fault tolerance.
For each of the topics, we will cover the background and motivation, programming and design theory, and current technology implementation in grid and cloud. In addition, the first six weeks will advance your parallel-programming knowledge and skills through in-class discussions and laboratory hands-on programming exercises, while the last four weeks will strengthen your ability of system analysis through literature surveys and presentations on grid/cloud systems.
Four programming assignments are given: (1) shared-memory-based programming with a tool such as OpenMP, (2) message-passing-based programming with a tool such as MPI, (3) paradigm-oriented programming with MapReduce, and (4) parallelization of an open problem with the MASS (multi-agent spatial simulation) library
Course Work | Percentage | Achievements | Approximately Corresponding Numeric Grade |
Programming 1 | 15% | 90s | 3.5 - 4.0 |
Programming 2 | 15% | 80s | 2.5 - 3.4 |
Programming 3 | 15% | 70s | 1.7 - 2.4 |
Programming 4 | 20% | 60s or below | 0.0 - 1.6 |
Literature Survey | 15% | ||
Midterm Exam | 10% | ||
Final Exam | 10% |
Although post-lecture laboratory work can be done in collaboration, all programming assignments 1 through to 4 are to be done independently. Any collaboration of work will result in severe penalty. You may discuss the problem statement and any clarification with each other, but any actual work to be turned in must be done without collaboration.
Any homework is due at the beginning of class on the due date. The submission may be postponed only in emergencies such as accidents, sickness, sudden business trips, and family emergencies, in which case you may turn in your homework late with a written proof. No make-up exams will be given except under exceptional circumstances. Barring emergencies, I must be informed before the exam.
To request academic accommodations due to a disability, please contact Disability Resources for Services (DRS) in UW1-170, (email: rosal@uw.edu, TDD: 425-352-5307, and FAX: 425-352-5114). If you have a documented disability on file with the DRS office, please have your DRS counselor contact me and we can discuss accommodations.
Week | Date | Topics | Sessions | Readings | Assignment |
---|---|---|---|---|---|
0 | Sept 29 | System Models SMP, clusters, hybrid systems, grid, and cloud |
Patterns Ch 2: Background and Jargon of Parallel Computing CDK4 Ch 2: System Models | ||
1 | Oct 4 | Programming Foundations Intro to shared memory, messag passing, embarassingly parallel, divide and conquer, pipeline, and synhcronization |
Discussion 0 (Grid vs Cloud, and how to distribute data) |
MPI Ch 1: Introduction | |
Oct 6 | Share Memory Programming 1 Loop parallelism, multi-processes, multi-threads, and OpenMP |
Laboratory 1 (OpenMP) |
Patterns Appx A: Intro to OpenMP OpenMP Ch 1: Introduction OpenMP Ch 3: Writing a First OpenMP Program OpenMP Ch 4: OpenMP Languge Features |
Program 1 assigned | |
2 | Oct 11 | Shared Memory Programming 2 Memory model, DSM, and performance issues |
(Static/dynamic load balancing) | Patterns Ch 2: Background and Jargon of Parallel Programming OpenMP Ch 5: How to Get Good Performance |
|
Oct 13 | Discussion 1 (Static/dynamic load balancing) |
||||
3 | Oct 18 | Message Passing Programming 1 MPI: Message Passing Interface |
Patterns Appx B: Intro to MPI MPI Ch 3: Greeting MPI Ch 4: Numerical Integration MPI Ch 5: Collective Communicateion |
||
Oct 20 | Laboratory 2 (MPI) |
Program 1 due Program 2 assigned |
|||
4 | Oct 25 | Message Passing Programming 2 MPI Performance Hybrid OpenMP and MPI |
Discussion 2 (OpenMP vs MPI and Hybrid pros/cons) |
MPI Ch 11: Performance MPI Ch 12: More on Performance |
|
Oct 27 | Paradigm-Oriented Programming 1 Task parallelism and MapReduce |
Laboratory 3 (MapReduce) |
Patterns Ch3.2: Task Decomposition Patterns Ch4.4: Task Parallelism MapReduce Ch 2: MapReduce Basics MapReduce Ch 3: MapReduce Algorithm Design |
||
5 | Nov 1 | Paradigm-Oriented Programming 2 MapReduce applications |
Discussion 3 (MapReduce pros/cons and MapReduce-suitable applications) |
MapReduce Ch 4: Inverted Indexing for Text Retrieval MapReduce Ch 5: Graph Algorithms |
|
Nov 3 | Paradigm-Oriented Programming 3 Data parallelism, GlobalArrays, (Spark), MASS, (RepatHPC), and thread migration |
Laboratory 4 (MASS) |
Patterns Ch3.3: Data Decomposition Patterns Ch4.6: Geometric Decomposition http://www.emsl.pnl.gov/docs/global/ https://spark.apache.org/ |
Program 2 due Program 3 assigned |
|
6 | Nov 8 | Midterm Exam in class | |||
Nov 10 | Discussion 4 (Paradim programming vs DSM, and their computation granurality) |
http://repast.sourceforge.net/docs.php http://depts.washington.edu/dslab/MASS |
|||
7 | Nov 15 | Parallel Programming Patterns Task, Data, and Mixed parallelism |
Discussion 5 (Application design with task, data, and mixed parallelism) |
Patterns Ch4:The Algorithm Structure Design Space Patterns Ch5:The Supporting Structures Design Space Patterns Ch6:The Implementation Mechanisms Desgin Space |
|
Nov 17 | Job Management Rsh, intra/inter-cluster and centralized/decentralized scheduling, autonomic computing, and cluster virtualization |
Grid-D Ch 5: Schedulers Grid-2 Ch 18: Resource and Service Management |
Program 3 due Program 4 assigned |
||
8 | Nov 22 | OpenPBS, Condor, Globus (GRAM and DUROC), Amazon EC2 (XenoServer), StartCluster, and Azure Scheduler/Batch | Survey Work Presentation 1 | http://www.mcs.anl.gov/research/projects/openpbs/ http://www.cs.wisc.edu/condor/ http://www.globus.org/ http://www.cl.cam.ac.uk/research/srg/netos/xeno/ http://graal.ens-lyon.fr/diet/ http://star.mit.edu/cluster/ https://azure.microsoft.com/en-us/services/scheduler/ https://azure.microsoft.com/en-us/documentation/services/batch/ |
|
  | Nov 24 | Holiday (No School) | |||
9 | Nov 29 | File Management Network, distributed, and parallel file systems |
CDK4 Ch 8: DFS Grid-D Ch 9: Data Management Grid-2 Ch 22: Data Access, Integration, and Management Hadoop Ch 3: Hadoop DFS Cloud Ch 2:Amazon Cloud Computing |
||
Dec 1 | NFS, AFS, PVFS, MPI/IO, Hadoop, Amazon-S3, Azure Storage, and Apache Storm | Survey Work Presentation 2 | http://www.openafs.org/ http://www.parl.clemson.edu/pvfs/ http://www.mcs.anl.gov/projects/romio/ http://hadoop.apache.org/ http://aws.amazon.com/s3/ https://azure.microsoft.com/en-us/documentation/services/storage/ http://storm.apache.org/ | ||
10 | Dec 6 | Fault Tolerance Two-phase commitment, replication, and check-pointing |
CDK4 Ch 14: Distributed Transaction CDK4 Ch 15: Replication Cloud Ch 4: Ready for the Cloud Cloud Ch 6: Disaster Recovery |
||
Dec 8 | MS Cloud DB, Coda/AFS, Condor-MW, MapReduce/Hadoop, and FT-MPI | Survey Work Presentation 3 | http://azure.microsoft.com/blog/2012/07/30/fault-tolerance-in-windows-azure-sql-database/ http://www.coda.cs.cmu.edu/ http://www.cs.wisc.edu/condor/mw/ http://hadoop.apache.org/ http://icl.cs.utk.edu/ftmpi/ |
||
11 | Dec 13 | Student Final Project | Student Presentation | Program 4 due | |
Dec 15 | Final Exam in class |