CSS 490: Distributed Computing Systems
(a.k.a. Parallel and Distributed Systems)
Winter 2003

TTh 330-535pm

Prof. Munehiro Fukuda


Professor:

Munehiro Fukuda <mfukuda@u.washington.edu>, room UW1-331, phone 352-3459, office hours: TTu 230-315pm and 535-620pm

Course Description:

This course introduces the concepts and design of distributed computing systems. Topics covered include message passing, remote procedure calls, process management, migration, mobile agents, distributed coordination, distributed shared memory, distributed file systems, fault tolerance, and grid computing. The first five weeks focus on the basic mechanism and the C++ programming techniques for message passing, process management, and migration. We will use sockets, MPI: Message Passing Interface, SunRPC, and M++: a C++ based mobile agent system the instructor has designed. The last five weeks discuss advanced topics, where the instructor will overview each topic and students will review a topic-related research paper in the class. The final project requires a team work where each team of two students picks up any parallelizable application, programs in MPI and M++, and compares the programmability and the performance.

Prerequisites:

CSS343.

Work Load and Grading:

Course Work Percentage Achievements Approximately Corresponding Numeric Grade
Programming 1 5% 90s 3.5 - 4.0
Programming 2 5% 80s 2.5 - 3.4
Paper Review 25% 70s 1.5 - 2.4
Final Project 25% 60s 0.7 - 1.4
Midterm Exam 20%
Final Exam 20%

Textbooks:

75% of the lecture covers the following textbook, while the rest focuses on some advanced topics such as MPI, mobile agents, and some research-oriented topics. Therefore, I do not strongly request you to buy this textbook.
    Distributed Operating Systems : Concepts and Design, Pradeep K. Sinha Wiley-IEEE Press, 1997.

References:

Some Programming Textbooks:

The following books and manuals are useful for system, network, and MPI programming.

Policies:

Programming assignment 1 - 2 and paper review are to be done independently. Any collaboration of work will result in severe penalty. You may discuss the problem statement and any clarification with each other, but any actual work to be turned in, must be done without collaboration.

The final project may be done by a pair of students, in which case both students must achieve an equally amount of work. For the detailed instructions, see the project assignment sheet.

Any homework is due at the beginning of class on the due date. The submission may be postponed only in emergencies such as accidents, sickness, sudden business trips, and family emergencies, in which case you may turn in yor homework late with a written proof. No make-up exams will be given except under exceptional circumstances. Barring emergencies, I must be informed before the exam.

To request academic accommodations due to a disability, please contact Disabled Student Services (DSS) in Bothell Library Annex Building, Room 106, (email: rlundborg@bothell.washington.edu, TDD: 425-352-3132, and FAX: 425-352-5444). If you have a documented disability on file with the DSS office, please have your DSS counselor contact me and we can discuss accommodations.

Course Goals:

The overall goal of CSS 490, "Distributed Computing System" includes: To strengthen your understanding of fundamental concepts, you are strongly recommended to solve the problems that are given on the final page of each lecture slide. To review research papers, you must visit the library, search for them, and get prepared for presenting your paper review with the power point. Finally, you need to work in the Linux laboratory, (UW1-320) for testing and evaluating the performance of your distributed program. Therefore, as with most technical courses, besides ability and motivation, it takes time to learn and master the subject. Expect to spend an additional 10 to 15 hours a week outside of class time on the average.

Assignments:

This is a research-flavored course. Each assignment specificaiton only gives you a topic and a guideline in order to work on the assignment. The answer and the quality of assignment work just depend on your enthusiuasm for assignment work. Therefore, there are no specific key answers.
  1. Program 1: exercises TCP communication and program a simple parallel program.
  2. Program 2: exercises RPC programming and code a program that passes a poiner/stl-based data structure.
  3. Paper review:requires each student to review a notable research project and to present his/her understanding in the class.
  4. Final project: requires a team work where each team of two students picks up any parallelizable application, programs in MPI and M++, and compares the programmability and the performance.
Please read assignment.html to understand the environment you use for assignments and the submission/grading procedures.

Topics covered and tentative 430 fall schedule:

Note that this is an approximate ordering of topics. Chapters will take about the allotted time and not all sections in all chapters are covered.

Week Date Topics Chapters Reading Assignment
1 Jan 7 Fundamentals 1 pp1-45  
  Jan 9 Invited Talk (by Prof. Kobayashi, Ehime U.)
Lab Orientation (by Mr. Grimmer, IS)
     
2 Jan 14 Message Passing 2 - 3 pp46-139 Program 1 assigned
  Jan 16 Message Passing Interface 3 pp139-166  
3 Jan 21 Remort Procedure Call
4 pp167-230  
  Jan 23 Process Management 8 pp398-420  
4 Jan 28 Process Migration 8 pp381-398 Programm 1 due
Program 2 assigned
  Jan 30   8 pp381-398  
5 Feb 4 M++   M++ User's Manual  
  Feb 6 Paper Review   D'Agent
IBM Agelts
Ara
Mole
Other papers (such as Telescript)
Reviewer:
Reviewer:
Reviewer:
Reviewer:
Reviewer:
6 Feb 11 Midterm exam in class 1 - 4, and 8 pp1-230 and 381-420 Program 2 due
Final project assigned
  Feb 13 Distributed Shared Memory 5 pp231-281  
7 Feb 18 Paper Review 5 Ivy
Dash
Munin
Linda
Other papers (such as Jini)
Reviewer:
Reviewer:
Reviewer:
Reviewer:
Reviewer:
  Feb 20 Synchronization 6 pp282-346  
8 Feb 25 Paper Review 6 SPEEDES
Time Warp
Distributed Snapshot(Mattern's)
Timing Management in HLA
Other papers (such as Samad's algorithm)
Reviewer:
Reviewer:
Reviewer:
Reviewer:
Reviewer:
  Feb 27 Distributed File Systems 9 pp421-439  
9 Mar 4 Paper Review 9 Sun NSF
AFS
XFS
Plan 9
Other papers (such as LFS)
Reviewer:
Reviewer:
Reviewer:
Reviewer:
Reviewer:
  Mar 6 Replication and Fault Tolerance 9 pp440-495  
10 Mar 11 Paper Review 9 ISIS
Gossip
Bayou
Coda
Other papers
Reviewer:
Reviewer:
Reviewer:
Reviewer:
Reviewer:
  Mar 13 Grid Computing   NetSolve
Legion
Reviewer:
Reviewer:
11 Mar 18 Final Project Presentation and Wrap Up     Final project due
  Mar 20 Final exam in class 5, 6, and 9 pp231-346 and pp421-495