CSS 434
Group Discussion

Instructor: Munehiro Fukuda
Discussion dates: see the syllabus


1. Purpose

Group discussion intends to help you understand other students' paper-review presentation and class materials. Three or four students form one group to discuss about one topic given by the professor and present their discussions.

2. Topics

No. Discussions Grade Topics
1 Time and Global States 1%
  • Groups 1 and 2:
    1. Consider two distributed snapshot algorithms such as Samadi's and Mattern's algorithms. Discuss their pros and cons.
    2. Solve textbook Q14.14
  • Groups 3 and 4:
    1. Compare Timewarp and SPEEDS in terms of performance, process creation/termination, dynamic memory allocation, and I/O handling.
    2. Solve textbook Q14.15
2 Distributed Shared Memory 1%
  • Groups 1 and 2:
    1. Compare Ivy and Dash in terms of considtency models, shared-data granularity, HW/SW implementation, false sharing, and implementation. What types of applications can Ivy and Dash benefit respectively?
    2. Solve slide p24's non-turn-in exercise 1.
  • Groups 3 and 4:
    1. Compare distributed shared memory and message passing (such as MPI) in terms of programmability and performance. Discuss about their pros and cons using two types of applications: computer graphics such as 3D ray tracing and spatial simulation such as molecular dynamics.
    2. Solve slide p24's non-turn-in exercise 2.
3 Distributed File Systems 1%
  • Groups 1 and 2:
    Compare NFS and AFS in terms of file-accessing models, file-sharing semantics, modification propagation (write through or delayed write), server-side/client-side caching, and client/server-initiated validation
  • Groups 3 and 4:
    Suppose that a user wants to use NSF or AFS for grid-computing middleware where s/he runs a massively parallel application with a large number of remote computers. Each process running at a different computer needs to share data files local to the user's computer. Discuss which of NSF and AFS will work out better for grid computing.
4 Fault Tolerane 1%
  • Groups 1 and 2:
    1. Could the gossip architecture be used for a distributed computer game as described below? The players move figures around a common scene. The state of the game is replicated at the players' workstations and at a server, which contains services controlling the game overall, such as collision detection. Updates are multicast to all replicas. (Textbook Q18.11)
    2. Suppose that a user wants to use Gossip for grid-computing middleware where s/he runs a massively parallel application with a large number of remote computers. Each process running at a different computer has a postive sequential identifier, periodically takes an execution snapshot, monitors its logical neighbor with a one-larger identifier, and resumes this neighbor at a new computer if it has crashed for some reason. Discuss how Gossip faciliates this fault tolerance in grid computing.
  • Groups 3 and 4:
    1. The quorum-based replication protocol can address network partition problems. Why didn't Code use this protocol? Explain the reason.
    2. Suppose that a user wants to use Coda for grid-computing middleware where s/he runs a massively parallel application with a large number of remote (network-detachable) computers. Each process running at a different computer has a postive sequential identifier, periodically takes an execution snapshot, monitors its logical neighbor with a one-larger identifier, and resumes this neighbor at a new computer if it has crashed for some reason. Discuss how Coda faciliates this fault tolerance in grid computing.
5 Grid Computing 1%
  • Groups 1 and 2: Summarize how Condor, Legion, NetSolve, and Globus facilitate the following two features:
    1. Discoveries of remote computing resources
    2. Job deployment to remote computers
  • Groups 3 and 4: Summarize how Condor, Legion, NetSolve, and Globus facilitate the following two features:
    1. Fault tolerance, (i.e., a job resumption at a new site)
    2. I/O, (i.e., rerouting files and standard I/O data to remote jobs)
For these discussions, you have to survey about these grid-computing middleware systems.
NetSolve
  1. http://icl.cs.utk.edu/netsolve/
  2. Henri Casanova, Jack Dongarra, Chris Johnson, and Michelle Miller, "Section 7.3: Case Study: NetSolve", In Ian Foster and Carl Kesselman, editors, The Grid: Blueprint for a New Computing Infrastracture, Morgan Kaufmann Publishers, July 1998, pages 171-175 (available from the instructor)
Legion
  1. http://legion.virginia.edu/
  2. Dennies Gannon and Andrew Gimshaw, "Section 9.4: The Legion Grid Architecture", In Ian Foster and Carl Kesselman, editors, The Grid: Blueprint for a New Computing Infrastracture, Morgan Kaufmann Publishers, July 1998, pages 222-227 (available from the instructor)
Condor
  1. http://www.cs.wisc.edu/condor
  2. Douglas Thain, Todd Tannenbaum, and Miron Livny, "Condor and the Grid", in Fran Berman, Anthony J.G. Hey, Geoffrey Fox, editors, Grid Computing: Making The Global Infrastructure a Reality, John Wiley, 2003. ISBN: 0-470-85319-0
Globus
  1. http://www.globus.org/
  2. Ian Foster and Carl Kesselman, "Chapter 11: The Globus Toolkit", In Ian Foster and Carl Kesselman, editors, The Grid: Blueprint for a New Computing Infrastracture, Morgan Kaufmann Publishers, July 1998, pages 222-227 (available from the instructor)

3. Discussion and Presentation

Approximately 20 to 25 minutes will be given for a group discussion. The professor will give each group a piece of scratch paper on which they summarize their discussions. Using this paper, a group representative is supposed to present the discussion to other students and will receive 0.1 points as extra credits.