See also a list of classical papers in distributed systems by various authors
- Overview Papers
- Andrew S. Tannenbaum and Robbert van Renesse, ``Distributed Operating Systems'', Computing Surveys, Vol. 17, No. 4, Pages 419-470, December 1985
- E. Levy and A. Silberschatz, ``Distributed File Systems: Concepts and Examples'', ACM Computing Surveys, Vol. 22, No. 4, Pages 321-374, December 1990
Readings for Chapter 2 Communication
- Remote Procedure Call
- Andrew Birrell and Bruce Nelson, Implementing RPCs, ACM Transactions on Computer Systems, Vol. 2, No. 1, Pages 39-59, February 1984.
- B. Bershad, T. Anderson, E. Lazowska, and H. Levy, Lightweight Remote Procedure Call, Proceedings of the 12th ACM Symposium on Operating Systems Principles, Operating Systems Review, Vol. 23, No. 5, Pages 12-113, December 1989
- Sun RPC documentation
- Java RMI documentation
Readings for Chapter 3 Processes
- Process and Thread Management
- Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy, The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors, IEEE Transactions on Computers, Vol. 38, No. 12, Pages 1631-1644, December 1989
- D. L. Black, Scheduling Support for Concurrency and Parallelism in the Mach Operating System, IEEE Computer, 23, 5, Pages 35-43, May 1990.
- Process Migration
- F. Douglis and J. Ousterhout, "Process Migration in the Sprite Operating System:A Status Report"
- M.Theimer, K.Lantz, D.Cheriton, ''Preemptable Remote Execution'', Proceedings of the 10th SOSP, Operating Systems Review, Vol. 19, No. 5, Pages 2-12, December 1985
- The Worldwide Computer. An operating system spanning the Internet would harness the power of millions of the world's networked PCs. Scientific American, February 2002
- Distributed Computing
- Condor: A Hunter of Idle Workstations, Proc of IEEE ICDCS 1988.
- Jim Basney and Miron Livny, "Deploying a High Throughput Computing Cluster", High Performance Cluster Computing, Rajkumar Buyya, Editor, Vol. 1, Chapter 5, Prentice Hall PTR, May 1999. More information about Condor is available at its homepage http://www.cs.wisc.edu/condor/
Readings for Chapter 4 Naming
- Butler Lampson, Designing a global name service. Proc. 4th ACM Symposium on Principles of Distributed Computing, Minaki, Ontario, 1986, pp 1-10
Readings for Chapter 5 Synchronization
- Leslie Lamport, Michael Melliar-Smith, "Byzantine Clock Synchronization", Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing (August, 1984), 68-74.
- Leslie Lamport, "Synchronizing Time Servers", SRC Research Report 18 (June 1987).
- P. Ramanathan, K. G. Shin, and R. W. Butler, "Fault-tolerant clock synchronization in distributed systems", IEEE Computer, vol. 23, pp. 33-42, Oct. 1990
- Mills, D., "Network Time Protocol (Version 3)", RFC 1305, March 1992.
- Mills, D., "Improved Algorithms for Synchronizing Computer Network Clocks", IEEE/ACM Transactions on NetworkingIEEE Communications Society, 1994
- Leslie Lamport,"Time, Clocks and the Ordering of Events in a Distributed System", Communications of the ACM 21, 7 (July 1978), 558-565. Reprinted in several collections, including Distributed Computing: Concepts and Implementations, McEntire et al., ed. IEEE Press, 1984.
- K. Mani Chandy, Leslie Lamport, "Distributed snapshots: determining global states of distributed systems", ACM Transactions on Computer Systems (TOCS) archive, Volume 3, Issue 1, Pages: 63 - 75
- K. Mani Chandy, Jayadev Misra, "Termination Detection of Diffusing Computations in Communicating Sequential Processes", ACM Transactions on Programming Languages and Systems (TOPLAS) archive, Volume 4 , Issue 1 (January 1982), Pages: 37 - 43
- Edsgar W. Dijkstra, "Termination detection for diffusing computations", EWD 687a, 1979
- K. Mani Chandy, Jayadev Misra, "A distributed algorithm for detecting resource deadlocks in distributed systems", Annual ACM Symposium on Principles of Distributed Computing archive, Proceedings of the first ACM SIGACT-SIGOPS symposium on Principles of distributed computing, Ottawa, Canada, Pages: 157 - 164
- K. Mani Chandy, Jayadev Misra, Laura M. Haas, "Distributed deadlock detection", ACM Transactions on Computer Systems (TOCS) archive, Volume 1, Issue 2 (May 1983), Pages: 144 - 156
- K. Mani Chandy, Jayadev Misra, "The drinking philosophers problem", ACM Transactions on Programming Languages and Systems (TOPLAS) archive, Volume 6, Issue 4 (October 1984), Lecture notes in computer science Vol. 174 , Pages: 632 - 646
- G. Ricart and A. K. Agrawala, "An Optimal Algorithm for Mutual Exclusion in Computer Networks", In Communications of the ACM, 24(1):9-17, January 1981
Butler Lampson, How to build a highly available system using consensus. In Distributed Algorithms, ed. Babaoglu and Marzullo, Lecture Notes in Computer Science 1151, Springer, 1996, pp 1-17
C.A.R. Hoare. Communicating Sequential Processes. Prentice Hall, 1985
Edsgar W. Dijkstra, "A short introduction to the art of programming", EWD 316, 1971
Edsgar W. Dijkstra, "The humble programmer", EWD 340, 1972
Leslie Lamport,"A New Solution of Dijkstra's Concurrent Programming Problem", Communications of the ACM 17, 8 (August 1974), 453-455.
Leslie Lamport,"A New Approach to Proving the Correctness of Multiprocess Programs", ACM Transactions on Programming Languages and Systems 1, 1 (July 1979), 84-97.
Leslie Lamport, Susan Owicki, "Proving Liveness Properties of Concurrent Programs", ACM Transactions on Programming Languages and Systems 4, 3 (July 1982), 455-495.
- P. A. Bernstein, V. Hadzilacos, and N. Goodman, "Concurrency Control and Recovery in Database Systems", Addison-Wesley, 1987
- Jim Gray, "The Transaction Concept, Virtues And Limitations", Proceedings of 7th VLDB, Cannes, France, 1981, pp. 144-154
Readings for Chapter 6 Consistency and Replication
- C. Gray and D. Cheriton, "Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency", Proceedings of the 12th ACM Symposium on Operating Systems Principles, 1989
- K. Petersen, M. J. Spreitzer, D. B. Terry, M. M. Theimer, and Demers, "Flexible Update Propagation for Weakly Consistent Replication", , Proc. of the 16th ACM Symp. on Op. Syst. Prin. (SOSP-16), S. Malo, France, Oct.5-8,97, p. 288-301.
- A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry, "Epidemic algorithms for replicated database maintenance", In PODC, 1987.
- Gifford, D, "Weighted voting for replicated data", In: Proceedings of 7th ACM Symposium on Operating System Principles. (1979) 150 162
Readings for Chapter 7 Fault Tolerance
- Edsgar W. Dijkstra, "Self-stabilization in spite of distributed control", EWD 391, 1973
- Edsgar W. Dijkstra, "Self-stabilizing systems in spite of distributed control", EWD 426, 1974
- Jim Gray, Lesile Lamport, "Consensus on Transaction Commit", MSR-TR-2003-96, January 2004, 32 p.
- Jim Gray, Why Do Computers Stop and What Can We Do About It", 6th International Conference on Reliability and Distributed Databases, June 1987
- Jim Gray, "Notes on Database Operating Systems", Operating Systems, an Advanced Course, Bayer et. al. eds., Lecture notes in Computer Science 60, Springer-Verlag, 1978, pp. 393-481.
- Leslie Lamport, Marshall Pease, Robert Shostak, "The Byzantine Generals Problem", ACM Transactions on Programming Languages and Systems 4, 3 (July 1982), 382-401.
- Leslie Lamport, "The Part-Time Parliament", ACM Transactions on Computer Systems 16, 2 (May 1998), 133-169.
- Atomic Multicast
- Kenneth P. Birman and Thomas Joseph, "Exploiting Virtual Synchrony in distributed systems", In Proceedings of the 11th ACM Symposium on Operating Systems Principles, pages 123--138, Austin, Texas, November 1987
- André Schiper, Kenneth Birman, Pat Stephenson , "Lightweight causal and atomic group multicast", ACM Transactions on Computer Systems (TOCS) archive, Volume 9, Issue 3, Pages: 272 - 314, 1991
Readings for Chapter 8 Security
- Butler Lampson, M. Abadi, M. Burrows, E. Wobber. "Authentication in distributed systems: Theory and practice", ACM Trans. Computer Systems 10, 4 (Nov. 1992), pp 265-310
Readings for Chapter 10: Distributed File
- NFS Version 4 Papers are here. Read the white paper on this link. The NFS v4 RFC is here.
- Zebra Network File System
- Serverless Network File SYsytem