Spring 2020 Calendar and Reading List¶
Note: This material is still a work in progress.
Make sure you don’t miss the Additional Suggested Reading at the bottom of this page.
Week 1 - Introduction¶
No required reading for this week. We will be covering basic terminology and concepts in distributed systems.
Suggested Reading
Note: Some of these readings go into topics that we will cover later in the quarter. As such, you may not get that much out of reading these references at the start of the quarter. Instead, they can be good reference material to re-read later on and see how everything fits together.
Distributed systems theory for the distributed systems engineer
Samuel C. Kendall, Jim Waldo, Ann Wollrath, and Geoff Wyant. 1994. A Note on Distributed Computing. Technical Report. Sun Microsystems, Inc., Mountain View, CA, USA.
Fallacies of Distributed Computing Explained. Arnon Rotem-Gal-Oz.
Pat Helland, Standing on Distributed Shoulders of Giants. ACM Queue, 14(2), 2016
Week 2 - Distributed Time¶
Team A: Questioners; Team B: Answerers; Team C: Observers.
Required reading for Tuesday, April 14
Leslie Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Commun.ACM, 21(7):558–565, July 1978
Required reading for Thursday, April 16
C. J. Fidge.Timestamps in Message-Passing Systems that Preserve the Partial Ordering. Proceedings of the 11th Australian Computer Science Conference, 10(1):5666, 1988
Friedemann Mattern. Virtual Time and Global States of Distributed Systems. In Parallel and Distributed Algorithms, pages 215–226. North-Holland, 1989
Suggested papers
Parameswaran Ramanathan, Kang G. Shin, and Ricky W. Butler. Fault-tolerant Clock Synchronization in Distributed Systems. Computer, 23(10):33–42, October 1990
David L. Mills. Improved Algorithms for Synchronizing Computer Network Clocks. SIGCOMM Comput. Commun. Rev., 24(4):317–327, October 1994
Week 3 - Distributed Consensus I¶
Team B: Questioners; Team C: Answerers; Team A: Observers.
Required reading for Tuesday, April 21
Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst., 4(3):382–401, July 1982
Required reading for Thursday, April 23
D. Skeen. A Quorum-Based Commit Protocol. Technical Report 82-483. Department of Computer Science, Cornell University. February 1982.
D. Skeen and M. Stonebraker. A Formal Model of Crash Recovery in a Distributed System. IEEE Trans. Softw. Eng., 9(3):219–228, May 1983
Suggested papers
J. Gray. Notes on Database Operating Systems. Operating Systems, an Advanced Course, Bayer et. al. eds., Lecture notes in Computer Science 60, Springer-Verlag, 1978, pp. 393-481.
Butler W. Lampson and Howard E. Sturgis. Crash Recovery in a Distributed Data Storage System, 1979
M. Pease, R. Shostak, and L. Lamport. Reaching Agreement in the Presence of Faults. J.ACM, 27(2):228–234, April 1980
Philip A. Bernstein, Vassco Hadzilacos, and Nathan Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1987
Fred B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv., 22(4):299–319, December 1990
Miguel Castro and Barbara Liskov. Practical Byzantine Fault Tolerance and Proactive Recovery. ACM Trans. Comput. Syst., 20(4):398–461, November 2002
Other suggested reading
Consensus Protocols: Two-Phase Commit. Henry Robinson. The Paper Trail.
Consensus Protocols: Three-phase Commit. Henry Robinson. The Paper Trail.
Week 4 - Limits of Distributed Systems¶
Team C: Questioners; Team A: Answerers; Team B: Observers.
Required reading for Tuesday, April 28
Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, April 1985
Danny Dolev, Cynthia Dwork, and Larry Stockmeyer. On the Minimal Synchronism Needed for Distributed Consensus. J. ACM, 34(1):77–97, January 1987
Required reading for Thursday, April 30
Seth Gilbert and Nancy Lynch. Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. SIGACT News, 33(2):51–59, June 2002
E. Brewer, CAP twelve years later: How the “rules” have changed. in Computer, vol. 45, no. 2, pp. 23-29, Feb. 2012.
Suggested papers
Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. J. ACM 35, 2 (April 1988), 288-323.
N. Lynch. A Hundred Impossibility Proofs for Distributed Computing. In Proceedings of the eighth annual ACM Symposium on Principles of distributed computing, PODC ’89, pages 1–28, New York, NY, USA, 1989. ACM
Marcos K. Aguilera. Stumbling over consensus research: Misunderstandings and issues. In Replication. Lecture Notes in Computer Science, vol 5959. Springer, Berlin, Heidelberg, 2010.
Marc’s Blog: Distributed Consensus: Beating Impossibility with Probability One.
Other suggested reading
A Brief Tour of FLP Impossibility. Henry Robinson. The Paper Trail.
Papers We Love - Impossibility of Consensus with One Faulty Process. Henry Robinson.
FLP and CAP aren’t the same thing. Henry Robinson. The Paper Trail.
The CAP FAQ. Henry Robinson.
You Can’t Sacrifice Partition Tolerance. Coda Hale.
Clarifications on the CAP Theorem and Data-Related Errors. Michael Stonebraker.
The Theorem That Will Not Go Away. Henry Robinson. The Paper Trail.
Week 5 - Paxos¶
Team A: Questioners; Team B: Answerers; Team C: Observers.
Required reading for Tuesday, May 5 and May 7
Leslie Lamport. The Part-Time Parliament. ACM Trans. Comput. Syst., 16(2):133–169, May 1998
Leslie Lamport. Paxos Made Simple. ACM SIGACT News, 32(4):18–25, December 2001
Other suggested reading
Consensus Protocols: Paxos. Henry Robinson. The Paper Trail.
Week 6 - Distributed Consensus II¶
Team B: Questioners; Team C: Answerers; Team A: Observers.
Required reading for Tuesday, May 12
Mike Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. In Proceedings of the 7th symposium on Operating systems design and implementation, OSDI ’06, pages 335–350, Berkeley, CA, USA, 2006. USENIX Association
Tushar D. Chandra, Robert Griesemer, and Joshua Redstone. Paxos Made Live: An Engineering Perspective. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, PODC ’07, pages 398–407, New York, NY, USA, 2007. ACM
Required reading for Thursday, May 14
Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm, 2014
Other suggested reading
Raft website.
Week 7 - Distributed Hash Tables¶
Team C: Questioners; Team A: Answerers; Team B: Observers.
Required reading for Tuesday, May 19
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. SIGCOMM Comput. Commun. Rev., 31(4):149–160, August 2001
Antony I. T. Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, Middleware ’01, pages 329–350, London, UK, UK, 2001. Springer-Verlag
Required reading for Thursday, May 21
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon’s Highly Available Key-Value Store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, SOSP ’07, pages 205–220, New York, NY, USA, 2007. ACM
Week 8 - Distributed Data¶
Team A: Questioners; Team B: Answerers; Team C: Observers.
Required reading for Tuesday May 26
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. SIGOPS Oper. Syst. Rev., 37(5):29–43, October 2003
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI‘04: Sixth Symposium on Operating System Design and Implementation, December, 2004.
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7, OSDI ’06, pages 15–15, Berkeley, CA, USA, 2006. USENIX Association
Required reading for Thursday, May 28
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. Spanner: Google’s globally-distributed database. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pages 251–264, Berkeley, CA, USA, 2012. USENIX Association
Eric Brewer, Spanner, TrueTime & The CAP Theorem. Google, 2017
Suggested papers
Daniel Ford, Francois Labelle, Florentina Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, Sean Quinlan. Availability in Globally Distributed Storage Systems. Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, USENIX (2010)
Avinash Lakshman and Prashant Malik. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev., 44(2):35–40, April 2010
Week 9 - Review¶
Team B: Questioners; Team C: Answerers; Team A: Observers.
Required reading for Tuesday, June 2
Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Commun. ACM, 17(11):643–644, November 1974
Leslie Lamport. Solved Problems, Unsolved Problems and Non-problems in Concurrency. SIGOPS Oper. Syst. Rev., 19(4):34–44, October 1985
No class on Thursday, June 4
Additional Suggested Reading¶
Aphyr’s blog is a great source of easy-to-read posts on a number of distributed systems topics. The blog also includes a lot of posts on Aphyr’s projects, so here are some links to specific posts on distributed systems:
Henry Robinson’s The Paper Trail blog has a plethora of posts related to many of the papers we discuss in this class.
Survey of important papers on distributed consensus: A brief history of Consensus, 2PC and Transaction Commit.
Notes on Theory of Distributed Systems James Aspnes, Yale University.