Distributed Systems and Cloud Computing

Prof. Pietro Michiardi

Lecture Notes and Laboratory Material for the Cloud Computing Course at EURECOM

View the Labs on GitHub EURECOM-CLOUDS-LAB

Home

Lecture Notes

List of topics, recommended reading material, and pointers to the PDF version of the slides. Note that for a large fraction of the slides, the complete Latex sources are available: feel free to fork, and contribute with pull requests.

Introduction [Slides]

Topics:

Reading list:

  • The Datacenter as a Computer: An Introduction to the Design of Warehouse-scale Machines, by Luiz André Barroso and Urs Hölzle, Morgan Claypool
  • Mining of Massive Datasets, by Anand Rajaraman and Jeff Ullman, Cambridge University Press

Scalable Algorithm Design [Slides]

Topics:

Reading list:

  • Data-intensive Text Processing with MapReduce, by Jimmy Lin and Chris Dyer, Morgan Claypool
  • Mining of Massive Datasets, by Anand Rajaraman and Jeff Ullman, Cambridge University Press
  • Upper and Lower Bounds on the Cost of a Map-Reduce Computation, by F. Afrati, et al., PVLDB, 2013
  • Enumerating subgraph instances using map-reduce, by F. Afrati, et al., ICDE, 2013
  • Transitive closure and recursive Datalog implemented on clusters, by F. Afrati and J. Ullman, EDBT, 2012
  • Monoidify! Monoids as a Design Principle for Efficient MapReduce Algorithms, by J. Lin, Arxiv, 1304.7544

Hadoop Internals [Slides]

Topics:

Reading list:

  • Mapreduce: Simplified data processing on large clusters, by J. Dean and S. Ghemawat, OSDI, 2004
  • The google file system, by S. Ghemawat, et al., ACM OSDI, 2003
  • Hadoop: The Definitive Guide, by T. White, O'Reilly, 2012
  • Hadoop Operations, by E. Sammer, O'Reilly, 2012

Spark Internals [Slides]

Topics:

Reading list:

  • Learning Spark, by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia, O'Reilly [O'Reilly Link]
  • Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, by M. Zaharia, et al., USENIX NSDI, 2012
  • Sparrow: distributed, low latency schedulin, by Ousterhout, Kay, et al., ACM SOSP, 2013
  • Introduction to Spark Internals, by M. Zaharia, Video on YouTube [Link]
  • A Deeper Understanding of Spark Internals, by Aaron Davidson, Video on YouTube [Link]

Cluster Schedulers [Slides]

Topics:

Reading list:

  • Hadoop YARN, by A. C. Murthy, et. al., Addison Wesley [Amazon Link]
  • Mesos: Flexible Resource Sharing for the Cloud, by B. Hindman, et. al., NSDI, 2011
  • Choosy: Max-Min Fair Sharing for Datacenter Jobs with Constraints, by A. Ghodsi, et. al., EuroSys, 2013
  • Omega: flexible, scalable schedulers for large compute clusters, by M. Schwarzkopf, et. al., EuroSys, 2013
  • Large-scale cluster management at Google with Borg, by A. Verma, et. al., EuroSys, 2015

Relational Algebra [Slides]

Topics:

Reading list:

  • Mining of Massive Datasets, by Anand Rajaraman and Jeff Ullman, Cambridge University Press

SparkSql [Slides]

Topics:

Reading list:

  • SparkSQL: Relational Data Processing in Spark, by M. Armbrust, et al., SIGMOD, 2015

Distributed Storage Systems [Slides]

Older, but more complete versions of the slides are available here:

[Part 1]:: this slide deck was originally created by Prof. Marko Vukolic (now at IBM Research Zurich)
[Part 2]:: this slide deck was originally created by Prof. Marko Vukolic (now at IBM Research Zurich)
[HBase]

Topics:

Reading list:

  • Seth Gilbert, Nancy A. Lynch: Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2): 51-59 (2002)
  • DeCandia et al. Dynamo: Amazon's highly available key-value store. SOSP 2007: 205-220 (2007)
  • Eric A. Brewer: Pushing the CAP: Strategies for Consistency and Availability. IEEE Computer 45(2): 23-29 (2012)
  • Seth Gilbert, Nancy A. Lynch: Perspectives on the CAP Theorem. IEEE Computer 45(2): 30-36 (2012)
  • Marko Vukolić: Quorum Systems with Applications to Storage and Consensus. Morgan&Claypool (2012)
  • Ion Stoica et al: Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw. 11(1): 17-32 (2003)
  • Avinash Lakshman, Prashant Malik: Cassandra: a decentralized structured storage system. Operating Systems Review 44(2): 35-40 (2010)
  • Apache Cassandra 1.2 Documentation. Datastax. http://www.datastax.com/docs/1.2/index
  • Eben Hewitt: Cassandra: The definitive Guide. O’Reilly. (2010) http://bit.ly/JHwwR6
  • Edward Capriolo: Cassandra High Performance Cookbook. Packt Publishing. (2011)

Coordinating distributed systems [Slides]

Topics:

Reading List:

  • Patrick Hunt, Mahadev Kumar, Flavio P. Junqueira and Benjamin Reed: Zookeeper: Wait-free coordination for Internet-scale systems. In proc. USENIX ATC (2010)
  • Zookeeper 3.4 Documentation. http://zookeeper.apache.org/doc/trunk/index.html
  • Flavio Paiva Junqueira, Benjamin C. Reed, Marco Serafini: Zab: High-performance broadcast for primary-backup systems. DSN 2011: 245-256
  • Michael Burrows: The Chubby Lock Service for Loosely-Coupled Distributed Systems. OSDI 2006: 335-350
  • Atul Adya, John Dunagan, Alec Wolman: Centrifuge: Integrated Lease Management and Partitioning for Cloud Services. NSDI 2010: 1-16

Selected Topics in Cloud Computing [Slides]

Topics:

Reading List:

  • J. Weinman. Cloudonomics: The Business Value of Cloud Computing, Wiley, 2012
  • L.A. Barroso, Jimmy Clidaras and U. Holzle. The Datacenter as a Computer: An Itroduction to the Design of Warehouse-Scale Machines, Morgan&Claypool, 2nd ed. July 2013