Prof. Pietro Michiardi
Lecture Notes and Laboratory Material for the Cloud Computing Course at EURECOM
List of topics, recommended reading material, and pointers to the PDF version of the slides. Note that for a large fraction of the slides, the complete Latex sources are available: feel free to fork, and contribute with pull requests.
Reading list:
- The Datacenter as a Computer: An Introduction to the Design of Warehouse-scale Machines, by Luiz André Barroso and Urs Hölzle, Morgan Claypool
- Mining of Massive Datasets, by Anand Rajaraman and Jeff Ullman, Cambridge University Press
Reading list:
- Data-intensive Text Processing with MapReduce, by Jimmy Lin and Chris Dyer, Morgan Claypool
- Mining of Massive Datasets, by Anand Rajaraman and Jeff Ullman, Cambridge University Press
- Upper and Lower Bounds on the Cost of a Map-Reduce Computation, by F. Afrati, et al., PVLDB, 2013
- Enumerating subgraph instances using map-reduce, by F. Afrati, et al., ICDE, 2013
- Transitive closure and recursive Datalog implemented on clusters, by F. Afrati and J. Ullman, EDBT, 2012
- Monoidify! Monoids as a Design Principle for Efficient MapReduce Algorithms, by J. Lin, Arxiv, 1304.7544
Reading list:
- Mapreduce: Simplified data processing on large clusters, by J. Dean and S. Ghemawat, OSDI, 2004
- The google file system, by S. Ghemawat, et al., ACM OSDI, 2003
- Hadoop: The Definitive Guide, by T. White, O'Reilly, 2012
- Hadoop Operations, by E. Sammer, O'Reilly, 2012
Reading list:
- Learning Spark, by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia, O'Reilly [O'Reilly Link]
- Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, by M. Zaharia, et al., USENIX NSDI, 2012
- Sparrow: distributed, low latency schedulin, by Ousterhout, Kay, et al., ACM SOSP, 2013
- Introduction to Spark Internals, by M. Zaharia, Video on YouTube [Link]
- A Deeper Understanding of Spark Internals, by Aaron Davidson, Video on YouTube [Link]
Reading list:
- Hadoop YARN, by A. C. Murthy, et. al., Addison Wesley [Amazon Link]
- Mesos: Flexible Resource Sharing for the Cloud, by B. Hindman, et. al., NSDI, 2011
- Choosy: Max-Min Fair Sharing for Datacenter Jobs with Constraints, by A. Ghodsi, et. al., EuroSys, 2013
- Omega: flexible, scalable schedulers for large compute clusters, by M. Schwarzkopf, et. al., EuroSys, 2013
- Large-scale cluster management at Google with Borg, by A. Verma, et. al., EuroSys, 2015
Reading list:
- Mining of Massive Datasets, by Anand Rajaraman and Jeff Ullman, Cambridge University Press
Reading list:
- SparkSQL: Relational Data Processing in Spark, by M. Armbrust, et al., SIGMOD, 2015
[Part 1]:: this slide deck was originally created by Prof. Marko Vukolic (now at IBM Research Zurich) [Part 2]:: this slide deck was originally created by Prof. Marko Vukolic (now at IBM Research Zurich) [HBase]
Reading list:
- Seth Gilbert, Nancy A. Lynch: Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2): 51-59 (2002)
- DeCandia et al. Dynamo: Amazon's highly available key-value store. SOSP 2007: 205-220 (2007)
- Eric A. Brewer: Pushing the CAP: Strategies for Consistency and Availability. IEEE Computer 45(2): 23-29 (2012)
- Seth Gilbert, Nancy A. Lynch: Perspectives on the CAP Theorem. IEEE Computer 45(2): 30-36 (2012)
- Marko Vukolić: Quorum Systems with Applications to Storage and Consensus. Morgan&Claypool (2012)
- Ion Stoica et al: Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw. 11(1): 17-32 (2003)
- Avinash Lakshman, Prashant Malik: Cassandra: a decentralized structured storage system. Operating Systems Review 44(2): 35-40 (2010)
- Apache Cassandra 1.2 Documentation. Datastax. http://www.datastax.com/docs/1.2/index
- Eben Hewitt: Cassandra: The definitive Guide. O’Reilly. (2010) http://bit.ly/JHwwR6
- Edward Capriolo: Cassandra High Performance Cookbook. Packt Publishing. (2011)
Reading List:
- Patrick Hunt, Mahadev Kumar, Flavio P. Junqueira and Benjamin Reed: Zookeeper: Wait-free coordination for Internet-scale systems. In proc. USENIX ATC (2010)
- Zookeeper 3.4 Documentation. http://zookeeper.apache.org/doc/trunk/index.html
- Flavio Paiva Junqueira, Benjamin C. Reed, Marco Serafini: Zab: High-performance broadcast for primary-backup systems. DSN 2011: 245-256
- Michael Burrows: The Chubby Lock Service for Loosely-Coupled Distributed Systems. OSDI 2006: 335-350
- Atul Adya, John Dunagan, Alec Wolman: Centrifuge: Integrated Lease Management and Partitioning for Cloud Services. NSDI 2010: 1-16
Reading List:
- J. Weinman. Cloudonomics: The Business Value of Cloud Computing, Wiley, 2012
- L.A. Barroso, Jimmy Clidaras and U. Holzle. The Datacenter as a Computer: An Itroduction to the Design of Warehouse-Scale Machines, Morgan&Claypool, 2nd ed. July 2013