- MIT 6.824 Distributed Systems Lectures
- CSE138 Distributed Systems Lectures
- Princeton COS418 Distributed Systems Lectures
- Stanford CS244b Distributed Systems Lectures
- Illinois CS425 Distributed Systems Lectures
- University of Washington CSE452 Distributed Systems Lectures
- University of Washington CSEP552 PMP Distributed Systems Lectures
- Distributed Systems Course from Chris Colohan
- System Design Interview from Mikhail Smarshchok
- System Design from Gaurav Sen
- System Design for Tech Interviews
- Distributed Systems in One Lesson by Tim Berglund
- NGINX Glossary
- Non-Abstract Large Scale Design Workbook
- Latency Numbers Everyone Should Know
- Google SRE Classroom: Distributed PubSub Handout
- Google SRE Classroom: Distributed PubSub Slides
- The Google File System
- Large-scale cluster management at Google with Borg
- The Google File System
- Spanner: Google’s Globally Distributed Database
- Spanner: Becoming a SQL System
- Spanner, TrueTime and the CAP Theorem
- Bigtable: A Distributed Storage System for Structured Data
- MapReduce: Simplified Data Processing on Large Clusters
- MapReduce/Bigtable for Distributed Optimization
- The Chubby lock service for loosely-coupled distributed systems
- Web Search for a Planet: The Google Cluster Architecture
- Raft : In Search of an Understandable Consensus Algorithm
- Raft : In Search of an Understandable Consensus Algorithm (Extended Version)
- Cassandra - A Decentralized Structured Storage System
- Kafka: a Distributed Messaging System for Log Processing
- Dynamo: Amazon’s Highly Available Key-value Store
- Practical Byzantine Fault Tolerance
- ZooKeeper: Wait-free coordination for Internet-scale systems
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- No compromises: distributed transactions with consistency, availability, and performance
- Scaling Memcache at Facebook
- Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
- Frangipani: A Scalable Distributed File System
- The design of a practical system for fault-tolerant virtual machines
- Object Storage on CRAQ
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
- Principles of Computer System Design: An Introduction
- Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS
- Certificate Transparency
- Transparent Logs for Skeptical Clients
- Bitcoin: A Peer-to-Peer Electronic Cash System
- Blockstack: A New Internet for Decentralized Applications
- Experiences with a Distributed, Scalable, Methodological File System: AnalogicFS
- The Go Memory Model
- Storm @Twitter
- The Hadoop Distributed File System
- Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
- Mastering Chaos - A Netflix Guide to Microservices
- Ephemeral Volatile Caching in the cloud
- The Netflix Tech Blog
- ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka
- Scaling Instagram Infrastructure
- How Complex Systems Fail
- Building Software Systems at Google andLessons Learned - DOCUMENT
- Building Software Systems At Google and Lessons Learned - VIDEO
- Scalable Web Architecture and Distributed Systems
- The Datacenter as a Computer
- Crack the System Design Interview
- System Design Cheatsheet
- System Design Preparation
- System Design Interview
- Distributed Log-Processing Design Workshop
- Distributed Systems Reasoning
- SRE Classroom
- Zoom System Design
- System Design Template
- System Design
- Zoom System Design
- System Design Primer
- Introduction to architecting systems for scale
- The Hidden Dividends of Microservices
- Building Secure & Reliable Systems
- The Site Reliability Workbook
- Site Reliability Engineering
- Thrift:Scalable Cross-Language Services Implementation
- Cap’n Proto
- JSON-RPC
- Impossibility of Distributed Consensus with One Faulty Process
- Paxos Made Live - An Engineering Perspective (2006 Invited Talk)
- Distributing Software in a Massively Parallel Environment
- Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams
- Maglev: A Fast and Reliable Software Network Load Balancer
- Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network
- BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing
- Linearizability versus Serializability
- Consistency Models
- A Fast, Minimal Memory, Consistent Hash Algorithm
- Semantics of Caching with SPOCA: A Stateless, Proportional,Optimally-Consistent Addressing Algorithm
- How to beat the CAP theorem
- Questioning the Lambda Architecture
- A database clustering system for horizontal scaling of MySQL
- Massively Scaling MySQL Using Vitess
- Vitess: Scaling MySQL at YouTube Using Go