How to Improve Design Skills

Jeffery Yuan

April 26, 2019

Agenda

How to Design
System Design Principles
Learning from Open Source
Learning from Existing Products
System Design Practices

How to Design

How to Design

Take time to think about your design
- Minimize upfront design or YAGNI
- It doesn’t mean you don’t take time to design the component
Components related
Impact to other components
What are alternatives?
Welcome different approaches and discussion

How to Design

Estimation
- back-of-the-envelope calculation
Estimated data size, QPS
Take time to design data schema
- As it’s difficult to change them after deploy to prod
Better user experience
- Thinking from client/user perspective
- How they use it, what they would like to know

Reflection – Lesson Learned

Reflection – Lesson Learned

What mistakes we made
- Where to store data: dynamodb or not?
- The key for Solr schema
Why they happened:
Not consider near-future requirements
Make decisions carelessly

Reflection – Lesson Learned

Better client library
- Only contains library and code that client need
Package shared configuration in the library

System Design Principles

System Design Principles

Idempotent
Policy to expire/archive data - Less data
Optimize data for read
- Denormalization
Read Heavy vs Write Heavy
Design to Be Disabled - feature toggle
Isolate Faults - Circular breaker
Throttling - Rate limit

System Design Principles Cont.

Stateless
Asynchronous
- Back pressure with exponential backoff
Message queues
Cache
Visibility – monitoring
Separation of concerns
- Separate read and write

System Design Principles Cont.

CAP
Graceful Degradation
Be Robust - Hide error as much as possible
Be conservative in what you send, be liberal in what you accept
Make your apps do something reasonable even if not all is right

Learning from Open Source

Learning from Open Source

What makes them popular
When to use them, when not

Cassandra

LSM(Log Structured Merge Trees)
- append-only
SSTable
MemTable - SSTable in memory
How C* handles delete: Tombstone(grace period)
Merkle trees
Bloom Filter
Index
CommitLog

Cassandra Cont.

Serialize cache data (row-cache, key cache) to avoid cold restart
Session Coordinator
Gossip protocol
Seed nodes
Consistent Hashing
Eventual Consistency
- W+R > N
Local Index (vs Global Index)

Kafka

Why it is fast
Sequentially read/write vs random read/write
Memory Mapped File
Zero copy
Batch data(compressed)
Partition: ordered, immutable, replicated
Consumer group

Database

Sharding
Replication
Master/Slave, Multi-master

Learning from Existing Products

Twitter/FB timeline
Pull/Push/Mixed Model
FB Haystack/Photo storage

System Design Practices

System Design Practices

URL shortener
- read heavy
- able to disable write functions
Design key-value store
Crawler
- Re-crawling
- cur+2t or cur+t/2 based on changed or not
Design search engine
- In-memory version: Data structure
- Distributed: Solr Cloud internal design

System Design Practices

Design score/rank system for social game
Search nearby places: GeoHash
Design Chat app
Design logging collection and analysis system
Design shopping cart
- guest cart
Design Hit Counter
Design rate limiter
Design Miao Sha

Resource

Resource

Designing Data-Intensive Applications
Scalability Rules: Principles for Scaling Web Sites
The Art of Scalability

Resource Cont.