From 19808bc67ee237c3c5c295032a6f9c7480dd260a Mon Sep 17 00:00:00 2001 From: Manish R Jain <manishrjain@gmail.com> Date: Wed, 7 Oct 2015 12:42:24 +1100 Subject: [PATCH] Understanding Spanner --- docs/spanner.md | 95 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 docs/spanner.md diff --git a/docs/spanner.md b/docs/spanner.md new file mode 100644 index 00000000..f7d124ec --- /dev/null +++ b/docs/spanner.md @@ -0,0 +1,95 @@ +# Notes from Spanner + +- Globally meaningful commit transactions. +- In correct serialization order +- If can't assign monotonically increasing timestamps, then slows down the system. +I'd guess that means writes, because read only operations don't get any txns. +- Assigns timestamps to data. + +### Architecture + +- Zonemaster assigns data to 100 - 1K +- Spanserver serves data to clients via 100 - 1K +- Tablets +- Spanserver also has a lock table for 2 phase locking. +Maps *range of keys --> Lock states*. +- Each tablet has a Paxos state machine, also stores logs +- Each Paxos state machine contains metadata + logs +- Hence, logs are stored twice. +- Other tablets would contain the same data as replicas. Leader via Paxos. +- Each leader would contain transaction manager. +- This leader + replicas config is called a group. +- Each group would participate with other groups in case of a transaction +involving multiple range of keys. +- **In case of Dgraph, every txn would involve other groups** +- Each txn manager would co-ordinate with other txn managers, by participating +in paxos leader election. +- One Participant Leader, other Participant slaves. +- Confused about coordinator leader + slaves. +- Txn manager acquires locks. + +### Reads +- Ts is system-chosen without locking, so no incoming writes are blocked. + +### 2 Phase Locking +- Read and write locks +- Expanding phase: Locks are only acquired +- Shrinking phase: Locks are only released +- S2PL **Strict 2 Phase Locking**: Release its write locks only after it +has ended: committed or aborted. +- SS2PL **Strong S2PL**: Release both write and read locks only after txn has ended. +- Aka, *release all locks only after txn end* + +### Deadlock Prevention [link](http://www.cs.colostate.edu/~cs551/CourseNotes/Deadlock/WaitWoundDie.html) +- **Wait Die**: If am older than lock holder, wait. If am younger, die. +Would cause younger process to die again and again. +- **Wound-Wait**: If am older than lock holder, preempt holder. If am younger, wait. +Better than *Wait-Die*. +- To avoid **starvation**, don't assign new ts each time it restarts. + +### Read Write Transactions (Theory) +- 2 Phase Locking +- Wound-Wait +- Spanner ts = ts of Paxos write of txn commit record. +- Within each Paxos group, writes in monotonically increasing order, even across leaders. +- Single leader replica can easily assign monotonically increasing ts (duh!) +- Across leaders, a leader must only assign ts within the interval of its leader lease (??) +- When ts s is assigned, smax = s, to preserve disjointness. + +``` +ei,start, ei,commit -> start and commit events +si = commit ts of txn Ti. +tabs(e2,start) > tabs(e1,commit) => s2 > s1 +``` + +- **START**: Coordinator leader assigns a commit ts si > TT.now().latest, +computed after ei,server = arrival event for commit request at self. +- **Commit Wait**: Client can't see any data by Ti, until TT.after(si). +``` +s1 < tabs(e1,commit) // commit wait +tabs(e1,commit) < tabs(e2,start) // scenario +tabs(e2,start) < tabs(e2,server) // causality +tabs(e2,server) < s2 // start +=> s1 < s2 *oh yeah* +``` + + +### Read Write Transactions +- Reads come back with timestamps. Writes are buffered in client. +- Client sends keep-alive messages to participant leaders. +- Begin 2 phase commit after completing all reads, and buffering all writes. +- Chose a coordinator group and send commit message to each participant leader. +- Client drives 2 phase commit. +- Non-coordinator Participant leader acquires write locks. +- Prepare ts > previously assigned txn ts. +- Logs a prepare record via Paxos. +- Each participant notifies coordinator of the prepare ts. +- Coordinator leader also acquires write locks. +- Commit ts >= all prepare ts of participant leaders. +- Commit ts > TT.now().latest at the time commit message was received. +- Commit ts > previous assigned txn ts. +- Log commit record via Paxos. +- Wait until TT.after(s), with expected wait of 2e, e = margin of error. +- Send commit ts to client and all participant leaders. +- Each leader logs txn outcome via Paxos. +- All apply at the same timestamp, and release locks. -- GitLab