diff --git a/docs/mvp.md b/docs/mvp.md new file mode 100644 index 0000000000000000000000000000000000000000..c7eed5781af5c76913c56d12ed4cde11d8bc2c28 --- /dev/null +++ b/docs/mvp.md @@ -0,0 +1,59 @@ +## Overview + +Dgraph is a distributed graph database, meant to tackle terabytes of data, +be deployed in production, and serve arbitrarily complex user queries in real +time, without predefined indexes. The aim of the system is to run complicated +joins by minimizing network calls required, and hence to keep end-to-end latency +low. + +# MVP Design +This is an MVP design doc. This would only contain part of the functionality, +which can be pushed out of the door within a couple of months. This version +would not enforce strong consistency, and might not be as distributed. Also, +shard movement from dead machines might not make a cut in this version. + +## Data Storage Format + +```go +type DirectedEdge struct { + Entity string + Attribute string + Value interface{} // Store anything here + ValueId string + + Source string // Optional. Used to store authorship, or web source + Timestamp time.Time // Creation time. Useful for snapshotting. +} +``` + +## Technologies Used +- Use [RocksDB](http://rocksdb.org/) for storing original data and posting lists. +- Use [Cap'n Proto](https://capnproto.org/) for in-memory and on-disk representation, +- [RAFT via CoreOS](https://github.com/coreos/etcd/tree/master/raft) +- Use [tcp](http://golang.org/pkg/net/) for inter machine communication. +Ref: [experiment](https://github.com/dgraph-io/experiments/tree/master/vrpc) + +## Concepts / Technologies Skipped +- Strong consistency on the likes of Spanner / Cockroach DB. For this version, +we'll skip even eventual consistency. But, note that it is the long term plan +to make this a proper database, supporting strongly consistent writes. +- [MultiRaft by Cockroachdb](http://www.cockroachlabs.com/blog/scaling-raft/) would +be better aligned to support all the shards handled by one server. But it's also +more complex, and will have to wait for later versions. +- No automatic shard movement from one server to another. +- No distributed Posting Lists. One complete PL = one shard. + +## Terminology + +Term | Definition | Link +:---: | --- | --- +Entity | Item being described | [link](https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model) +Attribute | A conceptual id (not necessary UUID) from a finite set of properties | [link](https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model) +Value | value of the attribute | +ValueId | If value is an id of another entity, `ValueId` is used to store that | +Posting List | A map with key = (Entity, Attribute) and value = (list of Values). This is how underlying data gets stored. | +Shard | Most granular data chunk that would be served by one server. In MVP, one shard = one complete posting list. | +Server | Machine connected to network, with local RAM and disk. Each server can serve multiple shards. | +Raft | Each shard would have multipe copies in different servers to distribute query load and improve availability. RAFT is the consensus algorithm used to determine leader among all copies of a shard. | [link](https://github.com/coreos/etcd/tree/master/raft) +Replica | Replica is defined as a non-leading copy of the shard after RAFT election, with the leading copy is defined as shard. | + diff --git a/docs/spanner.md b/docs/spanner.md index f7d124ec074ee8bf9c3bffeedf256815365b48d0..4f3020dcea1002d85e5511ce367de8ddc4ef7cb8 100644 --- a/docs/spanner.md +++ b/docs/spanner.md @@ -24,9 +24,11 @@ involving multiple range of keys. - **In case of Dgraph, every txn would involve other groups** - Each txn manager would co-ordinate with other txn managers, by participating in paxos leader election. -- One Participant Leader, other Participant slaves. -- Confused about coordinator leader + slaves. - Txn manager acquires locks. +- In one Paxos group, one Participant Leader, other would be Participant slaves. +- Among participating paxos groups, one would be coordinator. +- Participant leader of that group = coordinator leader. +- Participant slaves of that group = coordinator slaves. ### Reads - Ts is system-chosen without locking, so no incoming writes are blocked. @@ -73,7 +75,6 @@ tabs(e2,server) < s2 // start => s1 < s2 *oh yeah* ``` - ### Read Write Transactions - Reads come back with timestamps. Writes are buffered in client. - Client sends keep-alive messages to participant leaders.