- No automatic shard movement from one server to another.
- No automatic shard movement from one server to another.
- No distributed Posting Lists. One complete PL = one shard.
- No distributed Posting Lists. One complete PL = one shard.
- Graph languages, like Facebook's GraphQL. For this version, just use some internal lingo
as the mode of communication.
## Terminology
## Terminology
...
@@ -51,9 +59,48 @@ Entity | Item being described | [link](https://en.wikipedia.org/wiki/Entity%E2%8
...
@@ -51,9 +59,48 @@ Entity | Item being described | [link](https://en.wikipedia.org/wiki/Entity%E2%8
Attribute | A conceptual id (not necessary UUID) from a finite set of properties | [link](https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model)
Attribute | A conceptual id (not necessary UUID) from a finite set of properties | [link](https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model)
Value | value of the attribute |
Value | value of the attribute |
ValueId | If value is an id of another entity, `ValueId` is used to store that |
ValueId | If value is an id of another entity, `ValueId` is used to store that |
Posting List | A map with key = (Entity, Attribute) and value = (list of Values). This is how underlying data gets stored. |
Posting List | All the information related to one Attribute. A map with key = (Attribute, Entity) and value = (list of Values). This is how underlying data gets stored. |
Shard | Most granular data chunk that would be served by one server. In MVP, one shard = one complete posting list. |
Shard | Most granular data chunk that would be served by one server. In MVP, one shard = one complete posting list. In next versions, one shard might be a chunk of a posting list.|
Server | Machine connected to network, with local RAM and disk. Each server can serve multiple shards. |
Server | Machine connected to network, with local RAM and disk. Each server can serve multiple shards. |
Raft | Each shard would have multipe copies in different servers to distribute query load and improve availability. RAFT is the consensus algorithm used to determine leader among all copies of a shard. | [link](https://github.com/coreos/etcd/tree/master/raft)
Raft | Each shard would have multipe copies in different servers to distribute query load and improve availability. RAFT is the consensus algorithm used to determine leader among all copies of a shard. | [link](https://github.com/coreos/etcd/tree/master/raft)
Replica | Replica is defined as a non-leading copy of the shard after RAFT election, with the leading copy is defined as shard. |
Replica | Replica is defined as a non-leading copy of the shard after RAFT election, with the leading copy is defined as shard. |
## Simple design
- Have a posting list per Attribute.
- Store it in Rocks DB as (Attribute, Entity) -> (List of Values).
- One complete posting list = one shard.
- One mutex lock per shard.
- One single server, serving all shards.
## Write
- Convert the query into individual instructions with posting lists.
- Acquire write locks over all the posting lists.
- Do the reads.
- Do the writes.
- Release write locks.
## Read
- Figure out which posting list the data needs to be read from.
- Run the seeks, then repeat above step
- Consolidate and return.
## Sorting
- Would need to be done via bucketing of values.
- Sort attributes would have to be pre-defined to ensure correct posting
lists get generated.
## Goal
Doing this MVP would provide us with a working graph database, and
cement some of the basic concepts involved:
- Data storage and retrieval mechanisms (RocksDB, CapnProto).
- List intersections.
- List alterations to insert new items.
This design doc is deliberately kept simple and stupid.
At this stage, it's more important to pick a few concepts,
make progress on them and improve learning; instead of a multi-front
approch on a big and complicated design doc, which almost always changes
as we learn more. This way, we reduce the number of concepts we're working
on at the same time, and build up to the more complicated concepts that
are required in a full fledged strongly consistent and higly performant