Let's load up data first. If you have RDF data, you can use that.
Or, there's [Freebase film rdf data here](https://github.com/dgraph-io/benchmarks).
To use the above mentioned Film RDF data, install [Git LFS first](https://git-lfs.github.com/). I've found the Linux download to be the easiest way to install.
Once installed, clone the repository:
Bulk data loading happens in 2 passes.
### First Pass: UID Assignment
We first find all the entities in the data, and allocate UIDs for them.
You can run this either as a single instance, or over multiple instances.
This would now run dgraph server at port 8080. If you want to run it at some other port, you can change that with the `--port` flag.
Note that `stw_ram_mb` is based on the memory usage perceived by Golang. It currently doesn't take into account the memory usage by RocksDB. So, the actual usage is higher.
## Server
Now that data is loader, you can run DGraph servers. To serve the 3 shards above, you can follow the [same steps as here](#multiple-distributed-instances).
Now you can run GraphQL queries over freebase film data like so:
This query would find all movies directed by Steven Spielberg, their names, initial release dates, countries, genres, and the cast of these movies, i.e. characteres and actors playing those characters; and all the movies directed by these actors, if any.
The support for GraphQL is [very limited right now](https://github.com/dgraph-io/dgraph/issues/1). In particular, mutations, fragments etc. via GraphQL aren't supported.
The support for GraphQL is [very limited right now](https://github.com/dgraph-io/dgraph/issues/1).
You can conveniently browse [Freebase film schema here](http://www.freebase.com/film/film?schema=&lang=en).
There're also some schema pointers in [README](https://github.com/dgraph-io/benchmarks/blob/master/data/README.md).
#### Query Performance
With the [data loaded above](#loading-performance) on the same hardware,
it took **270ms to run** the pretty complicated query above the first time after server run.
it took **218ms to run** the pretty complicated query above the first time after server run.
Note that the json conversion step has a bit more overhead than captured here.
```json
{
"server_latency":{
"json":"57.937316ms",
"parsing":"1.329821ms",
"processing":"187.590137ms",
"total":"246.859704ms"
"json":"37.864027ms",
"parsing":"1.141712ms",
"processing":"163.136465ms",
"total":"202.144938ms"
}
}
```
Consecutive runs of the same query took much lesser time (100ms), due to posting lists being available in memory.
Consecutive runs of the same query took much lesser time (80 to 100ms), due to posting lists being available in memory.