Let's load up data first. If you have RDF data, you can use that.
Let's load up data first. If you have RDF data, you can use that.
Or, there's [Freebase film rdf data here](https://github.com/dgraph-io/benchmarks).
Or, there's [Freebase film rdf data here](https://github.com/dgraph-io/benchmarks).
To use the above mentioned Film RDF data, install [Git LFS first](https://git-lfs.github.com/). I've found the Linux download to be the easiest way to install.
Bulk data loading happens in 2 passes.
Once installed, clone the repository:
### First Pass: UID Assignment
We first find all the entities in the data, and allocate UIDs for them.
You can run this either as a single instance, or over multiple instances.
using a `2G tmpfs` as the dgraph directory for output, with `stw_ram_mb=8196` flag set.
using SSD persistent disk. Instance 2 took a bit longer, and finished in 15 mins. The total output including uids was 1.3GB.
The final output was 1.3GB.
Note that `stw_ram_mb` is based on the memory usage perceived by Golang, the actual usage is higher.
## Querying
Note that `stw_ram_mb` is based on the memory usage perceived by Golang. It currently doesn't take into account the memory usage by RocksDB. So, the actual usage is higher.
Once data is loaded, point the dgraph server to the postings and mutations directory.
```
$ cd $GOPATH/src/github.com/dgraph-io/dgraph/server
This would now run dgraph server at port 8080. If you want to run it at some other port, you can change that with the `--port` flag.
## Server
Now that data is loader, you can run DGraph servers. To serve the 3 shards above, you can follow the [same steps as here](#multiple-distributed-instances).
Now you can run GraphQL queries over freebase film data like so:
Now you can run GraphQL queries over freebase film data like so:
This query would find all movies directed by Steven Spielberg, their names, initial release dates, countries, genres, and the cast of these movies, i.e. characteres and actors playing those characters; and all the movies directed by these actors, if any.
This query would find all movies directed by Steven Spielberg, their names, initial release dates, countries, genres, and the cast of these movies, i.e. characteres and actors playing those characters; and all the movies directed by these actors, if any.
The support for GraphQL is [very limited right now](https://github.com/dgraph-io/dgraph/issues/1). In particular, mutations, fragments etc. via GraphQL aren't supported.
The support for GraphQL is [very limited right now](https://github.com/dgraph-io/dgraph/issues/1).
You can conveniently browse [Freebase film schema here](http://www.freebase.com/film/film?schema=&lang=en).
You can conveniently browse [Freebase film schema here](http://www.freebase.com/film/film?schema=&lang=en).
There're also some schema pointers in [README](https://github.com/dgraph-io/benchmarks/blob/master/data/README.md).
There're also some schema pointers in [README](https://github.com/dgraph-io/benchmarks/blob/master/data/README.md).
#### Query Performance
#### Query Performance
With the [data loaded above](#loading-performance) on the same hardware,
With the [data loaded above](#loading-performance) on the same hardware,
it took **270ms to run** the pretty complicated query above the first time after server run.
it took **218ms to run** the pretty complicated query above the first time after server run.
Note that the json conversion step has a bit more overhead than captured here.
Note that the json conversion step has a bit more overhead than captured here.
```json
```json
{
{
"server_latency":{
"server_latency":{
"json":"57.937316ms",
"json":"37.864027ms",
"parsing":"1.329821ms",
"parsing":"1.141712ms",
"processing":"187.590137ms",
"processing":"163.136465ms",
"total":"246.859704ms"
"total":"202.144938ms"
}
}
}
}
```
```
Consecutive runs of the same query took much lesser time (100ms), due to posting lists being available in memory.
Consecutive runs of the same query took much lesser time (80 to 100ms), due to posting lists being available in memory.