Skip to content
Snippets Groups Projects
Unverified Commit 22adad5a authored by Ashwin Ramesh's avatar Ashwin Ramesh
Browse files

Merge branch 'release/v0.4.0'

parents 1df0d283 ba1b35ed
No related branches found
No related tags found
No related merge requests found
......@@ -26,9 +26,9 @@ Dgraph supports [GraphQL](http://graphql.org/) as query language, and responds i
The README is divided into these sections:
- [Current Status](#current-status)
- [Quick Testing](#quick-testing)
- [Installation](#installation)
- [Usage](#usage)
- [Queries and Mutations](#queries-and-mutations)
- [Installation: Moved to wiki](https://wiki.dgraph.io/index.php?title=Beginners%27_guide)
- [Data Loading: Moved to Wiki](https://wiki.dgraph.io/index.php?title=Beginners%27_guide#Data_Loading)
- [Queries and Mutations: Moved to wiki](https://wiki.dgraph.io/index.php?title=Beginners%27_guide#Queries_and_Mutations)
- [Contact](#contact)
## Current Status
......@@ -58,7 +58,6 @@ when you encounter bugs and to direct the development of Dgraph.
There's an instance of Dgraph running at http://dgraph.xyz, that you can query without installing Dgraph.
This instance contains 21M facts from [Freebase Film Data](http://www.freebase.com/film).
See [Queries and Mutations below](#queries-and-mutations) for sample queries.
`curl dgraph.xyz/query -XPOST -d '{}'`
......@@ -119,215 +118,9 @@ You can hit any of the 3 processes, they'll produce the same results.
`curl localhost:8080/query -XPOST -d '{}'`
## Installation
Best way to do this is to refer to [Dockerfile](Dockerfile), which has the most complete
instructions on getting the right setup.
All the instructions below are based on a Debian/Ubuntu system.
### Install Go 1.6
Download and install [Go 1.6 from here](https://golang.org/dl/).
### Install RocksDB
Dgraph depends on [RocksDB](https://github.com/facebook/rocksdb) for storing posting lists.
```
# First install dependencies.
# For Ubuntu, follow the ones below. For others, refer to INSTALL file in rocksdb.
$ sudo apt-get update && apt-get install libgflags-dev libsnappy-dev zlib1g-dev libbz2-dev
$ git clone https://github.com/facebook/rocksdb.git
$ cd rocksdb
$ git checkout v4.2
$ make shared_lib
$ sudo make install
```
This would install RocksDB library in `/usr/local/lib`. Make sure that your `LD_LIBRARY_PATH` is correctly pointing to it.
```
# In ~/.bashrc
export LD_LIBRARY_PATH="/usr/local/lib"
```
### Install Dgraph
Now get [Dgraph](https://github.com/dgraph-io/dgraph) code. Dgraph uses `govendor` to fix dependency versions. Version information for these dependencies is included in the `github.com/dgraph-io/dgraph/vendor` directory under the `vendor.json` file.
```
go get -u github.com/kardianos/govendor
# cd to dgraph codebase root directory e.g. $GOPATH/src/github.com/dgraph-io/dgraph
govendor sync
# Optional
go test github.com/dgraph-io/dgraph/...
```
See [govendor](https://github.com/kardianos/govendor) for more information.
## Usage
### Distributed Bulk Data Loading
Let's load up data first. If you have RDF data, you can use that.
Or, there's [Freebase film rdf data here](https://github.com/dgraph-io/benchmarks).
Bulk data loading happens in 2 passes.
#### First Pass: UID Assignment
We first find all the entities in the data, and allocate UIDs for them.
You can run this either as a single instance, or over multiple instances.
Here we set number of instances to 2.
```
$ cd $GOPATH/src/github.com/dgraph-io/dgraph/dgraph/dgraphassigner
# Run instance 0.
$ go build . && ./dgraphassigner --numInstances 2 --instanceIdx 0 --rdfgzips $BENCHMARK_REPO/data/rdf-films.gz,$BENCHMARK_REPO/data/names.gz --uids ~/dgraph/uids/u0
# And either later, or on another server, run instance 1.
$ go build . && ./dgraphassigner --numInstances 2 --instanceIdx 1 --rdfgzips $BENCHMARK_REPO/data/rdf-films.gz,$BENCHMARK_REPO/data/names.gz --uids ~/dgraph/uids/u1
```
Once the shards are generated, you need to merge them before the second pass. If you ran this as a single instance, merging isn't required.
```
$ cd $GOPATH/src/github.com/dgraph-io/dgraph/tools/merge
$ go build . && ./merge --stores ~/dgraph/uids --dest ~/dgraph/uasync.final
```
The above command would iterate over all the directories in `~/dgraph/uids`, and merge their data into one `~/dgraph/uasync.final`.
Note that this merge step is important if you're generating multiple uid intances, because all the loader instances need to have access to global uids list.
#### Second Pass: Data Loader
Now that we have assigned UIDs for all the entities, the data is ready to be loaded.
Let's do this step with 3 instances.
```
$ cd $GOPATH/src/github.com/dgraph-io/dgraph/dgraph/dgraphloader
$ go build . && ./dgraphloader --numInstances 3 --instanceIdx 0 --rdfgzips $BENCHMARK_REPO/data/names.gz,$BENCHMARK_REPO/data/rdf-films.gz --uids ~/dgraph/uasync.final --postings ~/dgraph/p0
$ go build . && ./dgraphloader --numInstances 3 --instanceIdx 1 --rdfgzips $BENCHMARK_REPO/data/names.gz,$BENCHMARK_REPO/data/rdf-films.gz --uids ~/dgraph/uasync.final --postings ~/dgraph/p1
$ go build . && ./dgraphloader --numInstances 3 --instanceIdx 2 --rdfgzips $BENCHMARK_REPO/data/names.gz,$BENCHMARK_REPO/data/rdf-films.gz --uids ~/dgraph/uasync.final --postings ~/dgraph/p2
```
You can run these over multiple machines, or just one after another.
#### Loader performance
Loader is typically memory bound. Every mutation loads a posting list in memory, where mutations
are applied in layers above posting lists.
While loader doesn't write to disk every time a mutation happens, it does periodically
merge all the mutations to posting lists, and writes them to rocksdb which persists them.
There're 2 types of merging going on: Gentle merge, and Aggressive merge.
Gentle merging picks up N% of `dirty` posting lists, where N is currently 7, and merges them. This happens every 5 seconds.
Aggressive merging happens when the memory usage goes above `stw_ram_mb`.
When that happens, the loader would *stop the world*, start the merge process, and evict all posting lists from memory.
The more memory is available for loader to work with, the less frequently aggressive merging needs to be done, the faster the loading.
As a reference point, for instance 0 and 1, it took **11 minutes each to load 21M RDFs** from `rdf-films.gz` and `names.gz`
(from [benchmarks repository](https://github.com/dgraph-io/benchmarks/tree/master/data)) on
[n1-standard-4 GCE instance](https://cloud.google.com/compute/docs/machine-types)
using SSD persistent disk. Instance 2 took a bit longer, and finished in 15 mins. The total output including uids was 1.3GB.
Note that `stw_ram_mb` is based on the memory usage perceived by Golang. It currently doesn't take into account the memory usage by RocksDB. So, the actual usage is higher.
### Server
Now that the data is loaded, you can run the Dgraph servers. To serve the 3 shards above, you can follow the [same steps as here](#multiple-distributed-instances).
Now you can run GraphQL queries over freebase film data like so:
```
curl localhost:8080/query -XPOST -d '{
me(_xid_: m.06pj8) {
type.object.name.en
film.director.film {
type.object.name.en
film.film.starring {
film.performance.character {
type.object.name.en
}
film.performance.actor {
type.object.name.en
film.director.film {
type.object.name.en
}
}
}
film.film.initial_release_date
film.film.country
film.film.genre {
type.object.name.en
}
}
}
}' > output.json
```
This query would find all movies directed by Steven Spielberg, their names, initial release dates, countries, genres, and the cast of these movies, i.e. characteres and actors playing those characters; and all the movies directed by these actors, if any.
The support for GraphQL is [very limited right now](https://github.com/dgraph-io/dgraph/issues/1).
You can conveniently browse [Freebase film schema here](http://www.freebase.com/film/film?schema=&lang=en).
There're also some schema pointers in [README](https://github.com/dgraph-io/benchmarks/blob/master/data/README.md).
#### Query Performance
With the [data loaded above](#loading-performance) on the same hardware,
it took **218ms to run** the pretty complicated query above the first time after server run.
Note that the json conversion step has a bit more overhead than captured here.
```json
{
"server_latency": {
"json": "37.864027ms",
"parsing": "1.141712ms",
"processing": "163.136465ms",
"total": "202.144938ms"
}
}
```
Consecutive runs of the same query took much lesser time (80 to 100ms), due to posting lists being available in memory.
```json
{
"server_latency": {
"json": "38.3306ms",
"parsing": "506.708µs",
"processing": "32.239213ms",
"total": "71.079022ms"
}
}
```
## Queries and Mutations
You can see a list of [sample queries here](https://discuss.dgraph.io/t/list-of-test-queries/22).
Dgraph also supports mutations via GraphQL syntax.
Because GraphQL mutations don't contain complete data, the mutation syntax uses [RDF NQuad format](https://www.w3.org/TR/n-quads/).
```
mutation {
set {
<subject> <predicate> <objectid> .
<subject> <predicate> "Object Value" .
<subject> <predicate> "объект"@ru .
_uid_:0xabcdef <predicate> <objectid> .
}
}
```
You can batch multiple NQuads in a single GraphQL query.
Dgraph would assume that any data in `<>` is an external id (XID),
and it would retrieve or assign unique internal ids (UID) automatically for these.
You can also directly specify the UID like so: `_uid_: 0xhexval` or `_uid_: intval`.
Note that a `delete` operation isn't supported yet.
In addition, you could couple a mutation with a follow up query, in a single GraphQL query like so.
```
mutation {
set {
<alice> <follows> <greg> .
}
}
query {
me(_xid_: alice) {
follows
}
}
```
The query portion is executed after the mutation, so this would return `greg` as one of the results.
## Contributing to Dgraph
- See a list of issues [that we need help with](https://github.com/dgraph-io/dgraph/issues?q=is%3Aissue+is%3Aopen+label%3Ahelp_wanted).
- Please see [contributing to Dgraph](https://discuss.dgraph.io/t/contributing-to-dgraph/20) for guidelines on contributions.
- Please see [contributing to Dgraph](https://wiki.dgraph.io/index.php?title=Contributing_to_Dgraph) for guidelines on contributions.
- *Alpha Program*: If you want to contribute to Dgraph on a continuous basis and need some Bitcoins to pay for healthy food, talk to us.
## Contact
......
/example
/*
* Copyright 2016 DGraph Labs, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package main
import (
"flag"
"fmt"
"golang.org/x/net/context"
"google.golang.org/grpc"
"github.com/dgraph-io/dgraph/query/graph"
"github.com/dgraph-io/dgraph/x"
)
var glog = x.Log("client")
var ip = flag.String("ip", "127.0.0.1:8081", "Port to communicate with server")
var q = flag.String("query", "", "Query sent to the server")
func main() {
flag.Parse()
// TODO(pawan): Pick address for server from config
conn, err := grpc.Dial(*ip, grpc.WithInsecure())
if err != nil {
x.Err(glog, err).Fatal("DialTCPConnection")
}
defer conn.Close()
c := graph.NewDgraphClient(conn)
resp, err := c.Query(context.Background(), &graph.Request{Query: *q})
if err != nil {
x.Err(glog, err).Fatal("Error in getting response from server")
}
// TODO(pawan): Remove this later
fmt.Printf("Subgraph %+v", resp.N)
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment