diff --git a/README.md b/README.md index d1c260ea8dda69df94aac65909423fd0fdb03601..9679f4f6e5e6b726d71b31d6a9f4b0edfe50eab6 100644 --- a/README.md +++ b/README.md @@ -26,9 +26,9 @@ Dgraph supports [GraphQL](http://graphql.org/) as query language, and responds i The README is divided into these sections: - [Current Status](#current-status) - [Quick Testing](#quick-testing) -- [Installation](#installation) -- [Usage](#usage) -- [Queries and Mutations](#queries-and-mutations) +- [Installation: Moved to wiki](https://wiki.dgraph.io/index.php?title=Beginners%27_guide) +- [Data Loading: Moved to Wiki](https://wiki.dgraph.io/index.php?title=Beginners%27_guide#Data_Loading) +- [Queries and Mutations: Moved to wiki](https://wiki.dgraph.io/index.php?title=Beginners%27_guide#Queries_and_Mutations) - [Contact](#contact) ## Current Status @@ -58,7 +58,6 @@ when you encounter bugs and to direct the development of Dgraph. There's an instance of Dgraph running at http://dgraph.xyz, that you can query without installing Dgraph. This instance contains 21M facts from [Freebase Film Data](http://www.freebase.com/film). -See [Queries and Mutations below](#queries-and-mutations) for sample queries. `curl dgraph.xyz/query -XPOST -d '{}'` @@ -119,215 +118,9 @@ You can hit any of the 3 processes, they'll produce the same results. `curl localhost:8080/query -XPOST -d '{}'` -## Installation -Best way to do this is to refer to [Dockerfile](Dockerfile), which has the most complete -instructions on getting the right setup. -All the instructions below are based on a Debian/Ubuntu system. - -### Install Go 1.6 -Download and install [Go 1.6 from here](https://golang.org/dl/). - -### Install RocksDB -Dgraph depends on [RocksDB](https://github.com/facebook/rocksdb) for storing posting lists. - -``` -# First install dependencies. -# For Ubuntu, follow the ones below. For others, refer to INSTALL file in rocksdb. -$ sudo apt-get update && apt-get install libgflags-dev libsnappy-dev zlib1g-dev libbz2-dev -$ git clone https://github.com/facebook/rocksdb.git -$ cd rocksdb -$ git checkout v4.2 -$ make shared_lib -$ sudo make install -``` - -This would install RocksDB library in `/usr/local/lib`. Make sure that your `LD_LIBRARY_PATH` is correctly pointing to it. - -``` -# In ~/.bashrc -export LD_LIBRARY_PATH="/usr/local/lib" -``` - -### Install Dgraph -Now get [Dgraph](https://github.com/dgraph-io/dgraph) code. Dgraph uses `govendor` to fix dependency versions. Version information for these dependencies is included in the `github.com/dgraph-io/dgraph/vendor` directory under the `vendor.json` file. - -``` -go get -u github.com/kardianos/govendor -# cd to dgraph codebase root directory e.g. $GOPATH/src/github.com/dgraph-io/dgraph -govendor sync - -# Optional -go test github.com/dgraph-io/dgraph/... - -``` -See [govendor](https://github.com/kardianos/govendor) for more information. - -## Usage - -### Distributed Bulk Data Loading -Let's load up data first. If you have RDF data, you can use that. -Or, there's [Freebase film rdf data here](https://github.com/dgraph-io/benchmarks). - -Bulk data loading happens in 2 passes. - -#### First Pass: UID Assignment -We first find all the entities in the data, and allocate UIDs for them. -You can run this either as a single instance, or over multiple instances. - -Here we set number of instances to 2. -``` -$ cd $GOPATH/src/github.com/dgraph-io/dgraph/dgraph/dgraphassigner - -# Run instance 0. -$ go build . && ./dgraphassigner --numInstances 2 --instanceIdx 0 --rdfgzips $BENCHMARK_REPO/data/rdf-films.gz,$BENCHMARK_REPO/data/names.gz --uids ~/dgraph/uids/u0 - -# And either later, or on another server, run instance 1. -$ go build . && ./dgraphassigner --numInstances 2 --instanceIdx 1 --rdfgzips $BENCHMARK_REPO/data/rdf-films.gz,$BENCHMARK_REPO/data/names.gz --uids ~/dgraph/uids/u1 -``` - -Once the shards are generated, you need to merge them before the second pass. If you ran this as a single instance, merging isn't required. -``` -$ cd $GOPATH/src/github.com/dgraph-io/dgraph/tools/merge -$ go build . && ./merge --stores ~/dgraph/uids --dest ~/dgraph/uasync.final -``` -The above command would iterate over all the directories in `~/dgraph/uids`, and merge their data into one `~/dgraph/uasync.final`. -Note that this merge step is important if you're generating multiple uid intances, because all the loader instances need to have access to global uids list. - -#### Second Pass: Data Loader -Now that we have assigned UIDs for all the entities, the data is ready to be loaded. - -Let's do this step with 3 instances. -``` -$ cd $GOPATH/src/github.com/dgraph-io/dgraph/dgraph/dgraphloader -$ go build . && ./dgraphloader --numInstances 3 --instanceIdx 0 --rdfgzips $BENCHMARK_REPO/data/names.gz,$BENCHMARK_REPO/data/rdf-films.gz --uids ~/dgraph/uasync.final --postings ~/dgraph/p0 -$ go build . && ./dgraphloader --numInstances 3 --instanceIdx 1 --rdfgzips $BENCHMARK_REPO/data/names.gz,$BENCHMARK_REPO/data/rdf-films.gz --uids ~/dgraph/uasync.final --postings ~/dgraph/p1 -$ go build . && ./dgraphloader --numInstances 3 --instanceIdx 2 --rdfgzips $BENCHMARK_REPO/data/names.gz,$BENCHMARK_REPO/data/rdf-films.gz --uids ~/dgraph/uasync.final --postings ~/dgraph/p2 -``` -You can run these over multiple machines, or just one after another. - -#### Loader performance -Loader is typically memory bound. Every mutation loads a posting list in memory, where mutations -are applied in layers above posting lists. -While loader doesn't write to disk every time a mutation happens, it does periodically -merge all the mutations to posting lists, and writes them to rocksdb which persists them. - -There're 2 types of merging going on: Gentle merge, and Aggressive merge. -Gentle merging picks up N% of `dirty` posting lists, where N is currently 7, and merges them. This happens every 5 seconds. - -Aggressive merging happens when the memory usage goes above `stw_ram_mb`. -When that happens, the loader would *stop the world*, start the merge process, and evict all posting lists from memory. -The more memory is available for loader to work with, the less frequently aggressive merging needs to be done, the faster the loading. - -As a reference point, for instance 0 and 1, it took **11 minutes each to load 21M RDFs** from `rdf-films.gz` and `names.gz` -(from [benchmarks repository](https://github.com/dgraph-io/benchmarks/tree/master/data)) on -[n1-standard-4 GCE instance](https://cloud.google.com/compute/docs/machine-types) -using SSD persistent disk. Instance 2 took a bit longer, and finished in 15 mins. The total output including uids was 1.3GB. - -Note that `stw_ram_mb` is based on the memory usage perceived by Golang. It currently doesn't take into account the memory usage by RocksDB. So, the actual usage is higher. - -### Server -Now that the data is loaded, you can run the Dgraph servers. To serve the 3 shards above, you can follow the [same steps as here](#multiple-distributed-instances). -Now you can run GraphQL queries over freebase film data like so: -``` -curl localhost:8080/query -XPOST -d '{ - me(_xid_: m.06pj8) { - type.object.name.en - film.director.film { - type.object.name.en - film.film.starring { - film.performance.character { - type.object.name.en - } - film.performance.actor { - type.object.name.en - film.director.film { - type.object.name.en - } - } - } - film.film.initial_release_date - film.film.country - film.film.genre { - type.object.name.en - } - } - } -}' > output.json -``` -This query would find all movies directed by Steven Spielberg, their names, initial release dates, countries, genres, and the cast of these movies, i.e. characteres and actors playing those characters; and all the movies directed by these actors, if any. - -The support for GraphQL is [very limited right now](https://github.com/dgraph-io/dgraph/issues/1). -You can conveniently browse [Freebase film schema here](http://www.freebase.com/film/film?schema=&lang=en). -There're also some schema pointers in [README](https://github.com/dgraph-io/benchmarks/blob/master/data/README.md). - -#### Query Performance -With the [data loaded above](#loading-performance) on the same hardware, -it took **218ms to run** the pretty complicated query above the first time after server run. -Note that the json conversion step has a bit more overhead than captured here. -```json -{ - "server_latency": { - "json": "37.864027ms", - "parsing": "1.141712ms", - "processing": "163.136465ms", - "total": "202.144938ms" - } -} -``` - -Consecutive runs of the same query took much lesser time (80 to 100ms), due to posting lists being available in memory. -```json -{ - "server_latency": { - "json": "38.3306ms", - "parsing": "506.708µs", - "processing": "32.239213ms", - "total": "71.079022ms" - } -} -``` - -## Queries and Mutations -You can see a list of [sample queries here](https://discuss.dgraph.io/t/list-of-test-queries/22). -Dgraph also supports mutations via GraphQL syntax. -Because GraphQL mutations don't contain complete data, the mutation syntax uses [RDF NQuad format](https://www.w3.org/TR/n-quads/). -``` -mutation { - set { - <subject> <predicate> <objectid> . - <subject> <predicate> "Object Value" . - <subject> <predicate> "объект"@ru . - _uid_:0xabcdef <predicate> <objectid> . - } -} -``` - -You can batch multiple NQuads in a single GraphQL query. -Dgraph would assume that any data in `<>` is an external id (XID), -and it would retrieve or assign unique internal ids (UID) automatically for these. -You can also directly specify the UID like so: `_uid_: 0xhexval` or `_uid_: intval`. - -Note that a `delete` operation isn't supported yet. - -In addition, you could couple a mutation with a follow up query, in a single GraphQL query like so. -``` -mutation { - set { - <alice> <follows> <greg> . - } -} -query { - me(_xid_: alice) { - follows - } -} -``` -The query portion is executed after the mutation, so this would return `greg` as one of the results. - - ## Contributing to Dgraph - See a list of issues [that we need help with](https://github.com/dgraph-io/dgraph/issues?q=is%3Aissue+is%3Aopen+label%3Ahelp_wanted). -- Please see [contributing to Dgraph](https://discuss.dgraph.io/t/contributing-to-dgraph/20) for guidelines on contributions. +- Please see [contributing to Dgraph](https://wiki.dgraph.io/index.php?title=Contributing_to_Dgraph) for guidelines on contributions. - *Alpha Program*: If you want to contribute to Dgraph on a continuous basis and need some Bitcoins to pay for healthy food, talk to us. ## Contact diff --git a/client/dgraphclient-go/.gitignore b/client/dgraphclient-go/.gitignore deleted file mode 100644 index 6f30a3aba9e8c58986b54346b5cdfc18d5bb4bf7..0000000000000000000000000000000000000000 --- a/client/dgraphclient-go/.gitignore +++ /dev/null @@ -1 +0,0 @@ -/example diff --git a/client/dgraphclient-go/main.go b/client/dgraphclient-go/main.go deleted file mode 100644 index afebf93a58d253a2e19bc72cd3fd6fcda24f9a34..0000000000000000000000000000000000000000 --- a/client/dgraphclient-go/main.go +++ /dev/null @@ -1,53 +0,0 @@ -/* - * Copyright 2016 DGraph Labs, Inc. - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package main - -import ( - "flag" - "fmt" - - "golang.org/x/net/context" - "google.golang.org/grpc" - - "github.com/dgraph-io/dgraph/query/graph" - "github.com/dgraph-io/dgraph/x" -) - -var glog = x.Log("client") -var ip = flag.String("ip", "127.0.0.1:8081", "Port to communicate with server") -var q = flag.String("query", "", "Query sent to the server") - -func main() { - flag.Parse() - // TODO(pawan): Pick address for server from config - conn, err := grpc.Dial(*ip, grpc.WithInsecure()) - if err != nil { - x.Err(glog, err).Fatal("DialTCPConnection") - } - defer conn.Close() - - c := graph.NewDgraphClient(conn) - - resp, err := c.Query(context.Background(), &graph.Request{Query: *q}) - if err != nil { - x.Err(glog, err).Fatal("Error in getting response from server") - } - - // TODO(pawan): Remove this later - fmt.Printf("Subgraph %+v", resp.N) - -}