Quick Start
Overview
This example-driven tutorial presents 5 steps to get started with Blue Brain Nexus to build and query a simple knowledge graph. The goal is to go over some capabilities of Blue Brain Nexus enabling:
- The creation of a project as a protected data space to work with
- An easy ingestion of a dataset
- Querying a dataset to retrieve various information
- Sharing a dataset by making it public
For that we will work with the small version of the Global Research Identifier Database (GRID) dataset containing a set of:
- institutes (institutes.csv)
- their acronyms (acronyms.csv)
- their addresses (addresses.csv)
- their urls (links.csv)
- and their relationships (relationships.csv)
An overview of this dataset can be found here.
- We will be using Blue Brain Nexus CLI, a python client throughout this quick start tutorial.
- This tutorial assumes you’ve installed and configured the CLI. If not, please follow the set up instructions.
Let’s get started.
Create a project
Projects in BlueBrain Nexus are spaces where data can be:
- managed: created, updated, deprecated, validated, secured;
- accessed: directly by ids or through various search interfaces;
- shared: through fine grain Access Control List.
A project is always created within an organization just like a git repository is created in a github organization. Organizations can be understood as accounts hosting multiple projects.
Select an organization
A public organization named demo is already created for the purpose of this tutorial. All projects will be created under this organization.
The following command should list the organizations you have access to. The demo organization should be listed and tagged as non-deprecated in the output.
- Command
-
Full source at GitHubnexus orgs list
- Output
-
Full source at GitHub+----------------+-------------------+-------------------------------------------------+------------+ | Name | Description | Id | Deprecated | +----------------+-------------------+-----------------------------------------------------+--------+ | demo | Nexus sandbox | https://sandbox.bluebrainnexus.io/v1/demo | False |
Let select the demo organization.
- Command
-
Full source at GitHubnexus orgs select demo
- Output
-
Full source at GitHubdemo organization selected.
Create a project
A project is created with a label and within an organization. The label should be made of alphanumerical characters and its length should be between 3 and 32 (it should match the regex: [a-zA-Z0-9-_]{3,32}).
Pick a label (hereafter referred to as $PROJECTLABEL) and create a project using the following command. It is recommended to use your github username to avoid collision of projects labels within an organization.
- Command
-
Full source at GitHubnexus projects create $PROJECTLABEL && nexus projects list
- Output
-
Full source at GitHubProject created (id: https://sandbox.bluebrainnexus.io/v1/projects/demo/$PROJECTLABEL) +---------------+-------------+------------------------------------------------------------------------+------------+ | Label | Description | Id | Deprecated | +---------------+-------------+------------------------------------------------------------------------+------------+ | $PROJECTLABEL | | https://sandbox.bluebrainnexus.io/v1/projects/demo/$PROJECTLABEL | False | +---------------+-------------+------------------------------------------------------------------------+------------+
By default, created projects are private meaning that only the project creator (you) has read and write access to it. We’ll see below how to make a project public.
The output of the previous command shows the list of projects you have read access to. The project you just created should be the only one listed at this point. Let select it.
- Command
-
Full source at GitHubnexus projects select $PROJECTLABEL && nexus projects list
- Output
-
Full source at GitHub$PROJECTLABEL project selected +---------------+-------------+------------------------------------------------------------------------+------------+ | Label | Description | Id | Deprecated | +---------------+-------------+------------------------------------------------------------------------+------------+ | $PROJECTLABEL | | https://sandbox.bluebrainnexus.io/v1/projects/demo/$PROJECTLABEL | False | +---------------+-------------+------------------------------------------------------------------------+------------+
We are all set to bring some data within the project we just created.
Ingest data
Load the dataset
Let first list the files that made the small version of the GRID dataset.
- Command
-
Full source at GitHubcd getting-started/dataset/grid-small && ls
- Output
-
Full source at GitHubacronyms.csv addresses.csv institutes.csv links.csv relationships.csv
The data to be ingested come in 5 csv files (see the output of the above command) containing each a partial description of the organizations. A single command allows to load the organisations within the institutes.csv file and merge it with all the other csv files.
nexus resources create --file institutes.csv --type Organization --format csv \
--idcolumn grid_id --idnamespace http://www.grid.ac/institutes/ \
--mergewith links.csv --mergewith addresses.csv --mergewith relationships.csv --mergewith acronyms.csv \
--mergeon grid_id \
--max-connections 4
Access data
View data in Nexus Web
Nexus is deployed with a developer oriented web application allowing to browse organizations, projects, data and schemas you have access to. You can go to the address https://sandbox.bluebrainnexus.io/web/demo and browse the data you just loaded.
List data
The simplest way to accessed data within Nexus is by listing them. The following command lists 5 resources:
- Command
-
Full source at GitHubnexus resources list --size 5
The full payload of the resources are not retrieved when listing them: only identifier, type as well as Nexus added metadata are. But the result list can be scrolled and each resource fetched by identifier.
Let fetch the EPFL organization identified by http://www.grid.ac/institutes/grid.5333.6
- Command
-
Full source at GitHubnexus resources fetch http://www.grid.ac/institutes/grid.5333.6
- Output
-
Full source at GitHub{ "@context": [ { "@base": "https://sandbox.bluebrainnexus.io/v1/resources/demo/$PROJECTLABEL/_/", "@vocab": "https://sandbox.bluebrainnexus.io/v1/vocabs/demo/$PROJECTLABEL/" }, "https://bluebrain.github.io/nexus/contexts/resource.json" ], "@id": "http://www.grid.ac/institutes/grid.5333.6", "@type": "Organization", "acronym": "EPFL", "city": "Lausanne", "country": "Switzerland", "country_code": "CH", "email_address": "", "established": 1853, "geonames_city_id": 2659994, "grid_id": "grid.5333.6", "lat": 46.519082, "line_1": "", "line_2": "", "line_3": "", "link": "http://www.epfl.ch/index.en.html", "lng": 6.566747, "name": "\u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne", "postcode": "", "primary": false, "related_grid_id": "grid.482253.a", "relationship_type": "Related", "state": "", "state_code": "", "wikipedia_url": "http://en.wikipedia.org/wiki/%C3%89cole_Polytechnique_F%C3%A9d%C3%A9rale_de_Lausanne", "_self": "https://sandbox.bluebrainnexus.io/v1/resources/demo/testdemo/_/http%3A%2F%2Fwww.grid.ac%2Finstitutes%2Fgrid.5333.6", "_constrainedBy": "https://bluebrain.github.io/nexus/schemas/unconstrained.json", "_project": "https://sandbox.bluebrainnexus.io/v1/projects/demo/testdemo", "_rev": 1, "_deprecated": false, "_createdAt": "2019-06-04T08:42:26.433Z", "_createdBy": "https://sandbox.bluebrainnexus.io/v1/realms/github/users/mfsy", "_updatedAt": "2019-06-04T08:42:26.433Z", "_updatedBy": "https://sandbox.bluebrainnexus.io/v1/realms/github/users/mfsy" }
Whenever a resource is created, Nexus injects some useful metadata. The table below details some of them:
Metadata | Description | Value Type |
---|---|---|
@id | Generated resource identifier. The user can provide its own identifier. | URI |
@type | The type of the resource if provided by the user. | URI |
_self | The resource address within Nexus. It contains the resource management details such as the organization, the project and the schema. | URI |
_createdAt | The resource creation date. | DateTime |
_createdBy | The resource creator. | DateTime |
Note that Nexus uses JSON-LD as data exchange format.
Filters are available to list specific resources. For example a list of resources of type Organization can be retrieved by running the following command:
- Command
-
Full source at GitHubnexus resources list --type Organization --size 5
- Output
-
Full source at GitHub+------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+ | Id | Type | Revision | Deprecated | +------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+ | https://sandbox.bluebrainnexus.io/v1/resources/demo/$PROJECTLABEL/_/Rating_1 | https://sandbox.bluebrainnexus.io/v1/vocabs/demo/$PROJECTLABEL/Rating | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/demo/$PROJECTLABEL/_/Rating_9 | https://sandbox.bluebrainnexus.io/v1/vocabs/demo/$PROJECTLABEL/Rating | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/demo/$PROJECTLABEL/_/Rating_12 | https://sandbox.bluebrainnexus.io/v1/vocabs/demo/$PROJECTLABEL/Rating | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/demo/$PROJECTLABEL/_/Rating_7 | https://sandbox.bluebrainnexus.io/v1/vocabs/demo/$PROJECTLABEL/Rating | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/demo/$PROJECTLABEL/_/Rating_8 | https://sandbox.bluebrainnexus.io/v1/vocabs/demo/$PROJECTLABEL/Rating | 1 | False | +------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+
Query data
Listing is usually not enough to select specific subset of data. Data ingested within each project can be searched through two complementary search interfaces called views.
View | Description |
---|---|
ElasticSearchView | Exposes data in ElasticSearch, a document oriented search engine and provide access to it using the ElasticSearch query language. |
SparqlView | Exposes data as a graph and allows to navigate and explore the data using the W3C Sparql query language. |
Query data using the ElasticSearchView
The ElasticSearchView URL is available at the address https://sandbox.bluebrainnexus.io/v1/views/demo/$PROJECTLABEL/documents/_search.
The query below selects 5 organizations sorted by creation date in descending order.
- Select queries
-
Full source at GitHubnexus views query-es --data \ '{ "size":5, "sort" : [ { "_createdAt" : {"order" : "desc"} } ], "query": { "terms" : {"@type":["https://sandbox.bluebrainnexus.io/v1/vocabs/demo/$PROJECTLABEL/Organization"]} } }'
Query data using the SparqlView
The SparqlView is available at the address [https://sandbox.bluebrainnexus.io/v1/views/demo/$PROJECTLABEL/graph/sparql]. The following diagram shows how the MovieLens data is structured in the default Nexus SparqlView. Note that the ratings, tags and movies are joined by the movieId property.
The query below selects 5 organizations sorted by creation date in descending order.
- Select queries
-
Full source at GitHubnexus views query-sparql --data \ ' PREFIX vocab: <https://sandbox.bluebrainnexus.io/v1/vocabs/demo/$PROJECTLABEL/> PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/> Select ?org ?name ?createdAt WHERE { ?org a vocab:Organization. ?org vocab:name ?name. ?org nxv:createdAt ?createdAt } ORDER BY DESC (?createdAt) LIMIT 5'
Share data
Making a dataset public means granting read permissions to “anonymous” user.
$ nexus acls make-public
To check that the dataset is now public:
- Ask the person next to you to list resources in your project.
- Or create and select another profile named public-tutorial (following the instructions in the Set up. You should see the that the public-tutorial is selected and its corresponding token column is None.
- Output
-
Full source at GitHubSelected profile: tutorial +-------------------+----------+-------------------------------------+------------------+ | Profile | Selected | URL | Token | +-------------------+----------+-------------------------------------+------------------+ | tutorial | | https://sandbox.bluebrainnexus.io/v1 | Expiry: 2019... | | public-tutorial | Yes | https://sandbox.bluebrainnexus.io/v1 | None | +-------------------+----------+-------------------------------------+------------------+
- Resources in your project should be listed with the command even though you are not authenticated.
- Command
-
Full source at GitHubnexus resources list --size 5 -o demo -p $PROJECTLABEL