Query an RDF graph in Cypher
The examples on this page use the below database, whose schema and data import commands are given here:
RDFGraphs: Mapping of triples to Property Graph tables
Once you have imported your triples, you can query the triples with Cypher, Kùzu’s query language.
However, Cypher is not a query language that was designed for RDF. It assumes an underlying property graph data model.
When you create an RDFGraph, Kùzu internally creates 2 node and 2 relationship tables.
When you then ingest your triples using the COPY FROM
command, Kùzu maps the data in these triples into these
4 tables. That is, RDFGraphs are a virtual layer that wraps and gives a common name to these 4 tables.
To query your triples with Cypher, it is important to first understand this mapping.
The specifics of the mapping are as follows:
-
Resources Node Table —
UniKG_r(iri STRING, PRIMARY KEY (iri))
: Stores the Resources (hence the_r
suffix) in the triples. Each unique IRI that appears in the subject, predicate, or object of triples is mapped to a separateUniKG_r
node. Note that even IRIs that appear only as predicates and never as objects or subjects in any triple are mapped to aUniKG_r
resource node (e.g.,rdf:type
in the example database). Resource nodes have a single property,iri
, which stores the IRI of the resource as a string. -
Literals Node Table —
UniKG_l(id SERIAL, val VARIANT, lang STRING, PRIMARY KEY (id))
: Stores the Literals (hence the_l
suffix) in the triples. Each unique literal that appears in the triples is mapped to a separateUniKG_l
node. Literals have two properties,val
, which stores the value of the literal as a VARIANT data type andlang
, which stores the optional language tag as a STRING. There is a thirdid
property of type SERIAL which can be ignored. It is there to provide a primary key for the table. -
Resource-to-Resource Triples Relationship Table —
UniKG_rt(FROM UniKG_r, TO UniKG_r, iri STRING)
: Stores the triples between UniKG_r resources and UniKG_r resources._rt
suffix stands for “resource triples”, i.e., triples whose objects are resources. TheFROM
andTO
columns store the subject and object resources in the triple. Theiri
property stores the IRI of the predicate of the triple. -
Resource-to-Literal Triples Relationship Table —
UniKG_lt(FROM UniKG_r, TO UniKG_l, iri STRING)
: Stores the triples between UniKG_r resources and UniKG_l literals._lt
suffix stands for “literal triples”, i.e., triples whose objects are literals. TheFROM
andTO
columns store the subject resource and the object literal in the triple. Theiri
property stores the IRI of the predicate of the triple.
The contents of these mapped tables are shown below:
UniKG_r | UniKG_l | UniKG_rt | UniKG_lt | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
For example, the (0, rdf:type
, 2) tuple in the UniKG_rt
table corresponds to the (kz:Waterloo
, rdf:type
, kz:city
) triple,
while the (0, kz:population
, 1) tuple in the UniKG_lt
table corresponds to the (kz:Waterloo
, kz:population
, 150000) triple.
Altering the schemas of the base tables of RDFGraphs
You cannot alter the schemas of any of the node and relationship tables of RDFGraphs.
So the schemas of UniKG_r
, UniKG_l
, UniKG_rt
, and UniKG_lt
tables are immutable. However,
as discussed below you can add or delete the records in these tables as if they are
regular tables with several restrictions.
Mapping of RDF literals as separate nodes
Storing RDF Literals as separate “Literal nodes” may rightly look unintuitive at first. The other natural alternative would be to store literals as node
properties of the resources they are associated with. However, storing them as separate nodes has the advantage that
you can query both types of triples, those between resources and resources as well as between resources and literals
homogeneously with a relationship pattern. As will be discussed momentarily, you can for example use the
MATCH (s)-[p:UniKG]-(o)
pattern to match all triples. If you were to store literals as node properties, you would
need to use the previous pattern for triples between resources and resources and a different pattern MATCH (s:UniKG_r)
and inspect the properties of the mapped resources to match triples between resources and literals.
Physical storage of UniKG_rt and UniKG_lt relationship tables
For UniKG_rt
and UniKG_rl
tables, which store predicates of triples,
Kùzu stores the string iri
property internally as an integer
that stores the system-level id of the resource that corresponds to the IRI of the predicate. This is an internal
optimization and not visible to users but it helps both saving space and query performance.
Recall that each IRI that appears in your dataset is mapped to a separate resource node even if it is not
the subject or object of a triple, such as “rdf:type
” in our example. Consider
a triple where rdf:type
appears as a predicate, such as “<kz:Waterloo
, rdf:type
, kz:city
>“.
This will be stored in the UniKG_rt relationship table as three integers: (0, 1, 2), where 0, 1, and 2, are
respectively the system-level internal ids of resources kz:Waterloo
, rdf:type
, and kz:city
. However,
you can still query a “virtual” iri property of the UniKG_rt
relationship table, e.g., MATCH (s)-[p:UniKG_rt]->(o) RETURN p.iri
will
return among other tuples, the http://www.w3.org/1999/02/22-rdf-syntax-ns#type
tuple. `
Querying triples in RDFGraphs
Given the mapping of RDF triples into node and relationship tables, these tables can be queried using Cypher just like any other node and relationship table in Kùzu. For example, you can query all the triples between one resource and another using the following query:
Output:
Using RDFGraph name to query both relationship tables
We have also added syntactic sugar to make it easier to query the triples. Specifically, the RDFGraph name,
which is the prefix of all of the 4 tables, can be used to refer to both relationship table names.
That is, the RDFGraph name acts as a rel table group,
which are syntactic sugars that use a common name to refer to multiple possible relationship tables. In our example,
the RDFGraph’s name is UniKG, and instead of using UniKG_rt
and UniKG_rl
, you can use UniKG as a relationship name
to query both relationship tables as follows:
Output:
Note that for the triples whose objects are literals, the o.iri
field is null and o.val
is non-null. Similarly,
for the triples whose objects are resources, the o.val
is null and o.iri
is non-null.
[p:UniKG]
is simply a syntactic sugar for the multi-label relationship pattern of [p:UniKG_rt|UniKG_lt]
.
That is, the above query is equivalent to the following query:
We do not have a syntactic sugar option for querying both the resource and literal node tables. However, you can simply
omit the label of the nodes as done in the above query. In the above query, variable o
does not have a label,
and Kùzu resolves it to the 2 labels (o:UniKG_r:UniKG_l)
, which is the syntax for representing multi label
node variables in Cypher.
Use of namespace prefixes in queries
Writing IRI namespaces, which are the prefix strings, such as “http://www.w3.org/1999/02/22-rdf-syntax-ns#”,
“http://kuzu.io/rdf-ex#” or “http://xmlns.com/foaf/0.1/”, can be verbose.
In SPARQL, which is the standard query language for RDF, you can define a variable with “PREFIX” keyword,
such as “PREFIX kz: <http://kuzu.io/rdf-ex#>
” and then use the defined variable as a shorthand, as in
kz:student
instead of http://kuzu.io/rdf-ex#student
. In Kùzu, you can use the WITH
clause
in the beginning of your queries to define aliases. For example:
Output:
Note that in the above query if you instead projected every variable in scope with RETURN *
, the “kz” and “rdf”
aliases, which are also in scope would also be returned as columns in the output.
Querying of regular node and relationship tables and RDFGraphs
Since RDFGraphs are simply a set of node and relationship tables, you can link the node tables to other node tables in your database. This can especially be useful if you would like to enrich some of the resources with additional information. Suppose you had another source of information about the phone numbers of students at universities and you stored those in a separate Student node table. Let’s suppose that the Student node table has the following schema and 3 records:
You can now link the students in the Student node table to the students in the UniKG RDFGraph,
specifically the Resource nodes that represent students Adam
and Karissa
. Let us first create
a relationship table SameStudent
that links the two node tables:
Let us now link the Student node records with name Adam with the resource node with iri
kz:Adam
. Similarly, let us link the Student node record with name Karissa with the resource node with iri
kz:Karissa
. We can do this as follows:
Now, we can query the RDFGraph and the Student node table together as follows:
Output:
Above, a
is a node table record from the Student node table, s
is a resource node from the UniKG_r
node table,
p
is either a relationship record from the UniKG_rt
or UniKG_lt
relationship tables, and o
is either a resource or literal
record from the UniKG_r
or UniKG_l
node tables.
Modifying RDFGraphs using CREATE
, SET
, MERGE
and DELETE
Similar to how you can query the base 4 tables in RDFGraphs, you can also manipulate the base tables of RDFGraphs through the regular CREATE, MERGE, DELETE and DETACH DELETE statements of Cypher with some restrictions:
- Restriction 1:
SET
operations, including those used afterMERGE
, such asON MATCH/ON CREATE SET
are not supported. For example, you cannot change theiri
property of a Resource node or theval
property of a Literal node. - Restriction 2:
DELETE
operations on Resource node tables are not allowed.
In short, we support inserting and deleting of records from the relationship tables, inserting records into Resource node tables, and inserting and deleting Literal node tables. We provide a few examples below and discuss some of the restrictions. For details please see the documentation of the respective clauses.
Here is an example of how you can create a new resource node in the UniKG_r
node table.
Here is an example of how you can create a new triple in the UniKG_rt relationship table:
Output:
Note that the second CREATE statement creates a new UniKG_r
resource node with IRI http://kuzu.io/rdf-ex#lastName
,
which was not present in our example RDFGraph before. Recall that every unique IRI that appears in an RDFGraph, whether as a subject,
predicate or object, gets a corresponding node in the UniKG_r
node table (see item 1. in the section above
describing the mapping of triples to node and relationship tables).
Finally, here is an example of how you can delete the last literal nodes with val 150000 and all its relationships/triples.
Only the literal node in the <kz:Waterloo
, kz:population
, 150000> triple will match l
. So l
and the
triple <kz:Waterloo
, kz:population
, 150000> will be deleted.
Restrictions for deleting resource nodes
As listed among the above restrictions, Resource node table is append only, i.e., you cannot delete resource nodes.
The reason for this restriction is that
deleting a resource node correctly requires deleting all the relationships and triples that “refer” to it.
Recall that every IRI in an RDF dataset is modeled as a resource node, including those IRIs that appear in the
predicates of some triples (see item 1. in the section above
describing the mapping of triples to node and relationship tables). For example, in our running example, we have the following triple:
< kz:Waterloo
, rdf:type
, kz:City
>.
The IRI rdf:type
is a resource node in the UniKG_r
node table. Specifically, it is the 2nd row above
in the UniKG_r
table where we show the mapping of triples to node and relationship tables.
To correctly delete this node, we would have to delete all triples/relationships in the UniKG_rt
and UniKG_lt
relationship tables
that have rdf:type
as their iri
. For example, there are 4 such relationships in UniKG_rt
.
This is a non-trivial operation and we have not yet implemented it in Kùzu.
Malformed IRI behavior in CREATE
statements vs. Turtle files
Kùzu does not require that the values stored in the iri
property of the Resource node table is a valid IRI
according to the official IRI standard. From Kùzu’s perspective they can be arbitrary strings.
They only need be unique because iri
is a primary key of the Resource node table. For example, you can insert a Resource node table
with the following <http://full IRI/#ex>
string, which is not a valid IRI for two reasons, first it starts with
angle bracket and second because it contains the space character. However, you can insert it and the iri
that will
be stored would be the “<http://full IRI/#ex>” string. However, when doing bulk data ingestion from Turtle files,
triples with malformed IRIs will be ignored and not inserted into Kùzu. That is a side effect of the parser
Serd that Kùzu uses, which skips such triples (in fact it may skip an
entire “chunk” of triples in the Turtle file; see the documentation on this behavior here).
Using blank node IDs in CREATE
statements
Kùzu has the convention that during bulk data import from Turtle or N-Triples files,
blank nodes are replaced with specific IRIs of the form _:iopt-label
or _:ibj
, where i and j are integers.
The common prefix of these IRIs is _:i
.
If you use IRIs of this form, say _:i
, in your CREATE statements for Resource nodes,
Kùzu will interpret these simply as strings and will not do anything special. For example, if
you provide a CREATE statement that enters a Resource node with IRI _:7b4
and _:7b4
already exists,
Kùzu will not CREATE a new Resource node and instead error.
Further, you cannot have predicates whose IRIs of the form _:
(as in Turtle files).
Kùzu will error on CREATE statements that try to create a relationship
record in the _rt
or _lt
relationship tables with a predicate
whose IRI is of the form _:
.
Duplicate triples
Some RDF stores do not allow duplicate triples to be inserted into a database. In Kùzu, because each triple is a relationship record, and Kùzu supports multiple relationships between the same pair of nodes, it is possible to insert duplicate triples into Kùzu RDFGraphs.