MATCH is the clause where you define a “graph pattern”, i.e., a join of node or relationship records,
to find in the database1. MATCH is often accompanied by WHERE (equivalent to SQL’s WHERE clause)
to define more predicates on the patterns that are matched.
We will use the example database for demonstration, whose schema and data import commands are given here.
The query below matches variable “a” to nodes with label User and returns “a”, which
is a shortcut in openCypher to return all properties of the node together with label and internal ID that variable “a” matches.
The query below matches variable “a” to nodes with label User or label City. “Return a” will return all properties of the node together with label and internal ID. Properties not exist in a label will be returned as NULL value (e.g. “population” not exists in “User”). Properties exists in multiple labels are expected to have the same data type (e.g. “name” has STRING data type in “User” and “City” ).
Similar to binding variables to node records, you can bind variables to relationship records and return them. You can specify the direction of relationship by <- or ->. The following query finds all “a” Users that follow a “b” User through an outgoing relationship from “a”, and returns name of “a”, relationship “e”, and name of “b”, where “e” will match the relationship from “a” to “b”.
Output:
The following query matches all the relationships through an incoming relationship from “a” (so “a” and “b” are swapped in output):
Similar to matching nodes with multiple labels, you can bind variables to relationships with multiple labels. Below query finds all “a” User that Follows “b” User or LivesIn “b” City.
Similar to matching nodes with any label, you can bind variables to relationships with any label by not specifying a label. Below query finds all relationships in the database.
Users can match a relationship in both directions by not specifying a relationship direction (i.e. -). The following query finds all “b” users who either follows or being followed by “Karissa”.
You can also omit binding a variable to a node or relationship in your graph patterns if
you will not use them in somewhere else in your query (e.g., WHERE or RETURN). For example, below, we query for 2-hop paths searching for “the cities of Users that “a” Follows”.
Because we do not need to return the Users that “a” Users follows or the properties
of the Follows and LivesIn edges that form these 2-paths, we can omit giving variable names to them.
Although paths can be matched in a single pattern, some patterns, in particular
cyclic patterns, require specifying multiple patterns/paths that form the pattern.
These multiple paths are separated by a comma. The following is a (directed) triangle
query and returns the only triangle in the database between Adam, Karissa, and Zhang.
Output:
Note that in the query node variables a and c appear twice, once on each of the 2 paths
in the query. In such cases, their labels need to be specified only the first time they appear
in the pattern. In the above query a and c’s labels are defined on the first/left path,
so you don’t have to specify them on the right path (though you still can).
The WHERE clause is the main clause to specify arbitrary predicates on the nodes and relationships in your patters (e.g., a.age < b.age in where “a” and “b” bind to User nodes).
As a syntactic sugar openCypher allows equality predicates to be matched on
nodes and edges using the {prop1 : value1, prop2 : value2, ...} syntax. For example:
You can also find paths that are of variable length between node records. Variable-length relationships
are sometimes known as recursive patterns or recursive joins. Specifically, you can find variable-hop
connections between nodes by specifying in the relationship patterns,
e.g., -[:Label*min..max]->, where min and max specify the minimum and the maximum number of hops2.
The following query finds all Users that “Adam” follows within 1 to 2 hops and returns their names as well as length of the path.
Output:
Karissa is found through Adam -> Karissa
Zhang is found through Adam -> Zhang and Adam -> Karissa -> Zhang
Noura is found through Adam -> Zhang -> Noura
Similar to matching relationships, you can match undirected relationships or relationship with multiple labels.
The following query finds all Nodes excluding “Noura” that connects to “Noura” in both directions through any relationship with 2 hops.
Output:
Adam is found through Noura <- User -> Adam
Karissa is found through Noura <- User -> Karissa
Kitchener is found through Noura <- User -> Kitchener
We also support running predicates on recursive patterns to constrain the relationship being traversed.
The following query finds name of users and the number of path that are followed between 1 - 2 hops from Adam by person with age more than 45 and before 2022.
Output:
Our filter grammar is similar to that used by Memgraph
for example, in Cypher list comprehensions. The first variable represents intermediate relationships and the second one represents intermediate nodes.
You can project a subset of properties for the intermediate nodes and relationships that bind within a variable length
relationship. You can define the projection list of intermediate nodes and relationships within two curly brackets {}{} at
the end of the variable length pattern. The first {} is used for projecting relationship properties and the second {} for
node properties. Currently, Kùzu only supports directly projecting properties and not of expressions using
the properties. Projecting properties of intermediate nodes and relationships can improve both performance and memory footprint.
Below is an example that projects only the since property of the intermediate relationship and the name property of the
intermediate nodes that will bind to the variable length relationship pattern of e. Readers can assume
that there are other properties than since on the Follow relationship table for our purposes (in our running example, the User nodes already have a second property age, which will be removed from the output as shown below).
Returns:
As can be seen in the output, the nodes that bind to e contain only the name property and the relationships that
bind to e contain only the since property.
On top of variable length relationships, users can search for single shortest path by specifying SHORTEST key word in relationship, e.g. -[:Label* SHORTEST 1..max].
The following query finds a shortest path between Adam and any city and returns city name as well as length of the path.
Internally PATH is processed as a STRUCT{LIST[NODE], LIST[REL]} see PATH data type for details. Users can access nodes and relationships within a path through nodes(p) and rels(p) function calls.