Cypher tutorial
If you’re new to Cypher and just getting started with Kuzu, you’re in the right place! This tutorial will guide you through the basics of Cypher, including how to create nodes and relationships, and how to scan, copy and query your data that’s in a Kuzu database.
See the link below for the example dataset used in this tutorial.
The dataset is a social network dataset of users and posts. Download the dataset and unzip it in your current working directory.
Working with the Kuzu CLI
Kuzu’s command line interface (CLI) is a great way to get started with Kuzu. It allows you to interact with your database in a terminal and quickly test out your ideas in Cypher.
Let’s start by first opening up the Kuzu CLI shell. Install the Kuzu CLI by running as per the install docs and then run the following command to start the CLI.
# Open an in-memory databasekuzu
Define a schema
The first step in getting your data into Kuzu is creating node and relationship tables. This step is called schema definition. Follow the steps in the example database to create the node and relationship tables.
Ingest the data
The COPY FROM
command is used to ingest data from various file formats into Kuzu. In this case,
the data is stored in CSV format.
COPY User FROM '/Users/prrao/code/kuzu-debug/tutorial_data/node/user.csv';COPY Post FROM '/Users/prrao/code/kuzu-debug/tutorial_data/node/post.csv';COPY LIKES FROM '/Users/prrao/code/kuzu-debug/tutorial_data/relation/LIKES.csv';COPY FOLLOWS FROM '/Users/prrao/code/kuzu-debug/tutorial_data/relation/FOLLOWS.csv';COPY POSTED FROM '/Users/prrao/code/kuzu-debug/tutorial_data/relation/POSTS.csv';
You should see messages in your terminal indicating how many tuples were copied into each table. An example is shown below.
┌───────────────────────────────────────────────┐│ result ││ STRING │├───────────────────────────────────────────────┤│ 20 tuples have been copied to the User table. │└───────────────────────────────────────────────┘
MATCH
In Kuzu, a graph’s nodes and relationships are stored in tables. The MATCH
clause is used to
find nodes that match the pattern specified in the clause. All entities in the pattern are
returned via the RETURN
clause.
Match nodes
Let’s say we want to match only User
nodes in the database. We can do this by specifying the label
in the MATCH
clause.
MATCH (a:User) RETURN a.* LIMIT 3;
Note the a.*
in the RETURN
clause. This is a wildcard that returns all properties of the User
node, regardless of how many properties there are.
┌──────────┬────────────────┬─────────────────────────┐│ a.userID │ a.username │ a.account_creation_date ││ INT64 │ STRING │ DATE │├──────────┼────────────────┼─────────────────────────┤│ 1 │ epicwolf202 │ 2022-09-09 ││ 2 │ silentninja637 │ 2023-01-27 ││ 3 │ stormcat597 │ 2023-02-25 │└──────────┴────────────────┴─────────────────────────┘
You can also match nodes across multiple node tables, as shown below. However, it only makes sense to do this when there exists a common property between the two tables.
MATCH (a:User:Post) RETURN a.userID LIMIT 3;
Match a relationship pattern
You can match a relationship pattern by specifying the relationship in the MATCH
clause.
MATCH (a:User)-[r:LIKES]->(b:Post) RETURN a.* LIMIT 3;
┌──────────┬────────────────┬─────────────────────────┐│ a.userID │ a.username │ a.account_creation_date ││ INT64 │ STRING │ DATE │├──────────┼────────────────┼─────────────────────────┤│ 16 │ fastqueen400 │ 2022-07-26 ││ 2 │ silentninja637 │ 2023-01-27 ││ 4 │ brightking765 │ 2023-05-11 │└──────────┴────────────────┴─────────────────────────┘
Match on all nodes and relationships
If you want to match on an arbitrary node or relationship in the database, you can use the MATCH
clause without any label
in the pattern.
MATCH (a)-[b]->(c) RETURN * LIMIT 3;
┌──────────────────────────────────┬──────────────────────────────────┬──────────────────────────────────┐│ a │ c │ b ││ NODE │ NODE │ REL │├──────────────────────────────────┼──────────────────────────────────┼──────────────────────────────────┤│ {_ID: 0:14, _LABEL: User, use... │ {_ID: 0:0, _LABEL: User, user... │ (0:14)-{_LABEL: FOLLOWS, _ID:... ││ {_ID: 0:15, _LABEL: User, use... │ {_ID: 0:0, _LABEL: User, user... │ (0:15)-{_LABEL: FOLLOWS, _ID:... ││ {_ID: 0:17, _LABEL: User, use... │ {_ID: 0:0, _LABEL: User, user... │ (0:17)-{_LABEL: FOLLOWS, _ID:... │└──────────────────────────────────┴──────────────────────────────────┴──────────────────────────────────┘
Match multiple patterns
You can combine multiple match clauses that each specify a particular pattern.
MATCH (a:User)-[:Follows]->(b:User)-[:Follows]->(c:User), (a)-[:Follows]->(c)RETURN a.username AS user1, b.username AS user2, c.username AS user3LIMIT 5;
The above query is the same as having written the following two match clauses one after the other.
MATCH (a:User)-[:Follows]->(b:User)-[:Follows]->(c:User)MATCH (a)-[:Follows]->(c)RETURN *;
Instead of repeating the match clause, you can comma-separate the clauses as shown above. The following result is returned.
┌────────────────┬───────────────┬───────────────┐│ user1 │ user2 │ user3 ││ STRING │ STRING │ STRING │├────────────────┼───────────────┼───────────────┤│ silentninja637 │ stormninja678 │ darkdog878 ││ silentninja637 │ darkdog878 │ stormninja678 ││ silentninja637 │ stormcat597 │ darkdog878 │└────────────────┴───────────────┴───────────────┘
Match variable-length relationships
One of the most powerful features of Cypher is the ability to match variable-length relationships.
This is done using the Kleene star operator *
. The following query aims to find all users that
are two hops (i.e., paths of length 2) away from a particular user.
MATCH (a:User)-[:Follows*1..2]->(b:User)RETURN a.username, b.usernameLIMIT 3;
┌─────────────┬─────────────┐│ a.username │ b.username ││ STRING │ STRING │├─────────────┼─────────────┤│ epicwolf202 │ darkdog878 ││ epicwolf202 │ coolking201 ││ epicwolf202 │ epicking81 │└─────────────┴─────────────┘
The above result is telling us that the user in the column b
is followed by the user in the column a
.
We can verify if this is true by manually writing the following query without the *
operator.
MATCH (a:User)-[:Follows]->(x:User)-[:Follows]->(b:User)WHERE a.username = 'epicwolf202'RETURN b.username;
┌─────────────┬─────────────┐│ a.username │ b.username ││ STRING │ STRING │├─────────────┼─────────────┤│ epicwolf202 │ epicking81 ││ epicwolf202 │ coolking201 ││ epicwolf202 │ darkdog878 │└─────────────┴─────────────┘
Indeed, the same result is returned.
OPTIONAL MATCH
OPTIONAL MATCH
is clause used to define a pattern to find in the database. The difference from a regular
MATCH
is that if the system cannot match a pattern defined by OPTIONAL MATCH
, it will set the values in
the variables defined only in the OPTIONAL MATCH
, to NULL.
Depending on what the end goal is, returning nulls may or may not be acceptable, so use a conventional
MATCH
if no nulls are desired.
MATCH (u1:User)OPTIONAL MATCH (u2)-[:Follows]->(u1:User)RETURN u1.username, u2.usernameLIMIT 3;
┌─────────────┬───────────────┐│ u1.username │ u2.username ││ STRING │ STRING │├─────────────┼───────────────┤│ epicwolf202 │ smartdragon25 ││ epicwolf202 │ fastqueen400 ││ epicwolf202 │ mysticwolf198 │└─────────────┴───────────────┘
WHERE
The WHERE
clause allows you to specify predicates/constraints on a part of your query. The query
below shows how to filter the results to only include users whose account was created before a
particular date.
MATCH (a:User)WHERE a.account_creation_date < DATE('2023-02-01')RETURN a.username, a.account_creation_dateLIMIT 3;
The date format in the WHERE
predicate is specified in the format YYYY-MM-DD
, as a string, and
transformed into a date object that can be compared with the account_creation_date
property of the
User
node.
┌────────────────┬─────────────────────────┐│ a.username │ a.account_creation_date ││ STRING │ DATE │├────────────────┼─────────────────────────┤│ epicwolf202 │ 2022-09-09 ││ silentninja637 │ 2023-01-27 ││ fastgirl798 │ 2021-06-11 │└────────────────┴─────────────────────────┘
WHERE EXISTS subquery
You can specify a subquery in a WHERE
clause with the EXISTS
keyword. The following query returns all users
who have at least one post.
MATCH (u:User)WHERE EXISTS { MATCH (u)-[:POSTED]->()}RETURN u.usernameLIMIT 3;
┌────────────────┐│ u.username ││ STRING │├────────────────┤│ epicwolf202 ││ silentninja637 ││ stormcat597 │└────────────────┘
Grouping and Aggregation
Cypher does not have an explicit GROUP BY
clause. Instead, you can simply apply an aggregation function
in the RETURN
clause and group by the specified property. The following query returns the total number of
posts.
MATCH (p:Post)RETURN COUNT(p) AS num_posts;
┌───────────┐│ num_posts ││ INT64 │├───────────┤│ 50 │└───────────┘
The following example shows how to group by the userID
property and return the number of posts for each user.
MATCH (u:User)-[:POSTED]->(p:Post)RETURN u.userID, COUNT(p) AS num_postsLIMIT 3;
┌──────────┬───────────┐│ u.userID │ num_posts ││ INT64 │ INT64 │├──────────┼───────────┤│ 15 │ 4 ││ 8 │ 2 ││ 17 │ 3 │└──────────┴───────────┘
ORDER BY
The ORDER BY
clause is used to sort the results of a query. The following query returns all users
sorted in descending order of the number of posts they have.
MATCH (u:User)-[:POSTED]->(p:Post)RETURN u.username, COUNT(p) AS num_postsORDER BY num_posts DESCLIMIT 3;
┌───────────────┬───────────┐│ u.username │ num_posts ││ STRING │ INT64 │├───────────────┼───────────┤│ stormqueen831 │ 5 ││ cooldragon866 │ 5 ││ mysticcat651 │ 4 │└───────────────┴───────────┘
WITH
The WITH
clause is used to chain the results of one query to another. The example below shows how to
find the top 3 users with the most posts and then find the users who follow them.
MATCH (u:User)-[:POSTED]->(p:Post)WITH u, COUNT(p) AS num_postsORDER BY num_posts DESCLIMIT 3WITH uMATCH (u)-[:Follows]->(u2:User)RETURN u.username AS most_followed_user, u2.username AS followerLIMIT 8;
Note the use of the AS
keyword to rename the columns in the result.
┌───────────────┬────────────────┐│ u.username │ u2.username ││ STRING │ STRING │├───────────────┼────────────────┤│ cooldragon866 │ coolwolf752 ││ cooldragon866 │ stormninja678 ││ cooldragon866 │ brightninja683 ││ cooldragon866 │ mysticcat651 ││ cooldragon866 │ mysticwolf198 ││ stormqueen831 │ darkdog878 ││ stormqueen831 │ smartdragon25 ││ stormqueen831 │ silentninja637 │└───────────────┴────────────────┘
UNWIND
The UNWIND
clause is used to unnest (i.e., explode) a list and return the elements as separate rows.
Consider that you have a list of user IDs and you want to find their account creation dates.
UNWIND ["epicwolf202", "cooldragon866", "stormcat597"] AS userMATCH (u:User {username: user})RETURN u.username, u.account_creation_date;
┌───────────────┬─────────────────────────┐│ u.username │ u.account_creation_date ││ STRING │ DATE │├───────────────┼─────────────────────────┤│ epicwolf202 │ 2022-09-09 ││ cooldragon866 │ 2020-10-30 ││ stormcat597 │ 2023-02-25 │└───────────────┴─────────────────────────┘
CASE
Cypher supports the CASE
expression to handle conditional logic. The following query returns
the number of posts for each user, but only if the post was created after a certain date.
MATCH (u:User)-[:POSTED]->(p:Post)RETURN u.username, COUNT(CASE WHEN p.creation_date > DATE('2023-04-01') THEN p ELSE NULL END) AS num_posts, p.creation_dateORDER BY num_posts DESCLIMIT 3;
┌────────────────┬───────────┬─────────────────┐│ u.username │ num_posts │ p.creation_date ││ STRING │ INT64 │ DATE │├────────────────┼───────────┼─────────────────┤│ brightninja683 │ 1 │ 2023-07-04 ││ cooldragon866 │ 1 │ 2023-05-20 ││ silentninja637 │ 1 │ 2023-05-16 │└────────────────┴───────────┴─────────────────┘
This was a basic introduction to the kinds of queries you can ask in Cypher. There are many other clauses in Cypher that were not covered in this tutorial. Check out the Cypher documentation section of the docs for more information!