Skip to content
Blog

Create your first graph

Kùzu implements a structured property graph model and requires a pre-defined schema.

  • Schema definition involves declaring node and relationship tables and their associated properties.
  • Each property key is strongly typed (types must be explicitly declared)
  • For node tables, a primary key must be defined
  • For relationship tables, no primary key is required

Persistence

Kùzu supports both on-disk and in-memory modes of operation. The mode is determined at the time of creating the database, explained below.

On-disk database

At the time of creating your database, if you specify a database path, for example, ./demo_db, Kùzu will be opened under on-disk mode. In this mode, Kùzu will persist all data to disk at the specified path. All transactions are logged in the Write-Ahead Log (WAL), in which any changes will be merged into the database files during checkpoints.

In-memory database

At the time of creating your database, if you omit the database path, specify it as an empty string "", or explicitly specify the path as :memory:, Kùzu will be opened under in-memory mode. In this mode, there are no writes to the WAL, and no data is persisted to disk. All data is lost when the process finishes.

Quick start

To create your first graph, ensure that you have installed the Kùzu CLI or your preferred client API installed as per the instructions in the Installation section. The example below uses a graph schema with two node types, User and City, and two relationship types, Follows and LivesIn. The dataset in CSV format can be found here.

Because Kùzu is an embedded database, there are no servers to set up — you can simply import the kuzu module in your preferred client library and begin interacting with the database in your client API of choice. The examples below demonstrate how to create a graph schema and insert data into an on-disk database.

You can do the same using an in-memory database by omitting the database path, specifying an empty string "", or specifying :memory: in your client API of choice.

main.py
import kuzu
def main() -> None:
# Create an empty on-disk database and connect to it
db = kuzu.Database("./demo_db")
conn = kuzu.Connection(db)
# Create schema
conn.execute("CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name))")
conn.execute("CREATE NODE TABLE City(name STRING, population INT64, PRIMARY KEY (name))")
conn.execute("CREATE REL TABLE Follows(FROM User TO User, since INT64)")
conn.execute("CREATE REL TABLE LivesIn(FROM User TO City)")
# Insert data
conn.execute('COPY User FROM "./data/user.csv"')
conn.execute('COPY City FROM "./data/city.csv"')
conn.execute('COPY Follows FROM "./data/follows.csv"')
conn.execute('COPY LivesIn FROM "./data/lives-in.csv"')
# Execute Cypher query
response = conn.execute(
"""
MATCH (a:User)-[f:Follows]->(b:User)
RETURN a.name, b.name, f.since;
"""
)
while response.has_next():
print(response.get_next())

Result:

Terminal window
['Adam', 'Karissa', 2020]
['Adam', 'Zhang', 2020]
['Karissa', 'Zhang', 2021]
['Zhang', 'Noura', 2022]

The approach shown above returned a list of lists containing query results. See below for more output options for Python.

Pandas

You can also pass the results of a Cypher query to a Pandas DataFrame for downstream tasks. This assumes that pandas is installed in your environment.

# pip install pandas
response = conn.execute(
"""
MATCH (a:User)-[f:Follows]->(b:User)
RETURN a.name, b.name, f.since;
"""
)
print(response.get_as_df())
Terminal window
a.name b.name f.since
0 Adam Karissa 2020
1 Adam Zhang 2020
2 Karissa Zhang 2021
3 Zhang Noura 2022

Polars

Polars is another popular DataFrames library for Python users, and you can process the results of a Cypher query in much the same way you did with Pandas. This assumes that polars is installed in your environment.

# pip install polars
response = conn.execute(
"""
MATCH (a:User)-[f:Follows]->(b:User)
RETURN a.name, b.name, f.since;
"""
)
print(response.get_as_pl())
Terminal window
shape: (4, 3)
┌─────────┬─────────┬─────────┐
a.name b.name f.since
--- --- ---
str str i64
╞═════════╪═════════╪═════════╡
Adam Karissa 2020
Adam Zhang 2020
Karissa Zhang 2021
Zhang Noura 2022
└─────────┴─────────┴─────────┘

Arrow Table

You can also use the PyArrow library to work with Arrow Tables in Python. This assumes that pyarrow is installed in your environment. This approach is useful when you need to interoperate with other systems that use Arrow as a backend. In fact, the get_as_pl() method shown above for Polars materializes a pyarrow.Table under the hood.

# pip install pyarrow
response = conn.execute(
"""
MATCH (a:User)-[f:Follows]->(b:User)
RETURN a.name, b.name, f.since;
"""
)
print(response.get_as_arrow())
Terminal window
pyarrow.Table
a.name: string
b.name: string
f.since: int64
----
a.name: [["Adam","Adam","Karissa","Zhang"]]
b.name: [["Karissa","Zhang","Zhang","Noura"]]
f.since: [[2020,2020,2021,2022]]