Skip to content
Blog

Create your first graph

Kuzu implements a structured property graph model and requires a pre-defined schema.

  • Schema definition involves node and relationship tables and their associated properties.
  • Each property key is strongly typed and these types must be explicitly declared.
  • For node tables, a primary key must be defined.
  • For relationship tables, no primary key is required.

Persistence

Kuzu supports both on-disk and in-memory modes of operation. The mode is determined at the time of creating the database, as explained below.

On-disk database

If you specify a database path when initializing a database, such as example.kuzu, Kuzu will operate in the on-disk mode. In this mode, Kuzu persists all data to disk at the given path. All transactions are logged to a Write-Ahead Log (WAL) and updates are periodically merged into the database files during checkpoints.

In-memory database

If you omit the database path, by specifying it as "" or :memory:, Kuzu will operate in the in-memory mode. In this mode, there are no writes to the WAL, and no data is persisted to disk. All data is lost when the process finishes.

Quick start

Ensure that you have installed Kuzu using the CLI or your preferred client API. Also download the example CSV files from our GitHub repo.

Terminal window
mkdir ./data/
curl -L -o ./data/city.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/city.csv
curl -L -o ./data/user.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/user.csv
curl -L -o ./data/follows.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/follows.csv
curl -L -o ./data/lives-in.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/lives-in.csv

In this example, we will create a graph with two node types, User and City, and two relationship types, Follows and LivesIn.

Because Kuzu is an embedded database, there are no servers to set up. You can simply import the kuzu module in your code and run queries on the database. The examples for different client APIs below demonstrate how to create a graph schema and import data into an on-disk Kuzu database.

main.py
import kuzu
def main():
# Create an empty on-disk database and connect to it
db = kuzu.Database("example.kuzu")
conn = kuzu.Connection(db)
# Create schema
conn.execute("CREATE NODE TABLE User(name STRING PRIMARY KEY, age INT64)")
conn.execute("CREATE NODE TABLE City(name STRING PRIMARY KEY, population INT64)")
conn.execute("CREATE REL TABLE Follows(FROM User TO User, since INT64)")
conn.execute("CREATE REL TABLE LivesIn(FROM User TO City)")
# Insert data
conn.execute('COPY User FROM "./data/user.csv"')
conn.execute('COPY City FROM "./data/city.csv"')
conn.execute('COPY Follows FROM "./data/follows.csv"')
conn.execute('COPY LivesIn FROM "./data/lives-in.csv"')
# Execute Cypher query
response = conn.execute(
"""
MATCH (a:User)-[f:Follows]->(b:User)
RETURN a.name, b.name, f.since;
"""
)
for row in response:
print(row)
if __name__ == "__main__":
main()
Terminal window
['Adam', 'Karissa', 2020]
['Adam', 'Zhang', 2020]
['Karissa', 'Zhang', 2021]
['Zhang', 'Noura', 2022]

The approach shown above returned a list of lists containing query results. See below for more output options for Python.

Output as a dictionary

You can also get the results of a Cypher query as a dictionary.

response = conn.execute(
"""
MATCH (a:User)-[f:Follows]->(b:User)
RETURN a.name, b.name, f.since;
"""
)
for row in response.rows_as_dict():
print(row)
Terminal window
{'a.name': 'Adam', 'b.name': 'Karissa', 'f.since': 2020}
{'a.name': 'Adam', 'b.name': 'Zhang', 'f.since': 2020}
{'a.name': 'Karissa', 'b.name': 'Zhang', 'f.since': 2021}
{'a.name': 'Zhang', 'b.name': 'Noura', 'f.since': 2022}

Pandas

You can also pass the results of a Cypher query to a Pandas DataFrame for downstream tasks. This assumes that pandas is installed in your environment.

# pip install pandas
response = conn.execute(
"""
MATCH (a:User)-[f:Follows]->(b:User)
RETURN a.name, b.name, f.since;
"""
)
print(response.get_as_df())
Terminal window
a.name b.name f.since
0 Adam Karissa 2020
1 Adam Zhang 2020
2 Karissa Zhang 2021
3 Zhang Noura 2022

Polars

Polars is another popular DataFrames library for Python, and you can process the results of a Cypher query in much the same way you did with Pandas. This assumes that polars is installed in your environment.

# pip install polars
response = conn.execute(
"""
MATCH (a:User)-[f:Follows]->(b:User)
RETURN a.name, b.name, f.since;
"""
)
print(response.get_as_pl())
Terminal window
shape: (4, 3)
┌─────────┬─────────┬─────────┐
a.name b.name f.since
--- --- ---
str str i64
╞═════════╪═════════╪═════════╡
Adam Karissa 2020
Adam Zhang 2020
Karissa Zhang 2021
Zhang Noura 2022
└─────────┴─────────┴─────────┘

Arrow Table

You can also use the PyArrow library to work with Arrow Tables in Python. This assumes that pyarrow is installed in your environment. This approach is useful when you need to interoperate with other systems that use Arrow as a backend. In fact, the get_as_pl() method shown above for Polars materializes a pyarrow.Table under the hood.

# pip install pyarrow
response = conn.execute(
"""
MATCH (a:User)-[f:Follows]->(b:User)
RETURN a.name, b.name, f.since;
"""
)
print(response.get_as_arrow())
Terminal window
pyarrow.Table
a.name: string
b.name: string
f.since: int64
----
a.name: [["Adam","Adam","Karissa","Zhang"]]
b.name: [["Karissa","Zhang","Zhang","Noura"]]
f.since: [[2020,2020,2021,2022]]