Create your first graph

Kuzu implements a structured property graph model and requires a pre-defined schema.

Schema definition involves declaring node and relationship tables and their associated properties.
Each property key is strongly typed (types must be explicitly declared)
For node tables, a primary key must be defined
For relationship tables, no primary key is required

Persistence

Kuzu supports both on-disk and in-memory modes of operation. The mode is determined at the time of creating the database, explained below.

On-disk database

At the time of creating your database, if you specify a database path, for example, ./demo_db, Kuzu will be opened under on-disk mode. In this mode, Kuzu will persist all data to disk at the specified path. All transactions are logged in the Write-Ahead Log (WAL), in which any changes will be merged into the database files during checkpoints.

In-memory database

At the time of creating your database, if you omit the database path, specify it as an empty string "", or explicitly specify the path as :memory:, Kuzu will be opened under in-memory mode. In this mode, there are no writes to the WAL, and no data is persisted to disk. All data is lost when the process finishes.

Quick start

To create your first graph, ensure that you have installed the Kuzu CLI or your preferred client API installed as per the instructions in the Installation section. The example below uses a graph schema with two node types, User and City, and two relationship types, Follows and LivesIn. The dataset in CSV format can be found here.

Because Kuzu is an embedded database, there are no servers to set up — you can simply import the kuzu module in your preferred client library and begin interacting with the database in your client API of choice. The examples below demonstrate how to create a graph schema and insert data into an on-disk database.

You can do the same using an in-memory database by omitting the database path, specifying an empty string "", or specifying :memory: in your client API of choice.

import kuzu

def main() -> None:
    # Create an empty on-disk database and connect to it
    db = kuzu.Database("./demo_db")
    conn = kuzu.Connection(db)

    # Create schema
    conn.execute("CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name))")
    conn.execute("CREATE NODE TABLE City(name STRING, population INT64, PRIMARY KEY (name))")
    conn.execute("CREATE REL TABLE Follows(FROM User TO User, since INT64)")
    conn.execute("CREATE REL TABLE LivesIn(FROM User TO City)")

    # Insert data
    conn.execute('COPY User FROM "./data/user.csv"')
    conn.execute('COPY City FROM "./data/city.csv"')
    conn.execute('COPY Follows FROM "./data/follows.csv"')
    conn.execute('COPY LivesIn FROM "./data/lives-in.csv"')

    # Execute Cypher query
    response = conn.execute(
        """
        MATCH (a:User)-[f:Follows]->(b:User)
        RETURN a.name, b.name, f.since;
        """
    )
    while response.has_next():
        print(response.get_next())

Result:

['Adam', 'Karissa', 2020]
['Adam', 'Zhang', 2020]
['Karissa', 'Zhang', 2021]
['Zhang', 'Noura', 2022]

The approach shown above returned a list of lists containing query results. See below for more output options for Python.

Pandas

You can also pass the results of a Cypher query to a Pandas DataFrame for downstream tasks. This assumes that pandas is installed in your environment.

# pip install pandas
response = conn.execute(
    """
    MATCH (a:User)-[f:Follows]->(b:User)
    RETURN a.name, b.name, f.since;
    """
)
print(response.get_as_df())

    a.name   b.name  f.since
0     Adam  Karissa     2020
1     Adam    Zhang     2020
2  Karissa    Zhang     2021
3    Zhang    Noura     2022

Polars

Polars is another popular DataFrames library for Python users, and you can process the results of a Cypher query in much the same way you did with Pandas. This assumes that polars is installed in your environment.

# pip install polars
response = conn.execute(
    """
    MATCH (a:User)-[f:Follows]->(b:User)
    RETURN a.name, b.name, f.since;
    """
)
print(response.get_as_pl())

shape: (4, 3)
┌─────────┬─────────┬─────────┐
│ a.name  ┆ b.name  ┆ f.since │
│ ---     ┆ ---     ┆ ---     │
│ str     ┆ str     ┆ i64     │
╞═════════╪═════════╪═════════╡
│ Adam    ┆ Karissa ┆ 2020    │
│ Adam    ┆ Zhang   ┆ 2020    │
│ Karissa ┆ Zhang   ┆ 2021    │
│ Zhang   ┆ Noura   ┆ 2022    │
└─────────┴─────────┴─────────┘

Arrow Table

You can also use the PyArrow library to work with Arrow Tables in Python. This assumes that pyarrow is installed in your environment. This approach is useful when you need to interoperate with other systems that use Arrow as a backend. In fact, the get_as_pl() method shown above for Polars materializes a pyarrow.Table under the hood.

# pip install pyarrow
response = conn.execute(
    """
    MATCH (a:User)-[f:Follows]->(b:User)
    RETURN a.name, b.name, f.since;
    """
)
print(response.get_as_arrow())

pyarrow.Table
a.name: string
b.name: string
f.since: int64
----
a.name: [["Adam","Adam","Karissa","Zhang"]]
b.name: [["Karissa","Zhang","Zhang","Noura"]]
f.since: [[2020,2020,2021,2022]]

const kuzu = require("kuzu");

(async () => {
  // Create an empty on-disk database and connect to it
  const db = new kuzu.Database("./demo_db");
  const conn = new kuzu.Connection(db);

  // Create the tables
  await conn.query("CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name))");
  await conn.query("CREATE NODE TABLE City(name STRING, population INT64, PRIMARY KEY (name))");
  await conn.query("CREATE REL TABLE Follows(FROM User TO User, since INT64)");
  await conn.query("CREATE REL TABLE LivesIn(FROM User TO City)");

  // Load the data
  await conn.query('COPY User FROM "./data/user.csv"');
  await conn.query('COPY City FROM "./data/city.csv"');
  await conn.query('COPY Follows FROM "./data/follows.csv"');
  await conn.query('COPY LivesIn FROM "./data/lives-in.csv"');

  const queryResult = await conn.query("MATCH (a:User)-[f:Follows]->(b:User) RETURN a.name, f.since, b.name;");

  // Get all rows from the query result
  const rows = await queryResult.getAll();

  // Print the rows
  for (const row of rows) {
    console.log(row);
  }
})();

Result:

{ "a.name": "Adam", "f.since": 2020, "b.name": "Karissa" }
{ "a.name": "Adam", "f.since": 2020, "b.name": "Zhang" }
{ "a.name": "Karissa", "f.since": 2021, "b.name": "Zhang" }
{ "a.name": "Zhang", "f.since": 2022, "b.name": "Noura" }

Kuzu’s Java client library is available on Maven Central. You can add the following snippet to your pom.xml to get it installed:

<dependency>
    <groupId>com.kuzudb</groupId>
    <artifactId>kuzu</artifactId>
    <version>0.10.0</version>
</dependency>

Alternatively, if you are using Gradle, you can add the following snippet to your build.gradle file to include Kuzu’s Java client library:

For Groovy DSL:

implementation 'com.kuzudb:kuzu:0.10.0'

For Kotlin DSL:

implementation("com.kuzudb:kuzu:0.10.0")

Below is an example Gradle project structure for a simple Java application that creates a graph schema and inserts data into the database for the given example.

├── build.gradle
├── src/main
│   ├── java
│   │   └── Main.java
│   └── resources
│   │   └── user.csv
│   │   └── city.csv
│   │   └── follows.csv
│   │   └── lives-in.csv

The minimal build.gradle contains the following configurations:

plugins {
    id 'java'
    id 'application'
}
application {
    mainClassName = "Main"
}
repositories {
    mavenCentral()
}
dependencies {
    implementation 'com.kuzudb:kuzu:0.10.0'
}

The Main.java contains the following code:

import com.kuzudb.*;

public class Main {
    public static void main(String[] args) throws ObjectRefDestroyedException {
        // Create an in-memory database and connect to it
        Database db = new Database(":memory:");
        Connection conn = new Connection(db);
        // Create tables.
        conn.query("CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name))");
        conn.query("CREATE NODE TABLE City(name STRING, population INT64, PRIMARY KEY (name))");
        conn.query("CREATE REL TABLE Follows(FROM User TO User, since INT64)");
        conn.query("CREATE REL TABLE LivesIn(FROM User TO City)");
        // Load data.
        conn.query("COPY User FROM 'src/main/resources/user.csv'");
        conn.query("COPY City FROM 'src/main/resources/city.csv'");
        conn.query("COPY Follows FROM 'src/main/resources/follows.csv'");
        conn.query("COPY LivesIn FROM 'src/main/resources/lives-in.csv'");

        // Execute a simple query.
        QueryResult result =
                conn.query("MATCH (a:User)-[f:Follows]->(b:User) RETURN a.name, f.since, b.name;");
        while (result.hasNext()) {
            FlatTuple row = result.getNext();
            System.out.print(row);
        }
    }
}

To execute the example, navigate to the project root directory and run the following command:

gradle run

Result:

Adam|2020|Karissa
Adam|2020|Zhang
Karissa|2021|Zhang
Zhang|2022|Noura

When installing the kuzu crate via Cargo, it will by default build and statically link Kuzu’s C++ library from source. You can also link against the dynamic release libraries (see the Rust crate docs for details).

The main.rs file contains the following code:

use kuzu::{Connection, Database, Error, SystemConfig};

fn main() -> Result<(), Error> {
    // Create an empty on-disk database and connect to it
    let db = Database::new("./demo_db", SystemConfig::default())?;
    let conn = Connection::new(&db)?;

    // Create the tables
    conn.query("CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name))")?;
    conn.query("CREATE NODE TABLE City(name STRING, population INT64, PRIMARY KEY (name))")?;
    conn.query("CREATE REL TABLE Follows(FROM User TO User, since INT64)")?;
    conn.query("CREATE REL TABLE LivesIn(FROM User TO City)")?;

    // Load the data
    conn.query("COPY User FROM './data/user.csv'")?;
    conn.query("COPY City FROM './data/city.csv'")?;
    conn.query("COPY Follows FROM './data/follows.csv'")?;
    conn.query("COPY LivesIn FROM './data/lives-in.csv'")?;

    let query_result =
        conn.query("MATCH (a:User)-[f:Follows]->(b:User) RETURN a.name, f.since, b.name;")?;

    // Print the rows
    for row in query_result {
        println!("{}, {}, {}", row[0], row[1], row[2]);
    }
    Ok(())
}

Result:

Adam, 2020, Karissa
Adam, 2020, Zhang
Karissa, 2021, Zhang
Zhang, 2022, Noura

package main

import (
    "fmt"
    "os"

    "github.com/kuzudb/go-kuzu"
)

func main() {
    // Create an empty on-disk database and connect to it
    dbPath := "demo_db"
    os.RemoveAll(dbPath)

    systemConfig := kuzu.DefaultSystemConfig()
    systemConfig.BufferPoolSize = 1024 * 1024 * 1024
    db, err := kuzu.OpenDatabase(dbPath, systemConfig)
    if err != nil {
        panic(err)
    }
    defer db.Close()

    conn, err := kuzu.OpenConnection(db)
    if err != nil {
        panic(err)
    }
    defer conn.Close()

    // Create schema
    queries := []string{
        "CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name))",
        "CREATE NODE TABLE City(name STRING, population INT64, PRIMARY KEY (name))",
        "CREATE REL TABLE Follows(FROM User TO User, since INT64)",
        "CREATE REL TABLE LivesIn(FROM User TO City)",
        "COPY User FROM \"../dataset/demo-db/user.csv\"",
        "COPY City FROM \"../dataset/demo-db/city.csv\"",
        "COPY Follows FROM \"../dataset/demo-db/follows.csv\"",
        "COPY LivesIn FROM \"../dataset/demo-db/lives-in.csv\"",
    }
    for _, query := range queries {
        queryResult, err := conn.Query(query)
        if err != nil {
            panic(err)
        }
        defer queryResult.Close()
    }

    // Execute Cypher query
    query := "MATCH (a:User)-[e:Follows]->(b:User) RETURN a.name, e.since, b.name"
    result, err := conn.Query(query)
    if err != nil {
        panic(err)
    }
    defer result.Close()
    for result.HasNext() {
        tuple, err := result.Next()
        if err != nil {
            panic(err)
        }
        defer tuple.Close()
        // The result is a tuple, which can be converted to a slice.
        slice, err := tuple.GetAsSlice()
        if err != nil {
            panic(err)
        }
        fmt.Println(slice)
    }

Result:

[Adam 2020 Karissa]
[Adam 2020 Zhang]
[Karissa 2021 Zhang]
[Zhang 2022 Noura]

The Kuzu C++ client is distributed as so/dylib/dll+lib library files along with a header file (kuzu.hpp). Once you’ve downloaded and extracted the C++ files into a directory, it’s ready to use without any additional installation. You just need to specify the library search path for the linker.

In the following example, we assume that the so/dylib/dll+lib, the header file, the CSV files, and the cpp code file are all under the same directory as follows:

├── include
│   ├── kuzu.hpp
│   └── ......
├── libkuzu.so / libkuzu.dylib / kuzu_shared.dll + kuzu_shared.lib
├── main.cpp
├── user.csv
├── city.csv
├── follows.csv
└── lives-in.csv

The main.cpp file contains the following code:

#include <iostream>

#include "include/kuzu.hpp"

using namespace kuzu::main;
using namespace std;

int main() {
    // Create an empty on-disk database and connect to it
    SystemConfig systemConfig;
    auto database = make_unique<Database>("test", systemConfig);

    // Connect to the database.
    auto connection = make_unique<Connection>(database.get());

    // Create the schema.
    connection->query("CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name))");
    connection->query("CREATE NODE TABLE City(name STRING, population INT64, PRIMARY KEY (name))");
    connection->query("CREATE REL TABLE Follows(FROM User TO User, since INT64)");
    connection->query("CREATE REL TABLE LivesIn(FROM User TO City)");

    // Load data.
    connection->query("COPY User FROM \"user.csv\"");
    connection->query("COPY City FROM \"city.csv\"");
    connection->query("COPY Follows FROM \"follows.csv\"");
    connection->query("COPY LivesIn FROM \"lives-in.csv\"");

    // Execute a simple query.
    auto result =
        connection->query("MATCH (a:User)-[f:Follows]->(b:User) RETURN a.name, f.since, b.name;");

    // Output query result.
    while (result->hasNext()) {
        auto row = result->getNext();
        std::cout << row->getValue(0)->getValue<string>() << " "
                  << row->getValue(1)->getValue<int64_t>() << " "
                  << row->getValue(2)->getValue<string>() << std::endl;
    }
    return 0;
}

Compile and run main.cpp. Since we did not install the libkuzu as a system library, we need to override the linker search path to correctly compile the C++ code and run the compiled program. On Linux:

env LIBRARY_PATH=. LD_LIBRARY_PATH=. g++ main.cpp -std=c++20 -lkuzu -lpthread
env LIBRARY_PATH=. LD_LIBRARY_PATH=. ./a.out

On macOS:

env DYLD_LIBRARY_PATH=. LIBRARY_PATH=. clang++ main.cpp -std=c++20 -lkuzu
env DYLD_LIBRARY_PATH=. LIBRARY_PATH=. ./a.out

On Windows, the library file is passed to the compiler directly and the current directory is used automatically when searching for kuzu_shared.dll at runtime:

cl /std:c++20 /EHsc main.cpp kuzu_shared.lib
./main.exe

Result:

Adam 2020 Karissa
Adam 2020 Zhang
Karissa 2021 Zhang
Zhang 2022 Noura

The Kuzu C API shares the same so/dylib library files with the C++ API and can be used by including the C header file (kuzu.h).

In this example, we assume that the so/dylib, the header file, the CSV files, and the C code file are all under the same directory:

├── include
│   ├── kuzu.h
│   └── ......
├── libkuzu.so / libkuzu.dylib
├── main.c
├── user.csv
├── city.csv
├── follows.csv
└── lives-in.csv

The file main.c contains the following code:

#include <stdio.h>

#include "include/kuzu.h"

int main()
{
    // Create an empty on-disk database and connect to it
    kuzu_database db;
    kuzu_database_init("test", kuzu_default_system_config(), &db);

    // Connect to the database.
    kuzu_connection conn;
    kuzu_connection_init(&db, &conn);

    // Create the schema.
    kuzu_query_result result;
    kuzu_connection_query(&conn, "CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name))", &result);
    kuzu_query_result_destroy(&result);
    kuzu_connection_query(&conn, "CREATE NODE TABLE City(name STRING, population INT64, PRIMARY KEY (name))", &result);
    kuzu_query_result_destroy(&result);
    kuzu_connection_query(&conn, "CREATE REL TABLE Follows(FROM User TO User, since INT64)", &result);
    kuzu_query_result_destroy(&result);
    kuzu_connection_query(&conn, "CREATE REL TABLE LivesIn(FROM User TO City)", &result);
    kuzu_query_result_destroy(&result);

    // Load data.
    kuzu_connection_query(&conn, "COPY User FROM \"user.csv\"", &result);
    kuzu_query_result_destroy(&result);
    kuzu_connection_query(&conn, "COPY City FROM \"city.csv\"", &result);
    kuzu_query_result_destroy(&result);
    kuzu_connection_query(&conn, "COPY Follows FROM \"follows.csv\"", &result);
    kuzu_query_result_destroy(&result);
    kuzu_connection_query(&conn, "COPY LivesIn FROM \"lives-in.csv\"", &result);
    kuzu_query_result_destroy(&result);

    // Execute a simple query.
    kuzu_connection_query(&conn, "MATCH (a:User)-[f:Follows]->(b:User) RETURN a.name, f.since, b.name;", &result);

    // Output query result.
    kuzu_flat_tuple tuple;
    kuzu_value value;
    while (kuzu_query_result_has_next(&result))
    {
        kuzu_query_result_get_next(&result, &tuple);

        kuzu_flat_tuple_get_value(&tuple, 0, &value);
        char *name = NULL;
        kuzu_value_get_string(&value, &name);
        kuzu_value_destroy(&value);

        kuzu_flat_tuple_get_value(&tuple, 1, &value);
        int64_t since = 0;
        kuzu_value_get_int64(&value, &since);
        kuzu_value_destroy(&value);

        kuzu_flat_tuple_get_value(&tuple, 2, &value);
        char *name2 = NULL;
        kuzu_value_get_string(&value, &name2);
        kuzu_value_destroy(&value);


        printf("%s follows %s since %lld \n", name, name2, since);
        free(name);
        free(name2);
    }
    kuzu_value_destroy(&value);
    kuzu_flat_tuple_destroy(&tuple);
    kuzu_query_result_destroy(&result);
    kuzu_connection_destroy(&conn);
    kuzu_database_destroy(&db);
    return 0;
}

Compile and run main.c: Since we did not install the libkuzu as a system library, we need to override the linker search path to correctly compile the C code and run the compiled program.

On Linux:

env LIBRARY_PATH=. LD_LIBRARY_PATH=. gcc main.c -lkuzu
env LIBRARY_PATH=. LD_LIBRARY_PATH=. ./a.out

On macOS:

env DYLD_LIBRARY_PATH=. LIBRARY_PATH=. clang main.c -lkuzu
env DYLD_LIBRARY_PATH=. LIBRARY_PATH=. ./a.out

On Windows, the library file is passed to the compiler directly and the current directory is used automatically when searching for kuzu_shared.dll at runtime:

cl main.c kuzu_shared.lib
./main.exe

Result:

Adam follows Karissa since 2020
Adam follows Zhang since 2020
Karissa follows Zhang since 2021
Zhang follows Noura since 2022

When using the Kuzu CLI’s shell, you can create an on-disk database by specifying a path after the kuzu command in the terminal.

kuzu ./demo_db

Opened the database at path: ./demo_db in read-write mode.
Enter ":help" for usage hints.
kuzu>

You can create an in-memory database by omitting the path entirely, and just calling kuzu. This will create a database in memory that will be lost when the process finishes.

Opened the database under in-memory mode.
Enter ":help" for usage hints.
kuzu>

Proceed to enter the following Cypher statements separated by semicolons. Note that you must indicate the end of each query statement with a semicolon in the shell, otherwise it will not be parsed correctly and fail to execute.

// Create schema
CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name));
CREATE NODE TABLE City(name STRING, population INT64, PRIMARY KEY (name));
CREATE REL TABLE Follows(FROM User TO User, since INT64);
CREATE REL TABLE LivesIn(FROM User TO City);

// Insert data
COPY User FROM "./data/user.csv";
COPY City FROM "./data/city.csv";
COPY Follows FROM "./data/follows.csv";
COPY LivesIn FROM "./data/lives-in.csv";

// Execute Cypher query
MATCH (a:User)-[f:Follows]->(b:User)
RETURN a.name, b.name, f.since;

The following result is obtained:

┌─────────┬─────────┬─────────┐
│ a.name  │ b.name  │ f.since │
│ STRING  │ STRING  │ INT64   │
├─────────┼─────────┼─────────┤
│ Adam    │ Karissa │ 2020    │
│ Adam    │ Zhang   │ 2020    │
│ Karissa │ Zhang   │ 2021    │
│ Zhang   │ Noura   │ 2022    │
└─────────┴─────────┴─────────┘