Skip to content
Blog

Python API

Kùzu provides a Python package that you can install via PyPI. A full list of the available functions and classes can be found in the Python API documentation, linked below.

Some useful features of the Python API are explained in the following sections.

DataFrames and Arrow Tables

In Python, Kùzu supports the use of Pandas and Polars DataFrames, as well as PyArrow Tables. This allows you to leverage the data manipulation capabilities of these libraries in your graph workflows.

Output query results

You can output the results of a Cypher query to a Pandas DataFrame, Polars DataFrame, or PyArrow Table. The following examples show how to output query results to each of these data structures.

You can output the results of a Cypher query to a Pandas DataFrame using the get_as_df() method:

import kuzu
import pandas as pd
db = kuzu.Database("tmp")
conn = kuzu.Connection(db)
conn.execute("CREATE NODE TABLE Person(name STRING, age INT64, PRIMARY KEY (name))")
conn.execute("CREATE (a:Person {name: 'Adam', age: 30})")
conn.execute("CREATE (a:Person {name: 'Karissa', age: 40})")
conn.execute("CREATE (a:Person {name: 'Zhang', age: 50})")
result = conn.execute("MATCH (p:Person) RETURN p.*")
print(result.get_as_df())

You can return all the columns of a node table by using the * wildcard in the RETURN clause.

p.name p.age
0 Adam 30
1 Karissa 40
2 Zhang 50

Return specific columns by name and optionally, alias them, as follows:

result = conn.execute("MATCH (p:Person) RETURN p.name AS name")
print(result.get_as_df())

This will return only the name column.

name
0 Adam
1 Karissa
2 Zhang

LOAD FROM

You can scan a Pandas DataFrame, Polars DataFrame, or PyArrow Table in Kùzu using the LOAD FROM clause. Scanning a DataFrame or Table does not copy the data into Kùzu, it only reads the data.

import kuzu
import pandas as pd
db = kuzu.Database("tmp")
conn = kuzu.Connection(db)
df = pd.DataFrame({
"name": ["Adam", "Karissa", "Zhang"],
"age": [30, 40, 50]
})
result = conn.execute("LOAD FROM df RETURN *")
print(result.get_as_df())

Using the get_as_df() method on your query result returns the result as a Pandas DataFrame.

name age
0 Adam 30
1 Karissa 40
2 Zhang 50

COPY FROM

Copy from a Pandas DataFrame into a Kùzu table using the COPY FROM command:

import kuzu
import pandas as pd
db = kuzu.Database("tmp")
conn = kuzu.Connection(db)
conn.execute("CREATE NODE TABLE Person(name STRING, age INT64, PRIMARY KEY (name))")
df = pd.DataFrame({
"name": ["Adam", "Karissa", "Zhang"],
"age": [30, 40, 50]
})
conn.execute("COPY Person FROM df")
result = conn.execute("MATCH (p:Person) RETURN p.*")
print(result.get_as_df())

Using the get_as_df() method on your query result returns the result as a Pandas DataFrame.

p.name p.age
0 Adam 30
1 Karissa 40
2 Zhang 50

Type notation

This section summarizes the type notation used in Kùzu’s Python API. Below is a table from Python types to a Kùzu LogicalTypeID, which will be used to infer types via Python type annotations.

Python typeKùzu LogicalTypeID
boolBOOL
intINT64
floatDOUBLE
strSTRING
datetimeTIMESTAMP
dateDATE
timedeltaINTERVAL
uuidUUID
listLIST
dictMAP

Nested types

When defining a UDF, you can also specify nested types, though in this case, there are some differences from the example shown above.

If the parameter is a nested type, you must also provide the children’s type information. As such, with nested types, it’s not valid to use kuzu.Type. Instead, a string representation of the type should be given.

  • A list of INT64 would be "INT64[]"
  • A map from a STRING to a DOUBLE would be "MAP(STRING, DOUBLE)".

Note that it’s also valid to define child types through Python’s type annotations, e.g. list[int], or dict(str, float). It is also valid to use string representations to denote non-nested types.

UDF

Kùzu’s Python API also supports the registration of User Defined Functions (UDFs), allowing you to define custom functions in Python and use them in your Cypher queries. This can allow you to quickly extend Kùzu with new functions you need in your Python applications.

An example of using the UDF API is shown below. We will define a Python UDF that calculates the difference between two numbers, and then apply it in a Cypher query.

Register the UDF

import kuzu
db = kuzu.Database("test_db")
conn = kuzu.Connection(db)
# define your function
def difference(a, b):
return a - b
# define the expected type of your parameters
parameters = [kuzu.Type.INT64, kuzu.Type.INT64]
# define expected type of the returned value
return_type = kuzu.Type.INT64
# register the UDF
conn.create_function("difference", difference, parameters, return_type)

Note that in the example above, we explicitly declared the expected types of the parameters and the return value in Kùzu, prior to registering the UDF.

Alternatively, you can simply use Python type annotations to denote the type signature of the parameters and return value.

def difference(a : int, b : int) -> int:
return abs(a - b)
conn.create_function("difference", difference)

Additional parameters

The UDF API’s create_function provides the following additional parameters:

  • name: str : The name of the function to be invoked in cypher.
  • udf: Callable[[...], Any] : The function to be executed.
  • params_type: Optional[list[Type | str]] : A list whose elements can either be kuzu.Type or str. kuzu.Type can be used to denote nonnested parameter types, while str can be used to denote both nested and nonnested parameter types. Details on how to denote types are in the type notation section.
  • return_type: Optional[Type | str] : Either a kuzu.Type enum or str. Details on how to denote types are in the type notation section.
  • default_null_handling: Optional[bool] : True by default. When true, if any one of the inputs is null, function execution is skipped and the output is resolved to null.
  • catch_exceptions: Optional[bool] : False by default. When true, if the UDF raises an exception, the output is resolved to null. Otherwise the Exception is rethrown.

Apply the UDF

Once the UDF is registered, you can apply it in a Cypher query. First, let’s create some data.

# create a table
conn.execute("CREATE NODE TABLE IF NOT EXISTS Item (id INT64, a INT64, b INT64, c INT64, PRIMARY KEY(id))")
# insert some data
conn.execute("CREATE (i:Item {id: 1}) SET i.a = 134, i.b = 123")
conn.execute("CREATE (i:Item {id: 2}) SET i.a = 44, i.b = 29")
conn.execute("CREATE (i:Item {id: 3}) SET i.a = 32, i.b = 68")

We’re now ready to apply the UDF in a Cypher query:

# apply the UDF and print the results
result = conn.execute("MATCH (i:Item) RETURN i.a AS a, i.b AS b, difference (i.a, i.b) AS difference")
print(result.get_as_df())

The output should be:

a b difference
0 134 123 11
1 44 29 15
2 32 68 -36

Remove UDF

In case you want to remove the UDF, you can call the remove_function method on the connection object.

# Use existing connection object
conn.remove_function(difference)