DuckDB extension
The DuckDB extension allows Kùzu to directly scan data from DuckDB databases.
Currently, this is done by LOAD FROM
statements.
This allows users to not only view their DuckDB tables in Kùzu, but also facilitates seamless
migration of data from DuckDB to Kùzu for deeper graph analysis. Currently, the extension is read-only
from DuckDB and does not support write operations.
Usage
duckdb
is an official extension developed and maintained by Kùzu.
It can be installed and loaded by running the following commands using the CLI or your preferred language
client API:
Direct scan from DuckDB
This section shows how to directly scan from DuckDB tables to Kùzu using the LOAD FROM
statement.
Create a sample DuckDB database
To illustrate the usage of this extension, we create a sample DuckDB database of university students in Python.
Attach DuckDB instance in Kùzu
DB_PATH
: Path to the DuckDB database instance (can be relative or absolute path)alias
: Database alias to use in Kùzu - If not provided, the database name from DuckDB will be used. When attaching multiple databases, it’s recommended to use aliasing.
The below example shows how the university.db
DuckDB database can be attached to Kùzu using
the alias uw
:
Scan from DuckDB tables
Finally, we can utilize the LOAD FROM
statement to scan the person
table.
Result:
Detach DuckDB instance
To detach a DuckDB instance, use DETACH [ALIAS]
as follows:
Data migration from DuckDB tables
One important use case of the DuckDB extension is to facilitate seamless data transfer from DuckDB to Kùzu.
In this example, we continue using the university.db
database created in the last step, but this time,
we copy the data and persist it to Kùzu. This is done with the COPY FROM
query results feature.
Create a Person
table in Kùzu
We first create a Person
table in Kùzu which has the same schema as the one defined in DuckDB.
Use COPY FROM
to migrate data
We can reference the created alias uw
to copy data from the DuckDB table to the Kùzu table.
Query the data in Kùzu
Finally, we can verify the data in the Person
table in Kùzu.
Result:
Clear attached database schema cache
To avoid redundantly retrieving schema information from attached databases, Kùzu maintains a schema cache
including table names and their respective columns and types. Should modifications occur in the schema
via an alternate connection to attached databases (DuckDB or some other attachad database), such as creation or deletion of tables, the cached
schema data may become obsolete. You can use the clear_attached_db_cache()
function to refresh cached
schema information in such cases.
Note: this call function will clear cache of all attached databases.
USE
statement
The USE
statement for attached databases sets a default database name to use for future operations.
This can be used when reading from an attached database to avoid specifying the full database name
as a prefix to the table name.
Consider the same attached database as above:
Instead of defining the DUckDB database name for each subsequent clause like this:
You can do: