DuckDB extension
The duckdb
extension allows you to attach DuckDB databases to Kuzu.
Usage
INSTALL duckdb;LOAD duckdb;
Example DuckDB database
First, create a sample DuckDB database of university students in Python.
import duckdb
conn = duckdb.connect('university.db')
conn.execute("CREATE TABLE IF NOT EXISTS person (name VARCHAR, age INTEGER);")conn.execute("INSERT INTO person VALUES ('Alice', 30);")conn.execute("INSERT INTO person VALUES ('Bob', 27);")conn.execute("INSERT INTO person VALUES ('Carol', 19);")conn.execute("INSERT INTO person VALUES ('Dan', 25);")
Attach a DuckDB database
To attach a DuckDB database, use the ATTACH
statement:
ATTACH [DB_PATH] [AS alias] (dbtype duckdb, skip_unsupported_table = OPTION, schema = SCHEMA_NAME)
DB_PATH
: Path to the DuckDB database instance (can be a relative, absolute, https or S3 path)alias
: Database alias to use in Kuzu. If not provided, the database name from DuckDB will be used. When attaching multiple databases, we recommend using aliases.skip_unsupported_table
: A boolean flag to indicate whether Kuzu should exit with an error or skip tables with unsupported data types. Defaults toFalse
.schema
: The name of the schema in DuckDB to attach. Kuzu attaches to themain
schema of DuckDB by default.
For example, the university.db
DuckDB database can be attached to Kuzu using the alias uw
:
ATTACH 'university.db' AS uw (dbtype duckdb);
Attach a DuckDB database on S3
You can also attach a remote DuckDB database hosted on S3.
First, configure the S3 connection.
Then, attach the database using an S3 URL:
SET S3_URL_STYLE='VHOST';ATTACH 's3://duckdb-blobs/databases/stations.duckdb' AS station (dbtype duckdb);LOAD FROM station.stations RETURN count(*);
┌──────────────┐│ COUNT_STAR() ││ INT64 │├──────────────┤│ 578 │└──────────────┘
DuckDB to Kuzu type mapping
The following table shows the mapping from DuckDB’s data types to Kuzu’s data types:
Data type in DuckDB | Corresponding data type in Kuzu |
---|---|
BIGINT | INT64 |
BIT | Unsupported |
BLOB | BLOB |
BOOLEAN | BOOL |
DATE | DATE |
DECIMAL(prec, scale) | DECIMAL(prec, scale) |
DOUBLE | DOUBLE |
FLOAT | FLOAT |
HUGEINT | INT128 |
INTEGER | INT32 |
INTERVAL | INTERVAL |
SMALLINT | INT16 |
TIME | Unsupported |
TIMESTAMP WITH TIME ZONE | Unsupported |
TIMESTAMP | TIMESTAMP |
TINYINT | INT8 |
UBIGINT | UINT64 |
UHUGEINT | Unsupported |
UINTEGER | UINT32 |
USMALLINT | UINT16 |
UTINYINT | UINT8 |
UUID | UUID |
VARCHAR | STRING |
ENUM | Unsupported |
ARRAY | ARRAY |
LIST | LIST |
MAP | MAP |
STRUCT | STRUCT |
UNION | UNION |
Scan a DuckDB table
You can use the LOAD FROM
statement to scan the person
table from the attached uw
database.
LOAD FROM uw.personRETURN *;
---------------| name | age |---------------| Alice | 30 |---------------| Bob | 27 |---------------| Carol | 19 |---------------| Dan | 25 |---------------
Use a default database name
You can set a default database name to avoid having to specify the full database name as a prefix in every query.
The above example can be simplified as:
ATTACH 'university.db' AS uw (dbtype duckdb);USE uw;LOAD FROM personRETURN *;
Copy data from DuckDB tables
You can use the COPY FROM
statement to import data from a DuckDB table into Kuzu.
First, create a Person
table in Kuzu, with the same schema as the one defined in DuckDB:
CREATE NODE TABLE Person (name STRING PRIMARY KEY, age INT32);
When the schemas are the same, you can copy the data from the external DBMS table to the Kuzu table simply as:
COPY Person FROM uw.person;
If the schemas are not the same, e.g., Person
contains only name
property while the external uw.person
contains
name
and age
, you can still use COPY FROM
but with a subquery as:
COPY Person FROM (LOAD FROM uw.person RETURN name);
You can verify that the data was successfully imported:
MATCH (p:Person) RETURN p.*;
------------------| p.name | p.age |------------------| Alice | 30 |------------------| Bob | 27 |------------------| Carol | 19 |------------------| Dan | 25 |------------------
Clear the schema cache
To avoid redundantly retrieving schema information from attached databases, Kuzu maintains a schema cache
including table names and their respective columns and types. Should modifications occur in the schema
via an alternate connection to attached RDBMSs, such as creation or deletion of tables, the cached
schema data may become obsolete. You can use the clear_attached_db_cache()
function to refresh cached
schema information in such cases.
CALL clear_attached_db_cache();
Detach a database
To detach a database, use DETACH [ALIAS]
:
DETACH uw;