Skip to content
Blog

DuckDB extension

The duckdb extension allows you to attach DuckDB databases to Kuzu.

Usage

INSTALL duckdb;
LOAD duckdb;

Example DuckDB database

First, create a sample DuckDB database of university students in Python.

import duckdb
conn = duckdb.connect('university.db')
conn.execute("CREATE TABLE IF NOT EXISTS person (name VARCHAR, age INTEGER);")
conn.execute("INSERT INTO person VALUES ('Alice', 30);")
conn.execute("INSERT INTO person VALUES ('Bob', 27);")
conn.execute("INSERT INTO person VALUES ('Carol', 19);")
conn.execute("INSERT INTO person VALUES ('Dan', 25);")

Attach a DuckDB database

To attach a DuckDB database, use the ATTACH statement:

ATTACH
[DB_PATH] [AS alias]
(dbtype duckdb, skip_unsupported_table = OPTION, schema = SCHEMA_NAME)
  • DB_PATH: Path to the DuckDB database instance (can be a relative, absolute, https or S3 path)
  • alias: Database alias to use in Kuzu. If not provided, the database name from DuckDB will be used. When attaching multiple databases, we recommend using aliases.
  • skip_unsupported_table: A boolean flag to indicate whether Kuzu should exit with an error or skip tables with unsupported data types. Defaults to False.
  • schema: The name of the schema in DuckDB to attach. Kuzu attaches to the main schema of DuckDB by default.

For example, the university.db DuckDB database can be attached to Kuzu using the alias uw:

ATTACH 'university.db' AS uw (dbtype duckdb);

Attach a DuckDB database on S3

You can also attach a remote DuckDB database hosted on S3.

First, configure the S3 connection.

Then, attach the database using an S3 URL:

SET S3_URL_STYLE='VHOST';
ATTACH 's3://duckdb-blobs/databases/stations.duckdb' AS station (dbtype duckdb);
LOAD FROM station.stations RETURN count(*);
┌──────────────┐
│ COUNT_STAR() │
│ INT64 │
├──────────────┤
│ 578 │
└──────────────┘

DuckDB to Kuzu type mapping

The following table shows the mapping from DuckDB’s data types to Kuzu’s data types:

Data type in DuckDBCorresponding data type in Kuzu
BIGINTINT64
BITUnsupported
BLOBBLOB
BOOLEANBOOL
DATEDATE
DECIMAL(prec, scale)DECIMAL(prec, scale)
DOUBLEDOUBLE
FLOATFLOAT
HUGEINTINT128
INTEGERINT32
INTERVALINTERVAL
SMALLINTINT16
TIMEUnsupported
TIMESTAMP WITH TIME ZONEUnsupported
TIMESTAMPTIMESTAMP
TINYINTINT8
UBIGINTUINT64
UHUGEINTUnsupported
UINTEGERUINT32
USMALLINTUINT16
UTINYINTUINT8
UUIDUUID
VARCHARSTRING
ENUMUnsupported
ARRAYARRAY
LISTLIST
MAPMAP
STRUCTSTRUCT
UNIONUNION

Scan a DuckDB table

You can use the LOAD FROM statement to scan the person table from the attached uw database.

LOAD FROM uw.person
RETURN *;
---------------
| name | age |
---------------
| Alice | 30 |
---------------
| Bob | 27 |
---------------
| Carol | 19 |
---------------
| Dan | 25 |
---------------

Use a default database name

You can set a default database name to avoid having to specify the full database name as a prefix in every query.

The above example can be simplified as:

ATTACH 'university.db' AS uw (dbtype duckdb);
USE uw;
LOAD FROM person
RETURN *;

Copy data from DuckDB tables

You can use the COPY FROM statement to import data from a DuckDB table into Kuzu.

First, create a Person table in Kuzu, with the same schema as the one defined in DuckDB:

CREATE NODE TABLE Person (name STRING PRIMARY KEY, age INT32);

When the schemas are the same, you can copy the data from the external DBMS table to the Kuzu table simply as:

COPY Person FROM uw.person;

If the schemas are not the same, e.g., Person contains only name property while the external uw.person contains name and age, you can still use COPY FROM but with a subquery as:

COPY Person FROM (LOAD FROM uw.person RETURN name);

You can verify that the data was successfully imported:

MATCH (p:Person) RETURN p.*;
------------------
| p.name | p.age |
------------------
| Alice | 30 |
------------------
| Bob | 27 |
------------------
| Carol | 19 |
------------------
| Dan | 25 |
------------------

Clear the schema cache

To avoid redundantly retrieving schema information from attached databases, Kuzu maintains a schema cache including table names and their respective columns and types. Should modifications occur in the schema via an alternate connection to attached RDBMSs, such as creation or deletion of tables, the cached schema data may become obsolete. You can use the clear_attached_db_cache() function to refresh cached schema information in such cases.

CALL clear_attached_db_cache();

Detach a database

To detach a database, use DETACH [ALIAS]:

DETACH uw;