Attaching to DuckDB databases
The DuckDB extension can be installed and loaded by running the following commands using the Kuzu CLI or your preferred language client API:
INSTALL duckdb;LOAD duckdb;
1. Create a DuckDB database
To illustrate the usage of this extension, we create a sample DuckDB database of university students in Python.
import duckdb
conn = duckdb.connect('university.db')
# Insert data to person tableconn.execute("CREATE TABLE IF NOT EXISTS person (name VARCHAR, age INTEGER);")conn.execute("INSERT INTO person values ('Alice', 30);")conn.execute("INSERT INTO person values ('Bob', 27);")conn.execute("INSERT INTO person values ('Carol', 19);")conn.execute("INSERT INTO person values ('Dan', 25);")
2. Attach database
ATTACH [DB_PATH] AS [alias] (dbtype duckdb, skip_unsupported_table = OPTION, schema = SCHEMA_NAME)
DB_PATH
: Path to the DuckDB database instance (can be a relative/absolute/https/s3 path)alias
: Database alias to use in Kuzu - If not provided, the database name from DuckDB will be used. When attaching multiple databases, it’s recommended to use aliasing.skip_unsupported_table
: Whether Kuzu should throw an exception or skip when encountering a table with unsupported datatype. Supported options are: TRUE/FALSE. Default is FALSE.schema
: The name of the schema in duckdb to attach. Kuzu attaches to themain
schema of duckdb by default.
The below example shows how the university.db
DuckDB database can be attached to Kuzu using
the alias uw
:
ATTACH 'university.db' AS uw (dbtype duckdb);
Note: To attach a remote DuckDB database hosted on S3, users may have to configure the following options:
Option name | Description |
---|---|
s3_access_key_id | S3 access key id |
s3_secret_access_key | S3 secret access key |
s3_endpoint | S3 endpoint |
s3_url_style | Uses S3 url style (should either be vhost or path) |
s3_region | S3 region |
Or through environment variables:
Setting | System environment variable |
---|---|
S3 key ID | S3_ACCESS_KEY_ID |
S3 secret key | S3_SECRET_ACCESS_KEY |
S3 endpoint | S3_ENDPOINT |
S3 region | S3_REGION |
S3 url style | S3_URL_STYLE |
SET S3_URL_STYLE='VHOST';ATTACH 's3://duckdb-blobs/databases/stations.duckdb' as station (dbtype duckdb);LOAD FROM station.stations RETURN count(*);
Result:
┌──────────────┐│ COUNT_STAR() ││ INT64 │├──────────────┤│ 578 │└──────────────┘
3. Data type mapping from DuckDB to Kuzu
The table below shows the mapping from duckdb’s type to Kuzu’s type:
Data type in DuckDB | Corresponding data type in Kuzu |
---|---|
BIGINT | INT64 |
BIT | UNSUPPORTED |
BLOB | BLOB |
BOOLEAN | BOOL |
DATE | DATE |
DECIMAL(prec, scale) | DECIMAL(prec, scale) |
DOUBLE | DOUBLE |
FLOAT | FLOAT |
HUGEINT | INT128 |
INTEGER | INT32 |
INTERVAL | INTERVAL |
SMALLINT | INT16 |
TIME | UNSUPPORTED |
TIMESTAMP WITH TIME ZONE | UNSUPPORTED |
TIMESTAMP | TIMESTAMP |
TINYINT | INT8 |
UBIGINT | UINT64 |
UHUGEINT | UNSUPPORTED |
UINTEGER | UINT32 |
USMALLINT | UINT16 |
UTINYINT | UINT8 |
UUID | UUID |
VARCHAR | STRING |
ENUM | UNSUPPORTED |
ARRAY | ARRAY |
LIST | LIST |
MAP | MAP |
STRUCT | STRUCT |
UNION | UNION |
4. Scan from DuckDB tables
Finally, we can utilize the LOAD FROM
statement to scan the person
table. Note that you need to prefix the
external person
table with the database alias (in our example uw
). See the USE
statement which allows you to
skip this prefixing for a specific default database.
LOAD FROM uw.personRETURN *
Result:
---------------| name | age |---------------| Alice | 30 |---------------| Bob | 27 |---------------| Carol | 19 |---------------| Dan | 25 |---------------
5. USE: Reference database without alias
You can use the USE
statement for attached databases to use a default database name for future operations.
This can be used when reading from an attached database to avoid specifying the full database name
as a prefix to the table name.
Consider the same attached database as above:
ATTACH 'university.db' AS uw (dbtype duckdb);
Instead of defining the database name for each subsequent clause like this:
LOAD FROM uw.personRETURN *
You can do:
USE uw;LOAD FROM personRETURN *
6. Copy data from DuckDB tables
One important use case of the external RDBMS extensions is to facilitate seamless data transfer from the external RDBMS to Kuzu.
In this example, we continue using the university.db
database created in the last step, but this time,
we copy the data and persist it to Kuzu. This is done with the COPY FROM
statement. Here is an example:
We first create a Person
table in Kuzu. In this example we will make Person
have the same schema as the one defined in the external RDBMS.
CREATE NODE TABLE Person (name STRING, age INT32, PRIMARY KEY(name));
When the schemas are the same, we can copy the data from the external DBMS table to the Kuzu table simply as follows.
COPY Person FROM uw.person;
If the schemas are not the same, e.g., Person
contains only name
property while the external uw.person
contains
name
and age
, we can still use COPY FROM
but with a subquery as follows:
COPY Person FROM (LOAD FROM uw.person RETURN name);
7. Query the data in Kuzu
Finally, we can verify the data in the Person
table in Kuzu.
MATCH (p:Person) RETURN p.*;
Result:
------------------| p.name | p.age |------------------| Alice | 30 |------------------| Bob | 27 |------------------| Carol | 19 |------------------| Dan | 25 |------------------
8. Clear attached database schema cache
To avoid redundantly retrieving schema information from attached databases, Kuzu maintains a schema cache
including table names and their respective columns and types. Should modifications occur in the schema
via an alternate connection to attached RDBMSs, such as creation or deletion of tables, the cached
schema data may become obsolete. You can use the clear_attached_db_cache()
function to refresh cached
schema information in such cases.
CALL clear_attached_db_cache()
Note: If you have attached to databases from different RDBMSs, say Postgres, DuckDB, and Sqlite, this call will clear the cache for all of them.
9. Detach database
To detach a database, use DETACH [ALIAS]
as follows:
DETACH uw