Skip to content
Blog

Attaching to external RDBMSs

Kùzu supports directly scanning from a variety of relational databases using the LOAD FROM statement. Because RDBMSs are a common source of structured data in enterprises, the purpose behind this set of Kùzu extensions is currently twofold: (1) to allow users to scan data from external RDBMSs without having to copy it to a graph database; and (2) if data will be copied over from an RDBMS, this feature also simplifies the pipeline for copying data from an external RDBMS into Kùzu tables.

The currently available relational database extensions, as well as their minimum required versions, are shown below:

Extension NameDescriptionMinimum Version
duckdbScan from an attached DuckDB database0.10.0
postgresScan from an attached PostgreSQL database14.0
sqliteScan from an attached SQLite database3.3.0

Usage

Click through each tab in this section to see the example usage for your extension of choice.

The DuckDB extension can be installed and loaded by running the following commands using the Kùzu CLI or your preferred language client API:

INSTALL duckdb;
LOAD EXTENSION duckdb;

1. Create a DuckDB database

To illustrate the usage of this extension, we create a sample DuckDB database of university students in Python.

import duckdb
conn = duckdb.connect('university.db')
# Insert data to person table
conn.execute("CREATE TABLE IF NOT EXISTS person (name VARCHAR, age INTEGER);")
conn.execute("INSERT INTO person values ('Alice', 30);")
conn.execute("INSERT INTO person values ('Bob', 27);")
conn.execute("INSERT INTO person values ('Carol', 19);")
conn.execute("INSERT INTO person values ('Dan', 25);")

2. Attach database

ATTACH [DB_PATH] AS [alias] (dbtype duckdb, skip_unsupported_table = OPTION)
  • DB_PATH: Path to the DuckDB database instance (can be a relative/absolute/https/s3 path)
  • alias: Database alias to use in Kùzu - If not provided, the database name from DuckDB will be used. When attaching multiple databases, it’s recommended to use aliasing.
  • skip_unsupported_table: Whether kuzu should throw an exception or skip when encountering a table with unsupported datatype. Supported options are: TRUE/FALSE. Default is FALSE.

The below example shows how the university.db DuckDB database can be attached to Kùzu using the alias uw:

ATTACH 'university.db' AS uw (dbtype duckdb);

Note: To attach a remote DuckDB database hosted on S3, users may have to configure the following options:

Option nameDescription
s3_access_key_idS3 access key id
s3_secret_access_keyS3 secret access key
s3_endpointS3 endpoint
s3_url_styleUses S3 url style (should either be vhost or path)
s3_regionS3 region

Or through environment variables:

SettingSystem environment variable
S3 key IDS3_ACCESS_KEY_ID
S3 secret keyS3_SECRET_ACCESS_KEY
S3 endpointS3_ENDPOINT
S3 regionS3_REGION
S3 url styleS3_URL_STYLE
SET S3_URL_STYLE='VHOST';
ATTACH 's3://duckdb-blobs/databases/stations.duckdb' as station (dbtype duckdb);
LOAD FROM station.stations RETURN count(*);

Result:

┌──────────────┐
│ COUNT_STAR() │
│ INT64 │
├──────────────┤
│ 578 │
└──────────────┘

3. Scan from DuckDB tables

Finally, we can utilize the LOAD FROM statement to scan the person table. Note that you need to prefix the external person table with the database alias (in our example uw). See the USE statement which allows you to skip this prefixing for a specific default database.

LOAD FROM uw.person
RETURN *

Result:

---------------
| name | age |
---------------
| Alice | 30 |
---------------
| Bob | 27 |
---------------
| Carol | 19 |
---------------
| Dan | 25 |
---------------

4. USE: Reference database without alias

You can use the USE statement for attached databases to use a default database name for future operations. This can be used when reading from an attached database to avoid specifying the full database name as a prefix to the table name.

Consider the same attached database as above:

ATTACH 'university.db' AS uw (dbtype duckdb);

Instead of defining the database name for each subsequent clause like this:

LOAD FROM uw.person
RETURN *

You can do:

USE uw;
LOAD FROM person
RETURN *

5. Copy data from DuckDB tables

One important use case of the external RDBMS extensions is to facilitate seamless data transfer from the external RDBMS to Kùzu. In this example, we continue using the university.db database created in the last step, but this time, we copy the data and persist it to Kùzu. This is done with the COPY FROM statement. Here is an example:

We first create a Person table in Kùzu. In this example we will make Person have the same schema as the one defined in the external RDBMS.

CREATE NODE TABLE Person (name STRING, age INT32, PRIMARY KEY(name));

When the schemas are the same, we can copy the data from the external DBMS table to the Kùzu table simply as follows.

COPY Person FROM uw.person;

If the schemas are not the same, e.g., Person contains only name property while the external uw.person contains name and age, we can still use COPY FROM but with a subquery as follows:

COPY Person FROM (LOAD FROM uw.person RETURN name);

6. Query the data in Kùzu

Finally, we can verify the data in the Person table in Kùzu.

MATCH (p:Person) RETURN p.*;

Result:

------------------
| p.name | p.age |
------------------
| Alice | 30 |
------------------
| Bob | 27 |
------------------
| Carol | 19 |
------------------
| Dan | 25 |
------------------

7. Clear attached database schema cache

To avoid redundantly retrieving schema information from attached databases, Kùzu maintains a schema cache including table names and their respective columns and types. Should modifications occur in the schema via an alternate connection to attached RDBMSs, such as creation or deletion of tables, the cached schema data may become obsolete. You can use the clear_attached_db_cache() function to refresh cached schema information in such cases.

CALL clear_attached_db_cache() RETURN *

Note: If you have attached to databases from different RDBMSs, say Postgres, DuckDB, and Sqlite, this call will clear the cache for all of them.

8. Detach database

To detach a database, use DETACH [ALIAS] as follows:

DETACH uw