Attaching to external RDBMSs
Kùzu supports directly scanning from a variety of relational databases using the LOAD FROM
statement.
Because RDBMSs are a common source of structured data in enterprises, the purpose behind this set of Kùzu
extensions is currently twofold: (1) to allow users to scan data from external RDBMSs without having to copy it
to a graph database; and (2) if data will be copied over from an RDBMS, this feature also simplifies the pipeline
for copying data from an external RDBMS into Kùzu tables.
The currently available relational database extensions, as well as their minimum required versions, are shown below:
Extension Name | Description | Minimum Version |
---|---|---|
duckdb | Scan from an attached DuckDB database | 0.10.0 |
postgres | Scan from an attached PostgreSQL database | 14.0 |
sqlite | Scan from an attached SQLite database | 3.3.0 |
Usage
Click through each tab in this section to see the example usage for your extension of choice.
The DuckDB extension can be installed and loaded by running the following commands using the Kùzu CLI or your preferred language client API:
1. Create a DuckDB database
To illustrate the usage of this extension, we create a sample DuckDB database of university students in Python.
2. Attach database
DB_PATH
: Path to the DuckDB database instance (can be a relative/absolute/https/s3 path)alias
: Database alias to use in Kùzu - If not provided, the database name from DuckDB will be used. When attaching multiple databases, it’s recommended to use aliasing.skip_unsupported_table
: Whether kuzu should throw an exception or skip when encountering a table with unsupported datatype. Supported options are: TRUE/FALSE. Default is FALSE.
The below example shows how the university.db
DuckDB database can be attached to Kùzu using
the alias uw
:
Note: To attach a remote DuckDB database hosted on S3, users may have to configure the following options:
Option name | Description |
---|---|
s3_access_key_id | S3 access key id |
s3_secret_access_key | S3 secret access key |
s3_endpoint | S3 endpoint |
s3_url_style | Uses S3 url style (should either be vhost or path) |
s3_region | S3 region |
Or through environment variables:
Setting | System environment variable |
---|---|
S3 key ID | S3_ACCESS_KEY_ID |
S3 secret key | S3_SECRET_ACCESS_KEY |
S3 endpoint | S3_ENDPOINT |
S3 region | S3_REGION |
S3 url style | S3_URL_STYLE |
Result:
3. Scan from DuckDB tables
Finally, we can utilize the LOAD FROM
statement to scan the person
table. Note that you need to prefix the
external person
table with the database alias (in our example uw
). See the USE
statement which allows you to
skip this prefixing for a specific default database.
Result:
4. USE: Reference database without alias
You can use the USE
statement for attached databases to use a default database name for future operations.
This can be used when reading from an attached database to avoid specifying the full database name
as a prefix to the table name.
Consider the same attached database as above:
Instead of defining the database name for each subsequent clause like this:
You can do:
5. Copy data from DuckDB tables
One important use case of the external RDBMS extensions is to facilitate seamless data transfer from the external RDBMS to Kùzu.
In this example, we continue using the university.db
database created in the last step, but this time,
we copy the data and persist it to Kùzu. This is done with the COPY FROM
statement. Here is an example:
We first create a Person
table in Kùzu. In this example we will make Person
have the same schema as the one defined in the external RDBMS.
When the schemas are the same, we can copy the data from the external DBMS table to the Kùzu table simply as follows.
If the schemas are not the same, e.g., Person
contains only name
property while the external uw.person
contains
name
and age
, we can still use COPY FROM
but with a subquery as follows:
6. Query the data in Kùzu
Finally, we can verify the data in the Person
table in Kùzu.
Result:
7. Clear attached database schema cache
To avoid redundantly retrieving schema information from attached databases, Kùzu maintains a schema cache
including table names and their respective columns and types. Should modifications occur in the schema
via an alternate connection to attached RDBMSs, such as creation or deletion of tables, the cached
schema data may become obsolete. You can use the clear_attached_db_cache()
function to refresh cached
schema information in such cases.
Note: If you have attached to databases from different RDBMSs, say Postgres, DuckDB, and Sqlite, this call will clear the cache for all of them.
8. Detach database
To detach a database, use DETACH [ALIAS]
as follows:
The PostgreSQL extension can be installed and loaded by running the following commands using the Kùzu CLI or your preferred language client API:
Set up a PostgreSQL server via Docker
During local development, it’s convenient to set up a PostgreSQL server using Docker. Run the following command to start a PostgreSQL server locally:
Note that the storage volume for this database is not persistent and will be deleted once the container is stopped. Moreover, the password in this case is provided via plain text, which is not recommended in a real use case, so the below example is for testing purposes only.
1. Create a sample PostgreSQL database
To illustrate the usage of the extension, we create a sample Postgres database of university
students. We will use asyncpg,
an asynchronous PostgreSQL client library for Python, to create the database and insert some sample data
via a Python script.
This step assumes you have run pip install asyncpg
to install the library, and are using Python 3.10+.
2. Attach database
skip_unsupported_table
: Whether kuzu should throw an exception or skip when encountering a table with unsupported datatype. Supported options are: TRUE or FALSE. Default is FALSE.
The below example shows how the university
PostgreSQL database can be attached to Kùzu using
the alias uw
:
The ATTACH
statement requires the following parameters:
PG_CONNECTION_STRING
: PostgreSQL connection string with the necessary parametersalias
: Database alias to use in Kùzu - If not provided, the database name from PostgreSQL will be used. When attaching multiple databases, it’s recommended to use aliasing.
The below table lists some common connection string parameters:
Parameter | Description | Default |
---|---|---|
dbname | Database name | [user defined] |
host | Host IP address | localhost |
user | Postgres username | postgres |
password | Postgres password | [empty] |
port | Port number | 5432 |
3. Scan from PostgreSQL tables
Finally, we can utilize the LOAD FROM
statement to scan the Person
table.
Result:
4. USE: Reference database without alias
You can use the USE
statement for attached databases to use a default database name for future operations.
This can be used when reading from an attached database to avoid specifying the full database name
as a prefix to the table name.
Consider the same attached database as above:
Instead of defining the database name for each subsequent clause like this:
You can do:
5. Copy data from PostgreSQL tables
One important use case of the external RDBMS extensions is to facilitate seamless data transfer from the external RDBMS to Kùzu.
In this example, we continue using the university.db
database created in the last step, but this time,
we copy the data and persist it to Kùzu. This is done with the COPY FROM
statement. Here is an example:
We first create a Person
table in Kùzu. In this example we will make Person
have the same schema as the one defined in the external RDBMS.
When the schemas are the same, we can copy the data from the external DBMS table to the Kùzu table simply as follows.
If the schemas are not the same, e.g., Person
contains only name
property while the external uw.person
contains
name
and age
, we can still use COPY FROM
but with a subquery as follows:
6. Query the data in Kùzu
Finally, we can verify the data in the Person
table in Kùzu.
Result:
7. Clear attached database schema cache
To avoid redundantly retrieving schema information from attached databases, Kùzu maintains a schema cache
including table names and their respective columns and types. Should modifications occur in the schema
via an alternate connection to attached RDBMSs, such as creation or deletion of tables, the cached
schema data may become obsolete. You can use the clear_attached_db_cache()
function to refresh cached
schema information in such cases.
Note: If you have attached to databases from different RDBMSs, say Postgres, DuckDB, and Sqlite, this call will clear the cache for all of them.
8. Detach database
To detach a database, use DETACH [ALIAS]
as follows:
The SQLite extension can be installed and loaded by running the following commands using the Kùzu CLI or your preferred language client API:
1. Create a SQLite database
To illustrate the usage of this extension, we create a sample SQLite database of university students in Python.
2. Attach database
DB_PATH
: Path to the SQLite database instance (can be relative or absolute path)alias
: Database alias to use in Kùzu - If not provided, the database name from DuckDB will be used. When attaching multiple databases, it’s recommended to use aliasing.skip_unsupported_table
: Whether kuzu should throw an exception or skip when encountering a table with unsupported datatype. Supported options:TRUE
orFALSE
.
The below example shows how the university.db
SQLite database can be attached to Kùzu using
the alias uw
:
3. Scan from SQLite tables
Finally, we can utilize the LOAD FROM
statement to scan the person
table.
Result:
4. USE: Reference database without alias
You can use the USE
statement for attached databases to use a default database name for future operations.
This can be used when reading from an attached database to avoid specifying the full database name
as a prefix to the table name.
Consider the same attached database as above:
Instead of defining the database name for each subsequent clause like this:
You can do:
5. Copy data from SQLite tables
One important use case of the external RDBMS extensions is to facilitate seamless data transfer from the external RDBMS to Kùzu.
In this example, we continue using the university.db
database created in the last step, but this time,
we copy the data and persist it to Kùzu. This is done with the COPY FROM
statement. Here is an example:
We first create a Person
table in Kùzu. In this example we will make Person
have the same schema as the one defined in the external RDBMS.
When the schemas are the same, we can copy the data from the external DBMS table to the Kùzu table simply as follows.
If the schemas are not the same, e.g., Person
contains only name
property while the external uw.person
contains
name
and age
, we can still use COPY FROM
but with a subquery as follows:
6. Query the data in Kùzu
Finally, we can verify the data in the Person
table in Kùzu.
Result:
7. Clear attached database schema cache
To avoid redundantly retrieving schema information from attached databases, Kùzu maintains a schema cache
including table names and their respective columns and types. Should modifications occur in the schema
via an alternate connection to attached RDBMSs, such as creation or deletion of tables, the cached
schema data may become obsolete. You can use the clear_attached_db_cache()
function to refresh cached
schema information in such cases.
Note: If you have attached to databases from different RDBMSs, say Postgres, DuckDB, and Sqlite, this call will clear the cache for all of them.
8. Detach database
To detach a database, use DETACH [ALIAS]
as follows: