PostgreSQL extension
The postgres
extension allows you to attach PostgreSQL databases to Kuzu.
Usage
INSTALL postgres;LOAD postgres;
Set up a PostgreSQL server via Docker
During local development, it’s convenient to set up a PostgreSQL server using Docker. Run the following command to start a PostgreSQL server locally:
docker run --rm --name kuzu-postgres \ -e POSTGRES_PASSWORD=testpassword \ -p 5432:5432 \ postgres:latest
Example PostgreSQL database
To illustrate the usage of the extension, we create a sample PostgreSQL database of university students. We will use asyncpg, an asynchronous PostgreSQL client library for Python, to create the database and insert some sample data via a Python script.
pip install asyncpg
import asyncioimport asyncpg
async def main(): conn = await asyncpg.connect('postgresql://postgres:testpassword@localhost:5432/postgres') # Create and insert data to a new table try: await conn.execute("CREATE TABLE IF NOT EXISTS person (name VARCHAR, age INTEGER);") await conn.execute("INSERT INTO person (name, age) VALUES ('Alice', 30)") await conn.execute("INSERT INTO person (name, age) VALUES ('Bob', 27)") await conn.execute("INSERT INTO person (name, age) VALUES ('Carol', 19)") await conn.execute("INSERT INTO person (name, age) VALUES ('Dan', 25)") except asyncpg.exceptions.DuplicateTableError: print(f"Table already exists, skipping creation and insertion...") # Check results print(await conn.fetch("SELECT * FROM person"))
asyncio.run(main())
Attach a database
ATTACH <PG_CONNECTION_STRING> AS <alias> (dbtype postgres, skip_unsupported_table = <OPTION>, schema = <SCHEMA_NAME>)
PG_CONNECTION_STRING
: PostgreSQL connection string with the necessary parametersalias
: Database alias to use in Kuzu. If not provided, the database name from PostgreSQL will be used. When attaching multiple databases, we recommend using aliases.skip_unsupported_table
: A boolean flag to indicate whether Kuzu should exit with an error or skip tables with unsupported datatypes. Defaults toFalse
.schema
: The name of the schema in PostgreSQL to attach. Kuzu attaches to thepublic
schema of PostgreSQL by default.
PG_CONNECTION_STRING
accepts the following parameters:
dbname
: Database namehost
: Host IP address. Defaults tolocalhost
.user
: PostgreSQL username. Defaults topostgres
.password
: PostgreSQL password. Defaults to an empty string.port
: Port number. Defaults to5432
.
For example, the university
PostgreSQL database can be attached to Kuzu as:
ATTACH 'dbname=university user=postgres host=localhost password=testpassword port=5432' AS uw (dbtype postgres);
PostgreSQL to Kuzu type mapping
The following table shows the mapping from PostgreSQL’s data types to Kuzu’s data types:
PostgreSQL Data Type | Corresponding Data Type in Kuzu |
---|---|
bigint (int8) | INT64 |
bigserial (serial8) | INT64 |
bit [ (n) ] | STRING |
bit varying [ (n) ] (varbit [ (n) ]) | STRING |
boolean (bool) | BOOL |
box | DOUBLE[] |
bytea | BLOB |
character [ (n) ] (char [ (n) ]) | STRING |
character varying [ (n) ] (varchar [ (n)]) | STRING |
cidr | STRING |
circle | DOUBLE[] |
date | DATE |
double precision (float8) | DOUBLE |
inet | STRING |
integer (int, int4) | INT32 |
interval [ fields ] [ (p) ] | INTERVAL |
json | JSON |
line | DOUBLE[] |
lseg | DOUBLE[] |
macaddr | STRING |
macaddr8 | STRING |
money | STRING |
numeric [ (p, s) ] (decimal [ (p, s) ]) | DECIMAL |
path | DOUBLE[] |
pg_lsn | STRING |
pg_snapshot | STRING |
point | STRUCT(x DOUBLE, y DOUBLE) |
polygon | DOUBLE[] |
real (float4) | FLOAT |
smallint (int2) | INT16 |
smallserial (serial2) | INT16 |
serial (serial4) | INT32 |
text | STRING |
time [ (p) ] [ without time zone ] | Unsupported |
time [ (p) ] with time zone (timetz) | Unsupported |
timestamp [ (p) ] [ without time zone ] | TIMESTAMP |
timestamp [ (p) ] with time zone (timestamptz) | Unsupported |
tsquery | STRING |
tsvector | STRING |
txid_snapshot | STRING |
uuid | UUID |
xml | STRING |
Scan PostgreSQL tables with Cypher
You can use the LOAD FROM
statement to scan the person
table from the attached uw
database.
LOAD FROM uw.personRETURN *
┌────────┬───────┐│ name │ age ││ STRING │ INT64 │├────────┼───────┤│ Alice │ 30 ││ Bob │ 27 ││ Carol │ 19 ││ Dan │ 25 │└────────┴───────┘
Scan PostgreSQL tables with SQL
You can use the SQL_QUERY
function to execute arbitrary read-only SQL queries on any attached PostgreSQL database
and retrieve its results in Kuzu.
For example, the following query filters out all people who are younger than 20:
CALL SQL_QUERY('uw', 'SELECT * FROM person WHERE age >= 20')RETURN *;
┌────────┬───────┐│ name │ age ││ STRING │ INT64 │├────────┼───────┤│ Alice │ 30 ││ Bob │ 27 ││ Dan │ 25 │└────────┴───────┘
Using a database without an alias
You can set a default database name to avoid having to specify the full database name as a prefix in every query.
The above example can be simplified as:
ATTACH 'university' AS uw (dbtype postgres);USE uw;LOAD FROM personRETURN *;
Copy data from PostgreSQL tables
You can use the COPY FROM
statement to import data from a PostgreSQL table into Kuzu.
First, create a Person
table in Kuzu, with the same schema as the one defined in PostgreSQL:
CREATE NODE TABLE Person (name STRING PRIMARY KEY, age INT32);
When the schemas are the same, you can copy the data from the external DBMS table to the Kuzu table as:
COPY Person FROM uw.person;
If the schemas are not the same, e.g., Person
contains only name
property while the external uw.person
contains
name
and age
, you can still use COPY FROM
but with a subquery as:
COPY Person FROM (LOAD FROM uw.person RETURN name);
Alternatively, you can use the SQL_QUERY
function:
COPY Person FROM SQL_QUERY('uw', 'SELECT name FROM person');
You can verify that the data was successfully imported:
MATCH (p:Person) RETURN p.*;
┌────────┬───────┐│ name │ age ││ STRING │ INT64 │├────────┼───────┤│ Alice │ 30 ││ Bob │ 27 ││ Carol │ 19 ││ Dan │ 25 │└────────┴───────┘
Clear the schema cache
To avoid redundantly retrieving schema information from attached databases, Kuzu maintains a schema cache
including table names and their respective columns and types. Should modifications occur in the schema
via an alternate connection to attached RDBMSs, such as creation or deletion of tables, the cached
schema data may become obsolete. You can use the clear_attached_db_cache()
function to refresh the cached
schema information in such cases.
CALL clear_attached_db_cache();
Detach a database
To detach a database, use DETACH [ALIAS]
as follows:
DETACH uw;