Skip to content
Blog

PostgreSQL extension

The postgres extension allows you to attach PostgreSQL databases to Kuzu.

Usage

INSTALL postgres;
LOAD postgres;

Set up a PostgreSQL server via Docker

During local development, it’s convenient to set up a PostgreSQL server using Docker. Run the following command to start a PostgreSQL server locally:

Terminal window
docker run --rm --name kuzu-postgres \
-e POSTGRES_PASSWORD=testpassword \
-p 5432:5432 \
postgres:latest

Example PostgreSQL database

To illustrate the usage of the extension, we create a sample PostgreSQL database of university students. We will use asyncpg, an asynchronous PostgreSQL client library for Python, to create the database and insert some sample data via a Python script.

Terminal window
pip install asyncpg
import asyncio
import asyncpg
async def main():
conn = await asyncpg.connect('postgresql://postgres:testpassword@localhost:5432/postgres')
# Create and insert data to a new table
try:
await conn.execute("CREATE TABLE IF NOT EXISTS person (name VARCHAR, age INTEGER);")
await conn.execute("INSERT INTO person (name, age) VALUES ('Alice', 30)")
await conn.execute("INSERT INTO person (name, age) VALUES ('Bob', 27)")
await conn.execute("INSERT INTO person (name, age) VALUES ('Carol', 19)")
await conn.execute("INSERT INTO person (name, age) VALUES ('Dan', 25)")
except asyncpg.exceptions.DuplicateTableError:
print(f"Table already exists, skipping creation and insertion...")
# Check results
print(await conn.fetch("SELECT * FROM person"))
asyncio.run(main())

Attach a database

ATTACH
<PG_CONNECTION_STRING> AS <alias>
(dbtype postgres, skip_unsupported_table = <OPTION>, schema = <SCHEMA_NAME>)
  • PG_CONNECTION_STRING: PostgreSQL connection string with the necessary parameters
  • alias: Database alias to use in Kuzu. If not provided, the database name from PostgreSQL will be used. When attaching multiple databases, we recommend using aliases.
  • skip_unsupported_table: A boolean flag to indicate whether Kuzu should exit with an error or skip tables with unsupported datatypes. Defaults to False.
  • schema: The name of the schema in PostgreSQL to attach. Kuzu attaches to the public schema of PostgreSQL by default.

PG_CONNECTION_STRING accepts the following parameters:

  • dbname: Database name
  • host: Host IP address. Defaults to localhost.
  • user: PostgreSQL username. Defaults to postgres.
  • password: PostgreSQL password. Defaults to an empty string.
  • port: Port number. Defaults to 5432.

For example, the university PostgreSQL database can be attached to Kuzu as:

ATTACH
'dbname=university user=postgres host=localhost password=testpassword port=5432' AS uw
(dbtype postgres);

PostgreSQL to Kuzu type mapping

The following table shows the mapping from PostgreSQL’s data types to Kuzu’s data types:

PostgreSQL Data TypeCorresponding Data Type in Kuzu
bigint (int8)INT64
bigserial (serial8)INT64
bit [ (n) ]STRING
bit varying [ (n) ] (varbit [ (n) ])STRING
boolean (bool)BOOL
boxDOUBLE[]
byteaBLOB
character [ (n) ] (char [ (n) ])STRING
character varying [ (n) ] (varchar [ (n)])STRING
cidrSTRING
circleDOUBLE[]
dateDATE
double precision (float8)DOUBLE
inetSTRING
integer (int, int4)INT32
interval [ fields ] [ (p) ]INTERVAL
jsonJSON
lineDOUBLE[]
lsegDOUBLE[]
macaddrSTRING
macaddr8STRING
moneySTRING
numeric [ (p, s) ] (decimal [ (p, s) ])DECIMAL
pathDOUBLE[]
pg_lsnSTRING
pg_snapshotSTRING
pointSTRUCT(x DOUBLE, y DOUBLE)
polygonDOUBLE[]
real (float4)FLOAT
smallint (int2)INT16
smallserial (serial2)INT16
serial (serial4)INT32
textSTRING
time [ (p) ] [ without time zone ]Unsupported
time [ (p) ] with time zone (timetz)Unsupported
timestamp [ (p) ] [ without time zone ]TIMESTAMP
timestamp [ (p) ] with time zone (timestamptz)Unsupported
tsquerySTRING
tsvectorSTRING
txid_snapshotSTRING
uuidUUID
xmlSTRING

Scan PostgreSQL tables with Cypher

You can use the LOAD FROM statement to scan the person table from the attached uw database.

LOAD FROM uw.person
RETURN *
┌────────┬───────┐
│ name │ age │
│ STRING │ INT64 │
├────────┼───────┤
│ Alice │ 30 │
│ Bob │ 27 │
│ Carol │ 19 │
│ Dan │ 25 │
└────────┴───────┘

Scan PostgreSQL tables with SQL

You can use the SQL_QUERY function to execute arbitrary read-only SQL queries on any attached PostgreSQL database and retrieve its results in Kuzu.

For example, the following query filters out all people who are younger than 20:

CALL SQL_QUERY('uw', 'SELECT * FROM person WHERE age >= 20')
RETURN *;
┌────────┬───────┐
│ name │ age │
│ STRING │ INT64 │
├────────┼───────┤
│ Alice │ 30 │
│ Bob │ 27 │
│ Dan │ 25 │
└────────┴───────┘

Using a database without an alias

You can set a default database name to avoid having to specify the full database name as a prefix in every query.

The above example can be simplified as:

ATTACH 'university' AS uw (dbtype postgres);
USE uw;
LOAD FROM person
RETURN *;

Copy data from PostgreSQL tables

You can use the COPY FROM statement to import data from a PostgreSQL table into Kuzu.

First, create a Person table in Kuzu, with the same schema as the one defined in PostgreSQL:

CREATE NODE TABLE Person (name STRING PRIMARY KEY, age INT32);

When the schemas are the same, you can copy the data from the external DBMS table to the Kuzu table as:

COPY Person FROM uw.person;

If the schemas are not the same, e.g., Person contains only name property while the external uw.person contains name and age, you can still use COPY FROM but with a subquery as:

COPY Person FROM (LOAD FROM uw.person RETURN name);

Alternatively, you can use the SQL_QUERY function:

COPY Person FROM SQL_QUERY('uw', 'SELECT name FROM person');

You can verify that the data was successfully imported:

MATCH (p:Person) RETURN p.*;
┌────────┬───────┐
│ name │ age │
│ STRING │ INT64 │
├────────┼───────┤
│ Alice │ 30 │
│ Bob │ 27 │
│ Carol │ 19 │
│ Dan │ 25 │
└────────┴───────┘

Clear the schema cache

To avoid redundantly retrieving schema information from attached databases, Kuzu maintains a schema cache including table names and their respective columns and types. Should modifications occur in the schema via an alternate connection to attached RDBMSs, such as creation or deletion of tables, the cached schema data may become obsolete. You can use the clear_attached_db_cache() function to refresh the cached schema information in such cases.

CALL clear_attached_db_cache();

Detach a database

To detach a database, use DETACH [ALIAS] as follows:

DETACH uw;