Skip to content
Blog

Delta Lake

Usage

The delta extension adds support for scanning/copying from the Delta Lake open-source storage format. Delta Lake is an open-source storage framework that enables building a format agnostic Lakehouse architecture. Using this extension, you can interact with Delta tables from within Kùzu using the LOAD FROM and COPY FROM clauses.

The Delta functionality is not available by default, so you would first need to install the DELTA extension by running the following commands:

INSTALL DELTA;
LOAD EXTENSION DELTA;

Example dataset

Let’s look at an example dataset to demonstrate how the Delta extension can be used. Firstly, let’s create a Delta table containing student information using Python and save the Delta table in the '/tmp/student' directory: Before running the script, make sure the deltalake Python package is properly installed (we will also use Pandas).

Terminal window
pip install deltalake pandas
create_delta_table.py
import pandas as pd
from deltalake import DeltaTable, write_deltalake
student = {
"name": ["Alice", "Bob", "Carol"],
"ID": [0, 3, 7]
}
write_deltalake(f"/tmp/student", pd.DataFrame.from_dict(student))

In the following sections, we will first scan the Delta table to query its contents in Cypher, and then proceed to copy the data and construct a node table.

Scan the Delta table

LOAD FROM is a Cypher clause that scans a file or object element by element, but doesn’t actually move the data into a Kùzu table.

To scan the Delta table created above, you can do the following:

LOAD FROM '/tmp/student' (file_format='delta') RETURN *;
┌────────┬───────┐
│ name │ ID │
│ STRING │ INT64 │
├────────┼───────┤
│ Alice │ 0 │
│ Bob │ 3 │
│ Carol │ 7 │
└────────┴───────┘

Copy the Delta table into a node table

You can then use a COPY FROM statement to directly copy the contents of the Delta table into a Kùzu node table.

CREATE NODE TABLE student (name STRING, ID INT64, PRIMARY KEY(ID));
COPY student FROM '/tmp/student' (file_format='delta')

Just like above in LOAD FROM, the file_format parameter is mandatory when specifying the COPY FROM clause as well.

// First, create the node table
CREATE NODE TABLE student (name STRING, ID INT64, PRIMARY KEY(ID));
┌─────────────────────────────────┐
│ result │
│ STRING │
├─────────────────────────────────┤
│ Table student has been created. │
└─────────────────────────────────┘
COPY student FROM '/tmp/student' (file_format='delta');
┌─────────────────────────────────────────────────┐
│ result │
│ STRING │
├─────────────────────────────────────────────────┤
│ 3 tuples have been copied to the student table. │
└─────────────────────────────────────────────────┘

Access Delta tables hosted on S3

Kùzu also supports scanning/copying a Delta table hosted on S3 in the same way as from a local file system. Before reading and writing from S3, you have to configure the connection using the CALL statement.

Supported options

Option nameDescription
s3_access_key_idS3 access key id
s3_secret_access_keyS3 secret access key
s3_endpointS3 endpoint
s3_url_styleUses S3 url style (should either be vhost or path)
s3_regionS3 region

Requirements on the S3 server API

FeatureRequired S3 API features
Public file readsHTTP Range request
Private file readsSecret key authentication

Scan Delta table from S3

Reading or scanning a Delta table that’s on S3 is as simple as reading from regular files:

LOAD FROM 's3://kuzu-sample/sample-delta' (file_format='delta')
RETURN *

Copy Delta table hosted on S3 into a local node table

Copying from Delta tables on S3 is also as simple as copying from regular files:

CREATE NODE TABLE student (name STRING, ID INT64, PRIMARY KEY(ID));
COPY student FROM 's3://kuzu-sample/student-delta' (file_format='delta')

Limitations

When using the Delta Lake extension in Kùzu, keep the following limitations in mind.

  • Writing (i.e., exporting to) Delta files from Kùzu is currently not supported.