Delta Lake
Usage
The delta
extension adds support for scanning/copying from the Delta Lake open-source storage format
.
Delta Lake is an open-source storage framework that enables building a format agnostic Lakehouse architecture.
Using this extension, you can interact with Delta tables from within Kùzu using the LOAD FROM
and COPY FROM
clauses.
The Delta functionality is not available by default, so you would first need to install the DELTA
extension by running the following commands:
Example dataset
Let’s look at an example dataset to demonstrate how the Delta extension can be used.
Firstly, let’s create a Delta table containing student information using Python and save the Delta table in the '/tmp/student'
directory:
Before running the script, make sure the deltalake
Python package is properly installed (we will also use Pandas).
In the following sections, we will first scan the Delta table to query its contents in Cypher, and then proceed to copy the data and construct a node table.
Scan the Delta table
LOAD FROM
is a Cypher clause that scans a file or object element by element, but doesn’t actually
move the data into a Kùzu table.
To scan the Delta table created above, you can do the following:
Copy the Delta table into a node table
You can then use a COPY FROM
statement to directly copy the contents of the Delta table into a Kùzu node table.
Just like above in LOAD FROM
, the file_format
parameter is mandatory when specifying the COPY FROM
clause as well.
Access Delta tables hosted on S3
Kùzu also supports scanning/copying a Delta table hosted on S3 in the same way as from a local file system. Before reading and writing from S3, you have to configure the connection using the CALL statement.
Supported options
Option name | Description |
---|---|
s3_access_key_id | S3 access key id |
s3_secret_access_key | S3 secret access key |
s3_endpoint | S3 endpoint |
s3_url_style | Uses S3 url style (should either be vhost or path) |
s3_region | S3 region |
Requirements on the S3 server API
Feature | Required S3 API features |
---|---|
Public file reads | HTTP Range request |
Private file reads | Secret key authentication |
Scan Delta table from S3
Reading or scanning a Delta table that’s on S3 is as simple as reading from regular files:
Copy Delta table hosted on S3 into a local node table
Copying from Delta tables on S3 is also as simple as copying from regular files:
Limitations
When using the Delta Lake extension in Kùzu, keep the following limitations in mind.
- Writing (i.e., exporting to) Delta files from Kùzu is currently not supported.