Skip to content
Blog

Import data

There are multiple ways to import data in Kùzu. The only prerequisite for inserting data into a database is that you first create a graph schema, i.e., the structure of your node and relationship tables.

For small graphs (a few thousand nodes), the CREATE and MERGE Cypher clauses can be used to insert nodes and relationships. These are similar to SQL’s INSERT statements, but bear in mind that they are slower than the bulk import options shown below. The CREATE/MERGE clauses are intended to do small additions or updates on a sporadic basis.

In general, the recommended approach is to use COPY FROM (rather than creating or merging nodes one by one), for larger graphs of millions of nodes and beyond. This is the fastest way to bulk insert data into Kùzu.

COPY FROM CSV

The COPY FROM command is used to bulk import data from a CSV file into a node or relationship table. See the linked card below for more information and examples.

COPY FROM Parquet

Similar to CSV, the COPY FROM command is used to bulk import data from a Parquet file into a node or relationship table. See the linked card below for more information and examples.

COPY FROM NumPy

Importing from NumPy is a specific use case that allows you to import numeric data from a NumPy file into a node table.

COPY FROM subquery results

You can copy data from a subquery’s results into a node or relationship table. This is useful when you need to modify existing data and re-insert it into the database, or if youw want to copy data from a LOAD FROM scan operation on another data format.

COPY FROM DataFrames

You can copy data from a Pandas or Polars DataFrame into a node or relationship table. This is useful when you are already doing your data transformations in either Pandas or Polars DataFrames and want to directly insert results from their columns into a Kùzu table.

COPY FROM JSON

Using Kùzu’s JSON extension, you can copy data directly from JSON files that store nested data. This is useful when you have semi-structured data that you want to analyze as a graph.

COPY FROM a partial subset

In certain cases, you may only want to partially fill your Kùzu table using the data from your input source. This is common in cases where you want to create extra columns during DDL that will hold a default value, and only a subset of the Kùzu table’s columns are filled during the COPY operation.

This is better illustrated with an example. Consider a case where you have a CSV file person.csv that has 3 columns: id, name and age. Assume you have a data model where you want to attach and addtional property to a Person node, i.e., the address they live in, initially populated as nulls but to be filled in with appropriate values at a later time.

The DDL for the Person node table is as follows:

CREATE NODE TABLE Person(id INT64, name STRING, age INT64, address STRING, PRIMARY KEY(id));

However, the given CSV file to COPY FROM has only 3 columns — the address information will be obtained from another source.

In such a case, you can explicitly tell the COPY pipeline to partially fill the node table and only map the named columns from the CSV file as follows:

COPY Person(id, name, age) FROM 'person.csv'

This will only copy the id, name and age columns from the CSV file, while filling the fourth column, address with null values. These values can be populated at a later time via MERGE.