Skip to content
Blog

Import data from CSV files

You can bulk import data to node and relationship tables from CSV files using the COPY FROM command. It is highly recommended to use COPY FROM if you are creating large databases. You can use COPY FROM to import data into an empty table or to append data to an existing table.

There are a set of CSV configurations you can set during COPY FROM (as well as LOAD FROM). We will cover these parameters below in the CSV Configurations section. This page first covers examples of loading into node and relationship tables. Please see the section about Ignoring Erroneous Rows on how to skip erroneous CSV rows during the import.

Import to node table

Create a node table User as follows:

CREATE NODE TABLE User(name STRING, age INT64, reg_date DATE, PRIMARY KEY (name))

The CSV file user.csv contains the following fields:

name,age,reg_date
Adam,30,2020-06-22
Karissa,40,2019-05-12
...

The following statement will load user.csv into User table.

COPY User FROM "user.csv" (header=true);

Import to relationship table

When loading into a relationship table, Kùzu assumes the first two columns in the file are:

  • FROM Node Column: The primary key of the FROM nodes.
  • TO Node Column: The primary key of the TO nodes.

The rest of the columns correspond to relationship properties.

Create a relationship table Follows using the following Cypher query:

CREATE REL TABLE Follows(FROM User TO User, since DATE)

This reads data from the below CSV file follows.csv:

Adam,Karissa,2010-01-30
Karissa,Michelle,2014-01-30
...

The following statement loads the follows.csv file into a Follows table.

COPY Follows FROM "follows.csv";

Note that the header wasn’t present in the CSV file, hence the header parameter is not set.

To skip the first 3 lines of the CSV file, you can use the SKIP parameter as follows:

COPY Follows FROM "follows.csv" (SKIP=3);

Import multiple files to a single table

It is common practice to divide a large CSV file into several smaller files for cleaner data management. Kùzu can read multiple files with the same structure, consolidating their data into a single node or relationship table. You can specify that multiple files are loaded in the following ways:

Glob pattern

This is similar to the Unix glob pattern, where you specify file paths that match a given pattern. The following wildcard characters are supported:

WildcardDescription
*match any number of any characters (including none)
?match any single character
[abc]match any one of the characters enclosed within the brackets
[a-z]match any one of the characters within the range
COPY User FROM "User*.csv"

List of files

Alternatively, you can just specify a list of files to be loaded.

COPY User FROM ["User0.csv", "User0.csv", "User2.csv"]

CSV Configurations

There are a set of configurations that can be set when importing CSV files, such as whether the CSV file has a header that should be skipped during loading or what the delimiter character between the columns of the CSV is. See below for the list of all supported configurations. These configurations can be manually set by specifying parameters inside ( ) at the end of the COPY FROM clause. Several of the supported configurations, such as the header and delimiter characters, are automatically detected if they are not manually specified at the end of COPY FROM clause. See the Auto Detecting Configurations section for more details how Kùzu automatically detects these configurations.

The following table shows the configuration parameters supported:

ParameterDescriptionDefault Value
HEADERWhether the first line of the CSV file is the header. Can be true or false.false
DELIM or DELIMITER Character that separates different columns in a lines.,
QUOTECharacter to start a string quote."
ESCAPECharacter within string quotes to escape QUOTE and other characters, e.g., a line break.
See the important note below about line breaks lines below.
\
SKIPNumber of rows to skip from the input file0
PARALLELRead CSV files in parallel or nottrue
IGNORE_ERRORSSkips malformed rows in CSV files if set to true. Use SHOW_WARNINGS function to view information about malformed rows. Also see CLEAR_WARNINGS function. See more on Warnings table to inspect skipped rows.false
auto_detectTurn ON/OFF the auto detection of configurations (more details below)true
sample_sizeThe number of sample CSV lines to use when auto detecting CSV configurations (more details below)256

For example, the query below specifies that the CSV delimiter is | and also that the header row exists.

COPY User FROM "user.csv" (HEADER=true, DELIM="|");

Note on Boolean options: Any option that is a Boolean can be enabled or disabled in multiple ways. You can write true, or 1 to enable the option (e.g., (HEADER=true) or (HEADER=1)) , and false or 0 to disable it (e.g., (HEADER=false) or (HEADER=0)). The Boolean value can also be omitted (e.g., by only passing (HEADER)), in which case true is assumed. Finally, the assignment operator = can also be omitted and replaced with space (e.g., (HEADER true) is equivalent to (HEADER=true)).

Auto Detecting Configurations

If any of the following configuration options are not manually specified at the end of the COPY FROM statement, by default Kùzu will try to automatically detect them:

  • HEADER
  • DELIM
  • QUOTE
  • ESCAPE

If you specify a subset of these manually but not the others, then only those that have not been specified will be automatically detected. You can turn off auto-detection by setting (auto_detect=false) as a parameter, in which case Kùzu will default to using the default values for any of the unspecified configurations. For example, consider the example from above again:

COPY User FROM "user.csv" (HEADER=true, DELIM="|");

In this case (which is equivalent to COPY User FROM "user.csv" (HEADER=true, DELIM="|", auto_detect=true)), Kùzu will try to automatically detect the QUOTE and ESCAPE characters. It will not try to automatically detect if the first line is a header line or the DELIM character, since those configurations are manually specified in the query. If instead the query was:

COPY User FROM "user.csv" (HEADER=true, DELIM="|", auto_detect=false);

Then, Kùzu will use the default values of QUOTE and ESCAPE, which are " and " respectively (and use the manually specified configurations for HEADER and DELIM).

sample_size: By default, Kùzu will use the first 256 lines of the CSV file to auto-detect unspecified configurations. If you want to use a different number of lines, you can specify the sample_size parameter.

For interested users, below are more details of how Kùzu automatically tries to detect these configurations.

HEADER auto detection parses the first line of the CSV into columns and checks if each column can be cast to the data type of the target column in the node or rel table that is being copied into. If so, the line is assumed to be a valid “data” line and inserted as a record into the target table. Otherwise, it is assumed to be a header line and skipped.

DELIM, QUOTE, ESCAPE auto detection Kùzu uses the first sample_size lines to auto detect any configuration that has not been manually specified. The possible configurations for different configurations are:

  • DELIM: ,, |, ;, \t.
  • QUOTE: ", ' and (no quote character)
  • ESCAPE: ", ', \ and (no escape character)

For the unspecified configurations, Kùzu considers parsing the samples lines it scans (see the sample_size parameter) for each possible configuration combination and then picks the configuration combination that successfully parses the most lines and with the most consistent number of columns in each row.

Ignore erroneous rows

See the Ignore erroneous rows section for more details.