Skip to content
Blog

Scan data from various sources

Cypher supports the LOAD FROM clause to scan data from various sources. Scanning is the act of reading data from a source, but not inserting it into the database. Inserting data into the database involves “copying”; see the Import data section for details.

Scanning data using LOAD FROM is useful to inspect a subset of your data, understand its structure, and perform transformations like rearranging columns.

General usage

Say you have a user.csv file that looks like this:

user.csv
name,age
Adam,30
Karissa,40
Zhang,50
Noura,25

You can scan the file and count the number of rows:

LOAD FROM "user.csv" (header = true)
RETURN COUNT(*);

This counts the number of rows in the file.

┌──────────────┐
│ COUNT_STAR() │
│ INT64 │
├──────────────┤
│ 4 │
└──────────────┘

You can also apply filter predicates via the WHERE clause, like this:

LOAD FROM "user.csv" (header = true)
WHERE age > 25
RETURN COUNT(*);

The above query counts only the rows where the age column is greater than 25.

┌──────────────┐
│ COUNT_STAR() │
│ INT64 │
├──────────────┤
│ 3 │
└──────────────┘

Note that when scanning from CSV files, Kuzu will attempt to auto-cast the data to the correct type when possible. For example, the age column is cast to an INT64 type.

You can reorder the columns by simply returning them in the order you want. The LIMIT keyword can be used to limit the number of rows returned. The example below returns the first two rows, with the age and name columns in the order specified.

LOAD FROM "user.csv" (header = true)
RETURN age, name
LIMIT 2;
┌───────┬─────────┐
│ age │ name │
│ INT64 │ STRING │
├───────┼─────────┤
│ 30 │ Adam │
│ 40 │ Karissa │
└───────┴─────────┘

More features

LOAD FROM is a general-purpose clause in Cypher for scanning data from various data sources, including files and in-memory DataFrames. See the detailed documentation below for more details on how to use the LOAD FROM clause.