Skip to content
Blog

Import NumPy

The .npy format is the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk.

The primary use case for bulk loading NumPy files is to load large node features or vectors that are stored in .npy format. You can use the COPY FROM statement to import a set of *.npy files into a node table.

Import to node table

Consider a Paper table with an id column, a feature column that is an embedding (vector) with 768 dimensions, a year column and a label column as ground truth. We first define the schema with the following statement:

CREATE NODE TABLE Paper(id INT64, feat FLOAT[768], year INT64, label DOUBLE, PRIMARY KEY(id));

The raw data is stored in .npy format where each column is represented as a NumPy array on disk. The files are specified below:

node_id.npy", "node_feat_f32.npy", "node_year.npy", "node_label.npy"

We can copy the files with the following statement:

COPY Paper FROM ("node_id.npy", "node_feat_f32.npy", "node_year.npy", "node_label.npy") BY COLUMN;

As stated before, the number of *.npy files must equal the number of columns, and must also be specified in the same order as they are defined in the DDL.

Ignore erroneous rows

See the Ignore erroneous rows section for more details.