Copy from subquery
You can bulk import the results of a subquery like MATCH ....
by attaching
that query as a subquery of a COPY FROM
statement. Because the COPY FROM
command is part of
the Data Definition Language (DDL) in Kùzu, it follows SQL’s semantics and hence, a subquery
is passed within parentheses ()
that contains a RETURN
clause.
Copy using a subquery is useful when you need to transform data
before inserting it into the database, or if you want to copy data from a LOAD FROM
scan operation
on a data structure that’s already in memory, such as a DataFrame.
Create node/relationship tables
Consider that we have a database of Person
, Product
and HasReward
relationships (where a
person is eligible for a product reward if they have made a certain number of purchases).
CREATE NODE TABLE Person(name STRING, num_purchases INT64, PRIMARY KEY (name));CREATE NODE TABLE Product(name STRING, price DOUBLE, PRIMARY KEY (name));CREATE REL TABLE HasReward(FROM Person TO Product);
COPY FROM
a MATCH
subquery
Now, let’s say we want to reward a particular product (e.g., a gift voucher
) to all the Person
s
who have made more than 10 purchases. We can do this with a MATCH
subquery as follows:
COPY HasReward FROM ( MATCH (p:Person) WHERE p.num_purchases > 10 RETURN p.name, "gift voucher")
COPY FROM
a LOAD FROM
scan subquery
An alternate use case for this feature would be when you want to directly scan data from an existing
object, such as a Pandas DataFrame using LOAD FROM
and use its results as input to the COPY FROM
command. This can be combined with predicate filters as follows:
import kuzuimport pandas as pd
db = kuzu.Database("tmp")conn = kuzu.Connection(db)
df = pd.DataFrame({ "name": ["Adam", "Karissa", "Zhang", "Noura"], "age": [30, 40, 50, 25]})
conn.execute("CREATE NODE TABLE Person(name STRING, age INT64, PRIMARY KEY (name))")
# Apply a predicate filter while scanning the DataFrame# Pass the results of the scan to the COPY FROM commandconn.execute("COPY Person FROM (LOAD FROM df WHERE age < 30 RETURN *)")
You can similarly use this approach to subset your data, for example, read only a part of your DataFrame, Parquet or CSV file, and then copy that subset into Kùzu.
# Load specific columns onlyconn.execute("COPY Person FROM (LOAD FROM df RETURN name)")