Merge
For small graphs (a few thousand nodes), the MERGE
Cypher clauses
can be used to insert nodes and
relationships to existing tables in Kùzu. These are similar to SQL’s INSERT
statements, but bear in
mind that they are slower than COPY FROM
, which is optimized for bulk inserts. It’s generally
recommended that the MERGE
clause is only used to do small additions or updates on a sporadic basis.
If you want to do bulk inserts, see the COPY FROM page for instructions for your input format.
MERGE
single nodes or relationships
Say you have an existing node table User
with name
and age
properties and you want to insert a new user.
MATCH (a:User) RETURN a
Initially, there is just one user in the database:
┌────────┬───────┐│ a.name │ a.age ││ STRING │ INT64 │├────────┼───────┤│ Alice │ 25 │└────────┴───────┘
Now you want to insert a new user with the name Bob
and age 30
that was obtained from an
external source.
MERGE (u:User {name: 'Bob', age: 30})
The result will be:
┌────────┬───────┐│ a.name │ a.age ││ STRING │ INT64 │├────────┼───────┤│ Alice │ 25 ││ Bob │ 30 │└────────┴───────┘
The same approach can be used to import relationships — just ensure that the node tables and their data with the primary key values from your source data exist before trying to merge relationships.
MERGE
from DataFrames
If you have a DataFrame whose entire data you want to MERGE
into an existing node or relationship table,
you can use the LOAD FROM
clause in combination with the MERGE
clause.
Let’s see this in action with an example.
import kuzuimport pandas as pd
db = kuzu.Database('test_db')conn = kuzu.Connection(db)
df = pd.DataFrame({ 'name': ['Karissa', 'Rhea', 'James'], 'current_city': ['Seattle', 'New York', 'San Francisco'], 'item': ['smartphone', 'laptop', 'headphones']})
# Create tablesconn.execute("CREATE NODE TABLE Person (name STRING, current_city STRING, PRIMARY KEY (name))")conn.execute("CREATE NODE TABLE Item (name STRING, PRIMARY KEY (name))")conn.execute("CREATE REL TABLE Purchased (FROM Person TO Item)")# Copy data into tablesconn.execute("COPY Person FROM (LOAD FROM df RETURN name, current_city)")conn.execute("COPY Item FROM (LOAD FROM df RETURN item)")conn.execute("COPY Purchased FROM (LOAD FROM df RETURN name, item)")# Query datares = conn.execute("MATCH (p:Person)-[:PURCHASED]->(i:Item) RETURN p.*, i.*")print(res.get_as_df())
Initially, there are 3 rows in the Person
and Item
tables:
p.name p.current_city i.name0 Karissa Seattle smartphone1 Rhea New York laptop2 James San Francisco headphones
Now, say you obtain updated information about the users Karissa and Rhea who purchased new items.
Also, say the user Karissa has moved to a new city, so her current_city
is now Boston
.
df = pd.DataFrame({ 'name': ['Karissa', 'Rhea'], 'current_city': ['Boston', 'New York'], 'item': ['headphones', 'smartphone']})
You can avoid using a for loop by using the LOAD FROM
clause in combination with the MERGE
clause
to merge the rows in the DataFrame into the database, all at once.
conn.execute( """ LOAD FROM df MERGE (p:Person {name: name}) MERGE (i:Item {name: item}) MERGE (p)-[:PURCHASED]->(i) ON MATCH SET p.current_city = current_city ON CREATE SET p.current_city = current_city """)
# Query datares = conn.execute("MATCH (p:Person)-[:PURCHASED]->(i:Item) RETURN p.*, i.*")print(res.get_as_df())
The following steps are performed:
- The
LOAD FROM
clause loads the data from the DataFrame into the database. - The first two
MERGE
clauses merge the data into thePerson
andItem
tables. - The third
MERGE
clause merges the data into thePurchased
relationship table. - The
ON MATCH SET
clause updates thecurrent_city
property for the user Karissa if she is already in the database. - The
ON CREATE SET
clause sets thecurrent_city
property for the user Karissa if she is created (i.e., if she is not already in the database).
The resulting data looks like this:
p.name p.current_city i.name0 Karissa Boston smartphone1 Karissa Boston headphones2 Rhea New York laptop3 Rhea New York smartphone4 James San Francisco headphones