Run graph algorithms
One of the overarching goals of Kùzu is to function as the go-to graph database for data science
use cases. NetworkX is a popular library in Python for graph algorithms and data science. In this
section, we demonstrate Kùzu’s ease of use in exporting subgraphs to the NetworkX format using the
get_as_networkx()
function in the Python API. In addition, the following two capabilities are
demonstrated.
- Graph Visualization: We visualize subgraphs of interest via Kùzu explorer
- PageRank: We compute PageRank on an extracted subgraph, store these values back in Kùzu’s node tables and query them.
The dataset we will use for this exercise is the MovieLens dataset, available here. The small version of the dataset is used, which contains 610 user nodes, 9724 movie nodes, 100863 rates edges, and 3684 tags edges. The schema of the dataset is shown below.
You can download the dataset locally via wget.
Place the CSV files in a directory named movie_data
in the same directory in which you want the
database to be stored.
Insert data to Kùzu
The data is copied to a Kùzu database via the Python API as follows:
Visualize subgraphs in Kùzu Explorer
You can visualize the data in Kùzu Explorer as shown in the previous section. An example is shown below.
Export subgraph to NetworkX
You can extract only the subgraph between users and movies (ignoring tags) and convert it to a
NetworkX graph G
. This assumes that the network
package is installed via pip.
We output an undirected graph as the direction doesn’t matter for the PageRank algorithm.
Compute PageRank
We can compute the PageRank of the subgraph G
using NetworkX’s pagerank
function.
The movie nodes’ PageRanks along with their IDs can then be put into a Pandas DataFrame as follows:
Similarly, we can store the PageRanks for the user nodes in a Pandas DataFrame the same way:
Write PageRank values back to Kùzu
To write the values back to Kùzu, first update the node table schemas to include a new property
pagerank
.
An important feature of Kùzu is its ability to natively scan Pandas DataFrames in a zero-copy manner. This allows for efficient data transfer between your data in Python and Kùzu. The following code snippet shows how this is done for the movie nodes.
The same can be done for the user nodes.
Query PageRank values in Kùzu
You can run a query to print the top 20 pagerank movies to test that the upload worked:
And similarly, for the user nodes:
Further work
You’ve now seen how to use NetworkX to run algorithms on a Kùzu graph, and move data back and forth between Kùzu and Python.
There are numerous additional computations you can perform in NetworkX and store these results in Kùzu. See the tutorial notebook on Google Colab to try it for yourself!