Skip to content
Blog

Data types

Kùzu supports a set of primitive and nested data types both for node and relationship properties as well as for forming expressions whose outputs are specified using these data types. This section shows all built-in data types.

INT8

SizeDescription
1 bytesigned one-byte integer

INT16

SizeDescription
2 bytessigned two-byte integer

INT32

SizeDescriptionAliases
4 bytessigned four-byte integerINT

INT64

SizeDescriptionAliases
8 bytessigned eight-byte integerSERIAL

INT128

SizeDescription
16 bytessigned sixteen-byte integer

UINT8

SizeDescription
1 byteunsigned one-byte integer

UINT16

SizeDescription
2 bytesunsigned two-byte integer

UINT32

SizeDescription
4 bytesunsigned four-byte integer

UINT64

SizeDescription
8 bytesunsigned eight-byte integer

FLOAT

SizeDescriptionAliases
4 bytessingle precision floating-point numberREAL, FLOAT4

DOUBLE

SizeDescriptionAliases
8 bytesdouble precision floating-point numberFLOAT8

BOOLEAN

SizeDescription
1 bytetrue/false

STRUCT

SizeDescription
fixeda dictionary or map where keys are of type STRING

UUID

SizeDescription
16 bytessigned sixteen-byte integer

The data type UUID stores Universally Unique Identifiers (UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards. Kuzu follows PostgreSQL’s implementation for the UUID format.

Example:

RETURN UUID('A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11') as result;

Output:

---------------------------------------------
| result |
---------------------------------------------
| a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11 |
---------------------------------------------

STRING

SizeDescription
variablevariable-length character string

STRING data type supports UTF-8 encoding.

Example:

RETURN 'Зарегистрируйтесь, σπαθιοῦ, Yen [jɛn], kΩ' AS str;

Output:

---------------------------------------------
| str |
---------------------------------------------
| Зарегистрируйтесь, σπαθιοῦ, Yen [jɛn], kΩ |
---------------------------------------------

NULL

SizeDescription
fixedspecial value to represent unknown data

NULLs are special values to represent unknown data. Every node/relationship property or result of any expression can be NULL in addition to the non-NULL domain of values they can take. For example, boolean expressions can be true, false or NULL.

The NULL (in any of its case variations, such as Null or null) can be used to specify a null literal. Some examples of comparisons using NULL are shown below.

Compare a value with NULL:

RETURN 3 = null;

Output:

------------
| 3 = null |
------------
| |
------------

Compare NULL with NULL:

RETURN null = null;

Output:

---------------
| null = null |
---------------
| |
---------------

Kùzu’s CLI returns an empty cell to indicate nulls.

DATE

SizeDescription
4 bytesyear, month, day

DATE is specified in ISO-8601 format (YYYY-MM-DD).

Example:

RETURN date('2022-06-06') as x;

Output:

--------------
| x |
--------------
| 2022-06-06 |
--------------

TIMESTAMP

SizeDescription
4 bytescombination of time and date

TIMESTAMP combines date and a time (hour, minute, second, millisecond) and is formatted according to the ISO-8601 format (YYYY-MM-DD hh:mm:ss[.zzzzzz][+-TT[:tt]]), which specifies the date (YYYY-MM-DD), time (hh:mm:ss[.zzzzzz]) and a time offset [+-TT[:tt]]. Only the Date part is mandatory. If time is specified, then the millisecond [.zzzzzz] part and the time offset are optional.

Example:

RETURN timestamp("1970-01-01 00:00:00.004666-10") as x;

Output:

------------------------------
| x |
------------------------------
| 1970-01-01 10:00:00.004666 |
------------------------------

INTERVAL

SizeDescriptionAliases
4 bytesdate/time differenceDURATION

INTERVAL consists of multiple date parts and represents the total time length of these date parts. Kùzu follows DuckDB’s implementation for the interval format.

Example:

RETURN interval("1 year 2 days") as x;

Output:

-----------------
| x |
-----------------
| 1 year 2 days |
-----------------

BLOB

SizeDescriptionAliases
variablearbitrary binary objectBYTEA

BLOB(Binary Large OBject) allows storage of an arbitrary binary object with up to 4KB in size in Kùzu. The database processes it as binary data because it has no knowledge as to what the underlying data represents (e.g. image, video).

Below is an example of how to create a blob object with 3 bytes (188, 189, 186, 170):

RETURN BLOB('\\xBC\\xBD\\xBA\\xAA') as result;

Output:

---------------------------------------------
| result |
---------------------------------------------
| \xBC\xBD\xBA\xAA |
---------------------------------------------

SERIAL

SERIAL is a logical data type and usually used for creating an incremental sequence of unique identifier column (similar to AUTO_INCREMENT supported by some other databases).

Here’s how to use SERIAL on a primary key column for a CSV file that has the following values:

person.csv
Alice
Bob
Carol
CREATE NODE TABLE Person(ID SERIAL, name STRING, PRIMARY KEY(ID));
COPY Person FROM `person.csv`;
MATCH (a:Person) RETURN a;

Output:

-------------------------------------------
| a |
-------------------------------------------
| (label:Person, 3:0, {ID:0, name:Alice}) |
-------------------------------------------
| (label:Person, 3:1, {ID:1, name:Bob}) |
-------------------------------------------
| (label:Person, 3:2, {ID:2, name:Carol}) |
-------------------------------------------

NODE

SizeDescription
fixedrepresents a node in a graph

NODE is a logical data type. Internally, NODE is processed as STRUCT type. A NODE always contains an internal ID field with key _ID and a label field with key _LABEL. The rest fields are node properties.

Here’s how to return NODE column:

MATCH (a:User)
RETURN a;

Output:

----------------------------------------------------
| a |
----------------------------------------------------
| {_ID: 0:0, _LABEL: User, name: Adam, age: 30} |
----------------------------------------------------
| {_ID: 0:1, _LABEL: User, name: Karissa, age: 40} |
----------------------------------------------------
| {_ID: 0:2, _LABEL: User, name: Zhang, age: 50} |
----------------------------------------------------
| {_ID: 0:3, _LABEL: User, name: Noura, age: 25} |
----------------------------------------------------

REL

SizeDescription
fixedrepresents a relationship in a graph

REL is a logical type. Internally, REL is processed as STRUCT type. A REL always contains a src ID field with key _SRC, a dst ID field with key _DST, an internal ID field with key _ID and a label field with key _LABEL. The rest fields are rel properties.

Here’s how to return REL column:

MATCH (a:User)-[e:Follows]->(b:User)
RETURN e;

Output:

---------------------------------------------------------
| e |
---------------------------------------------------------
| (0:0)-{_LABEL: Follows, _ID: 2:0, since: 2020}->(0:1) |
---------------------------------------------------------
| (0:0)-{_LABEL: Follows, _ID: 2:1, since: 2020}->(0:2) |
---------------------------------------------------------
| (0:1)-{_LABEL: Follows, _ID: 2:2, since: 2021}->(0:2) |
---------------------------------------------------------
| (0:2)-{_LABEL: Follows, _ID: 2:3, since: 2022}->(0:3) |
---------------------------------------------------------

LIST and ARRAY

Kùzu supports two list-like data types: (i) variable-length lists, simply called LIST, and (ii) fixed-length lists, called ARRAY. Click on the card below to learn more about them.

VARIANT

Variant is a data type that can store values of various data types (similar to the sql_variant data type of SQLServer). Currently it can only be used to store RDF literals in RDFGraphs. That is, you cannot create a regular node or relationship table that holds a column of type VARIANT. When working with RDFGraphs, the Literals node table’s val column stores RDF literal values. RDF literals, and Kùzu’s Variant data type can store values of different data types. For example, consider the following triples in a Turtle file:

@prefix kz: <http://kuzu.io/rdf-ex#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
kz:Waterloo a kz:City ;
kz:name "Waterloo" ;
kz:population 10000 ;
kz:altitude1 "329.0"^^xsd:decimal .

Suppose that you insert these into an RDFGraph named UniKG. You will get the following values in the val column of the Literals node table UniKG_l:

MATCH (a:UniKG_r)-[p:UniKG_lt]->(o:UniKG_l)
RETURN a.iri, p.iri, o.val;

Output:

-------------------------------------------------------------------------------------------------
| a.iri | p.iri | o.val |
-------------------------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#Waterloo | http://kuzu.io/rdf-ex#altitude1 | 329.000000 |
-------------------------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#Waterloo | http://kuzu.io/rdf-ex#population | 10000 |
-------------------------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#Waterloo | http://kuzu.io/rdf-ex#name | Waterloo |
-------------------------------------------------------------------------------------------------

In the output above the data types of the values in o.val are as follows (data types are not rendered in Kùzu cli’s output)

  • 329.000000 is a double
  • 10000 is an integer
  • ”Waterloo” is a string

These different types are stored under the same column val of the Literals node table. The following Kùzu data types can be stored in a Variant column. You can use the cast function to cast a value to a specific data type before storing it in a Variant column (as will be demonstrated in the CREATE statement examples momentarily).

Kùzu Data TypeCAST Function Example
INT8cast(2, “INT8”)
INT16cast(2, “INT16”)
INT32cast(2, “INT32”)
INT64cast(2, “INT64”)
UINT8cast(2, “UINT8”)
UINT16cast(2, “UINT16”)
UINT32cast(2, “UINT32”)
UINT64cast(2, “UINT64”)
DOUBLEcast(4.4, “DOUBLE”)
FLOATcast(4.4, “FLOAT”)
BLOBcast(“\xB2”, “BLOB”)
BOOLcast(“true”, “BOOL”)
STRINGcast(123, “STRING”)
DATEcast(“2024-01-01”, “DATE”)
TIMESTAMPcast(“2024-01-01 11:25:30Z+00:00”, “TIMESTAMP”)
INTERVALcast(“1 year”, “INTERVAL”)

For example, the below code adds new triples into an RDFGraph with type date and float, respectively:

CREATE (a:UniKG_r {iri:"http://kuzu.io/rdf-ex#foo"})-[p:UniKG_lt {iri:"http://kuzu.io/rdf-ex#datepredicate"}]->(o:UniKG_l {val:cast("2024-01-01", "DATE")});
CREATE (a:UniKG_r {iri:"http://kuzu.io/rdf-ex#foo"})-[p:UniKG_lt {iri:"http://kuzu.io/rdf-ex#doublepredicate"}]->(o:UniKG_l {val:4.4});

Above, DATE type needs to be cast explicitly as in “cast(“2024-01-01”, “DATE”)” while 4.4, which is of type DOUBLE, can be provided as is. This is because DATE is not an automatically inferred data type. The above two CREATE statements will create the following two triples:

----------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#foo | http://kuzu.io/rdf-ex#doublepredicate | 4.400000 |
----------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#foo | http://kuzu.io/rdf-ex#datepredicate | 2024-01-01 |
----------------------------------------------------------------------------------

PATH

PATH is a logical type. Internally, PATH is processed as STRUCT type, more specifically, a STRUCT{LIST[NODE], LIST[REL]}. A PATH always contains a nodes field with the key _NODES and a relationships field with the key _RELS.

Return PATH column

MATCH p = (a:User)-[:Follows]->(b:User)
WHERE a.name = 'Adam' AND b.name = 'Karissa'
RETURN p;

Output:

{_NODES: [{_ID: 0:0, _LABEL: User, name: Adam, age: 30},{_ID: 0:1, _LABEL: User, name: Karissa, age: 40}], _RELS: [(0:0)-{_LABEL: Follows, _ID: 2:0, since: 2020}->(0:1)]}

Access all nodes on a path

MATCH p = (a:User)-[:Follows]->(b:User)
WHERE a.name = 'Adam' AND b.name = 'Karissa'
RETURN nodes(p);

Output:

[{_ID: 0:0, _LABEL: User, name: Adam, age: 30},{_ID: 0:1, _LABEL: User, name: Karissa, age: 40}]

Access all rels on a path

MATCH p = (a:User)-[:Follows]->(b:User)
WHERE a.name = 'Adam' AND b.name = 'Karissa'
RETURN rels(p);

Output:

[(0:0)-{_LABEL: Follows, _ID: 2:0, since: 2020}->(0:1)]