Skip to content
Blog

Data types

Kùzu supports a set of primitive and nested data types both for node and relationship properties as well as for forming expressions whose outputs are specified using these data types. This section shows all built-in data types.

INT8

SizeDescription
1 bytesigned one-byte integer

INT16

SizeDescription
2 bytessigned two-byte integer

INT32

SizeDescriptionAliases
4 bytessigned four-byte integerINT

INT64

SizeDescriptionAliases
8 bytessigned eight-byte integerSERIAL

INT128

SizeDescription
16 bytessigned sixteen-byte integer

UINT8

SizeDescription
1 byteunsigned one-byte integer

UINT16

SizeDescription
2 bytesunsigned two-byte integer

UINT32

SizeDescription
4 bytesunsigned four-byte integer

UINT64

SizeDescription
8 bytesunsigned eight-byte integer

FLOAT

SizeDescriptionAliases
4 bytessingle precision floating-point numberREAL, FLOAT4

DOUBLE

SizeDescriptionAliases
8 bytesdouble precision floating-point numberFLOAT8

DECIMAL

SizeDescription
variablearbitrary fixed precision decimal number

For numbers where exact precision is required, the DECIMAL data type can be used. The DECIMAL type is specified as DECIMAL(precision, scale), where precision is the total number of digits and scale is the number of digits to the right of the decimal point.

Internally, decimals are represented as integers depending on their specified width.

PrecisionInternalSize (bytes)
1-4INT162
5-9INT324
10-18INT648
19-38INT12816

You can explicitly cast a number (either integer or float) to a DECIMAL as follows:

RETURN CAST(127.3, "DECIMAL(5, 2)") AS result;

Output:

┌───────────────┐
│ result │
│ DECIMAL(5, 2) │
├───────────────┤
│ 127.30 │
└───────────────┘

Note that if you attempt to cast with a precision or scale that is too small, an overflow exception will be raised:

RETURN CAST(127.3, "DECIMAL(4, 2)");
Error: Overflow exception: To Decimal Cast Failed: 127.300000 is not in DECIMAL(4, 2) range

BOOLEAN

SizeDescription
1 bytetrue/false

UUID

SizeDescription
16 bytessigned sixteen-byte integer

The data type UUID stores Universally Unique Identifiers (UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards. Kuzu follows PostgreSQL’s implementation for the UUID format.

Example:

RETURN UUID('A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11') as result;

Output:

┌──────────────────────────────────────┐
│ result │
│ UUID │
├──────────────────────────────────────┤
│ a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11 │
└──────────────────────────────────────┘

STRING

SizeDescription
variablevariable-length character string

STRING data type supports UTF-8 encoding.

Example:

RETURN 'Зарегистрируйтесь, σπαθιοῦ, Yen [jɛn], kΩ' AS str;

Output:

┌───────────────────────────────────────────┐
│ str │
│ STRING │
├───────────────────────────────────────────┤
│ Зарегистрируйтесь, σπ... │
└───────────────────────────────────────────┘

NULL

SizeDescription
fixedspecial value to represent unknown data

NULLs are special values to represent unknown data. Every node/relationship property or result of any expression can be NULL in addition to the non-NULL domain of values they can take. For example, boolean expressions can be true, false or NULL.

The NULL (in any of its case variations, such as Null or null) can be used to specify a null literal. Some examples of comparisons using NULL are shown below.

Compare a value with NULL:

RETURN 3 = null;

Output:

┌────────────┐
│ EQUALS(3,) │
│ BOOL │
├────────────┤
│ │
└────────────┘

Compare NULL with NULL:

RETURN null = null;

Output:

┌───────────┐
│ EQUALS(,) │
│ BOOL │
├───────────┤
│ │
└───────────┘

Kùzu’s CLI returns an empty cell to indicate nulls.

DATE

SizeDescription
4 bytesyear, month, day

DATE is specified in ISO-8601 format (YYYY-MM-DD).

Example:

RETURN date('2022-06-06') as x;

Output:

┌────────────┐
│ x │
│ DATE │
├────────────┤
│ 2022-06-06 │
└────────────┘

TIMESTAMP

SizeDescription
4 bytescombination of time and date

TIMESTAMP combines date and a time (hour, minute, second, millisecond) and is formatted according to the ISO-8601 format (YYYY-MM-DD hh:mm:ss[.zzzzzz][+-TT[:tt]]), which specifies the date (YYYY-MM-DD), time (hh:mm:ss[.zzzzzz]) and a time offset [+-TT[:tt]]. Only the Date part is mandatory. If time is specified, then the millisecond [.zzzzzz] part and the time offset are optional.

Example:

RETURN timestamp("1970-01-01 00:00:00.004666-10") as x;

Output:

┌────────────────────────────┐
│ x │
│ TIMESTAMP │
├────────────────────────────┤
│ 1970-01-01 10:00:00.004666 │
└────────────────────────────┘

INTERVAL

SizeDescriptionAliases
4 bytesdate/time differenceDURATION

INTERVAL consists of multiple date parts and represents the total time length of these date parts. Kùzu follows DuckDB’s implementation for the interval format.

Example:

RETURN interval("1 year 2 days") as x;

Output:

┌───────────────┐
│ x │
│ INTERVAL │
├───────────────┤
│ 1 year 2 days │
└───────────────┘

STRUCT

A STRUCT is a mapping of key-value pairs where the keys are of the type STRING. STRUCT is a fixed-size data type so values with the same STRUCT type must contain the same set of key-value pairs. You can think of a STRUCT column as a nested single column over multiple other columns.

Data TypeDDL definition
STRUCTSTRUCT(a INT64, b INT64)

To construct a STRUCT, provide a mapping of keys to values as follows:

RETURN {first: 'Adam', last: 'Smith'};

Output:

┌───────────────────────────────────┐
│ STRUCT_PACK(first,last) │
│ STRUCT(first STRING, last STRING) │
├───────────────────────────────────┤
│ {first: Adam, last: Smith} │
└───────────────────────────────────┘

You can extract a value from a STRUCT using the dot notation:

WITH {first: 'Adam', last: 'Smith'} AS full_name
RETURN full_name.first AS first_name;

Output:

┌────────────┐
│ first_name │
│ STRING │
├────────────┤
│ Adam │
└────────────┘

Alternatively you can use the struct_extract() function

WITH {first:'Adam', last: 'Smith'} AS full_name
RETURN struct_extract(full_name, 'first') AS first_name;

Functions that work on STRUCTs can be found here.

MAP

A MAP is a dictionary of key-value pairs where all keys have the same type and all values have the same type. MAP is similar to STRUCT in that it is an ordered list of mappings. However, MAP does not need to have the same keys present for each row, and is thus more suitable when the schema of an entity is unknown beforehand or when the schema varies per row.

MAPs must have a single type for all keys, and a single type for all values. Additionally, keys of a MAP do not need to be STRINGs like they do in a STRUCT.

Data TypeDDL definition
MAPMAP(STRING, INT64)

To construct a MAP, provide a list of keys and a list of values. The keys and values must be of the same length.

Example:

RETURN map([1, 2], ['a', 'b']) AS m;

Output:

┌────────────────────┐
│ m │
│ MAP(INT64, STRING) │
├────────────────────┤
│ {1=a, 2=b} │
└────────────────────┘

Functions that work on map objects can be found here.

UNION

Similar to C++ std::variant, UNION is a nested data type that is capable of holding multiple alternative values with different types. The value under key "tag" is considered as the value being currently hold by the UNION.

Internally, UNION are implemented as STRUCT with "tag" as one of its keys.

Data TypeDDL definition
UNIONUNION(price FLOAT, note STRING)

Consider the following CSV file:

demo.csv
1
aa

Example

CREATE NODE TABLE demo(a SERIAL, b UNION(num INT64, str STRING), PRIMARY KEY(a));
COPY demo from "demo.csv";
MATCH (d:demo) RETURN d.b;
┌──────────────────────────────┐
│ d.b │
│ UNION(num INT64, str STRING) │
├──────────────────────────────┤
│ 1 │
│ aa │
└──────────────────────────────┘

Functions that work on UNION data types can be found here.

BLOB

SizeDescriptionAliases
variablearbitrary binary objectBYTEA

BLOB(Binary Large OBject) allows storage of an arbitrary binary object with up to 4KB in size in Kùzu. The database processes it as binary data because it has no knowledge as to what the underlying data represents (e.g. image, video).

Below is an example of how to create a blob object with 3 bytes (188, 189, 186, 170):

RETURN BLOB('\\xBC\\xBD\\xBA\\xAA') as result;

Output:

┌──────────────────┐
│ result │
│ BLOB │
├──────────────────┤
│ \xBC\xBD\xBA\xAA │
└──────────────────┘

SERIAL

SERIAL is a logical data type used for creating an auto-incrementing sequence of numbers, typically used as a unique column identifier, similar to AUTO_INCREMENT feature supported by some other databases.

Using SERIAL as primary key column in node tables

person.csv
Alice
Bob
Carol
Dan
CREATE NODE TABLE Person(id SERIAL, name STRING, PRIMARY KEY(id));
COPY Person FROM 'person.csv';
MATCH (a:Person) RETURN a.*;

Output:

┌────────┬────────┐
│ a.id │ a.name │
│ SERIAL │ STRING │
├────────┼────────┤
│ 0 │ Alice │
│ 1 │ Bob │
│ 2 │ Carol │
│ 3 │ Dan │
└────────┴────────┘

Using SERIAL for properties in relationship tables

You can create relationship tables that have a SERIAL property column. For example, consider a scenario where you want to auto-generate a unique transaction ID for each transfer between users.

CREATE REL TABLE Transfer (from User to User, trx_id SERIAL);

NODE

SizeDescription
fixedrepresents a node in a graph

NODE is a logical data type. Internally, NODE is processed as STRUCT type. A NODE always contains an internal ID field with key _ID and a label field with key _LABEL. The rest fields are node properties.

Here’s how to return NODE column for a file person.csv:

CREATE NODE TABLE Person(id SERIAL, name STRING, age INT64, PRIMARY KEY(id));
COPY Person FROM 'person.csv';
MATCH (a:Person) RETURN a;

Output:

┌─────────────────────────────────────────────────────────┐
│ a │
│ NODE │
├─────────────────────────────────────────────────────────┤
│ {_ID: 0:0, _LABEL: Person, id: 0, name: Alice, age: 30} │
│ {_ID: 0:1, _LABEL: Person, id: 1, name: Bob, age: 20} │
│ {_ID: 0:2, _LABEL: Person, id: 2, name: Carol, age: 25} │
│ {_ID: 0:3, _LABEL: Person, id: 3, name: Dan, age: 28} │
└─────────────────────────────────────────────────────────┘

REL

SizeDescription
fixedrepresents a relationship in a graph

REL is a logical type that represents a relationship (i.e., an edge). Internally, REL is processed as STRUCT type. A REL always contains a src ID field with key _SRC, a dst ID field with key _DST, an internal ID field with key _ID and a label field with key _LABEL. The rest fields are rel properties.

Here’s how to return a relationship column that’s of type REL:

MATCH (a:Person)-[r:Follows]->(b:Person)
RETURN r;

Output:

┌───────────────────────────────────────────────┐
│ r │
│ REL │
├───────────────────────────────────────────────┤
│ (0:0)-{_LABEL: Follows, _ID: 1:0, since: 2... │
│ (0:1)-{_LABEL: Follows, _ID: 1:1, since: 2... │
│ (0:2)-{_LABEL: Follows, _ID: 1:2, since: 2... │
│ (0:3)-{_LABEL: Follows, _ID: 1:3, since: 2... │
└───────────────────────────────────────────────┘

RECURSIVE_REL

RECURSIVE_REL is a logical type that represents recursive relationships. i.e., paths of arbitrary lengths. Internally, RECURSIVE_REL is processed as STRUCT type, more specifically, a STRUCT{LIST[NODE], LIST[REL]}. A RECURSIVE_REL always contains a nodes field with the key _NODES and a relationships field with the key _RELS.

Return a column that’s of type RECURSIVE_REL

MATCH p = (a:User)-[:Follows]->(b:User)
WHERE a.name = 'Adam' AND b.name = 'Karissa'
RETURN p;

Output:

{_NODES: [{_ID: 0:0, _LABEL: User, name: Adam, age: 30},{_ID: 0:1, _LABEL: User, name: Karissa, age: 40}], _RELS: [(0:0)-{_LABEL: Follows, _ID: 2:0, since: 2020}->(0:1)]}

Access all nodes on a recursive relationship

MATCH p = (a:Person)-[:Follows]->(b:Person)
WHERE a.name = 'Alice' AND b.name = 'Bob'
RETURN nodes(p);

Output:

┌─────────────────────────────────────────────────────────────────────────────────┐
│ NODES(p) │
│ NODE[] │
├─────────────────────────────────────────────────────────────────────────────────┤
│ [{_ID: 0:0, _LABEL: Person, name: Alice},{_ID: 0:1, _LABEL: Person, name: Bob}] │
└─────────────────────────────────────────────────────────────────────────────────┘

Access all relationships on a recursive relationship

MATCH p = (a:Person)-[:Follows]->(b:Person)
WHERE a.name = 'Alice' AND b.name = 'Bob'
RETURN rels(p);

Output:

┌─────────────────────────────────────────────────────────┐
│ RELS(p) │
│ REL[] │
├─────────────────────────────────────────────────────────┤
│ [(0:0)-{_LABEL: Follows, _ID: 1:0, since: 2024}->(0:1)] │
└─────────────────────────────────────────────────────────┘

LIST and ARRAY

Kùzu supports two list-like data types: (i) variable-length lists, simply called LIST, and (ii) fixed-length lists, called ARRAY. Click on the card below to learn more about them.

VARIANT

VARIANT is a data type that can store values of various data types (similar to the sql_variant data type of SQLServer). Currently it can only be used to store RDF literals in RDFGraphs. That is, you cannot create a regular node or relationship table that holds a column of type VARIANT. When working with RDFGraphs, the Literals node table’s val column stores RDF literal values. RDF literals, and Kùzu’s Variant data type can store values of different data types. For example, consider the following triples in a Turtle file:

example.ttl
@prefix kz: <http://kuzu.io/rdf-ex#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
kz:Waterloo a kz:City ;
kz:name "Waterloo" ;
kz:population 10000 ;
kz:altitude1 "329.0"^^xsd:decimal .

Suppose that you insert these into an RDFGraph named UniKG. You will get the following values in the val column of the Literals node table UniKG_l:

MATCH (a:UniKG_r)-[p:UniKG_lt]->(o:UniKG_l)
RETURN a.iri, p.iri, o.val;

Output:

┌────────────────────────────────┬──────────────────────────────────┬─────────────┐
│ a.iri │ p.iri │ o.val │
│ STRING │ STRING │ RDF_VARIANT │
├────────────────────────────────┼──────────────────────────────────┼─────────────┤
│ http://kuzu.io/rdf-ex#Waterloo │ http://kuzu.io/rdf-ex#altitude1 │ 329.000000 │
│ http://kuzu.io/rdf-ex#Waterloo │ http://kuzu.io/rdf-ex#population │ 10000 │
│ http://kuzu.io/rdf-ex#Waterloo │ http://kuzu.io/rdf-ex#name │ Waterloo │
└────────────────────────────────┴──────────────────────────────────┴─────────────┘

In the output above the data types of the values in o.val are as follows:

  • 329.000000 is a double
  • 10000 is an integer
  • "Waterloo" is a string

These different types are stored under the same column val of the Literals node table. The following Kùzu data types can be stored in a VARIANT column. You can use the CAST function to cast a value to a specific data type before storing it in a VARIANT column (as will be demonstrated in the CREATE statement examples momentarily).

Kùzu Data TypeCAST Function Example
INT8CAST(2, "INT8")
INT16CAST(2, "INT16")
INT32CAST(2, "INT32")
INT64CAST(2, "INT64")
UINT8CAST(2, "UINT8")
UINT16CAST(2, "UINT16")
UINT32CAST(2, "UINT32")
UINT64CAST(2, "UINT64")
DOUBLECAST(4.4, "DOUBLE")
FLOATCAST(4.4, "FLOAT")
BLOBCAST("\\xB2", "BLOB")
BOOLCAST("true", "BOOL")
STRINGCAST(123, "STRING")
DATECAST("2024-01-01", "DATE")
TIMESTAMPCAST("2024-01-01 11:25:30Z+00:00", "TIMESTAMP")
INTERVALCAST("1 year", "INTERVAL")

For example, the below code adds new triples into an RDFGraph with type DATE and FLOAT, respectively:

CREATE (a:UniKG_r {iri:"http://kuzu.io/rdf-ex#foo"})-[p:UniKG_lt {iri:"http://kuzu.io/rdf-ex#datepredicate"}]->(o:UniKG_l {val:CAST("2024-01-01", "DATE")});
CREATE (a:UniKG_r {iri:"http://kuzu.io/rdf-ex#foo"})-[p:UniKG_lt {iri:"http://kuzu.io/rdf-ex#doublepredicate"}]->(o:UniKG_l {val:4.4});

The DATE type needs to be cast explicitly as in CAST("2024-01-01", "DATE")while 4.4, which is of type DOUBLE, can be provided as is. This is because DATE is not an automatically inferred data type. The above two CREATE statements will create the following two triples:

┌────────────────────────────────┬───────────────────────────────────────┬─────────────┐
│ a.iri │ p.iri │ o.val │
│ STRING │ STRING │ RDF_VARIANT │
├────────────────────────────────┼───────────────────────────────────────┼─────────────┤
│ http://kuzu.io/rdf-ex#foo │ http://kuzu.io/rdf-ex#doublepredicate │ 4.400000 │
│ http://kuzu.io/rdf-ex#foo │ http://kuzu.io/rdf-ex#datepredicate │ 2024-01-01 │
└────────────────────────────────┴───────────────────────────────────────┴─────────────┘