Data types
Kùzu supports a set of primitive and nested data types both for node and relationship properties as well as for forming expressions whose outputs are specified using these data types. This section shows all built-in data types.
INT8
Size | Description |
---|---|
1 byte | signed one-byte integer |
INT16
Size | Description |
---|---|
2 bytes | signed two-byte integer |
INT32
Size | Description | Aliases |
---|---|---|
4 bytes | signed four-byte integer | INT |
INT64
Size | Description | Aliases |
---|---|---|
8 bytes | signed eight-byte integer | SERIAL |
INT128
Size | Description |
---|---|
16 bytes | signed sixteen-byte integer |
UINT8
Size | Description |
---|---|
1 byte | unsigned one-byte integer |
UINT16
Size | Description |
---|---|
2 bytes | unsigned two-byte integer |
UINT32
Size | Description |
---|---|
4 bytes | unsigned four-byte integer |
UINT64
Size | Description |
---|---|
8 bytes | unsigned eight-byte integer |
FLOAT
Size | Description | Aliases |
---|---|---|
4 bytes | single precision floating-point number | REAL, FLOAT4 |
DOUBLE
Size | Description | Aliases |
---|---|---|
8 bytes | double precision floating-point number | FLOAT8 |
BOOLEAN
Size | Description |
---|---|
1 byte | true/false |
STRUCT
Size | Description |
---|---|
fixed | a dictionary or map where keys are of type STRING |
UUID
Size | Description |
---|---|
16 bytes | signed sixteen-byte integer |
The data type UUID
stores Universally Unique Identifiers (UUID) as defined by RFC 4122,
ISO/IEC 9834-8:2005, and related standards. Kuzu follows PostgreSQL’s implementation
for the UUID
format.
Example:
Output:
STRING
Size | Description |
---|---|
variable | variable-length character string |
STRING
data type supports UTF-8 encoding.
Example:
Output:
NULL
Size | Description |
---|---|
fixed | special value to represent unknown data |
NULL
s are special values to represent unknown data. Every node/relationship property or result of
any expression can be NULL
in addition to the non-NULL
domain of values they can take. For
example, boolean expressions can be true, false or NULL
.
The NULL
(in any of its case variations, such as Null
or null
) can be
used to specify a null literal. Some examples of comparisons using NULL
are shown below.
Compare a value with NULL
:
Output:
Compare NULL
with NULL
:
Output:
Kùzu’s CLI returns an empty cell to indicate nulls.
DATE
Size | Description |
---|---|
4 bytes | year, month, day |
DATE
is specified in ISO-8601 format (YYYY-MM-DD
).
Example:
Output:
TIMESTAMP
Size | Description |
---|---|
4 bytes | combination of time and date |
TIMESTAMP
combines date and a time (hour, minute, second, millisecond) and is formatted
according to the ISO-8601 format (YYYY-MM-DD hh:mm:ss[.zzzzzz][+-TT[:tt]]
),
which specifies the date (YYYY-MM-DD
), time (hh:mm:ss[.zzzzzz]
) and a time offset
[+-TT[:tt]]
. Only the Date part is mandatory. If time is specified, then the millisecond
[.zzzzzz]
part and the time offset are optional.
Example:
Output:
INTERVAL
Size | Description | Aliases |
---|---|---|
4 bytes | date/time difference | DURATION |
INTERVAL
consists of multiple date parts and represents the total time length of these date parts.
Kùzu follows DuckDB’s implementation for the
interval format.
Example:
Output:
BLOB
Size | Description | Aliases |
---|---|---|
variable | arbitrary binary object | BYTEA |
BLOB
(Binary Large OBject) allows storage of an arbitrary binary object with up to
4KB in size in Kùzu. The database processes it as binary data because it has no knowledge as to what
the underlying data represents (e.g. image, video).
Below is an example of how to create a blob object with 3 bytes (188, 189, 186, 170):
Output:
SERIAL
SERIAL
is a logical data type and usually used for creating an incremental sequence of unique
identifier column (similar to AUTO_INCREMENT
supported by some other databases).
Here’s how to use SERIAL
on a primary key column for a CSV file that has the following
values:
Output:
NODE
Size | Description |
---|---|
fixed | represents a node in a graph |
NODE
is a logical data type. Internally, NODE
is processed as STRUCT
type. A NODE
always contains
an internal ID field with key _ID
and a label field with key _LABEL
. The rest fields are node properties.
Here’s how to return NODE
column:
Output:
REL
Size | Description |
---|---|
fixed | represents a relationship in a graph |
REL
is a logical type. Internally, REL
is processed as STRUCT
type. A REL
always contains a
src ID field with key _SRC
, a dst ID field with key _DST
, an internal ID field with key _ID
and a label field with key _LABEL
. The rest fields are rel properties.
Here’s how to return REL
column:
Output:
LIST and ARRAY
Kùzu supports two list-like data types: (i) variable-length lists, simply called LIST
, and
(ii) fixed-length lists, called ARRAY
. Click on the card below to learn more about them.
VARIANT
Variant is a data type that can store values of various data types (similar to the sql_variant
data type of SQLServer).
Currently it can only be used to store RDF literals in RDFGraphs.
That is, you cannot create a regular node or relationship table that holds a column of type VARIANT.
When working with RDFGraphs, the Literals node table’s
val
column stores RDF literal values. RDF literals, and Kùzu’s Variant data type can store values of different data types.
For example, consider the following triples in a Turtle file:
Suppose that you insert these into an RDFGraph named UniKG
. You will get the following values in the val
column
of the Literals node table UniKG_l
:
Output:
In the output above the data types of the values in o.val
are as follows (data types are not rendered in Kùzu cli’s output)
- 329.000000 is a double
- 10000 is an integer
- ”Waterloo” is a string
These different types are stored under the same column val
of the Literals
node table.
The following Kùzu data types can be stored in a Variant column. You can use the cast
function to cast a value to a
specific data type before storing it in a Variant column (as will be demonstrated in the CREATE
statement
examples momentarily).
Kùzu Data Type | CAST Function Example |
---|---|
INT8 | cast(2, “INT8”) |
INT16 | cast(2, “INT16”) |
INT32 | cast(2, “INT32”) |
INT64 | cast(2, “INT64”) |
UINT8 | cast(2, “UINT8”) |
UINT16 | cast(2, “UINT16”) |
UINT32 | cast(2, “UINT32”) |
UINT64 | cast(2, “UINT64”) |
DOUBLE | cast(4.4, “DOUBLE”) |
FLOAT | cast(4.4, “FLOAT”) |
BLOB | cast(“\xB2”, “BLOB”) |
BOOL | cast(“true”, “BOOL”) |
STRING | cast(123, “STRING”) |
DATE | cast(“2024-01-01”, “DATE”) |
TIMESTAMP | cast(“2024-01-01 11:25:30Z+00:00”, “TIMESTAMP”) |
INTERVAL | cast(“1 year”, “INTERVAL”) |
For example, the below code adds new triples into an RDFGraph with type date and float, respectively:
Above, DATE type needs to be cast explicitly as in “cast(“2024-01-01”, “DATE”)” while 4.4, which is of type DOUBLE, can be provided as is. This is because DATE is not an automatically inferred data type. The above two CREATE statements will create the following two triples:
PATH
PATH
is a logical type. Internally, PATH
is processed as STRUCT
type, more specifically, a
STRUCT{LIST[NODE], LIST[REL]}
. A PATH
always contains a nodes field with the key _NODES
and a
relationships field with the key _RELS
.
Return PATH
column
Output:
Access all nodes on a path
Output:
Access all rels on a path
Output: