Data types
Kùzu supports a set of primitive and nested data types both for node and relationship properties as well as for forming expressions whose outputs are specified using these data types. This section shows all built-in data types.
INT8
Size | Description |
---|---|
1 byte | signed one-byte integer |
INT16
Size | Description |
---|---|
2 bytes | signed two-byte integer |
INT32
Size | Description | Aliases |
---|---|---|
4 bytes | signed four-byte integer | INT |
INT64
Size | Description | Aliases |
---|---|---|
8 bytes | signed eight-byte integer | SERIAL |
INT128
Size | Description |
---|---|
16 bytes | signed sixteen-byte integer |
UINT8
Size | Description |
---|---|
1 byte | unsigned one-byte integer |
UINT16
Size | Description |
---|---|
2 bytes | unsigned two-byte integer |
UINT32
Size | Description |
---|---|
4 bytes | unsigned four-byte integer |
UINT64
Size | Description |
---|---|
8 bytes | unsigned eight-byte integer |
FLOAT
Size | Description | Aliases |
---|---|---|
4 bytes | single precision floating-point number | REAL, FLOAT4 |
DOUBLE
Size | Description | Aliases |
---|---|---|
8 bytes | double precision floating-point number | FLOAT8 |
DECIMAL
Size | Description |
---|---|
variable | arbitrary fixed precision decimal number |
For numbers where exact precision is required, the DECIMAL
data type can be used. The DECIMAL
type is
specified as DECIMAL(precision, scale)
, where precision
is the total number of digits and
scale
is the number of digits to the right of the decimal point.
Internally, decimals are represented as integers depending on their specified width.
Precision | Internal | Size (bytes) |
---|---|---|
1-4 | INT16 | 2 |
5-9 | INT32 | 4 |
10-18 | INT64 | 8 |
19-38 | INT128 | 16 |
You can explicitly cast a number (either integer or float) to a DECIMAL
as follows:
Output:
Note that if you attempt to cast with a precision or scale that is too small, an overflow exception will be raised:
BOOLEAN
Size | Description |
---|---|
1 byte | true/false |
UUID
Size | Description |
---|---|
16 bytes | signed sixteen-byte integer |
The data type UUID
stores Universally Unique Identifiers (UUID) as defined by RFC 4122,
ISO/IEC 9834-8:2005, and related standards. Kuzu follows PostgreSQL’s implementation
for the UUID
format.
Example:
Output:
STRING
Size | Description |
---|---|
variable | variable-length character string |
STRING
data type supports UTF-8 encoding.
Example:
Output:
NULL
Size | Description |
---|---|
fixed | special value to represent unknown data |
NULL
s are special values to represent unknown data. Every node/relationship property or result of
any expression can be NULL
in addition to the non-NULL
domain of values they can take. For
example, boolean expressions can be true, false or NULL
.
The NULL
(in any of its case variations, such as Null
or null
) can be
used to specify a null literal. Some examples of comparisons using NULL
are shown below.
Compare a value with NULL
:
Output:
Compare NULL
with NULL
:
Output:
Kùzu’s CLI returns an empty cell to indicate nulls.
DATE
Size | Description |
---|---|
4 bytes | year, month, day |
DATE
is specified in ISO-8601 format (YYYY-MM-DD
).
Example:
Output:
TIMESTAMP
Size | Description |
---|---|
4 bytes | combination of time and date |
TIMESTAMP
combines date and a time (hour, minute, second, millisecond) and is formatted
according to the ISO-8601 format (YYYY-MM-DD hh:mm:ss[.zzzzzz][+-TT[:tt]]
),
which specifies the date (YYYY-MM-DD
), time (hh:mm:ss[.zzzzzz]
) and a time offset
[+-TT[:tt]]
. Only the Date part is mandatory. If time is specified, then the millisecond
[.zzzzzz]
part and the time offset are optional.
Example:
Output:
INTERVAL
Size | Description | Aliases |
---|---|---|
4 bytes | date/time difference | DURATION |
INTERVAL
consists of multiple date parts and represents the total time length of these date parts.
Kùzu follows DuckDB’s implementation for the
interval format.
Example:
Output:
STRUCT
A STRUCT
is a mapping of key-value pairs where the keys are of the type STRING
. STRUCT
is a
fixed-size data type so values with the same STRUCT
type must contain the same set of key-value pairs.
You can think of a STRUCT
column as a nested single column over multiple other columns.
Data Type | DDL definition |
---|---|
STRUCT | STRUCT(a INT64, b INT64) |
To construct a STRUCT
, provide a mapping of keys to values as follows:
Output:
You can extract a value from a STRUCT
using the dot notation:
Output:
Alternatively you can use the struct_extract()
function
Functions that work on STRUCT
s can be found here.
MAP
A MAP
is a dictionary of key-value pairs where all keys have the same type and all values have the
same type. MAP
is similar to STRUCT
in that it is an ordered list of mappings. However, MAP
does
not need to have the same keys present for each row, and is thus more suitable when the schema of an entity
is unknown beforehand or when the schema varies per row.
MAP
s must have a single type for all keys, and a single type for all values. Additionally, keys of
a MAP
do not need to be STRING
s like they do in a STRUCT
.
Data Type | DDL definition |
---|---|
MAP | MAP(STRING, INT64) |
To construct a MAP
, provide a list of keys and a list of values. The keys and values must be of the same length.
Example:
Output:
Functions that work on map objects can be found here.
UNION
Similar to C++ std::variant
, UNION
is a nested data type that is capable of holding multiple
alternative values with different types. The value under key "tag"
is considered as the value being
currently hold by the UNION
.
Internally, UNION
are implemented as STRUCT
with "tag"
as one of its keys.
Data Type | DDL definition |
---|---|
UNION | UNION(price FLOAT, note STRING) |
Consider the following CSV file:
Example
Functions that work on UNION
data types can be found here.
BLOB
Size | Description | Aliases |
---|---|---|
variable | arbitrary binary object | BYTEA |
BLOB
(Binary Large OBject) allows storage of an arbitrary binary object with up to
4KB in size in Kùzu. The database processes it as binary data because it has no knowledge as to what
the underlying data represents (e.g. image, video).
Below is an example of how to create a blob object with 3 bytes (188, 189, 186, 170):
Output:
SERIAL
SERIAL
is a logical data type used for creating an auto-incrementing sequence of numbers, typically
used as a unique column identifier, similar to AUTO_INCREMENT
feature supported
by some other databases.
Using SERIAL
as primary key column in node tables
Output:
Using SERIAL
for properties in relationship tables
You can create relationship tables that have a SERIAL
property column. For example, consider a
scenario where you want to auto-generate a unique transaction ID for each transfer between users.
NODE
Size | Description |
---|---|
fixed | represents a node in a graph |
NODE
is a logical data type. Internally, NODE
is processed as STRUCT
type. A NODE
always contains
an internal ID field with key _ID
and a label field with key _LABEL
. The rest fields are node properties.
Here’s how to return NODE
column for a file person.csv
:
Output:
REL
Size | Description |
---|---|
fixed | represents a relationship in a graph |
REL
is a logical type that represents a relationship (i.e., an edge). Internally, REL
is processed as STRUCT
type. A REL
always contains a
src ID field with key _SRC
, a dst ID field with key _DST
, an internal ID field with key _ID
and a label field with key _LABEL
. The rest fields are rel properties.
Here’s how to return a relationship column that’s of type REL
:
Output:
RECURSIVE_REL
RECURSIVE_REL
is a logical type that represents recursive relationships. i.e., paths of arbitrary lengths. Internally, RECURSIVE_REL
is processed as STRUCT
type, more specifically, a
STRUCT{LIST[NODE], LIST[REL]}
. A RECURSIVE_REL
always contains a nodes field with the key _NODES
and a
relationships field with the key _RELS
.
Return a column that’s of type RECURSIVE_REL
Output:
Access all nodes on a recursive relationship
Output:
Access all relationships on a recursive relationship
Output:
LIST and ARRAY
Kùzu supports two list-like data types: (i) variable-length lists, simply called LIST
, and
(ii) fixed-length lists, called ARRAY
. Click on the card below to learn more about them.
VARIANT
VARIANT
is a data type that can store values of various data types (similar to the sql_variant
data type of SQLServer).
Currently it can only be used to store RDF literals in RDFGraphs.
That is, you cannot create a regular node or relationship table that holds a column of type VARIANT.
When working with RDFGraphs, the Literals node table’s
val
column stores RDF literal values. RDF literals, and Kùzu’s Variant data type can store values of different data types.
For example, consider the following triples in a Turtle file:
Suppose that you insert these into an RDFGraph named UniKG
. You will get the following values in the val
column
of the Literals node table UniKG_l
:
Output:
In the output above the data types of the values in o.val
are as follows:
329.000000
is a double10000
is an integer"Waterloo"
is a string
These different types are stored under the same column val
of the Literals
node table.
The following Kùzu data types can be stored in a VARIANT
column. You can use the CAST
function to cast a value to a
specific data type before storing it in a VARIANT
column (as will be demonstrated in the CREATE
statement
examples momentarily).
Kùzu Data Type | CAST Function Example |
---|---|
INT8 | CAST(2, "INT8") |
INT16 | CAST(2, "INT16") |
INT32 | CAST(2, "INT32") |
INT64 | CAST(2, "INT64") |
UINT8 | CAST(2, "UINT8") |
UINT16 | CAST(2, "UINT16") |
UINT32 | CAST(2, "UINT32") |
UINT64 | CAST(2, "UINT64") |
DOUBLE | CAST(4.4, "DOUBLE") |
FLOAT | CAST(4.4, "FLOAT") |
BLOB | CAST("\\xB2", "BLOB") |
BOOL | CAST("true", "BOOL") |
STRING | CAST(123, "STRING") |
DATE | CAST("2024-01-01", "DATE") |
TIMESTAMP | CAST("2024-01-01 11:25:30Z+00:00", "TIMESTAMP") |
INTERVAL | CAST("1 year", "INTERVAL") |
For example, the below code adds new triples into an RDFGraph with type DATE
and FLOAT
, respectively:
The DATE
type needs to be cast explicitly as in CAST("2024-01-01", "DATE")
while 4.4
, which is of type DOUBLE
,
can be provided as is. This is because DATE
is not an automatically inferred data type. The above two CREATE
statements will create
the following two triples: