Testing framework
Introduction
Testing is a crucial part of Kùzu to ensure the correct functioning of the system. Our general principle for testing is to avoid testing components individually — instead we route all tests, when possible, end-to-end (e2e) via Cypher statements.
In order to use the e2e testing framework, developers are required to generate
a .test
file, which should be placed in the test/test_files
directory. Each
test file comprises two key sections: the test header and test body. In the header section,
you must specify the dataset to be used and other optional
parameters such as BUFFER_POOL_SIZE
.
Here is a basic example of a test:
The first three lines represents the header, separated by --
. The testing
framework will parse the file and register a GTest
programatically.
All e2e tests will have a prefix e2e_test_
when being registered, which is used to distinguish them from other internal tests. e.g. a e2e_test named BasicTest
will be registered as a GTest named e2e_test_BasicTest
.
When it comes to the test case name, the provided example above would be equivalent to:
For the main source code tests, the test group name will be the relative path of the file under the test/test_files
directory, delimited by ~
, followed by a dot and the test case name.
For the extension code tests, the test group name will be the relative path of the file under the extension/name_of_extension/test/test_files
directory, delimited by ~
, followed by a dot and the test case name.
The testing framework will test each logical plan created from the prepared statements and assert the result.
Running the tests
Our primary tool for generating the test list and executing it is ctest
. Use the command
make test
to build and run all tests. By default, the tests will run
concurrently on 10 jobs, but it is also possible to change the number of parallel jobs by
running make test TEST_JOBS=X
where X
is the desired number of jobs to be run in parallel.
Running a specific group or test case
There are two ways to run a specific e2e test or group of tests:
1. Using ctest and specifying the name of the test
Example:
To switch between main tests and extension tests, pass ‘E2E_TEST_FILES_DIRECTORY=extension’ as an environment variable when calling ctest.
Example:
2. Running directly from e2e_test
binary
The test binaries are available in build/relwithdebinfo[or debug or release]/test/runner
folder. To run any of the main tests, you can run e2e_test
specifying the relative path file inside
test_files
:
To run any of the extension tests, you can run e2e_test
with environment variable E2E_TEST_FILES_DIRECTORY=extension
and specify the relative path file inside
extension
:
Test file header
The .test
file header contains one required parameter:
-DATASET
, to specify the test group name and the dataset to be used. If no
dataset is required, use the keyword ‘empty’.
Specifying the Dataset
Property | Description |
---|---|
-DATASET [type] [dataset name] | Type: CSV, PARQUET, NPY, KUZU or empty Dataset name: the name of the directory inside dataset/ . i.e. tinysnb. |
The KUZU
dataset type is a Kùzu database directory.
Examples:
Converting CSV to Parquet
It is also possible to make a conversion from CSV dataset to PARQUET file format
using CSV_TO_PARQUET(dataset path)
. This case is especially useful to ensure
the expected result remains the same for both CSV and PARQUET file format
without storing the same dataset in the codebase twice.
Other properties
Other optional parameters include -BUFFER_POOL_SIZE
, -CHECKPOINT_WAIT_TIMEOUT
and -SKIP
. By including
-SKIP
in the header, the entire suite will be deactivated, but the tests
will still be displayed as disabled when running through ctest
.
Test case
The following example illustrates a basic structure of how to create a test case:
In the example above:
-CASE
is the name of the test case, analogous to TEST_F(Test, MyTest)
in C++.
-LOG
is optional and will be only used for display purposes when running in verbose mode.
-STATEMENT
is followed by 4 dashes ----
alongside the expected result (error, success, hash, or the number of the tuples).
When specifying a number after the dashes, it’s necessary to add the same number of tuples in the next following lines.
If the subsequent lines contain additional statements to validate, they will be incorporated into the same test case
unless a new -CASE
is written.
Results
There are three ways to specify the expected result:
Result | Description |
---|---|
---- error | The following lines must be the expected error message. |
---- error(regex) | The following lines must be a regex pattern matching the expected error message. |
---- ok | does not require any additional information below the line. |
---- hash | A single line must follow containing the number of values in the query results and the md5 hash value of the query result. |
---- [number of expected tuples] [expected tuple 1] [expected tuple 2] | The first line after ---- contains the number of tuples and the following lines must exactly match the query results. |
Query results can also be stored in a file. By using <FILE>:
, the testing
framework reads the results from the file and compare to the actual query
result. The file must be created inside test/answers/<name-of-the-file.txt>
.
Hash details
When hashing an expected output, it’s best to add the -CHECK_ORDER
flag.
If you don’t want to check the order of the expected output, then you have to
sort the expected output by line (with string comparison) before creating
the hash
Additional properties
It is also possible to use the additional properties inside each test case:
Property | Parameter | Description |
---|---|---|
-LOG | any string | Define a name for each block for informational purposes |
-SKIP | none | Register the test but skip the whole test case. When a test is skipped, it will display as disabled in the test run |
-PARALLELISM | integer | Default: 4. The number of threads that will be set by connection.setMaxNumThreadForExec() |
-CHECK_ORDER | none | By default, the query results and expected results are ordered before asserting comparison. |
-CHECK_COLUMN_NAMES | none | Includes the column names as the first row of query result. Requires +1 to number of expected tuples. |
-RELOADDB | none | Reload database from file system. |
-REMOVE_FILE | file path | Delete the file at the given path. |
-IMPORT_DATABASE | directory path | Close current database. Open a new database in the given directory. |
-CHECK_PRECISION | none | Checks floating point columns using machine epsilon precision. Requires -CHECK_ORDER enabled. |
Defining variables
A variable can be defined and re-used inside a statement, results or error message:
A more practical example is using functions alongside -DEFINE
. The framework
currently support the following functions:
Function | Description | Example |
---|---|---|
-DEFINE [var] ARANGE [start] [end] | Generate a list of numbers from start to end | -DEFINE STRING_OVERFLOW ARANGE 0 5 generates STRING_OVERFLOW = [0,1,2,3,4,5] |
-DEFINE [var] REPEAT [count] "[text]" | Repeat the text multiple times | -DEFINE MY_STR REPEAT 3 "MyString" generates MY_STR = "MyStringMyStringMyString" |
Pre-defined variables
The following variables are available to use inside the statements:
Variable | Description |
---|---|
${KUZU_ROOT_DIRECTORY} | Kùzu directory path |
${DATABASE_PATH} | When a test case runs, a temporary database path is created and cleaned up after the suite finishes. This variable represents the path of the running test case. |
Multiple queries
A statement can contain multiple queries, each separated by semi-colons, as per normal usage. The statement would then have multiple results, in the order of the queries.
Defining statement blocks
A statement block can be defined and re-used along the test file.
-DEFINE_STATEMENT_BLOCK
define a block that can be used by
calling -INSERT_STATEMENT_BLOCK
in any part of the test case body. It
can be useful to perform checks without having to re-write the same staments
again.
Multiple connections
The following example illustrates how to use multiple connections:
In the example above:
-CREATE CONNECTION conn.*
initiates a connection to the database. It’s essential that the connection name matches the specified prefix conn
, like conn_write
, conn_read
.
-STATEMENT
is followed by a connection name that was defined in the -CREATE CONNECTION
statement. If a connection name is not explicitly mentioned in a statement, the testing framework will default to using the default connection.
Batch statements
You can use -BATCH_STATEMENTS
to test a batch of query statements from a file:
In the example above:
-BATCH_STATEMENTS
is followed by <FILE>:
, indicating that you’re specifying a file to be used.
The file must be created inside test/statements/<name-of-the-file.cypher>
. By doing so, the testing
framework reads the query statements from the file and execute each query statement.
Random Split Files to Copy
You can use -MULTI_COPY_RANDOM
to randomly split a copy-able source into multiple csv files, which will then each be copied to a specified table.
In the example above:
-MULTI_COPY_RANDOM
is followed by an integer, being 5
, indicating the number of fragments the source should be split into. The next parameter is the table to copy into, here being test
. The last parameter is a string that is the exact string used if the source were to be used in a COPY FROM
statement (including the quotes and any specifications such as delim
that follow).
Optionally, a seed can be specified after the table name to stably split the source or to reproduce a previous result. The seed is made up of two unsigned 64-bit integers. So, formally, the syntax from -MULTI_COPY_RANDOM
is:
Examples
Full example with comments
Sample test log
File | Description |
---|---|
set.test | ARANGE example |
copy_long_string.test | REPEAT example |
copy_multiple_files.test | Using statement blocks |
catalog.test | Dealing with exceptions |