Skip to main content

ARRAY Documentation

Type Description​

The ARRAY<T> type is used to represent an ordered collection of elements, where each element has the same data type. For example, an array of integers can be represented as [1, 2, 3], and an array of strings as ["a", "b", "c"].

  • ARRAY<T> represents an array composed of elements of type T, where T is nullable. Supported types for T include: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6, STRUCT, MAP, VARIANT, JSONB, ARRAY<T>.
    • Note: Among the above T types, JSONB and VARIANT are only supported in the computation layer of Doris and do not support using ARRAY<JSONB> and ARRAY<VARIANT> in table creation in Doris.

Type Constraints​

  • The maximum nesting depth supported by ARRAY<T> type is 9.
  • Conversion between ARRAY<T> types depends on whether T can be converted. Array<T> type cannot be converted to other types.
    • For example: ARRAY<INT> can be converted to ARRAY<BIGINT> because INT and BIGINT can be converted.
    • Variant type can be converted to Array<T> type.
    • String type can be converted to ARRAY<T> type (through parsing, returning NULL if parsing fails).
  • In the AGGREGATE table model, ARRAY<T> type only supports REPLACE and REPLACE_IF_NOT_NULL. In any table model, it cannot be used as a KEY column, nor as a partition or bucket column.
  • Columns of ARRAY<T> type support ORDER BY and GROUP BY operations.
    • T types that support ORDER BY and GROUP BY include: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6.
  • Columns of ARRAY<T> type do not support being used as JOIN KEY and do not support being used in DELETE statements.

Constant Construction​

  • Use the ARRAY() function to construct a value of type ARRAY<T>, where T is the common type of the parameters.

    -- [1, 2, 3] T is INT
    SELECT ARRAY(1, 2, 3);

    -- ["1", "2", "abc"] , T is STRING
    SELECT ARRAY(1, 2, 'abc');
  • Use [] to construct a value of type ARRAY<T>, where T is the common type of the parameters.

     -- ["abc", "def", "efg"] T is STRING
    SELECT ["abc", "def", "efg"];

    -- ["1", "2", "abc"] , T is STRING
    SELECT [1, 2, 'abc'];

Modifying Type​

  • Modification is only allowed when the element type inside ARRAY is VARCHAR.

    • Only allows changing the parameter of VARCHAR from smaller to larger, not the other way around.
    CREATE TABLE `array_table` (
    `k` INT NOT NULL,
    `array_column` ARRAY<VARCHAR(10)>
    ) ENGINE=OLAP
    DUPLICATE KEY(`k`)
    DISTRIBUTED BY HASH(`k`) BUCKETS 1
    PROPERTIES (
    "replication_num" = "1"
    );

    ALTER TABLE array_table MODIFY COLUMN array_column ARRAY<VARCHAR(20)>;
  • The default value for columns of type ARRAY<T> can only be specified as NULL, and once specified, it cannot be modified.

Element Access​

  • Use [k] to access the k-th element of ARRAY<T>, where k starts from 1. If out of bounds, returns NULL.

    SELECT [1, 2, 3][1];
    +--------------+
    | [1, 2, 3][1] |
    +--------------+
    | 1 |
    +--------------+

    SELECT ARRAY(1, 2, 3)[2];
    +-------------------+
    | ARRAY(1, 2, 3)[2] |
    +-------------------+
    | 2 |
    +-------------------+

    SELECT [[1,2,3],[2,3,4]][1][3];
    +-------------------------+
    | [[1,2,3],[2,3,4]][1][3] |
    +-------------------------+
    | 3 |
    +-------------------------+
  • Use ELEMENT_AT(ARRAY, k) to access the k-th element of ARRAY<T>, where k starts from 1. If out of bounds, returns NULL.

    SELECT ELEMENT_AT(ARRAY(1, 2, 3) , 2);
    +--------------------------------+
    | ELEMENT_AT(ARRAY(1, 2, 3) , 2) |
    +--------------------------------+
    | 2 |
    +--------------------------------+

    SELECT ELEMENT_AT([1, 2, 3] , 3);
    +---------------------------+
    | ELEMENT_AT([1, 2, 3] , 3) |
    +---------------------------+
    | 3 |
    +---------------------------+

    SELECT ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3);
    +-------------------------------------------------------+
    | ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3) |
    +-------------------------------------------------------+
    | ["3"] |
    +-------------------------------------------------------+

Query Acceleration​

  • Columns of type ARRAY<T> in Doris tables support adding inverted indexes to accelerate computations involving ARRAY functions on this column.
    • T types supported by inverted indexes: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6.
    • Accelerated ARRAY functions: ARRAY_CONTAINS, ARRAYS_OVERLAP, but when the function parameters include NULL, it falls back to regular vectorized computation.

Examples​

  • Multidimensional Arrays

    -- Create table
    CREATE TABLE IF NOT EXISTS array_table (
    id INT,
    two_dim_array ARRAY<ARRAY<INT>>,
    three_dim_array ARRAY<ARRAY<ARRAY<STRING>>>
    ) ENGINE=OLAP
    DUPLICATE KEY(id)
    DISTRIBUTED BY HASH(id) BUCKETS 1
    PROPERTIES (
    "replication_num" = "1"
    );

    -- Insert
    INSERT INTO array_table VALUES (1, [[1, 2, 3], [4, 5, 6]], [[['ab', 'cd', 'ef'], ['gh', 'ij', 'kl']], [['mn', 'op', 'qr'], ['st', 'uv', 'wx']]]);

    INSERT INTO array_table VALUES (2, ARRAY(ARRAY(1, 2, 3), ARRAY(4, 5, 6)), ARRAY(ARRAY(ARRAY('ab', 'cd', 'ef'), ARRAY('gh', 'ij', 'kl')), ARRAY(ARRAY('mn', 'op', 'qr'), ARRAY('st', 'uv', 'wx'))));

    -- Query
    SELECT two_dim_array[1][2], three_dim_array[1][1][2] FROM array_table ORDER BY id;
    +---------------------+--------------------------+
    | two_dim_array[1][2] | three_dim_array[1][1][2] |
    +---------------------+--------------------------+
    | 2 | cd |
    | 2 | cd |
    +---------------------+--------------------------+
  • Nested Complex Types

    -- Create table
    CREATE TABLE IF NOT EXISTS array_map_table (
    id INT,
    array_map ARRAY<MAP<STRING, INT>>
    ) ENGINE=OLAP
    DUPLICATE KEY(id)
    DISTRIBUTED BY HASH(id) BUCKETS 1
    PROPERTIES (
    "replication_num" = "1"
    );

    -- Insert
    INSERT INTO array_map_table VALUES (1, ARRAY(MAP('key1', 1), MAP('key2', 2)));
    INSERT INTO array_map_table VALUES (2, ARRAY(MAP('key1', 1), MAP('key2', 2)));

    -- Query
    SELECT array_map[1], array_map[2] FROM array_map_table ORDER BY id;
    +--------------+--------------+
    | array_map[1] | array_map[2] |
    +--------------+--------------+
    | {"key1":1} | {"key2":2} |
    | {"key1":1} | {"key2":2} |
    +--------------+--------------+

    -- Create table
    CREATE TABLE IF NOT EXISTS array_table (
    id INT,
    array_struct ARRAY<STRUCT<id: INT, name: STRING>>,
    ) ENGINE=OLAP
    DUPLICATE KEY(id)
    DISTRIBUTED BY HASH(id) BUCKETS 1
    PROPERTIES (
    "replication_num" = "1"
    );

    INSERT INTO array_table VALUES (1, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane')));
    INSERT INTO array_table VALUES (2, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane')));

    SELECT array_struct[1], array_struct[2] FROM array_table ORDER BY id;
    +-------------------------+-------------------------+
    | array_struct[1] | array_struct[2] |
    +-------------------------+-------------------------+
    | {"id":1, "name":"John"} | {"id":2, "name":"Jane"} |
    | {"id":1, "name":"John"} | {"id":2, "name":"Jane"} |
    +-------------------------+-------------------------+
  • Modifying Type

    -- Create table
    CREATE TABLE array_table (
    id INT,
    array_varchar ARRAY<VARCHAR(10)>
    ) ENGINE=OLAP
    DUPLICATE KEY(id)
    DISTRIBUTED BY HASH(id) BUCKETS 1
    PROPERTIES (
    "replication_allocation" = "tag.location.default: 1"
    );

    -- Modify ARRAY type
    ALTER TABLE array_table MODIFY COLUMN array_varchar ARRAY<VARCHAR(20)>;

    -- Check column type
    DESC array_table;
    +---------------+--------------------+------+-------+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +---------------+--------------------+------+-------+---------+-------+
    | id | int | Yes | true | NULL | |
    | array_varchar | array<varchar(20)> | Yes | false | NULL | NONE |
    +---------------+--------------------+------+-------+---------+-------+
  • Inverted Index

    -- Create table statement
    CREATE TABLE `array_table` (
    `k` int NOT NULL,
    `array_column` ARRAY<INT>,
    INDEX idx_array_column (array_column) USING INVERTED
    ) ENGINE=OLAP
    DUPLICATE KEY(`k`)
    DISTRIBUTED BY HASH(`k`) BUCKETS 1
    PROPERTIES (
    "replication_num" = "1"
    );

    -- Insert
    INSERT INTO array_table VALUES (1, [1, 2, 3]), (2, [4, 5, 6]), (3, [7, 8, 9]);

    -- The inverted index accelerates the execution of the ARRAY_CONTAINS function
    SELECT * FROM array_table WHERE ARRAY_CONTAINS(array_column, 5);
    +------+--------------+
    | k | array_column |
    +------+--------------+
    | 2 | [4, 5, 6] |
    +------+--------------+

    -- The inverted index accelerates the execution of the ARRAYS_OVERLAP function
    SELECT * FROM array_table WHERE ARRAYS_OVERLAP(array_column, [6, 9]);
    +------+--------------+
    | k | array_column |
    +------+--------------+
    | 2 | [4, 5, 6] |
    | 3 | [7, 8, 9] |
    +------+--------------+