Auto-Increment Column

In Doris, an auto-increment column (Auto Increment Column) is a feature that automatically generates unique numeric values. It is commonly used to produce a unique identifier for each row, such as a primary key. Whenever a new record is inserted, the auto-increment column is automatically assigned an incrementing value, eliminating the tedious task of specifying numbers manually. Using a Doris auto-increment column ensures data uniqueness and consistency, simplifies the data insertion process, reduces human error, and improves the efficiency of data management. This makes the auto-increment column an ideal choice for scenarios that require unique identifiers, such as user IDs.

Core Features

Write Behavior

For a table with an auto-increment column, Doris handles data writes as follows:

Write scenario	Handling
The written data does not include the auto-increment column	Doris automatically generates and fills in a unique value for the column
The written data includes the auto-increment column with a null value	Doris replaces the null value with a system-generated unique value
The written data includes the auto-increment column with a non-null value	The user-provided value is preserved unchanged

Important

A non-null value provided by the user may break the uniqueness of the auto-increment column.

Uniqueness

Doris guarantees that values generated in an auto-increment column are table-level unique. However:

Guaranteed uniqueness: This applies only to system-generated values.
User-provided values: Doris does not validate or enforce the uniqueness of values that the user specifies in an auto-increment column, which may lead to duplicate entries.

Density

The auto-increment values that Doris generates are usually dense, but with a few considerations:

Potential gaps: Gaps may appear due to performance optimizations. Each backend node (BE) pre-allocates a block of unique values for efficiency, and the blocks across nodes do not overlap.
Non-chronological values: Doris does not guarantee that values generated by later writes are greater than those generated by earlier writes.

Note

Auto-increment values cannot be used to infer the chronological order of writes.

Creating an Auto-Increment Column

Syntax

To use an auto-increment column, add the AUTO_INCREMENT attribute to the corresponding column when running CREATE-TABLE:

If no starting value is specified, the default starting value is 1.
You can manually specify the starting value of the auto-increment column at table creation time with AUTO_INCREMENT(start_value).

Constraints and Limitations

Only Duplicate model tables and Unique model tables can contain an auto-increment column.
A table can contain at most one auto-increment column.
The auto-increment column must be of type BIGINT and must be NOT NULL.
A manually specified starting value for the auto-increment column must be greater than or equal to 0.

Table Creation Examples

Example 1: Duplicate model table where the key column is the auto-increment column

CREATE TABLE `demo`.`tbl` (
    `id` BIGINT NOT NULL AUTO_INCREMENT,
    `value` BIGINT NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
);

Example 2: Duplicate model table where the key column is the auto-increment column with a starting value of 100

CREATE TABLE `demo`.`tbl` (
    `id` BIGINT NOT NULL AUTO_INCREMENT(100),
    `value` BIGINT NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
);

Example 3: Duplicate model table where the value column is the auto-increment column

CREATE TABLE `demo`.`tbl` (
    `uid` BIGINT NOT NULL,
    `name` BIGINT NOT NULL,
    `id` BIGINT NOT NULL AUTO_INCREMENT,
    `value` BIGINT NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(`uid`, `name`)
DISTRIBUTED BY HASH(`uid`) BUCKETS 10
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
);

Example 4: Unique model table where the key column is the auto-increment column

CREATE TABLE `demo`.`tbl` (
    `id` BIGINT NOT NULL AUTO_INCREMENT,
    `name` varchar(65533) NOT NULL,
    `value` int(11) NOT NULL
) ENGINE=OLAP
UNIQUE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
);

Example 5: Unique model table where the value column is the auto-increment column

CREATE TABLE `demo`.`tbl` (
    `text` varchar(65533) NOT NULL,
    `id` BIGINT NOT NULL AUTO_INCREMENT
) ENGINE=OLAP
UNIQUE KEY(`text`)
DISTRIBUTED BY HASH(`text`) BUCKETS 10
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
);

Data Ingestion

Regular Ingestion

Take the following table as an example:

CREATE TABLE `demo`.`tbl` (
    `id` BIGINT NOT NULL AUTO_INCREMENT,
    `name` varchar(65533) NOT NULL,
    `value` int(11) NOT NULL
) ENGINE=OLAP
UNIQUE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
);

Scenario 1: INSERT INTO without specifying the auto-increment column

When you load data with an insert into statement without specifying the auto-increment column id, Doris automatically fills in generated values for the id column:

insert into tbl(name, value) values("Bob", 10), ("Alice", 20), ("Jack", 30);

select * from tbl order by id;
+------+-------+-------+
| id   | name  | value |
+------+-------+-------+
|    1 | Bob   |    10 |
|    2 | Alice |    20 |
|    3 | Jack  |    30 |
+------+-------+-------+

Scenario 2: Stream Load without specifying the auto-increment column

Similarly, when you use stream load to import the file test.csv without specifying the auto-increment column id, Doris automatically fills in generated values for the id column.

Contents of test.csv:

Tom,40
John,50

Run the load command:

curl --location-trusted -u user:passwd \
    -H "columns:name,value" \
    -H "column_separator:," \
    -T ./test.csv \
    http://{host}:{port}/api/{db}/tbl/_stream_load

Query the result:

select * from tbl order by id;
+------+-------+-------+
| id   | name  | value |
+------+-------+-------+
|    1 | Bob   |    10 |
|    2 | Alice |    20 |
|    3 | Jack  |    30 |
|    4 | Tom   |    40 |
|    5 | John  |    50 |
+------+-------+-------+

Scenario 3: INSERT INTO specifying the auto-increment column with NULL values

When you specify the auto-increment column id during an insert into, any null values in that column are replaced with generated values:

insert into tbl(id, name, value) values(null, "Doris", 60), (null, "Nereids", 70);

select * from tbl order by id;
+------+---------+-------+
| id   | name    | value |
+------+---------+-------+
|    1 | Bob     |    10 |
|    2 | Alice   |    20 |
|    3 | Jack    |    30 |
|    4 | Tom     |    40 |
|    5 | John    |    50 |
|    6 | Doris   |    60 |
|    7 | Nereids |    70 |
+------+---------+-------+

Partial Column Update

When performing partial column updates on a merge-on-write Unique table that contains an auto-increment column, the behavior depends on whether the auto-increment column is a key column:

Auto-increment column position	Update behavior
Key column	The user must explicitly specify the key column, and the target columns of the partial update must include the auto-increment column. The load behavior is the same as a regular partial column update.
Non-key column	When the auto-increment column is not specified, its value is filled in from the existing row in the table. When it is specified, `null` values are replaced with generated values, non-`null` values are preserved, and the data is then inserted with partial column update semantics.

Example 1: Auto-increment column as a key column

CREATE TABLE `demo`.`tbl2` (
    `id` BIGINT NOT NULL AUTO_INCREMENT,
    `name` varchar(65533) NOT NULL,
    `value` int(11) NOT NULL DEFAULT "0"
) ENGINE=OLAP
UNIQUE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3",
    "enable_unique_key_merge_on_write" = "true"
);

insert into tbl2(id, name, value) values(1, "Bob", 10), (2, "Alice", 20), (3, "Jack", 30);

select * from tbl2 order by id;
+------+-------+-------+
| id   | name  | value |
+------+-------+-------+
|    1 | Bob   |    10 |
|    2 | Alice |    20 |
|    3 | Jack  |    30 |
+------+-------+-------+

set enable_unique_key_partial_update=true;
set enable_insert_strict=false;
insert into tbl2(id, name) values(1, "modified"), (4, "added");

select * from tbl2 order by id;
+------+----------+-------+
| id   | name     | value |
+------+----------+-------+
|    1 | modified |    10 |
|    2 | Alice    |    20 |
|    3 | Jack     |    30 |
|    4 | added    |     0 |
+------+----------+-------+

Example 2: Auto-increment column as a non-key column

CREATE TABLE `demo`.`tbl3` (
    `id` BIGINT NOT NULL,
    `name` varchar(100) NOT NULL,
    `score` BIGINT NOT NULL,
    `aid` BIGINT NOT NULL AUTO_INCREMENT
) ENGINE=OLAP
UNIQUE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3",
    "enable_unique_key_merge_on_write" = "true"
);

insert into tbl3(id, name, score) values(1, "Doris", 100), (2, "Nereids", 200), (3, "Bob", 300);

select * from tbl3 order by id;
+------+---------+-------+------+
| id   | name    | score | aid  |
+------+---------+-------+------+
|    1 | Doris   |   100 |    0 |
|    2 | Nereids |   200 |    1 |
|    3 | Bob     |   300 |    2 |
+------+---------+-------+------+

set enable_unique_key_partial_update=true;
set enable_insert_strict=false;
insert into tbl3(id, score) values(1, 999), (2, 888);

select * from tbl3 order by id;
+------+---------+-------+------+
| id   | name    | score | aid  |
+------+---------+-------+------+
|    1 | Doris   |   999 |    0 |
|    2 | Nereids |   888 |    1 |
|    3 | Bob     |   300 |    2 |
+------+---------+-------+------+

insert into tbl3(id, aid) values(1, 1000), (3, 500);

select * from tbl3 order by id;
+------+---------+-------+------+
| id   | name    | score | aid  |
+------+---------+-------+------+
|    1 | Doris   |   999 | 1000 |
|    2 | Nereids |   888 |    1 |
|    3 | Bob     |   300 |  500 |
+------+---------+-------+------+

Typical Use Cases

Use Case 1: Dictionary Encoding

Applicable business: In user profiling scenarios that use bitmaps for cohort analysis, you need to build a user dictionary in which each user maps to a unique integer dictionary value. Dense dictionary values yield better bitmap performance.

The following steps illustrate the procedure with an offline UV/PV analysis scenario.

Step 1: Create the user behavior detail table

Suppose the following user behavior table stores the detail data:

CREATE TABLE `demo`.`dwd_dup_tbl` (
    `user_id` varchar(50) NOT NULL,
    `dim1` varchar(50) NOT NULL,
    `dim2` varchar(50) NOT NULL,
    `dim3` varchar(50) NOT NULL,
    `dim4` varchar(50) NOT NULL,
    `dim5` varchar(50) NOT NULL,
    `visit_time` DATE NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(`user_id`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 32
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
);

Step 2: Create the dictionary table using an auto-increment column

CREATE TABLE `demo`.`dictionary_tbl` (
    `user_id` varchar(50) NOT NULL,
    `aid` BIGINT NOT NULL AUTO_INCREMENT
) ENGINE=OLAP
UNIQUE KEY(`user_id`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 32
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3",
    "enable_unique_key_merge_on_write" = "true"
);

Step 3: Load data to build the encoding mapping

Load the user_id values from the existing data into the dictionary table to build the mapping from user_id to integer values:

insert into dictionary_tbl(user_id)
select user_id from dwd_dup_tbl group by user_id;

Or, use the following approach to load only the user_id values from incremental data into the dictionary table:

insert into dictionary_tbl(user_id)
select dwd_dup_tbl.user_id from dwd_dup_tbl left join dictionary_tbl
on dwd_dup_tbl.user_id = dictionary_tbl.user_id
where dwd_dup_tbl.visit_time > '2023-12-10' and dictionary_tbl.user_id is NULL;

tip

In real-world scenarios, you can also use the Flink Connector to write data into Doris.

Step 4: Create the aggregate result table

Suppose dim1, dim3, and dim5 are the statistical dimensions of interest. Create the following aggregate table to store the aggregation results:

CREATE TABLE `demo`.`dws_agg_tbl` (
    `dim1` varchar(50) NOT NULL,
    `dim3` varchar(50) NOT NULL,
    `dim5` varchar(50) NOT NULL,
    `user_id_bitmap` BITMAP BITMAP_UNION NOT NULL,
    `pv` BIGINT SUM NOT NULL
) ENGINE=OLAP
AGGREGATE KEY(`dim1`,`dim3`,`dim5`)
DISTRIBUTED BY HASH(`dim1`) BUCKETS 32
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
);

Step 5: Aggregate the data and run queries

Aggregate the data and store the result in the aggregate result table:

insert into dws_agg_tbl
select dwd_dup_tbl.dim1, dwd_dup_tbl.dim3, dwd_dup_tbl.dim5, BITMAP_UNION(TO_BITMAP(dictionary_tbl.aid)), COUNT(1)
from dwd_dup_tbl INNER JOIN dictionary_tbl on dwd_dup_tbl.user_id = dictionary_tbl.user_id
group by dwd_dup_tbl.dim1, dwd_dup_tbl.dim3, dwd_dup_tbl.dim5;

Run UV and PV queries:

select dim1, dim3, dim5, bitmap_count(user_id_bitmap) as uv, pv from dws_agg_tbl;

Use Case 2: Efficient Pagination

Background: When displaying data on a page, pagination is often required. Traditional pagination typically uses limit, offset, and order by in SQL. However, with deep pagination (when offset is very large), even if only a few rows are actually needed, this approach still reads all the data into memory, performs a full sort, and then continues processing. This is inefficient.

Optimization idea: You can give each row a unique value through an auto-increment column. During a query, record the maximum value max_value of the unique_value column from the previous page, then use where unique_value > max_value limit rows_per_page to filter out a large amount of data early via predicate pushdown, achieving more efficient pagination.

Traditional pagination (inefficient)

Suppose the following business table needs to be paginated:

CREATE TABLE `demo`.`records_tbl` (
    `user_id` int(11) NOT NULL COMMENT "",
    `name` varchar(26) NOT NULL COMMENT "",
    `address` varchar(41) NOT NULL COMMENT "",
    `city` varchar(11) NOT NULL COMMENT "",
    `nation` varchar(16) NOT NULL COMMENT "",
    `region` varchar(13) NOT NULL COMMENT "",
    `phone` varchar(16) NOT NULL COMMENT "",
    `mktsegment` varchar(11) NOT NULL COMMENT ""
) DUPLICATE KEY (`user_id`, `name`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 10
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
);

Suppose each page displays 100 rows:

Get page 1:

select * from records_tbl order by user_id, name limit 100;

Get page 2:

select * from records_tbl order by user_id, name limit 100 offset 100;

Auto-increment column pagination (efficient)

Add an auto-increment column to the table to give each row a unique identifier:

CREATE TABLE `demo`.`records_tbl2` (
    `user_id` int(11) NOT NULL COMMENT "",
    `name` varchar(26) NOT NULL COMMENT "",
    `address` varchar(41) NOT NULL COMMENT "",
    `city` varchar(11) NOT NULL COMMENT "",
    `nation` varchar(16) NOT NULL COMMENT "",
    `region` varchar(13) NOT NULL COMMENT "",
    `phone` varchar(16) NOT NULL COMMENT "",
    `mktsegment` varchar(11) NOT NULL COMMENT "",
    `unique_value` BIGINT NOT NULL AUTO_INCREMENT
) DUPLICATE KEY (`user_id`, `name`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 10
PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
);

Each page displays 100 rows:

Query page 1:

select * from records_tbl2 order by unique_value limit 100;

Query page 2: Have your program record the maximum unique_value returned by the previous query (assume it is 99), then run:
```
select * from records_tbl2 where unique_value > 99 order by unique_value limit 100;
```

Jump directly to a later page: When you want to query a later page directly, it is not convenient to obtain the maximum unique_value from the previous page. For example, to retrieve page 101 directly, run the following query:

select user_id, name, address, city, nation, region, phone, mktsegment
from records_tbl2, (select unique_value as max_value from records_tbl2 order by unique_value limit 1 offset 9999) as previous_data
where records_tbl2.unique_value > previous_data.max_value
order by unique_value limit 100;

Core Features​

Write Behavior​

Uniqueness​

Density​

Creating an Auto-Increment Column​

Syntax​

Constraints and Limitations​

Table Creation Examples​

Data Ingestion​

Regular Ingestion​

Scenario 1: INSERT INTO without specifying the auto-increment column​

Scenario 2: Stream Load without specifying the auto-increment column​

Scenario 3: INSERT INTO specifying the auto-increment column with NULL values​

Partial Column Update​

Example 1: Auto-increment column as a key column​

Example 2: Auto-increment column as a non-key column​

Typical Use Cases​

Use Case 1: Dictionary Encoding​

Step 1: Create the user behavior detail table​

Step 2: Create the dictionary table using an auto-increment column​

Step 3: Load data to build the encoding mapping​

Step 4: Create the aggregate result table​

Step 5: Aggregate the data and run queries​

Use Case 2: Efficient Pagination​

Traditional pagination (inefficient)​

Auto-increment column pagination (efficient)​