Unique Key Model
When data updates are required, use the Unique Key Model. It guarantees the uniqueness of the Key columns so that new data overwrites existing records with matching keys, ensuring that only the most up-to-date records are maintained. This model is ideal for update scenarios, enabling unique-key-level updates during data insertion. The Unique Key Model has the following characteristics:
-
Unique Key UPSERT: During insertion, records with duplicate keys are updated, while new keys are inserted.
-
Automatic Deduplication: The model ensures key uniqueness and automatically deduplicates data based on the unique key.
-
Optimized for High-frequency Updates: It efficiently handles high-frequency updates while balancing update and query performance.
Use Casesβ
-
High-frequency Data Updates: In upstream OLTP databases, where dimension tables are frequently updated, the Unique Key Model can efficiently synchronize the upstream updated records and perform efficient UPSERT operations.
-
Efficient Data Deduplication: In scenarios such as advertising campaigns or customer relationship management systems, where deduplication is required based on user IDs, the Unique Key Model ensures efficient deduplication.
-
Partial Columns Updates: In scenarios such as in user profiling where dynamic tags change frequently, or in order consumption scenarios where the transaction status needs to be updated. The Unique Key Model's partial column update capability allows for changes to specific columns.
Implementation Methodsβ
In Doris, the Unique Key Model has two implementation methods:
-
Merge-on-write: Starting from version 1.2, the default implementation of the Unique Key Model in Doris is the merge-on-write mode. In this mode, data is immediately merged for the same Key upon writing, ensuring that the data storage state after each write is the final merged result of the unique key, and only the latest result is stored. Merge-on-write provides a good balance between query and write performance, avoiding the need to merge multiple versions of data during queries and ensuring predicate pushdown to the storage layer. The merge-on-write model is recommended for most scenarios.
-
Merge-on-read: Prior to version 1.2, Doris's Unique Key Model defaulted to merge-on-read mode. In this mode, data is not merged upon writing but is appended incrementally, retaining multiple versions within Doris. During queries or Compaction, data is merged by the same Key version. Merge-on-read is suitable for write-heavy and read-light scenarios, but during queries, multiple versions must be merged, and predicates cannot be pushed down, which may affect query speed.
In Doris, there are two types of update semantics for the Unique Key Model:
-
Full Row Upsert: The default update semantic for the Unique Key Model is full row UPSERT, i.e., UPDATE OR INSERT. If the Key of the row exists, it will be updated; if it does not exist, new data will be inserted. In the full row UPSERT semantic, even if the user inserts data into specific columns using
INSERT INTO
, Doris will fill in the missing columns with NULL values or default values during the planner stage. -
Partial Column Upsert: If users want to update specific fields, they need to use the merge-on-write implementation and enable partial column updates support via specific parameters. Please refer to the documentation on Partial Column Updates.
Merge-on-writeβ
Creating a Merge-on-write Tableβ
To create a Unique Key table, use the UNIQUE KEY
keyword. Enable merge-on-write mode by setting the enable_unique_key_merge_on_write
attribute (default since Doris 2.1):
CREATE TABLE IF NOT EXISTS example_tbl_unique
(
user_id LARGEINT NOT NULL,
user_name VARCHAR(50) NOT NULL,
city VARCHAR(20),
age SMALLINT,
sex TINYINT
)
UNIQUE KEY(user_id, user_name)
DISTRIBUTED BY HASH(user_id) BUCKETS 10
PROPERTIES (
"enable_unique_key_merge_on_write" = "true"
);
Merge-on-readβ
Creating a Merge-on-read Tableβ
When creating a table, the UNIQUE KEY
keyword can be used to specify a Unique Key table. The merge-on-read mode can be enabled by explicitly disabling the enable_unique_key_merge_on_write
attribute. Before Doris version 2.1, the merge-on-read mode was enabled by default:
CREATE TABLE IF NOT EXISTS example_tbl_unique
(
user_id LARGEINT NOT NULL,
username VARCHAR(50) NOT NULL,
city VARCHAR(20),
age SMALLINT,
sex TINYINT
)
UNIQUE KEY(user_id, username)
DISTRIBUTED BY HASH(user_id) BUCKETS 10
PROPERTIES (
"enable_unique_key_merge_on_write" = "false"
);
Data Insertion and Storageβ
In a Unique Key table, the Key columns serve both for sorting and deduplication. New insertions overwrite existing records with matching keys.
As shown in the example, there were 4 rows of data in the original table. After inserting 2 new rows, the newly inserted rows are updated based on the unique key:
-- insert into raw data
INSERT INTO example_tbl_unique VALUES
(101, 'Tom', 'BJ', 26, 1),
(102, 'Jason', 'BJ', 27, 1),
(103, 'Juice', 'SH', 20, 2),
(104, 'Olivia', 'SZ', 22, 2);
-- insert into data to update by key
INSERT INTO example_tbl_unique VALUES
(101, 'Tom', 'BJ', 27, 1),
(102, 'Jason', 'SH', 28, 1);
-- check updated data
SELECT * FROM example_tbl_unique;
+---------+----------+------+------+------+
| user_id | username | city | age | sex |
+---------+----------+------+------+------+
| 101 | Tom | BJ | 27 | 1 |
| 102 | Jason | SH | 28 | 1 |
| 104 | Olivia | SZ | 22 | 2 |
| 103 | Juice | SH | 20 | 2 |
+---------+----------+------+------+------+
Notesβ
-
The implementation mode for Unique Key tables is fixed at creation and cannot be changed via schema alterations.
-
Under full row UPSERT semantics, if specific columns are omitted during insertion, Doris fills them with NULL or default values during planning.
-
For partial column upsert, enable merge-on-write mode with the appropriate parameters. Refer to Partial Column Updates for guidance.