Last updated on May 17, 2026

Data Update Overview

This article introduces the data update and deletion capabilities of Apache Doris, helping you answer the following questions:

Which table model should you choose for your business scenario?
When should you use ingest-based updates (UPSERT, partial column updates), and when should you use DML updates (UPDATE, INSERT INTO SELECT)?
What is the implementation mechanism behind the Unique Key model? (Merge-on-Write, Sequence column, delete marker, partial column update)
How are typical scenarios such as CDC synchronization and real-time wide tables implemented?

Reading suggestions:

For a quick selection, go directly to 1. Choosing the Right Update Method.
To understand the principles or troubleshoot performance issues, see 2. Unique Key Model Implementation Mechanism.
To reference typical business practices, see 3. Typical Business Scenarios.
To learn about best practices and limitations, see 4. Usage Recommendations.

1. Choosing the Right Update Method

1.1 Comparison of Table Models and Update Capabilities

Doris provides three table models, with significant differences in support for updates and deletions. When making a selection, first confirm whether your business requires row-level updates or partial column updates.

Table Model	Data Organization	Supported Update/Delete Methods	Typical Use Cases
Unique Key Model	Each row is identified by a unique primary key, deduplicated on write	UPSERT, partial column update, `UPDATE`, marked deletion, `DELETE`	Order status changes, user tag updates, CDC synchronization
Aggregate Key Model	Value columns with the same Key are merged by aggregation function (SUM/MAX/MIN/REPLACE)	Updated through ingestion based on aggregation semantics; `DELETE` only supports Key column conditions	Real-time reports, click counts, and other aggregation scenarios
Duplicate Key Model	Append-only writes, no deduplication or aggregation	Only supports `DELETE`	Logs, behavior tracking, and other append-only scenarios

Selection conclusion: Choose the Unique Key model when row-level updates or partial column updates are required.

1.2 Update Path Selection

The Unique Key model supports two update paths, which can be chosen based on data scale and business frequency:

Path	Applicable Scenarios	Recommended Write Method
Ingest-based update (UPSERT)	High-frequency, large-batch updates; CDC synchronization; multi-source wide table assembly	Stream Load, Routine Load, Broker Load, `INSERT INTO`
DML update (`UPDATE`)	Low-frequency, batch updates; conditional data refresh; cross-table associated updates	`UPDATE`, `INSERT INTO ... SELECT ...`

1.2.1 Update via Ingestion (UPSERT)

All ingestion methods (Stream Load, Broker Load, Routine Load, INSERT INTO) process data in the Unique Key model with UPSERT semantics by default:

If the primary key already exists: the old row is overwritten by the new row.
If the primary key does not exist: a new row is inserted.

Upsert

The behavior of ingest-based updates differs across table models. For details, see:

Update via Ingestion in the Unique Key Model: Implement full-row Upsert and partial column updates through Stream Load, Routine Load, and other methods.
Update via Ingestion in the Aggregate Model: The write semantics and available update methods for the Aggregate model.

1.2.2 Update via the UPDATE Statement

Doris supports the standard SQL UPDATE, which can update data in Unique Key model tables based on WHERE conditions and supports cross-table associated updates.

-- Simple update
UPDATE user_profiles SET age = age + 1 WHERE user_id = 1;

-- Cross-table associated update
UPDATE sales_records t1
SET t1.user_name = t2.name
FROM user_profiles t2
WHERE t1.user_id = t2.user_id;

The execution process of UPDATE is: first scan the data that meets the conditions, then write the updated rows back. This is suitable for low-frequency, batch update scenarios.

Note

High-concurrency UPDATE operations on data with the same primary key are not recommended. Concurrent UPDATE operations cannot guarantee data isolation when they involve the same primary key.

For the syntax, typical usage, and limitations of the UPDATE statement, see: Updating Data Using the UPDATE Command.

1.2.3 Update via INSERT INTO SELECT

Since the Unique Key model uses UPSERT semantics by default, using INSERT INTO ... SELECT ... can also achieve the effect of UPDATE, which is suitable for batch write-back from other tables.

1.3 Data Deletion Paths

Doris provides two data deletion paths, with different implementation mechanisms across table models:

Deletion Path	Supported Table Models	Description
Marked deletion via ingestion	Unique Key Model	Writes a `__DORIS_DELETE_SIGN__ = 1` marker, with background Compaction performing physical cleanup
DML deletion (`DELETE` / `TRUNCATE`)	All models	Delete data by condition or empty a table/partition

For complete information, see Data Deletion.

2. Unique Key Model Implementation Mechanism

2.1 Merge-on-Write and Merge-on-Read

The Unique Key model has two data merging strategies. Since Doris 2.1, Merge-on-Write is the default implementation.

Dimension	Merge-on-Write (MoW)	Merge-on-Read (MoR, legacy)
Behavior on write	Deduplicates and merges on write, keeping only one latest record per primary key in storage	Retains multiple versions on write
Query performance	Close to that of a Duplicate Key table without updates	Real-time merge during queries, taking approximately 3-10 times longer than MoW
Write performance	Has merging overhead, slightly lower than MoR (about 10-20% for small batches, about 30-50% for large batches)	Close to a Duplicate Key table
Resource consumption	Writes and background Compaction consume more CPU/memory	Queries consume more CPU/memory
Applicable scenarios	Read-heavy with infrequent writes (recommended)	Write-heavy with infrequent reads (no longer recommended)

Newly created tables use MoW by default, with no additional configuration required.

2.2 Sequence Column and Out-of-Order Data

In a distributed system, data may arrive out of order. For example, an order status changes successively to "Paid" and "Shipped", but due to network latency, the "Shipped" message may arrive at Doris before the "Paid" message.

The Sequence column mechanism solves this problem: when creating the table, specify a column (typically a timestamp or version number) as the Sequence column. When writing data with the same primary key, Doris always retains the row with the largest Sequence value.

CREATE TABLE order_status (
    order_id BIGINT,
    status_name STRING,
    update_time DATETIME
)
UNIQUE KEY(order_id)
DISTRIBUTED BY HASH(order_id)
PROPERTIES (
    "function_column.sequence_col" = "update_time" -- Specify update_time as the Sequence column
);

-- 1. Write the "Shipped" record (larger update_time)
-- {"order_id": 1001, "status_name": "Shipped", "update_time": "2023-10-26 12:00:00"}

-- 2. Write the "Paid" record (smaller update_time, arrives later)
-- {"order_id": 1001, "status_name": "Paid", "update_time": "2023-10-26 11:00:00"}

-- Final result: the record with the largest update_time is retained
-- order_id: 1001, status_name: "Shipped", update_time: "2023-10-26 12:00:00"

For more information:

For concurrency control capabilities such as the Sequence column, MVCC version management, and UPDATE concurrency parameters, see: Concurrency Control of Updates in the Unique Key Model.
When multiple data streams update different columns of the same wide table simultaneously, you can use Sequence Mapping to control the version order of each column independently. For details, see: Multi-Stream Updates in the Unique Key Model.

2.3 Workflow of the Delete Marker

__DORIS_DELETE_SIGN__ uses a "logical marker, background cleanup" approach, divided into three stages:

Execute deletion: When data is deleted via ingestion or the DELETE statement, Doris does not immediately remove the data from physical files. Instead, it writes a new record with __DORIS_DELETE_SIGN__ marked as 1.
Query filtering: During queries, Doris automatically appends the filter condition WHERE __DORIS_DELETE_SIGN__ = 0 to hide rows marked for deletion from the results.
Background Compaction: When the Compaction process detects that a primary key has both a normal record and a delete marker record, it physically removes both records during the merge process, freeing storage space.

Delete Sign

2.4 Partial Column Update

Starting from version 2.0, the Unique Key model (MoW) supports partial column updates: during ingestion, you only need to provide the primary key and the columns to be updated. Columns that are not provided retain their original values.

How to enable:

Ingestion Method	Configuration to Enable
`INSERT INTO`	Set the session variable `enable_unique_key_partial_update = true`
Stream Load and other ingestion methods	Set the `partial_columns` parameter to `true`

Merge-on-Write must be enabled when creating the table:

CREATE TABLE user_profiles (
    user_id BIGINT,
    name STRING,
    age INT,
    last_login DATETIME
)
UNIQUE KEY(user_id)
DISTRIBUTED BY HASH(user_id)
PROPERTIES (
    "enable_unique_key_merge_on_write" = "true"
);

-- Initial data
-- user_id: 1, name: 'Alice', age: 30, last_login: '2023-10-01 10:00:00'

-- Partial update via Stream Load, writing only age and last_login
-- {"user_id": 1, "age": 31, "last_login": "2023-10-26 18:00:00"}

-- Data after update
-- user_id: 1, name: 'Alice', age: 31, last_login: '2023-10-26 18:00:00'

In terms of implementation, partial column updates in the Unique Key model are not in-place updates. Instead, during ingestion, existing columns are read, missing fields are filled in, and the entire row is rewritten. This results in read amplification and write amplification: for example, when updating 10 fields of a 100-column wide table with similar field sizes, a 1 MB effective update triggers approximately 9 MB of data reads and 10 MB of data writes.

Performance recommendations:

Recommendation	Description
Use SSD	Partial column updates generate a large amount of random IO. The bottleneck on mechanical disks is significant, so SSDs (NVMe preferred) are recommended
Enable row store for wide tables	For very wide tables, enabling row store is recommended, so a single IO can read the entire row. In column store mode, each missing field requires a separate IO

For complete usage instructions (Stream Load, INSERT INTO, Flink Connector, etc.), see: Column Update.

3. Typical Business Scenarios

3.1 Real-Time CDC Synchronization

Tools such as Flink CDC capture the Binlog of upstream databases (MySQL, PostgreSQL, Oracle, etc.) and write it to Doris Unique Key model tables.

Flink CDC

Key capabilities:

Whole-database synchronization: Flink Doris Connector internally integrates Flink CDC, enabling end-to-end whole-database synchronization without manual table creation or field mapping.
Consistency guarantee: The following capabilities are used in combination to align with the upstream database state:
- The Unique Key model UPSERT handles INSERT / UPDATE.
- __DORIS_DELETE_SIGN__ handles DELETE.
- The Sequence column (such as the Binlog timestamp) handles out-of-order data.

3.2 Real-Time Wide Table Assembly

Using partial column updates, you can complete multi-source wide table assembly directly within Doris without performing real-time joins in Flink:

Create a Unique Key model wide table.
Different data sources (basic information, behavior logs, transaction data, etc.) write to it in real time via Stream Load / Routine Load.
Each data stream writes only the columns it is responsible for, for example:
- The user behavior stream updates page_view_count and last_login_time.
- The transaction stream updates total_orders and total_amount.

Each stream writes only the changed columns, which reduces IO overhead and avoids the resource consumption of real-time joins. For version control in multi-stream concurrent scenarios, see: Multi-Stream Updates in the Unique Key Model.

4. Usage Recommendations

4.1 General Recommendations

No.	Recommendation	Description
1	Prefer ingest-based updates	For high-frequency, large-volume updates, prefer Stream Load and Routine Load over `UPDATE` DML
2	Batch writes	Avoid high-frequency single-row `INSERT` (> 100 TPS), as each INSERT has transaction overhead. You can enable Group Commit to merge small batch commits
3	Use high-frequency DELETE on Duplicate/Aggregate models with caution	Predicate accumulation affects subsequent query performance
4	Use TRUNCATE PARTITION to delete an entire partition	When deleting an entire partition, `TRUNCATE PARTITION` is much more efficient than `DELETE`
5	Execute UPDATE serially	Avoid concurrent execution of `UPDATE` tasks that may operate on the same primary key

4.2 Unique Key Model in the Compute-Storage Decoupled Architecture

Doris 3.0 introduces the compute-storage decoupled architecture. In this architecture, BE is stateless, and Merge-on-Write must maintain global state through the Meta Service to resolve write-write conflicts among ingestion, Compaction, and Schema Change. The Unique Key model MoW relies on a distributed table lock based on the Meta Service to ensure write consistency.

Distribute Lock

High-frequency ingestion and Compaction can cause table lock contention. When using this architecture, the following are recommended:

Control single-table ingestion frequency: It is recommended that the ingestion frequency of a single Unique Key table does not exceed 60 times per second. This can be reduced by batching or adjusting concurrency.
Design partitions and buckets reasonably:
- Partitions: Use time partitions (by day or by hour) so that each ingestion only updates a small number of partitions.
- Buckets: The number of buckets (number of Tablets) should match the data volume, typically between 8 and 64. Too many Tablets aggravate lock contention.
Adjust Compaction strategy: When the write pressure is high, reduce the Compaction frequency to lessen lock conflicts with ingestion tasks.
Use a recent version: Version 3.1 has made significant optimizations to the distributed table lock implementation. Using the latest stable version is recommended.

1. Choosing the Right Update Method​

1.1 Comparison of Table Models and Update Capabilities​

1.2 Update Path Selection​

1.2.1 Update via Ingestion (UPSERT)​

1.2.2 Update via the UPDATE Statement​

1.2.3 Update via INSERT INTO SELECT​

1.3 Data Deletion Paths​

2. Unique Key Model Implementation Mechanism​

2.1 Merge-on-Write and Merge-on-Read​

2.2 Sequence Column and Out-of-Order Data​

2.3 Workflow of the Delete Marker​

2.4 Partial Column Update​

3. Typical Business Scenarios​

3.1 Real-Time CDC Synchronization​

3.2 Real-Time Wide Table Assembly​

4. Usage Recommendations​

4.1 General Recommendations​

4.2 Unique Key Model in the Compute-Storage Decoupled Architecture​