Updating Data with UPDATE Command
This document explains how to use the UPDATE
command to modify data in Doris. The UPDATE
command is only applicable to tables with a Unique data model.
Applicable Scenariosβ
-
Small-scale data updates: Ideal for scenarios where a small amount of data needs to be corrected, such as fixing erroneous fields in certain records or updating the status of specific fields (e.g., order status updates).
-
ETL batch processing of certain fields: Suitable for large-scale updates of a specific field, commonly seen in ETL processing scenarios. Note: Large-scale data updates should be infrequent.
How It Worksβ
The query engine uses its own filtering logic to identify the rows that need to be updated. Then, using the Unique model's Value column logic, it replaces old data with new data. The rows to be updated are modified and reinserted into the table to achieve row-level updates.
Synchronizationβ
The UPDATE
syntax in Doris is synchronous, meaning that once the UPDATE
statement is successfully executed, the update operation is completed and the data is immediately visible.
Performanceβ
The performance of the UPDATE
statement is closely related to the number of rows to be updated and the efficiency of the query conditions.
-
Number of rows to be updated: The more rows that need updating, the slower the
UPDATE
statement will be. For small-scale updates, Doris supports a frequency similar toINSERT INTO
statements. For large-scale updates, due to the long execution time, it is only suitable for infrequent calls. -
Efficiency of query conditions: The
UPDATE
implementation first reads the rows that meet the query conditions. Therefore, if the query conditions are efficient, theUPDATE
speed will be fast. It is best if the condition columns can hit the index or partition bucket pruning, so Doris does not need to scan the entire table and can quickly locate the rows that need updating, thereby improving update efficiency. It is strongly recommended not to include value columns in the condition columns.
Usage Exampleβ
Assume there is a transaction details table with the following structure in a financial risk control scenario:
CREATE TABLE transaction_details (
transaction_id BIGINT NOT NULL, -- Unique transaction ID
user_id BIGINT NOT NULL, -- User ID
transaction_date DATE NOT NULL, -- Transaction date
transaction_time DATETIME NOT NULL, -- Transaction time
transaction_amount DECIMAL(18, 2), -- Transaction amount
transaction_device STRING, -- Transaction device
transaction_region STRING, -- Transaction region
average_daily_amount DECIMAL(18, 2), -- Average daily transaction amount over the last 3 months
recent_transaction_count INT, -- Number of transactions in the last 7 days
has_dispute_history BOOLEAN, -- Whether there is a dispute history
risk_level STRING -- Risk level
)
UNIQUE KEY(transaction_id)
DISTRIBUTED BY HASH(transaction_id) BUCKETS 16
PROPERTIES (
"replication_num" = "3", -- Number of replicas, default is 3
"enable_unique_key_merge_on_write" = "true" -- Enable MOW mode, support merge updates
);
The following transaction data exists:
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL |
| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL |
| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL |
| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL |
| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
Update the risk level of all daily transaction records according to the following risk control rules:
- Transactions with a dispute history have a risk level of high.
- Transactions in high-risk regions have a risk level of high.
- Transactions with abnormal amounts (exceeding 5 times the daily average) have a risk level of high.
- Frequent transactions in the last 7 days: a. Transactions > 50 times have a risk level of high. b. Transactions between 20 and 50 times have a risk level of medium.
- Transactions during non-working hours (2 AM to 4 AM) have a risk level of medium.
- The default risk level is low.
UPDATE transaction_details
SET risk_level = CASE
-- Transactions with a dispute history or in high-risk regions
WHEN has_dispute_history = TRUE THEN 'high'
WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high'
-- Abnormal transaction amount
WHEN transaction_amount > 5 * average_daily_amount THEN 'high'
-- High transaction frequency in the last 7 days
WHEN recent_transaction_count > 50 THEN 'high'
WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium'
-- Transactions during non-working hours
WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium'
-- Default risk level
ELSE 'low'
END
WHERE transaction_date = '2024-11-24';
The updated data is as follows:
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low |
| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium |
| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium |
| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high |
| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
More Helpβ
For more detailed syntax on data updates, please refer to the UPDATE command manual. You can also enter HELP UPDATE
in the MySQL client command line for more help.