Table Model Overview
When creating a table in Doris, it is necessary to specify the table model to define how data is stored and managed. Doris provides three table models: the Duplicate Key Model, Unique Key Model and Aggregate Key Model, which cater to different application scenarios. Each model has corresponding mechanisms for data deduplication, aggregation, and updates. Choosing the appropriate table model helps achieve business objectives while ensuring flexibility and efficiency in data processing.
Table Model Classificationβ
Doris supports three types of table models:
-
Duplicate Key Model: Allows the specified Key columns to be duplicated, and Doris's storage layer retains all written data. This model is suitable for situations where all original data records must be preserved.
-
Unique Key Model: Ensures that each row has a unique Key value, and guarantees that there are no duplicate rows for a given Key column. The Doris storage layer retains only the latest written data for each key, making this model suitable for scenarios that involve data updates.
-
Aggregate Key Model: Allows data to be aggregated based on the Key columns. The Doris storage layer retains aggregated data, reducing storage space and improving query performance. This model is typically used in situations where summary or aggregated information (such as totals or averages) is required.
After creating the table, the properties of the table model are confirmed and cannot be modified. Choosing the right model for the business is crucial:
-
Duplicate Key Model is suitable for ad-hoc queries with any dimensions. Although it cannot leverage the benefits of pre-aggregation, it is not constrained by aggregation models and can take advantage of the columnar storage model (only reading relevant columns without needing to read all key columns).
-
Unique Key Model is designed for scenarios where a unique key constraint is needed, ensuring the uniqueness of the key. However, it cannot utilize the query benefits brought by pre-aggregations such as ROLLUP.
-
Aggregate Key Model can greatly reduce the data and computation required for aggregation queries through pre-aggregation, making it ideal for fixed-schema reporting queries. However, this model is not friendly to
count(*)
queries. Also, because the aggregation method for the Value columns is fixed, when performing other types of aggregation queries, semantic correctness must be considered. -
Update partial columns, please refer to the documentation for Partial Column Updates in Unique Key Model and Partial Column Updates in Aggregate Model for relevant usage advice.
Sort Keyβ
In Doris, data is stored in a columnar format, and a table can be divided into Key columns and Value columns. The Key columns are used for grouping and sorting, while the Value columns are used for aggregation. Key columns can consist of one or more fields, and when creating a table, data is sorted and stored according to the columns of Aggregate Key, Unique Key, and Duplicate Key models.
Different table models require the specification of Key columns during table creation, each with a different significance: for the Duplicate Key model, the Key columns represent sorting, without any uniqueness constraints. In the Aggregate Key and Unique Key models, aggregation is performed based on the Key columns, which not only have sorting capabilities but also enforce uniqueness constraints.
Proper use of the Sort Key can provide the following benefits:
-
Accelerated Query Performance: Sort keys help reduce the amount of data that needs to be scanned. For range queries or filtering queries, the sort key can directly locate the data. For queries that require sorting, the sort key can also accelerate the sorting process.
-
Data Compression Optimization: Storing data in an ordered fashion based on the sort key improves compression efficiency, as similar data will be grouped together, significantly increasing the compression ratio and reducing storage space.
-
Reduced Deduplication Costs: When using the Unique Key Model, the sort key allows Doris to perform deduplication more efficiently, ensuring data uniqueness.
When selecting a sort key, the following recommendations can be followed:
-
The Key columns must come before all Value columns.
-
Preferably choose integer types. This is because integer types are much more efficient in computation and lookup than strings.
-
For selecting different lengths of integer types, follow the principle of choosing what is sufficient.
-
For the length of
VARCHAR
andSTRING
types, follow the principle of choosing enough...
Table Model Comparisonβ
Duplicate Key Model | Unique Key Model | Aggregate Key Model | |
---|---|---|---|
Key Column Uniqueness | Not Supported, Key columns can be duplicated | Supported | Supported |
Synchronous Materialized View | Supported | Supported | Supported |
Asynchronous Materialized View | Supported | Supported | Supported |
UPDATE Statement | Not Supported | Supported | Not Supported |
DELETE Statement | Partially Supported | Supported | Not Supported |
Full Row Update on Import | Not Supported | Supported | Not Supported |
Partial Column Update on Import | Not Supported | Supported | Partially Supported |