Best Practices for Cache Optimization in Read-Write Splitting Scenarios

When using Apache Doris's storage-compute separation architecture, especially in scenarios where multiple Compute Groups are deployed to implement read-write splitting, query performance is highly dependent on the File Cache hit rate. When a read-only compute group experiences a cache miss, it needs to pull data from remote object storage, leading to a significant increase in query latency.

This document aims to detail how to effectively reduce cache miss issues caused by common scenarios such as Compaction, Data Ingestion, and Schema Change through cache warm-up and related configurations, thereby ensuring the query performance stability of the read-only cluster.

Core Issue: Cache Invalidation Caused by New Data Versions (Rowsets)

In Doris, both background processes like Compaction / Schema Change and foreground data ingestion will generate new sets of data files (Rowsets). On the nodes of the write-only compute group responsible for writes, this data is written to the local File Cache by default, so its query performance is not affected.

However, for a read-only compute group, when it synchronizes metadata and becomes aware of these new Rowsets, its local cache does not contain this new data. If a query then needs to access these new Rowsets, it will trigger a cache miss, leading to a performance degradation.

To solve this problem, the core idea is: to load data into the read-only compute group's cache in advance or intelligently before it is queried.

I. Overview of Cache Warm-up Mechanisms

Cache warm-up is the process of proactively loading data from remote storage into the File Cache of BE nodes. Doris provides the following three main warm-up methods:

1. Proactive Incremental Warm-up (Recommended)

This is a more intelligent and automated mechanism. It establishes a warm-up relationship between the write compute group and the read-only compute group. When a write/Compaction event generates a new Rowset, it actively notifies and triggers the associated read-only compute group to perform an asynchronous cache warm-up.

Applicable Scenarios:

Most scenarios.
Requires user permission to configure warm-up relationships.

[Documentation Link]: For detailed information on how to configure and use proactive incremental warm-up, please refer to the official documentation FileCache Proactive Incremental Warm-up.

2. Read-Only Compute Group Automatic Warm-up

This is a lightweight, automatic warm-up strategy. By enabling a configuration on the BE nodes of the read-only compute group, it automatically triggers an asynchronous warm-up task when it perceives a new Rowset.

Applicable Scenarios:

Most scenarios.
Requires user permission to configure warm-up relationships.

Core Configuration: In the be.conf of the read-only compute group, set:

enable_warmup_immediately_on_new_rowset = true

II. Optimizing the Impact of Compaction / Schema Change on Query Performance

Background Compaction merges old Rowsets and generates new ones. If the new Rowsets are not warmed up, the query performance of the read-only compute group will fluctuate due to cache misses. The following are two recommended solutions.

Solution 1: Proactive Incremental Warm-up + Delayed Commit (Recommended)

This solution can fundamentally prevent the read-only compute group from querying new Rowsets generated by Compaction / Schema Change that have not yet been cached.

How it Works:

First, configure the proactive incremental warm-up relationship between the write compute group and the read-only compute group.
On the BE nodes of the write compute group, enable the delayed commit feature for Compaction / Schema Change.

Core Configuration (Write Compute Group be.conf):

enable_compaction_delay_commit_for_warm_up = true

Workflow:
1. A Compaction / Schema Change task completes on the write compute group and generates a new Rowset.
2. At this point, the Rowset is not immediately committed (i.e., it is not visible to the read-only compute group).
3. The system triggers the associated read-only compute group to warm up the cache for this new Rowset.
4. Only after all associated read-only compute groups have completed the warm-up, the new Rowset is finally committed and becomes visible to all compute groups.

Advantages:

Seamless Switching: For the read-only compute group, all visible data post-Compaction is already in the cache, so query performance does not fluctuate.
High Stability: This is the most robust solution for ensuring query performance in read-write splitting scenarios.

Solution 2: Read-Only Compute Group Automatic Warm-up + Query Awareness

This solution uses intelligent selection at the query layer to try to skip new Rowsets that have not yet been warmed up (Note: For Unique Key MoW tables, Rowsets from compaction cannot be skipped to ensure correctness).

How it Works:

On the BE nodes of the read-only compute group, enable automatic warm-up.

Core Configuration (Read-Only Compute Group be.conf):

enable_warmup_immediately_on_new_rowset = true

During a query, enable the "prefer cached" Rowset selection strategy via a session variable or user property.

Set in the query session:

SET enable_prefer_cached_rowset = true;

Or set as a user property:

SET property for "jack" enable_prefer_cached_rowset = true;

Workflow:
1. When the read-only compute group perceives a new Rowset from Compaction, it asynchronously triggers a warm-up task.
2. With enable_prefer_cached_rowset enabled, the query planner, when selecting Rowsets to read, will prioritize those that are already warmed up.
3. It will automatically ignore new Rowsets that are still being warmed up, provided that this does not affect data consistency (i.e., the old Rowsets before the merge are still accessible).

Advantages:

Relatively simple to configure, without needing to set up cross-compute-group warm-up relationships.
Effectively reduces performance impact in most cases.

Note:

This solution is a "best-effort" strategy. If the old Rowsets corresponding to a new Rowset have already been cleaned up, or if the query must access the latest data version, the query will still have to wait for the warm-up to complete or access the cold data directly.

III. Optimizing the Impact of Data Ingestion on Query Performance

High-frequency data ingestion (like INSERT INTO, Stream Load) continuously produces new small files (Rowsets), which also causes cache miss problems for the read-only compute group. If your business can tolerate data latency of seconds or even sub-seconds, you can adopt the following combined strategy to trade a tiny amount of "freshness" for a huge performance gain.

How it Works: This strategy combines automatic warm-up with a query freshness tolerance setting, allowing the query planner to intelligently skip the latest data that has not been warmed up within a specified time window.

Implementation Steps:

Enable a Warm-up Mechanism:
1. Enable either Proactive Incremental Warm-up or Read-Only Compute Group Automatic Warm-up(enable_warmup_immediately_on_new_rowset=true) on the read-only compute group. This is the prerequisite for data to be loaded into the cache asynchronously.
Set Query Freshness Tolerance:
1. In the query session of the read-only compute group, set the query_freshness_tolerance_ms variable.
2. Set in the query session:
```
-- Set a tolerance for 1000 milliseconds (1 second) of data latency
SET query_freshness_tolerance_ms = 1000;
```
  Or set as a user property:
```
SET property for "jack" query_freshness_tolerance_ms = 1000;
```

Workflow:

When a query starts, it checks the Rowsets it needs to access.
If a Rowset was generated within the last 1000ms and is not yet warmed up, the query planner will automatically skip it and access older, but already cached, data instead.
This way, the vast majority of queries will hit the cache, avoiding the performance degradation caused by reading the latest, cold data from recent writes.

Fallback Mechanism:

If the warm-up process for a Rowset is very slow and exceeds the time set by query_freshness_tolerance_ms (e.g., still not finished after 1000ms), the query will no longer skip it to ensure eventual data visibility. It will fall back to the default behavior: read the cold data directly.

Advantages:

Significant Performance Improvement: Effectively eliminates query performance spikes in high-throughput write scenarios.
High Flexibility: Users can make a flexible trade-off between data freshness and query performance based on their business needs.

Summary and Recommendations

Solution	Applicable Scenarios	Expected Effect (Impact of various write operations on cache hit rate)
Active incremental pre-warming + delayed commit + configurable data freshness tolerance (optional)	Suitable for scenarios with extremely high query latency requirements; requires users to have permission to configure pre-warming relationships	Compaction: None Heavyweight schema change: None Newly written data: Depends on freshness tolerance
Read-only compute group with automatic pre-warming + prefer cached data + configurable data freshness tolerance (optional)	Users have no permission to configure pre-warming relationships If freshness tolerance is not configured, ineffective for MOW primary key tables	Compaction: None Heavyweight schema change: Cache miss Newly written data: Depends on freshness tolerance

By reasonably applying the above cache warm-up strategies and related configurations, you can effectively manage the cache behavior of Apache Doris in a read-write splitting architecture, minimize performance loss due to cache misses, and ensure the stability and efficiency of your read-only query services.

Core Issue: Cache Invalidation Caused by New Data Versions (Rowsets)​

I. Overview of Cache Warm-up Mechanisms​

1. Proactive Incremental Warm-up (Recommended)​

2. Read-Only Compute Group Automatic Warm-up​

II. Optimizing the Impact of Compaction / Schema Change on Query Performance​

Solution 1: Proactive Incremental Warm-up + Delayed Commit (Recommended)​

Solution 2: Read-Only Compute Group Automatic Warm-up + Query Awareness​

III. Optimizing the Impact of Data Ingestion on Query Performance​

Summary and Recommendations​

Core Issue: Cache Invalidation Caused by New Data Versions (Rowsets)

I. Overview of Cache Warm-up Mechanisms

1. Proactive Incremental Warm-up (Recommended)

2. Read-Only Compute Group Automatic Warm-up

II. Optimizing the Impact of Compaction / Schema Change on Query Performance

Solution 1: Proactive Incremental Warm-up + Delayed Commit (Recommended)

Solution 2: Read-Only Compute Group Automatic Warm-up + Query Awareness

III. Optimizing the Impact of Data Ingestion on Query Performance

Summary and Recommendations