Skip to main content

Read/Write Separation and Primary/Standby Cluster File Cache Warm-Up Configuration Guide

Background and Applicable Scenarios

To address cache cold-start problems in cross-availability-zone (AZ) high-availability failover and read/write separation scenarios, Doris introduces the File Cache active incremental warm-up mechanism. This mechanism ensures that the cache data of a target cluster stays highly consistent with the source cluster, thereby improving query performance, reducing jitter, and accelerating failover response.

This feature applies to the following two typical scenarios:

ScenarioDescriptionCore Requirement
Primary/standby cluster high availabilityThe standby cluster continuously syncs hot data from the primary cluster and takes over the load quickly when the primary cluster fails.Minimize failover latency
Read/write separationNewly written data in the write cluster is promptly warmed up to the read cluster to avoid queries hitting cold cache.Reduce read cluster query jitter
Version Information

The File Cache active incremental warm-up feature was introduced in Apache Doris 3.1.0.


Feature Overview

File Cache active warm-up supports the following two types of cache synchronization:

  1. Event-triggered warm-up: Automatically triggers synchronization after write operations such as Load, Compaction, and Schema Change complete, reducing query jitter.
  2. Periodic hot-data sync: Continuously syncs hot-query data to the target cluster through periodic scanning, ensuring stable performance of the standby cluster during primary/standby switchover.

Sync Mode Description

The applicable scenarios for the three sync modes are as follows:

ModeParameter ValueApplicable Scenario
One-time syncONCEManually triggered; suitable for initial warm-up when a new cluster comes online
Periodic syncPERIODICScheduled sync of hot data; suitable for continuous warm-keeping scenarios
Event-driven syncEVENT_DRIVENAutomatically triggered after Load, Compaction, or Schema Change operations

Creating Warm-Up Jobs

One-Time Sync

Suitable for manually triggering initial warm-up when a new cluster comes online:

WARM UP COMPUTE GROUP <target_cluster> WITH COMPUTE GROUP <source_cluster>;

Periodic Sync

Suitable for continuously maintaining hot-data synchronization:

WARM UP COMPUTE GROUP <target_cluster> WITH COMPUTE GROUP <source_cluster>
PROPERTIES (
"sync_mode" = "periodic",
"sync_interval_sec" = "600"
);
  • sync_interval_sec: The sync interval in seconds, calculated from the last start time. The default value is 600 seconds.

Event-Driven Sync

Suitable for read/write separation scenarios, where new data is automatically warmed up to the read cluster after a write operation completes:

WARM UP COMPUTE GROUP <target_cluster> WITH COMPUTE GROUP <source_cluster>
PROPERTIES (
"sync_mode" = "event_driven",
"sync_event" = "load"
);
  • sync_event: The type of event that triggers synchronization. Accepted values include load, compaction, and schema_change.

Managing Warm-Up Jobs

Viewing Job List

-- View all warm-up jobs
SHOW WARM UP JOB;

-- View a specific job
SHOW WARM UP JOB WHERE ID = 12345;

Field descriptions for query results:

FieldDescription
JobIdUnique ID of the sync job
ComputeGroupName of the target Compute Group
SrcComputeGroupName of the source Compute Group
TypeSync type: CLUSTER (cluster-level) / TABLE (table-level)
SyncModeSync mode: ONCE / PERIODIC(interval_sec) / EVENT_DRIVEN(event)
StatusJob status: PENDING / RUNNING / FINISHED / CANCELLED / DELETED
CreateTimeTime when the job was created
StartTimeTime of the last start
FinishTimeTime of the last completion
FinishBatchNumber of batches completed
AllBatchTotal number of batches to sync
ErrMsgError message (empty if no error)

Canceling a Job

CANCEL WARM UP JOB WHERE id = 12345;
Note

The current version does not support using ALTER to modify an existing job configuration. To change parameters, cancel the job first and then create a new one.


How It Works

Periodic Sync Execution Flow

  1. FE registers the job and records the sync_interval configuration.
  2. FE periodically checks whether the trigger time has been reached, calculated from the last start time.
  3. The sync job is triggered, avoiding overlapping executions.
  4. After sync completes, the status is recorded and the system waits for the next cycle.

Event-Driven Sync Execution Flow

  1. The user creates an event-driven job, FE registers the job and pushes the configuration to the source cluster BE.
  2. The source BE automatically triggers the warm-up logic after events such as Load and Compaction complete.
  3. The source BE initiates a sync request to the target BE at the Rowset granularity.
  4. After sync completes, the target BE reports the execution status to FE.

Scheduling and Storage Mechanism

  • Sync relationships are persistently stored by FE as CloudWarmUpJob objects, supporting concurrent management of multiple jobs.
  • Multiple jobs in PENDING status are allowed for the same target cluster, but only one job is allowed to be in RUNNING status at a time; the remaining jobs queue and wait.
  • Sync relationships can be managed by Compute Group name, compatible with cluster renaming and migration operations.

Metrics Monitoring

Periodic Jobs - FE-Side Metrics

Metric NameMeaning
file_cache_warm_up_job_exec_countNumber of scheduled executions
file_cache_warm_up_job_requested_tabletsTotal number of tablets submitted
file_cache_warm_up_job_finished_tabletsNumber of tablets that completed sync
file_cache_warm_up_job_latest_start_timeTime of the most recent job start
file_cache_warm_up_job_last_finish_timeTime of the most recent job completion

Periodic Jobs - BE-Side Metrics

Metric NameMeaning
file_cache_once_or_periodic_warm_up_submitted_segment_sizeSize of submitted segment data
file_cache_once_or_periodic_warm_up_finished_segment_sizeSize of completed segment data
file_cache_once_or_periodic_warm_up_submitted_index_numNumber of submitted indexes
file_cache_once_or_periodic_warm_up_finished_index_numNumber of completed indexes

Event-Driven Jobs - Source BE Metrics

Metric NameMeaning
file_cache_event_driven_warm_up_requested_segment_sizeSize of segment data requested for sync
file_cache_event_driven_warm_up_requested_index_numNumber of indexes requested for sync
file_cache_warm_up_rowset_last_call_unix_tsTimestamp of the last sync request initiated

Event-Driven Jobs - Target BE Metrics

Metric NameMeaning
file_cache_event_driven_warm_up_submitted_segment_numNumber of segments received
file_cache_event_driven_warm_up_finished_segment_numNumber of segments that completed warm-up
file_cache_warm_up_rowset_last_handle_unix_tsTimestamp of the last sync request handled

FAQ

Q: Does a sync failure in one round cancel the entire job?

No. A sync failure in the current round only skips the current execution; the job status remains unchanged and subsequent cycles continue to attempt execution.

Q: What happens when a periodic job execution times out?

The system skips the current round after a timeout. The job itself is not deleted, and the next cycle triggers normally.

Q: Is it supported to sync from multiple source clusters to the same target cluster?

Yes. For example, cluster A and cluster C can both be configured to sync to cluster B simultaneously (A -> B and C -> B coexist).

Q: How do you verify that a warm-up job has taken effect?

You can verify using the following methods:

  1. Run SHOW WARM UP JOB WHERE ID = <job_id> to check whether Status is RUNNING or FINISHED.
  2. Compare FinishBatch with AllBatch to confirm sync progress.
  3. Observe the BE-side metrics of the target cluster and confirm that finished_segment_num continues to grow.

Q: How do you modify the configuration of a sync job (such as adjusting the sync interval)?

Direct modification is not supported in the current version. You must first run CANCEL WARM UP JOB WHERE id = <job_id> to cancel the old job, then create a new job.