Skip to main content

SQL Cache Query Cache Guide

SQL Cache is the query result caching mechanism provided by Doris. It caches query results keyed by metadata such as the SQL text and data version, so that subsequent identical queries are returned directly from the cache. This significantly reduces the overhead of repeated computation.

Pre-reading checklist

Before reading this guide, confirm that you understand or have prepared the following:

  • Whether your query scenario is a good fit for caching (for example, T+1 offline analytics or data with low update frequency)
  • Whether your current Doris version supports viewing SQL Cache via explain plan (2.1.3+ recommended)
  • Whether you understand that SQL Cache only supports OlapTable internal tables and Hive external tables
  • Whether you understand that nondeterministic functions (such as now() and random()) affect the cache hit rate
  • Whether you have permission to modify FE/BE configuration (used for memory control and the global switch)

Before deciding whether to enable SQL Cache, check item by item that your queries meet the following conditions:

  • The SQL statements are executed repeatedly at high frequency
  • The data is updated infrequently (high cache hit rate)
  • The queries do not contain random functions (such as random())
  • The metric fields are consistent across each query (no dynamic addition or removal)

1. Concept introduction

SQL Cache is suitable for query scenarios where the data is updated infrequently. It avoids repeated computation by caching query results.

Key factors for cache hits

SQL Cache uniquely identifies a piece of cached data based on the combination of the following factors:

Key factorDescription
SQL textAn exactly identical SQL string
View definitionThe DDL definition of any views involved
Table and partition versionsWhether the data has been changed
User variables and their valuesThe current values of variables referenced in the SQL
Nondeterministic functions and their resultsThe computed results of functions such as now() and random()
Row policy definitionRow Policy configuration
Data masking definitionData Masking configuration

If any factor changes (such as SQL rewrites, different query fields or conditions, or version changes caused by data updates), the cache will not be hit. For multi-table Join queries, an update to any one of the tables changes the partition ID or version number, which makes the cache unable to hit.

Applicable scenarios

  • Strongly recommended: T+1 update scenarios. The data is updated overnight; the first query fetches the result from BE and writes it to the cache, and subsequent identical queries are returned directly from the cache.
  • Optional: Real-time updated data. You can still enable SQL Cache, but the hit rate will be low.
  • Supported scope: Currently supports OlapTable internal tables and Hive external tables.

2. Limitations

Impact of nondeterministic functions

Definition: A nondeterministic function is a function whose result has no fixed relationship with its input parameters.

FunctionBehaviorCan it use the cache
now()Returns the current second-level time, changes once per secondReusable within the same second
date(now())Converts second-level time to day-level granularityReusable within the same day (recommended)
random()Returns a different result on every callAlmost never hits the cache

Optimization tip: Convert fine-grained time to coarse-grained time. For example, use select * from tbl where dt = date(now()) instead of select * from tbl where dt = now(), so that all queries within the same day can hit the cache. Avoid using strongly nondeterministic functions such as random() in queries.

Impact of metric field expansion

SQL Cache caches results strictly by the queried fields. It does not support satisfying a "more metrics" query with a "fewer metrics" cache.

ScenarioBehaviorImpact
The result of 2 metrics has been cached, and a new query requests 3 metricsThe cache cannot be reused, and the query must be re-executedThe hit rate drops, and the new result is written to a new cache entry

Optimization tip: Keep the metric fields of business-side reports and query templates stable. If you need to add metrics, modify the query template instead of expanding it ad hoc, so that the cache is not made unusable.

3. Implementation principles

FE implementation

The processing flow after FE receives a query request:

  1. Metadata lookup: Look up metadata in memory using the SQL string as the key (the metadata contains the table/partition versions).
  2. Version comparison: If the metadata is unchanged, the data has not been changed and the cache can be reused.
  3. Skip parsing: Skip the SQL parsing and optimization process, and locate the corresponding BE based on consistent hashing.
  4. Result return:
    • Cache hit on BE: Return the result directly to the client.
    • Cache miss: Execute the full SQL parsing, optimization, and computation flow.
  5. Result write-back: After BE finishes the computation, FE stores the result on the corresponding BE and records the metadata in its own memory for reuse by the next query.

Special optimization: If the SQL optimization phase determines that the result contains only 0 or 1 row, FE saves the result directly in its own memory to accelerate subsequent identical queries.

BE implementation

  • A BE is selected via consistent hashing to store the result. The result is stored in BE memory in a HashMap structure.
  • When reading from or writing to the cache, a digest of metadata such as the SQL string is used as the key to quickly retrieve the result data.

4. Quick start

Step 1: Enable or disable SQL Cache

Purpose: Enable SQL Cache at the session or global level (disabled by default).

-- Turn on SQL Cache in the current session; it is off by default
set enable_sql_cache=true;
-- Turn off SQL Cache in the current session
set enable_sql_cache=false;

-- Turn on SQL Cache globally; it is off by default
set global enable_sql_cache=true;
-- Turn off SQL Cache globally
set global enable_sql_cache=false;

Notes: Session-level configuration only takes effect for the current session. Global configuration takes effect for all newly created sessions.

Step 2: Check whether a query hits the SQL Cache

Method A: Use explain plan (applicable to Doris 2.1.3+)

Purpose: Determine whether the cache is hit using the query plan. Command: Run explain plan <your SQL>. Notes: When a LogicalSqlCache or PhysicalSqlCache node appears in the query plan tree, the query has hit the SQL Cache.

> explain plan select * from t2;

+------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 28ms) ========== |
| LogicalSqlCache[1] ( queryId=711dea740e4746e6-8bc11afe08f6542c ) |
| +--PhysicalResultSink[39] ( outputExprs=[id#0, name#1] ) |
| +--PhysicalOlapScan[t2]@0 ( stats=12 ) |
| |
| ========== ANALYZED PLAN ========== |
| LogicalSqlCache[1] ( queryId=711dea740e4746e6-8bc11afe08f6542c ) |
| +--PhysicalResultSink[39] ( outputExprs=[id#0, name#1] ) |
| +--PhysicalOlapScan[t2]@0 ( stats=12 ) |
| |
| ========== REWRITTEN PLAN ========== |
| LogicalSqlCache[1] ( queryId=711dea740e4746e6-8bc11afe08f6542c ) |
| +--PhysicalResultSink[39] ( outputExprs=[id#0, name#1] ) |
| +--PhysicalOlapScan[t2]@0 ( stats=12 ) |
| |
| ========== OPTIMIZED PLAN ========== |
| PhysicalSqlCache[3] ( queryId=711dea740e4746e6-8bc11afe08f6542c, backend=192.168.126.3:9051, rowCount=12 ) |
| +--PhysicalResultSink[39] ( outputExprs=[id#0, name#1] ) |
| +--PhysicalOlapScan[t2]@0 ( stats=12 ) |
+------------------------------------------------------------------------------------------------------------+

Method B: View the Profile (applicable to versions before Doris 2.1.3)

Purpose: Confirm cache hits in versions that do not support explain plan. Command: Enable Profile and view the Execution Summary. Notes: If the Is Cached: field shows Yes, the query has hit the SQL Cache.

Execution  Summary:
- Parse SQL Time: 18ms
- Nereids Analysis Time: N/A
- Nereids Rewrite Time: N/A
- Nereids Optimize Time: N/A
- Nereids Translate Time: N/A
- Workload Group: normal
- Analysis Time: N/A
- Wait and Fetch Result Time: N/A
- Fetch Result Time: 0ms
- Write Result Time: 0ms
- Doris Version: 915138e801
- Is Nereids: Yes
- Is Cached: Yes
- Total Instances Num: 0
- Instances Num Per BE:
- Parallel Fragment Exec Instance Num: 1
- Trace ID:
- Transaction Commit Time: N/A
- Nereids Distribute Time: N/A

5. Metric monitoring

FE monitoring metrics

Endpoint: http://${FE_IP}:${FE_HTTP_PORT}/metrics Notes: Metric statistics are monotonically increasing. After an FE restart, the count starts again from 0.

# 1 SQL has been written to the cache
doris_fe_cache_added{type="sql"} 1

# The SQL Cache has been hit 2 times
doris_fe_cache_hit{type="sql"} 2

BE monitoring metrics

Endpoint: http://${BE_IP}:${BE_HTTP_PORT}/metrics Notes: Different caches may be stored on different BEs. You must collect metrics from all BEs to get the complete picture.

# There are currently 1205 caches in the memory of this BE
doris_be_query_cache_sql_total_count 1205

# All caches currently take up about 44KB of BE memory
doris_be_query_cache_memory_total_byte 44101

6. Memory control

FE memory control

The cache metadata in FE uses weak references: when FE is short on memory, the least recently used metadata is automatically released. The following parameters are also supported for limiting memory usage:

ParameterDefault valueDescription
sql_cache_manage_num100Upper bound on the number of metadata entries; the least recently used entry is automatically released when exceeded
expire_sql_cache_in_fe_second300Metadata expiration time (seconds); automatically released if not accessed within this time
cache_result_max_row_count3000Upper bound on the number of result rows; if exceeded, no SQL Cache is created
cache_result_max_data_size31457280 (30MB)Upper bound on the result size (in bytes); if exceeded, no SQL Cache is created

Configuration commands (take effect in real time, must be configured on every FE; for persistence, write them into fe.conf):

-- Store at most 100 cache metadata entries
ADMIN SET FRONTEND CONFIG ('sql_cache_manage_num'='100');

-- Automatically release a cache metadata entry that has not been accessed for 300 seconds
ADMIN SET FRONTEND CONFIG ('expire_sql_cache_in_fe_second'='300');

-- Do not create SQL Cache when the result has more than 3000 rows by default
ADMIN SET FRONTEND CONFIG ('cache_result_max_row_count'='3000');

-- Do not create SQL Cache when the result is larger than 30MB by default
ADMIN SET FRONTEND CONFIG ('cache_result_max_data_size'='31457280');

BE memory control

ParameterDefault value (example)Description
query_cache_max_size_mb256The stable upper limit of memory used by the cache
query_cache_elasticity_size_mb128Elastic expansion space; eviction is triggered when usage exceeds max + elasticity, until usage drops below max

Configuration file: be.conf (BE must be restarted after modification).

-- When the cache memory usage exceeds query_cache_max_size_mb + query_cache_elasticity_size_mb,
-- release the least recently used cache entries until the memory usage drops below query_cache_max_size_mb.
query_cache_max_size_mb = 256
query_cache_elasticity_size_mb = 128

7. Troubleshooting: investigating cache invalidation

The following table summarizes common causes of cache misses or invalidation, along with the corresponding investigation directions:

No.Cause of invalidationTypical operationInvestigation tip
1Table or view structure changedrop table, replace table, alter table, alter viewCheck recent DDL history of tables and views
2Table data changeinsert, delete, update, truncateCheck ingestion and change logs to confirm whether the data version has changed
3User permissions revokedrevokeCheck permission changes for the query account
4Use of nondeterministic functionsselect random(), select now(), etc.Switch to coarse-grained functions or constant parameters
5Variable value changeselect * from tbl where dt = @dt_varCheck whether the session variable values are consistent
6Row Policy / Data Masking changeRow policy or masking policy was adjustedCheck the recent change records of the policy
7Result row count exceeds the limitExceeds cache_result_max_row_count (3000 rows by default)Adjust the threshold or narrow the result set
8Result size exceeds the limitExceeds cache_result_max_data_size (30MB by default)Adjust the threshold or reduce the returned fields

8. FAQ

Q1: Is SQL Cache enabled by default? A: It is disabled by default. Enable it at the session level with set enable_sql_cache=true, or globally with set global enable_sql_cache=true.

Q2: Which table types does SQL Cache support? A: It currently supports OlapTable internal tables and Hive external tables.

Q3: Can SQL Cache be used for real-time updated data? A: Yes, but every data update changes the partition version, which invalidates the cache. The hit rate is low. SQL Cache is more suitable for T+1 offline analytics scenarios.

Q4: How is the cache invalidated for multi-table Join queries? A: As soon as any one of the tables in the Join has a data change, the partition ID or version number changes, and the entire query cache cannot be hit.

Q: Two metrics were cached previously. Can the cache be reused for a query that now requests three metrics? A: No. SQL Cache caches results strictly by the queried fields. The cache of fewer metrics cannot satisfy a query request for more metrics, and the query must be re-executed.

Q5: Does the now() function completely invalidate the cache? A: No, it does not completely invalidate the cache. now() returns a second-level time, and identical queries within the same second can reuse the cache. To extend the cache hit window, use date(now()) to convert it to day-level granularity.

Q6: Where is the cached data stored? A: The vast majority of results are stored in BE memory (in a HashMap structure). When the result contains only 0 or 1 row, FE stores it directly in its own memory.

Q7: Is the cache still available after an FE restart? A: The metadata in FE is lost, and the monitoring metrics are also reset to 0. Results in BE memory are also invalidated after a restart.

9. Comparison and further reading

DimensionSQL CachePartition Cache (where applicable)
Cache granularityThe result set of an entire SQLCaches intermediate results by partition
Applicable update patternT+1, low-frequency updatesSome partitions are updated frequently while others remain stable
Hit conditionThe SQL text and all dependent metadata are unchangedThe versions of the involved partitions are unchanged
Invalidation granularityA change in any dependency invalidates the entire cacheOnly partitions that miss the cache need to be recomputed

Note: The Partition Cache column in this table is for comparison reference only. Actual availability depends on the features of the current Doris version.