Skip to main content

Condition Cache

Introduction

In large-scale analytical workloads, queries often include repeated filtering conditions (Conditions), for example:

SELECT * FROM orders WHERE region = 'ASIA';
SELECT count(*) FROM orders WHERE region = 'ASIA';

Such queries repeatedly execute the same filtering logic on identical data segments, leading to redundant CPU and I/O overhead.

To address this, Apache Doris introduces the Condition Cache mechanism. It caches the filtering results of specific conditions on a given segment, allowing subsequent queries to reuse those results directly, thereby reducing unnecessary scans and filtering operations and significantly lowering query latency.

Working Principle

The core concept of the Condition Cache is:

  • The same filtering condition produces the same result on the same data segment.
  • Doris generates a 64-bit digest from the combination of “condition expression + key range,” which serves as a unique cache identifier.
  • Each segment can then look up existing filtering results in the cache using this digest.

Cached results are stored as compressed bit vectors (std::vector<bool>):

  • 0 indicates that the row range does not meet the condition and can be skipped directly;
  • 1 indicates that the range may contain matching data and needs further scanning.

Through this mechanism, Doris can quickly eliminate irrelevant data blocks at a coarse granularity, performing fine-grained filtering only when necessary.

Applicable Scenarios

Condition Cache is most effective in the following cases:

  • Repeated conditions: Identical or similar filter conditions are frequently used.
  • Relatively stable data: Data inside a segment is typically immutable (new segments are generated after INSERT/Compaction, naturally invalidating old caches).
  • High selectivity: When filters leave only a small subset of rows, it maximizes scan reduction.

Condition Cache will not be used in the following situations:

  • Queries containing delete predicates (to ensure correctness, caching is disabled).
  • TopN runtime filters generated at runtime (currently unsupported).

Configuration and Management

Enable or Disable

SET enable_condition_cache = true;

Memory Management

  • Condition Cache uses an LRU policy for cache eviction.
  • When exceeding condition_cache_limit, the least recently used entries are automatically cleared.

You can modify the memory limit in be.conf:

condition_cache_limit = 1024  # Unit: MB
  • After segment compaction, old cache entries are naturally invalidated through LRU eviction.

Cache Statistics

Doris provides comprehensive metrics to help users monitor the effectiveness of Condition Cache:

  • Profile-level metrics (visible in query execution plans)
    • ConditionCacheSegmentHit: Number of segments that hit the cache
    • ConditionCacheFilteredRows: Number of rows skipped directly by cached results
  • System metrics (viewable via the monitoring system or /metrics)
    • condition_cache_search_count: Total cache lookup count
    • condition_cache_hit_count: Number of successful cache hits

These metrics help evaluate the cache’s benefit and hit ratio.

Usage Example

Typical Scenario

Consider the following query:

SELECT order_id, amount
FROM orders
WHERE region = 'ASIA' AND order_date >= '2023-01-01';
  • First execution: The query performs a full scan and evaluates the filter; the Condition Cache stores the result in the LRU cache.
  • Subsequent identical queries: They reuse the cached results, skipping most irrelevant row ranges and scanning only potential matches.

When multiple queries share the same filtering condition (e.g., region = 'ASIA' AND order_date >= '2023-01-01'), they can reuse each other’s Condition Cache entries, reducing overall workload.

Notes

  • Cache is not persistent: The Condition Cache is cleared upon Doris restart.
  • Delete operations disable caching: Segments with delete markers require strict consistency and thus do not use the cache.

Summary

Condition Cache is an optimization mechanism in Doris designed for repeated conditional queries. Its advantages include:

  • Avoiding redundant computation and reducing CPU/I/O overhead
  • Automatically and transparently effective without user intervention
  • Lightweight in memory consumption and highly efficient when hit and filter rates are high

By leveraging the Condition Cache effectively, users can achieve significantly faster response times in high-frequency OLAP query scenarios.