Overview
Apache Doris 4.0.6 is a maintenance release in the 4.0 series. It focuses on stability and operational experience. All 4.0.x users are advised to upgrade, and users running the compute-storage decoupled (cloud) mode benefit the most.
Highlights of this release include:
- Compute-storage decoupling enhancements: support read-write separation for Compaction, optimize Tablet scheduling and FE-side memory usage, and reduce RPC pressure on the Meta Service.
- Operational observability: add Compaction task tracking (System Table and HTTP API), a dynamic
enable_recyclerswitch, online adjustment of the memtable flush thread pool, and several brpc / bvar metrics. - Data lineage: introduce a Lineage SPI collection framework.
- Many correctness and stability fixes: cover incorrect query results, load data loss (such as Broker Load loading only the first file, and PostgreSQL DML being silently dropped), crashes, hangs, and resource leaks.
Before upgrading, read the Behavior Changes section below, which contains several adjustments to default values and privileges.
Behavior Changes
- In compute-storage decoupled (cloud) mode,
enable_strict_consistency_dmlis now disabled by default (#61891). If your workload relies on strict-consistency DML, explicitly enable this configuration after upgrading. - The default trigger threshold for Time Series Compaction is changed from 2000 to 1000 (#61979), so Compaction triggers more aggressively.
- The default value of
segments_key_bounds_truncation_thresholdis changed to 36 (#61984). - Cluster Snapshot commands now require root privileges (#60239).
- JSONB and Variant columns are no longer allowed as Distribution Columns (#63211). Such previously-incorrect usage is now rejected at table-creation time.
New Features & Improvements
Functions & Types
- Add the
mmhash3_u64_v2hash function (#61846). - Add the
json_object_flattenscalar function for flattening nested JSON (#62825). - Add a Lambda expression (functor) version of
array_sortthat supports custom sorting logic (#57828). - The
max/minaggregate functions now support the Array type, andmax_by/min_bysupport some complex types (#58490, #58736).
Query Optimization
- Limit the cost of not-null inference to reduce optimizer planning overhead for complex queries (#63318).
- Scale the
num_nullscolumn statistic proportionally after partition pruning to improve row-count estimation accuracy (#62265).
Storage & Compaction
- Add Compaction task tracking: query current Compaction tasks through a System Table and an HTTP API (#61696).
- Support adjusting the memtable flush thread pool size online, without restarting BE (#60423).
- Support adaptive Write Buffer sizing (#61810).
- Release unused memory earlier to reduce memory usage during load and write (#62185).
Compute-Storage Decoupling (Cloud Mode)
- Support read-write separation for Compaction (#60310).
- Schedule recently-active Tablets first to improve cache hit rate and scheduling efficiency (#59539, #57200, #61562).
- Reduce
get_tablet_statsRPC calls to the Meta Service to lower metadata-service pressure (#60543). - Optimize the memory usage of CloudTabletStatMgr / CloudTabletRebalancer (#59776, #61318).
- Add the
enable_recyclerconfiguration to dynamically skip the Recycler (#63286). - Support hot-reloading of File Cache microbench configuration (#58922).
Lakehouse
- Iceberg supports IAM Role authentication for REST and S3 Tables (#60498).
- Paimon supports Jindo OSS and Token propagation (#62106).
- Support disabling View operations for the Iceberg REST Catalog (#63319).
Data Lineage
- Introduce a Lineage SPI framework to support data lineage collection (#61004).
Observability & Operations
- Add the
--drop_backendsparameter tostart_fe.sh(#63306). - Add a transaction write-amplification brpc metric for Sub Txn Load (#63545).
- Add bvar metrics for the Recycler Operation Log (#60520).
- Statistics collection now automatically skips overly-long string columns to reduce collection overhead (#62686).
- Remove the class histogram trace from the JDK17 startup parameters to avoid printing large amounts of logs during Full GC (#62422).
Security & Authentication
- Hide Token and authentication information in BE Info during Stream Load (#60656, #59743).
- Improve the robustness and diagnostics of LDAP authentication (#61673).
- Upgrade dependencies with known security vulnerabilities (#62274).
Important Bug Fixes
Query Result Correctness
- Fix
INTERSECT/EXCEPTlosing NULL rows during predicate pushdown (#62299). - Fix
count(null)being incorrectly treated ascount(*)(#62548). - Fix a NoSuchElementException when
countcontains aMATCH_ALLexpression (#62172). - Fix incorrect results caused by losing a DateTimeV2 narrowing Cast during IN predicate simplification (#63343).
- Fix loss of the
IN_LISTRuntime Filter predicate in Key Range scenarios (#62115). - Fix materialized view rewriting merging partitions that the query does not need, which could cause incorrect results or performance issues (#63081).
- Fix the
no_use_cbo_ruleHint being silently ignored (#62358). - Fix FragmentMgr incorrectly canceling queries on a compute-storage decoupled Virtual Cluster (#62135).
Views
- Fix
ALTER VIEWdefinitions not being synchronized to Follower FEs when a COMMENT is specified (#61670). - Fix incorrect query results caused by a View definition losing Variant subfields (#62907).
- Fix View columns losing colUniqueId under Lazy Materialization (#62533).
- Fix illegal Alias rewriting in View definitions (#63353).
Functions & Types
- Fix several incorrect results related to TIMESTAMPTZ and Daylight Saving Time (DST): LEAD/LAG not preserving the type, spring-forward gap handling, DST fold branch selection, switching time-difference calculation to UTC semantics, and TopN Runtime Predicate support (#62779, #62945, #63034, #63161, #63220).
- Fix incorrect results for the
allow_zero_datefunction (#61900). - Fix incorrect results when casting a scientific-notation string to Decimal (#63119).
- Fix
from_olap_stringthrowing an exception when it fails to parse a datetime; it now returns NULL instead (#63035). - Fix incorrect integer inference and type resolution for user variables (#62524).
Variant Type
- Fix Compaction failure when a Variant column has uid=0 in a keyless table (#62656).
- Fix Variant subcolumns being lost after a partial-column update in the Row Store (#62067).
- Fix a normalization issue when reading older-version single-segment dot-key subcolumns (#62409).
- Keep the first occurrence when a Variant JSON Path is duplicated (#63697).
- Optimize VariantStatsCaculator construction by skipping the full footer scan to improve performance (#62819).
Load & Streaming
Data Loss / Integrity
- Fix Broker Load loading only the first file when multiple file paths are specified (#62969).
- Fix PostgreSQL DML being silently dropped after a job restart (#61481).
- Fix loss of S3 Offset and job statistics after an FE Checkpoint restart (#62449).
- Fix incorrect Java type restoration of Split boundaries when reading FE-persisted CDC Offsets (#63219).
Hangs / Leaks
- Fix a VNodeChannel
close_waithang (#58024). - Fix a PostgreSQL Replication Slot leak caused by canceling a streaming job during pause/resume (#62010).
- Fix a memory leak caused by an InsertLoadJob being stuck in PENDING (#62890).
Restart & Metadata
- Fix streaming job properties not being parsed after an FE restart (#62298).
- Fix a StreamingInsertJob NPE during EditLog Replay (#62416).
- Fix Broker Load storage properties not being correctly rebuilt after Gson Replay (#63094).
- Fix loss of INSERT job statistics in
SHOW LOADafter an FE restart (#62331).
Routine Load / Parsing
- Fix an NPE in Routine Load Kafka Meta requests (#63180).
- Fix an IllegalMonitorStateException in Routine Load when the coordinating BE restarts (#62892).
- Fix incorrect column parsing when a UTF-8 BOM CSV uses enclose (#62092).
- Fix an NPE and load-availability issues when Group Commit selects a Backend (#60652, #61555).
- Fix conflict validation between load properties and table properties for Broker Load / Routine Load (#58054).
- Fix
to_load_error_http_pathreturning an incorrect URL in HTTPS mode (#61785).
Storage & Compaction
- Fix BE crashes caused by Schema Cache concurrency issues related to OlapScanner (#61510, #62327).
- Fix an IOContext use-after-free crash (#59947).
- Fix reading older DecimalV2 Segments that are missing precision / frac (#63569).
- Fix blocking during async close polling of Packed Files (#62938).
- Fix an OpenMP concurrency-budget issue when building ANN vector indexes (#61313).
Compute-Storage Decoupling (Cloud Mode)
- Fix several correctness issues during Schema Change: delete local Rowsets before
add_rowsets, fill Version holes before running, normalize the Rowset graph before Delete Bitmap Capture, and add anSC_COMPACTION_CONFLICTerror code to retry failures across V1 Compaction (#62256, #63443, #63981, #62272). - Fix several Cache Warmup issues: lack of Packed File support, no retry after errors, NPEs on cancellation/expiration, and cache cleanup (#60375, #62886, #62805, #62839, #62941, #62931).
- Fix a
balanced_tablets_shardsmemory leak and a Warmup inflight counting issue (#59093, #60480). - Fix Recycler OOM caused by too many queued deletion tasks (#59331).
- Fix a race condition between the first dynamic-partition initialization and a concurrent
CREATE MATERIALIZED VIEW(#62755). - Fix exposing the internal
KV_TXN_MAYBE_COMMITTEDstate to clients (#62244). - Fix Meta Service logs not being printed in cloud mode (#61766).
- Fix
SHOW PROCnot showing the Cached Version of partitions (#60807).
Lakehouse & External Data Sources
- Fix loss of predicate filtering when Native and JNI Readers are mixed in FileScanner (#61802).
- Fix a timezone offset issue for the Hive DATE type in external Readers (#61330).
- Fix loss of Query TVF column aliases across JDBC Catalogs (#61939).
- Fix incompatibility between
SHOW PARTITIONSand the partitions TVF for external Catalogs (#62134). - Fix an out-of-bounds crash when generating partition columns for Iceberg / Paimon tables (#62177).
- Fix several Parquet read/write issues: int96 Timestamp writing, Page V2 encoding, and conditional checks (#61832, #63779, #63305, #63509).
- Fix preloading Jindo for the Paimon Scanner (#62351).
- Fix a TVF returning an error when the Thrift message exceeds the size limit (#61788).
File Cache & IO
- Fix a SIGSEGV crash caused by background LRU updates during cache cleanup (#60533).
- Fix temporary TTL expiration time being incorrectly anchored to the Tablet creation time (#62287).
- Fix the file handle cache Key not including the HDFS connection, which could cause connection mismatches (#63516).
Metadata & Job System
- Fix
CANCEL ALTERwith an empty job list being mistakenly treated as canceling all Rollup jobs (#62712). - Fix a sessionVariables null pointer after an upgrade (#61959).
- Fix
RestoreCommandthrowing an UnsupportedOperationException (#61890). - Fix Binlog files still being linked when Binlog is not enabled (#61949).
- Fix auto-partition tables with
retention_countnot being correctly registered with the scheduler after a restart (#61954). - Fix a Host mismatch when starting FE in
metadata_failure_recoverymode (#62748). - Fix being unable to run
SHOW TABLETwhen no database is selected (#63280). - Fix inconsistent column ordering in
SHOW BACKENDS(#62207). - Fix system tables returning unknown statistics (#62913).
- Fix Audit Log issues: UnboundAlias digest calculation, and internal query failures being mislabeled as ERR (#62160, #62997).
- Fix fe.out not being generated when using
start_fe.sh --console(#61807). - Fix the error message for
INSERT OVERWRITE(#62555).
Security & Authentication
- Fix Ranger column-level privileges being bypassed when combined with CTEs (#61741).
- Fix client IP authentication for Arrow Flight (#63506).
- Redact sensitive Headers in Stream Load logs (#62108).
Stability & Misc
- Fix BE crashes caused by Runtime Filters with a shared Hash table (#63257).
- Fix a core dump caused by the inline static empty message in arrow::Status (#63191).
- Fix Arrow UTF8 / String size limit issues (#63137).
- Fix PartitionRebalancer generating illegal migrations to BEs that lack the required storage medium (#62206).
- Fix
jetty_server_max_http_header_sizenot taking effect under Jetty 12 (#61197). - Fix Prepared Statement QPS metrics not being counted when Audit Log is disabled (#61621).
- Fix the partition-near-limit metric by changing it from a Counter to a Gauge (#61845).
- Fix loading of the JNI log4j2 configuration (#63063).
- Track IO-layer read buffers through MemTrackerLimiter to improve memory observability (#62288).
These notes focus on user-visible and operations-visible changes, and omit some purely-internal refactoring and fixes with no external symptoms. For the complete commit history, refer to the PR list for 4.0.6 in the apache/doris repository on GitHub.