Release 3.0.4

Dear community members, the Apache Doris 3.0.4 version was officially released on February 02, 2025, this version further enhances the performance and stability of the system.

Quick Download: https://doris.apache.org/download/

GitHub Release: https://github.com/apache/doris/releases

Behavior Changes

In the Audit log, the force flag is retained for drop table and drop database statements. #43227
When exporting data to Parquet/ORC formats, the bitmap, quantile_state, and hll types are exported in Binary format. Additionally, support has been added for exporting jsonb and variant types, which are exported as string. #44041
- For more information, please refer to documentation: Export Overview - Apache Doris
The Hudi JNI Scanner has been replaced from Spark API to Hadoop API to enhance compatibility. Users can switch by setting the session variable set hudi_jni_scanner=spark/hadoop. #44396
The use of auto bucket in Colocate tables is prohibited. #44396
Paimon cache has been added to the Catalog, eliminating real-time data queries. #44911
The default value of max_broker_concurrency has been increased to improve performance for large-scale data imports with Broker Load. #44929
The default value of the storage medium for Auto Partition partitions has been changed to the attribute value of the current table's storage medium, rather than using the system default value. #45955
Column updates are prohibited during Schema Change execution for Key columns. #46347
For Key columns containing auto-increment columns, support has been added to allow column updates without providing the auto-increment column. #44528
The FE ID generator strategy has been switched to a time-based approach, and IDs no longer start from 10000. #44790
In the compute-storage separation mode, the default stale rowset recycling delay for Compaction has been reduced to 1800 seconds to decrease the recycling interval. This may cause large queries to fail in extreme scenarios, and adjustments can be made as needed. #45460
The show cache hotspot statement has been disabled in compute-storage separation mode, and direct access to system tables is required. #47332
Deleting the system-created admin user is prohibited. #44751

Improvements

Storage

Optimized the issue of Routine Load tasks frequently timing out due to a small max_match_interval setting. #46292
Improved performance for Broker Load when importing multiple compressed files. #43975
Increased the default value of webserver_num_workers to enhance Stream Load performance. #46593
Optimized the load imbalance issue for Routine Load import tasks during BE node scaling. #44798
Improved the use of Routine Load thread pools to prevent timeouts from affecting queries. #45039

Compute-Storage Separation

Enhanced the stability and observability of the Meta-service. #44036, #45617, #45255, #45068
Optimized File Cache by adding an early eviction strategy, reducing lock time, and improving query performance. #47473, #45678, #47472
Improved initialization checks and queue transitions for File Cache to enhance stability. #44004, #44429, #45057, #47229
Increased the speed of HDFS data recycling. #46393
Optimized performance issues when the FE acquires compute groups during ultra-high-frequency imports. #47203
Improved several import-related parameters for primary key tables in compute-storage separation to enhance the stability of real-time high-concurrency imports. #47295, #46750, #46365

Lakehouse

Supported reading Hive tables in JSON format. #43469
- For more information, please refer to documentation: Text/CSV/JSON - Apache Doris
Introduced the session variable enable_text_validate_utf8 to skip UTF-8 encoding checks for CSV formats. #45537
- For more information, please refer to documentation: Text/CSV/JSON - Apache Doris
Updated the Hudi version to 0.15 and optimized query planning performance for Hudi tables.
Improved read performance for MaxCompute partitioned tables. #45148
Optimized performance for Parquet file delayed materialization under high filter rates. #46183
Supported delayed materialization for complex Parquet types. #44098
Optimized predicate pushdown logic for ORC types, supporting more predicate conditions for index filtering. #43255

Asynchronous Materialized Views

Supported more scenarios for aggregate roll-up rewriting. #44412

Query Optimizer

Improved partition pruning performance. #46261
Added rules to eliminate group by keys based on data characteristics. #43391
Adaptively adjusted the wait time for Runtime Filters based on the target table size. #42640
Improved the ability to push down aggregations in joins to fit more scenarios. #43856, #43380
Improved Limit pushdown for aggregations to fit more scenarios. #44042

Others

Optimized startup scripts for FE, BE, and MS processes to provide clearer output. #45610, #45490, #45883
The case sensitivity of table names in show tables now matches MySQL behavior. #46030
show index now supports arbitrary target table types. #45861
information_schema.columns now supports displaying default values. #44849
information_schema.views now supports displaying view definitions. #45857
Supported the MySQL protocol COM_RESET_CONNECTION command. #44747

Bug Fixes

Storage

Fixed potential memory errors during the import process for aggregate table models. #46997
Resolved the issue of Routine Load offset loss during FE master node restarts in compute-storage separation mode. #46566
Fixed memory leaks in FE Observer nodes during batch import scenarios in compute-storage mode. #47244
Resolved the issue of Cumulative Point rollback during Full Compaction with Order Data Compaction. #44359
Fixed the issue where Delete operations could temporarily prevent Tablet Compaction scheduling. #43466
Resolved incorrect Tablet states after Schema Change in multi-compute-cluster scenarios. #45821
Fixed the potential NPE error when performing Column Rename Schema Change on primary key tables with sequence_type. #46906
Data Correctness: Fixed correctness issues for primary key tables when importing partial column updates containing DELETE SIGN columns. #46194
Resolved potential memory leaks in FE when Publish tasks for primary key tables were continuously stuck. #44846

Compute-Storage Decoupled

Fixed the issue where File Cache size could exceed the table data size. #46561, #46390
Resolved upload failures at the 5MB boundary during data uploads. #47333
Enhanced robustness by adding more parameter checks for several alter operations in Storage Vault. #45155, #45156, #46625, #47078, #45685, #46779
Resolved issues with data recycling failures or slow recycling due to improper Storage Vault configurations. #46798, #47536, #47475, #47324, #45072
Fixed the issue where data recycling could stall, preventing timely recycling. #45760
Resolved incorrect retries for MTTM-230 errors in compute-storage separation mode. #47370, #47326
Fixed the issue where Group Commit WAL was not fully replayed during BE decommissioning in compute-storage separation mode. #47187
Resolved the issue where Tablet Meta exceeding 2GB rendered MS unavailable. #44780
Data Correctness: Fixed two duplicate Key issues in primary key tables in compute-storage separation mode. #46039, #44975
Resolved the issue where Base Compaction could continuously fail due to large Delete Bitmaps in primary key tables during high-frequency real-time imports. #46969
Modified incorrect retry logic for Schema Change in primary key tables in compute-storage separation mode to enhance robustness. #46748

Lakehouse

Hive

Fixed the issue where Hive views created by Spark could not be queried. #43553
Resolved the issue where certain Hive Transaction tables could not be read correctly. #45753
Fixed the issue where partition pruning failed for Hive tables with special characters in partitions. #42906

Iceberg

Fixed the issue where Iceberg tables could not be created in Kerberos authentication environments. #43445
Resolved the issue where count(*) queries were inaccurate for Iceberg tables with dangling deletes. #44039
Fixed the issue where query errors occurred due to mismatched column names in Iceberg tables. #44470
Resolved the issue where Iceberg tables could not be read after partition modifications. #45367

Paimon

Fixed the issue where Paimon Catalog could not access Alibaba Cloud OSS-HDFS. #42585

Hudi

Fixed the issue where partition pruning failed for Hudi tables in certain scenarios. #44669

JDBC

Fixed the issue where tables could not be retrieved using JDBC Catalog after enabling case-insensitive table names.

MaxCompute

Fixed the issue where partition pruning failed for MaxCompute tables in certain scenarios. #44508

Others

Fixed the issue where export tasks caused memory leaks in FE. #44019
Resolved the issue where S3 object storage could not be accessed via HTTPS protocol. #44242
Fixed the issue where Kerberos authentication tickets could not be automatically refreshed. #44916
Resolved the issue where reading Hadoop Block compressed format files failed. #45289
When querying ORC format data, CHAR type predicates are no longer pushed down to avoid potential result errors. #45484

Asynchronous Materialized Views

Fixed the issue where transparent query rewriting could lead to planning or result errors in extreme scenarios. #44575, #45744
Resolved the issue where multiple build tasks could be generated during asynchronous materialized view scheduling in extreme scenarios. #46020, #46280

Query Optimizer

Fixed the issue where some expression rewrites could produce incorrect expressions. #44770, #44920, #45922, #45596
Resolved occasional incorrect results from SQL Cache. #44782, #44631, #46443, #47266
Fixed the issue where limit pushdown for aggregation operators could produce incorrect results in some scenarios. #45369
Resolved the issue where delayed materialization optimization could produce incorrect execution plans in some scenarios. #45693, #46551

Query Execution

Fixed the issue where regular expressions and like functions produced incorrect results with special characters. #44547
Resolved the issue where SQL Cache results could be incorrect when switching databases. #44782
Fixed a series of Arrow Flight-related issues. #45023, #43929
Resolved the issue where results were incorrect when the Hash table for HashJoin exceeded 4GB in some cases. #46461
Fixed the overflow issue of the convert_to function with Chinese characters. #46405
Resolved the issue where results could be incorrect in extreme scenarios when group by was used with Limit. #47844
Fixed the issue where results could be incorrect when accessing certain system tables. #47498
Resolved the issue where the percentile function could cause system crashes. #47068
Fixed the performance degradation issue for single-table queries with Limit. #46090
Resolved the issue where StDistanceSphere and StAngleSphere functions caused system crashes. #45508
Fixed the issue where map_agg results were incorrect. #40454

Semi-structured Data Management

BloomFilter Index

Fixed the exception caused by large parameters in BloomFilter Index. #45780
Resolved the issue of high memory usage during BloomFilter Index writes. #45833
Fixed the issue where BloomFilter Index was not correctly deleted when columns were dropped. #44361, #43378

Inverted Index

Fixed the occasional crash during inverted index construction. #43246
Resolved the issue where words with zero occurrences occupied space during inverted index merging. #43113
Prevented abnormal large values in Index Size statistics. #46549
Fixed the issue with inverted indexes for VARIANT type fields. #43375
Optimized local cache locality for inverted indexes to improve cache hit rates. #46518
Added the metric NumInvertedIndexRemoteIOTotal to query profiles for remote storage reads of inverted indexes. #45675, #44863

Others

Fixed the crash issue of the ipv6_cidr_to_range function with special NULL data. #44700

Permissions

When granting CREATE_PRIV, the existence of the corresponding resource is no longer checked. #45125
Fixed the issue where queries on views with permissions could fail due to missing permissions for referenced tables in extreme scenarios. #44621
Resolved the issue where permission checks for use db did not distinguish between internal and external Catalogs. #45720

Behavior Changes​

Improvements​

Storage​

Compute-Storage Separation​

Lakehouse​

Asynchronous Materialized Views​

Query Optimizer​

Others​

Bug Fixes​

Storage​

Compute-Storage Decoupled​

Lakehouse​

Hive​

Iceberg​

Paimon​

Hudi​

JDBC​

MaxCompute​

Others​

Asynchronous Materialized Views​

Query Optimizer​

Query Execution​

Semi-structured Data Management​

BloomFilter Index​

Inverted Index​

Others​

Permissions​

Behavior Changes

Improvements

Storage

Compute-Storage Separation

Lakehouse

Asynchronous Materialized Views

Query Optimizer

Others

Bug Fixes

Storage

Compute-Storage Decoupled

Lakehouse

Hive

Iceberg

Paimon

Hudi

JDBC

MaxCompute

Others

Asynchronous Materialized Views

Query Optimizer

Query Execution

Semi-structured Data Management

BloomFilter Index

Inverted Index

Others

Permissions