Release 3.0.4
Dear community members, the Apache Doris 3.0.4 version was officially released on February 02, 2025, this version further enhances the performance and stability of the system.
Quick Download: https://doris.apache.org/download/
GitHub Release: https://github.com/apache/doris/releases
Behavior Changesβ
-
In the Audit log, the
force
flag is retained fordrop table
anddrop database
statements. #43227 -
When exporting data to Parquet/ORC formats, the
bitmap
,quantile_state
, andhll
types are exported in Binary format. Additionally, support has been added for exportingjsonb
andvariant
types, which are exported asstring
. #44041- For more information, please refer to documentation: Export Overview - Apache Doris
-
The Hudi JNI Scanner has been replaced from Spark API to Hadoop API to enhance compatibility. Users can switch by setting the session variable
set hudi_jni_scanner=spark/hadoop
. #44396 -
The use of
auto bucket
in Colocate tables is prohibited. #44396 -
Paimon cache has been added to the Catalog, eliminating real-time data queries. #44911
-
The default value of
max_broker_concurrency
has been increased to improve performance for large-scale data imports with Broker Load. #44929 -
The default value of the
storage medium
for Auto Partition partitions has been changed to the attribute value of the current table'sstorage medium
, rather than using the system default value. #45955 -
Column updates are prohibited during Schema Change execution for Key columns. #46347
-
For Key columns containing auto-increment columns, support has been added to allow column updates without providing the auto-increment column. #44528
-
The FE ID generator strategy has been switched to a time-based approach, and IDs no longer start from 10000. #44790
-
In the compute-storage separation mode, the default stale rowset recycling delay for Compaction has been reduced to 1800 seconds to decrease the recycling interval. This may cause large queries to fail in extreme scenarios, and adjustments can be made as needed. #45460
-
The
show cache hotspot
statement has been disabled in compute-storage separation mode, and direct access to system tables is required. #47332 -
Deleting the system-created
admin
user is prohibited. #44751
Improvementsβ
Storageβ
- Optimized the issue of Routine Load tasks frequently timing out due to a small
max_match_interval
setting. #46292 - Improved performance for Broker Load when importing multiple compressed files. #43975
- Increased the default value of
webserver_num_workers
to enhance Stream Load performance. #46593 - Optimized the load imbalance issue for Routine Load import tasks during BE node scaling. #44798
- Improved the use of Routine Load thread pools to prevent timeouts from affecting queries. #45039
Compute-Storage Separationβ
- Enhanced the stability and observability of the Meta-service. #44036, #45617, #45255, #45068
- Optimized File Cache by adding an early eviction strategy, reducing lock time, and improving query performance. #47473, #45678, #47472
- Improved initialization checks and queue transitions for File Cache to enhance stability. #44004, #44429, #45057, #47229
- Increased the speed of HDFS data recycling. #46393
- Optimized performance issues when the FE acquires compute groups during ultra-high-frequency imports. #47203
- Improved several import-related parameters for primary key tables in compute-storage separation to enhance the stability of real-time high-concurrency imports. #47295, #46750, #46365
Lakehouseβ
-
Supported reading Hive tables in JSON format. #43469
- For more information, please refer to documentation: Text/CSV/JSON - Apache Doris
-
Introduced the session variable
enable_text_validate_utf8
to skip UTF-8 encoding checks for CSV formats. #45537- For more information, please refer to documentation: Text/CSV/JSON - Apache Doris
-
Updated the Hudi version to 0.15 and optimized query planning performance for Hudi tables.
-
Improved read performance for MaxCompute partitioned tables. #45148
-
Optimized performance for Parquet file delayed materialization under high filter rates. #46183
-
Supported delayed materialization for complex Parquet types. #44098
-
Optimized predicate pushdown logic for ORC types, supporting more predicate conditions for index filtering. #43255
Asynchronous Materialized Viewsβ
- Supported more scenarios for aggregate roll-up rewriting. #44412
Query Optimizerβ
- Improved partition pruning performance. #46261
- Added rules to eliminate
group by
keys based on data characteristics. #43391 - Adaptively adjusted the wait time for Runtime Filters based on the target table size. #42640
- Improved the ability to push down aggregations in joins to fit more scenarios. #43856, #43380
- Improved Limit pushdown for aggregations to fit more scenarios. #44042
Othersβ
- Optimized startup scripts for FE, BE, and MS processes to provide clearer output. #45610, #45490, #45883
- The case sensitivity of table names in
show tables
now matches MySQL behavior. #46030 show index
now supports arbitrary target table types. #45861information_schema.columns
now supports displaying default values. #44849information_schema.views
now supports displaying view definitions. #45857- Supported the MySQL protocol
COM_RESET_CONNECTION
command. #44747
Bug Fixesβ
Storageβ
- Fixed potential memory errors during the import process for aggregate table models. #46997
- Resolved the issue of Routine Load offset loss during FE master node restarts in compute-storage separation mode. #46566
- Fixed memory leaks in FE Observer nodes during batch import scenarios in compute-storage mode. #47244
- Resolved the issue of Cumulative Point rollback during Full Compaction with Order Data Compaction. #44359
- Fixed the issue where Delete operations could temporarily prevent Tablet Compaction scheduling. #43466
- Resolved incorrect Tablet states after Schema Change in multi-compute-cluster scenarios. #45821
- Fixed the potential NPE error when performing Column Rename Schema Change on primary key tables with
sequence_type
. #46906 - Data Correctness: Fixed correctness issues for primary key tables when importing partial column updates containing DELETE SIGN columns. #46194
- Resolved potential memory leaks in FE when Publish tasks for primary key tables were continuously stuck. #44846
Compute-Storage Decoupledβ
- Fixed the issue where File Cache size could exceed the table data size. #46561, #46390
- Resolved upload failures at the 5MB boundary during data uploads. #47333
- Enhanced robustness by adding more parameter checks for several
alter
operations in Storage Vault. #45155, #45156, #46625, #47078, #45685, #46779 - Resolved issues with data recycling failures or slow recycling due to improper Storage Vault configurations. #46798, #47536, #47475, #47324, #45072
- Fixed the issue where data recycling could stall, preventing timely recycling. #45760
- Resolved incorrect retries for MTTM-230 errors in compute-storage separation mode. #47370, #47326
- Fixed the issue where Group Commit WAL was not fully replayed during BE decommissioning in compute-storage separation mode. #47187
- Resolved the issue where Tablet Meta exceeding 2GB rendered MS unavailable. #44780
- Data Correctness: Fixed two duplicate Key issues in primary key tables in compute-storage separation mode. #46039, #44975
- Resolved the issue where Base Compaction could continuously fail due to large Delete Bitmaps in primary key tables during high-frequency real-time imports. #46969
- Modified incorrect retry logic for Schema Change in primary key tables in compute-storage separation mode to enhance robustness. #46748
Lakehouseβ
Hiveβ
- Fixed the issue where Hive views created by Spark could not be queried. #43553
- Resolved the issue where certain Hive Transaction tables could not be read correctly. #45753
- Fixed the issue where partition pruning failed for Hive tables with special characters in partitions. #42906
Icebergβ
- Fixed the issue where Iceberg tables could not be created in Kerberos authentication environments. #43445
- Resolved the issue where
count(*)
queries were inaccurate for Iceberg tables with dangling deletes. #44039 - Fixed the issue where query errors occurred due to mismatched column names in Iceberg tables. #44470
- Resolved the issue where Iceberg tables could not be read after partition modifications. #45367
Paimonβ
- Fixed the issue where Paimon Catalog could not access Alibaba Cloud OSS-HDFS. #42585
Hudiβ
- Fixed the issue where partition pruning failed for Hudi tables in certain scenarios. #44669
JDBCβ
- Fixed the issue where tables could not be retrieved using JDBC Catalog after enabling case-insensitive table names.
MaxComputeβ
- Fixed the issue where partition pruning failed for MaxCompute tables in certain scenarios. #44508
Othersβ
- Fixed the issue where export tasks caused memory leaks in FE. #44019
- Resolved the issue where S3 object storage could not be accessed via HTTPS protocol. #44242
- Fixed the issue where Kerberos authentication tickets could not be automatically refreshed. #44916
- Resolved the issue where reading Hadoop Block compressed format files failed. #45289
- When querying ORC format data, CHAR type predicates are no longer pushed down to avoid potential result errors. #45484
Asynchronous Materialized Viewsβ
- Fixed the issue where transparent query rewriting could lead to planning or result errors in extreme scenarios. #44575, #45744
- Resolved the issue where multiple build tasks could be generated during asynchronous materialized view scheduling in extreme scenarios. #46020, #46280
Query Optimizerβ
- Fixed the issue where some expression rewrites could produce incorrect expressions. #44770, #44920, #45922, #45596
- Resolved occasional incorrect results from SQL Cache. #44782, #44631, #46443, #47266
- Fixed the issue where limit pushdown for aggregation operators could produce incorrect results in some scenarios. #45369
- Resolved the issue where delayed materialization optimization could produce incorrect execution plans in some scenarios. #45693, #46551
Query Executionβ
- Fixed the issue where regular expressions and
like
functions produced incorrect results with special characters. #44547 - Resolved the issue where SQL Cache results could be incorrect when switching databases. #44782
- Fixed a series of Arrow Flight-related issues. #45023, #43929
- Resolved the issue where results were incorrect when the Hash table for HashJoin exceeded 4GB in some cases. #46461
- Fixed the overflow issue of the
convert_to
function with Chinese characters. #46405 - Resolved the issue where results could be incorrect in extreme scenarios when
group by
was used with Limit. #47844 - Fixed the issue where results could be incorrect when accessing certain system tables. #47498
- Resolved the issue where the
percentile
function could cause system crashes. #47068 - Fixed the performance degradation issue for single-table queries with Limit. #46090
- Resolved the issue where
StDistanceSphere
andStAngleSphere
functions caused system crashes. #45508 - Fixed the issue where
map_agg
results were incorrect. #40454
Semi-structured Data Managementβ
BloomFilter Indexβ
- Fixed the exception caused by large parameters in BloomFilter Index. #45780
- Resolved the issue of high memory usage during BloomFilter Index writes. #45833
- Fixed the issue where BloomFilter Index was not correctly deleted when columns were dropped. #44361, #43378
Inverted Indexβ
- Fixed the occasional crash during inverted index construction. #43246
- Resolved the issue where words with zero occurrences occupied space during inverted index merging. #43113
- Prevented abnormal large values in Index Size statistics. #46549
- Fixed the issue with inverted indexes for VARIANT type fields. #43375
- Optimized local cache locality for inverted indexes to improve cache hit rates. #46518
- Added the metric
NumInvertedIndexRemoteIOTotal
to query profiles for remote storage reads of inverted indexes. #45675, #44863
Othersβ
- Fixed the crash issue of the
ipv6_cidr_to_range
function with special NULL data. #44700
Permissionsβ
- When granting
CREATE_PRIV
, the existence of the corresponding resource is no longer checked. #45125 - Fixed the issue where queries on views with permissions could fail due to missing permissions for referenced tables in extreme scenarios. #44621
- Resolved the issue where permission checks for
use db
did not distinguish between internal and external Catalogs. #45720