Apache Doris 2.1.8 just released
Dear Community, Apache Doris version 2.1.8 was officially released on January 24, 2025. This version introduces continuous upgrades and enhancements in several key areas, including Lakehouse, Asynchronous Materialized Views, Query Optimizer and Execution Engine, Storage Management, and more.
Behavior Changes
- When querying a data source with case-insensitive table names (such as Hive) through External Catalog, in previous versions, you can use any case to query the table name, but in version 2.1.8, Doris's own table name case sensitivity policy will be strictly followed.
- Add the environment variable
SKIP_CHECK_ULIMITto skip the ulimit value verification check within the BE process. This is only available to applications in the Docker quick start scenario. #45267 - Add the
enable_cooldown_replica_affinitysession variable to control the selection of replica affinity for queries under cold - hot separation. - In FE, add the configurations
restore_job_compressed_serializationandbackup_job_compressed_serializationto solve the OOM problem of FE during backup and restore operations when the number of db tablets is extremely large. By default, these configurations are disabled, and once enabled, they cannot be downgraded.
New Features
- The Arrow flight protocol supports accessing BE through a load - balancing device. #43281
- Now lambda expressions support capturing external columns (#45186).#45186
Improvements
Lakehouse
- Update the Hudi version to 0.15. And optimize the query planning performance of Hudi tables.
- Optimize the read performance of MaxCompute partitioned tables. #45148
- Support the session variable
enable_text_validate_utf8, which can ignore the UTF8 encoding detection in CSV format. #45537 - Optimize the performance of Parquet file lazy materialization under high - filtering - rate conditions. #46183
Asynchronous Materialized Views
- Now it supports manually refreshing partitions that do not exist in an asynchronous materialized view #45290.
- Optimize the performance of transparent rewrite planning #44786.
Query Optimizer
- Improve the adaptive ability of runtime filters #42640.
- Add the ability to generate original column filter conditions from filter conditions on
max/minaggregate function columns #39252 - Add the ability to extract single - side filter conditions from join predicates #38479.
- Optimize the ability of predicate derivation on set operators to better generate filter predicates #39450.
- Optimize the exception handling ability of statistic information collection and usage to avoid generating unexpected execution plans when collection exceptions occur. #43009 #43776 #43865 #42104 #42399 #41729
Query Execution Engine
- Optimize the execution of queries with
limitto end faster and avoid unnecessary data scanning #44255.
Storage Management
- CCR supports more comprehensive operations, such as
rename table,rename column,modify comment,drop view,drop rollup, etc. - Improve the accuracy of the broker load import progress and the performance when importing multiple compressed files.
- Improve the routine load timeout strategy and thread - pool usage to prevent routine load timeout failures and impacts on queries.
Others
- The Docker quick - start image supports starting without setting environment parameters. Add the environment variable
SKIP_CHECK_ULIMITto skip thestart_be.shscript and the swap,max_map_count, ulimit - related verification checks within the BE process. This is only applicable to applications in the Docker quick - start scenario. #45269 - Add the new LDAP configuration
ldap_group_filterfor custom group filtering. #43292 - Optimize performance when using ranger. #41207
- Fix the inaccurate statistics of
scan bytesin the audit log. #45167 - Now, the default values of columns can be correctly displayed in the
COLUMNSsystem table. #44849 - Now, the definition of views can be correctly displayed in the
VIEWSsystem table. #45857 - Now, the
adminuser cannot be deleted. #44751
Bug Fixes
Lakehouse
Hive
- Fix the problem of being unable to query Hive views created by Spark. #43553
- Fix the problem of being unable to correctly read some Hive Transaction tables. #45753
- Fix the problem of incorrect partition pruning when Hive table partitions contain special characters. #42906
Iceberg
- Fix the problem of being unable to create Iceberg tables in a Kerberos - authenticated environment. #43445
- Fix the problem of inaccurate
count(*)queries when there are dangling deletes in Iceberg tables in some cases. #44039 - Fix the problem of query errors due to column name mismatches in Iceberg tables in some cases . #44470
- Fix the problem of being unable to read Iceberg tables when their partitions are modified in some cases .#45367
Paimon
- Fix the problem that the Paimon Catalog cannot access Alibaba Cloud OSS - HDFS. #42585
Hudi
- Fix the problem of ineffective partition pruning in Hudi tables in some cases. #44669
JDBC
- Fix the problem of being unable to obtain tables using the JDBC Catalog after enabling the case insensitive table name feature in some cases.
MaxCompute
- Fix the problem of ineffective partition pruning in MaxCompute tables in some cases#44508.
Others
- Fix the problem of FE memory leaks caused by EXPORT tasks in some cases.#44019
- Fix the problem of being unable to access S3 object storage using the https protocol in some cases #44242.
- Fix the problem of the inability to automatically refresh Kerberos authentication tickets in some cases #44916
- Fix the problem of errors when reading Hadoop Block compressed format files in some cases. #45289
- When querying ORC - formatted data, no longer push down CHAR - type predicates to avoid possible result errors. #45484
Asynchronous Materialized Views
- Fix the problem that when there is a CTE in the materialized view definition, it cannot be refreshed #44857.
- Fix the problem that when columns are added to the base table, the asynchronous materialized view cannot hit the transparent rewrite. #44867
- Fix the problem that when the same filter predicate is included in different positions in a query, the transparent rewrite fails. #44575
- Fix the problem that when column aliases are used in filter predicates or join predicates, the transparent rewrite cannot be performed. #44779
Inverted Index
- Fix the problem of abnormal handling of inverted index compaction. #45773
- Fix the problem that inverted index construction fails due to lock - waiting timeout. #43589
- Fix the problem of inverted index write crashes in abnormal situations. #46075
- Fix the null - pointer problem of the
matchfunction with special parameters. #45774 - Fix problems related to the variant inverted index and disable the use of the index v1 format for variants #43971 #45179
- Fix the problem of crashes when setting
gram_size = 65535for the ngram bloomfilter index #43654 - Fix the problem of incorrect calculation of DATE and DATETIME for the bloomfilter index #43622
- Fix the problem that dropping a column does not automatically drop the bloomfilter index #44478
- Reduce the memory footprint when writing the bloomfilter index #46047
Semi-Structure Data
- Optimize memory usage and reduce the memory consumption of the
variantdata type #43349 #44585 #45734 - Optimize the performance of
variantschema copy. #45731 - Do not use
variantas a key when automatically inferring tablet keys. #44736 - Fix the problem of changing
variantfromNOT NULLtoNULL#45734 - Fix the problem of incorrect type inference of lambda functions. #45798
- Fix the coredump problem at the boundary conditions of the
ipv6_cidr_to_rangefunction #46252
Query Optimizer
- Fix the potential deadlock problem caused by mutual exclusion of table read locks and optimize the lock - using logic #45045 #43376 #44164 #44967 #45995.
- Fix the problem that the SQL Cache function incorrectly uses constant folding, resulting in incorrect results when using functions containing time formats . #44631
- Fix the problem of incorrect optimization of comparison expressions in edge cases, which may lead to incorrect results. #44054 #44725 #44922 #45735 #45868
- Fix the problem of incorrect audit logs for high - concurrent point queries #43345 #44588
- Fix the problem of continuous error reporting after an exception occurs in high - concurrent point queries #44582
- Fix the problem of incorrectly prepared statements for some fields.#45732
Query Execution Engine
- Fix the problem of incorrect results of regular expressions and
likefunctions for special characters. #44547 - Fix the problem that the SQL Cache may have incorrect results when switching databases. #44782
- Fix the problem of incorrect results of the
cut_ipv6function. #43921 - Fix the problem of casting from numeric types to bool types. #46275
- Fix a series of problems related to arrow flight. #45661 #45023 #43960 #43929
- Fix the problem of incorrect results in some cases when the hash table of hash join exceeds 4G. #46461
- Fix the overflow problem of the
convert_tofunction for Chinese characters. #46505
Storage Management
- Fix the problem that high - concurrent DDL may cause FE startup failure.
- Fix the problem that auto - increment columns may have duplicate values.
- Fix the problem that routine load cannot use the newly expanded BE during expansion.
Permission Management
- Fix the problem of frequent access to the Ranger service when using Ranger as the authentication plugin #45645.
Others
- Fix the potential memory leak problem when
enable_jvm_monitor=trueis enabled on the BE side #44311.