Skip to main content

Alternative to ClickHouse

Apache Doris and ClickHouse are both leading real-time analytical databases with columnar storage and fast query capabilities. Apache Doris offers significant advantages over ClickHouse in three critical areas: 10x faster join query performance through its advanced MPP architecture with Cost-Based Optimizer, lower infrastructure costs via compute-storage separation that allows independent scaling of resources, and superior real-time update performance with its Merge-on-Write engine that maintains query speed during high-frequency data modifications.

tencent-music

“Tencent Music's data platform has migrated from ClickHouse to Apache Doris, improving data timeliness and reducing maintenance costs. Doris' flexible ingestion methods and robust consistency protocol ensure high availability and reliability.”

Highlight:

  • Massive boost in multi-table join performance.
  • Easy scaling and maintenance.
  • Efficient data processing and real-time updates.
tencent-music

“Apache Doris has faster query response times than ClickHouse in the vast majority of scenarios, especially in complex join scenarios, where its performance is significantly superior to that of ClickHouse.”

Highlight:

  • Core business queries 2-3x.
  • Complex join queries 2-10x.
  • Can run all ClickHouse OOM queries.
tencent-music

“By replacing ClickHouse with Doris, Kwai successfully upgraded to a lakehouse architecture, simplifying the data pipeline and eliminating the need for data import, as Doris can directly access data lake data.”

Highlight:

  • Directly query of data lake data.
  • Improved query performance.
  • Flexible data governance with materialized views.

Apache Doris vs. ClickHouse

Apache DorisClickHouse
Architecture & SQL
  • Based on MPP architecture
  • Standard SQL support, MySQL-compatible
  • Uses Scatter-Gather architecture
  • SQL-like capabilities but with non-standard SQL
Join Query Performance
  • 2-10x faster joins with true distributed join execution across nodes
  • Advanced Cost-Based Optimizer (CBO) automatically selects optimal join strategies (broadcast, shuffle, colocate)
  • Colocate Join eliminates network shuffle for pre-partitioned tables
  • Runtime Filter pushdown reduces data scanning by up to 90%
  • Transparent query acceleration - queries on base tables are automatically rewritten to use materialized views
  • Handles complex TPC-DS queries that cause OOM in ClickHouse
  • Limited join capability - relies on subqueries and denormalization
  • No Cost-Based Optimizer; requires manual query tuning
  • Scatter-Gather architecture not designed for distributed joins
  • ~50% of TPC-DS queries fail due to unsupported correlated subqueries
  • No automatic query rewriting - must explicitly query materialized views; cannot accelerate queries on base tables
  • Frequent OOM errors on large multi-table queries
Real-time Updates
  • 34x faster query performance than ClickHouse for real-time update workloads
  • Merge-on-Write (MoW) engine with delete bitmap ensures query performance remains constant regardless of update frequency
  • Strongly consistent primary key model - updates are immediately visible with no stale reads
  • Supports high-throughput UPSERT operations without query performance degradation
  • Partial column updates minimize write amplification
  • ReplacingMergeTree only supports eventual consistency - stale data visible until background merge
  • Using FINAL keyword for consistent reads causes 2-10x query slowdown
  • High update frequency leads to excessive merge overhead and query latency spikes
Transaction Support
  • Full ACID transaction support for data ingestion
  • Atomic batch imports - all data loads succeed or fail together
  • Two-phase commit ensures data consistency across distributed nodes
  • No transaction support
  • Partial data may be visible during failed imports
  • Requires application-level handling for data consistency
Query Concurrency
  • 10x higher concurrency - supports thousands of concurrent queries
  • Efficient memory management prevents OOM under high load
  • Query queue management with workload isolation
  • Limited concurrent query support (typically <100)
  • Memory-intensive queries cause cluster instability
  • No built-in workload management
Data API
  • Offers high-throughput read APIs based on Arrow-flight, facilitating integration with other engines such as data science/AI tools
  • Only inefficient data reading via JDBC API
Building Open Lakehouse
  • Serves as a Lakehouse SQL engine, supporting queries on Hive, Hudi, Iceberg, and Parquet data lake formats
  • Limited Lakehouse integration capabilities
Operations & Maintenance
  • Supports automatic scaling in, scaling out, and replica balancing
  • Requires manual rebalancing during scaling operations
Performance
  • In wide table benchmarks (ClickBench), Doris ranked top 1 or top 2 in October 2022 and October 2024, outperforming ClickHouse
  • In large TPC-H and TPC-DS tests, Doris achieved leading performance
  • In terms of ClickBench performance, ClickHouse and Doris have been taking turns leading
  • Experiences many OOM (Out of Memory) queries in large TPC-H and TPC-DS tests
Cost Efficiency (Storage-Compute Separation)
  • Up to 70% cost reduction by independently scaling compute and storage
  • Cold data stored on low-cost object storage (S3, HDFS, OSS) while hot data uses local SSD
  • Elastic compute scaling - add/remove nodes without data rebalancing
  • Multi-tier storage with automatic data temperature management
  • Pay only for the compute resources you need at any given time
  • Available as open-source feature since version 3.0
  • Tightly coupled storage and compute - scaling requires both
  • Storage-compute separation only in proprietary ClickHouse Cloud
  • Scaling requires expensive data rebalancing across nodes
  • Must over-provision compute to handle peak loads
  • Higher total cost of ownership for variable workloads
Open Source
  • Fully open source under the Apache Software Foundation; license and governance are community-driven and cannot be changed by any single entity.
  • Open source, but controlled by a commercial company.

Performance Comparison

ClickBench Benchmark

ClickBench is a benchmarking tool created and maintained by the ClickHouse team to evaluate the performance of analytical databases.

It focuses on testing the performance of large, flat tables rather than complex multi-table joins. It uses real-world data from a major web analytics platform, covering typical scenarios such as clickstream analysis and structured logs.

The benchmark consists of a set of queries that test aggregation operations and single-table performance, without involving complex joins. This makes it especially useful for evaluating databases optimized for real-time analytics and large-scale data processing.

ClickBench  Benchmark

SSB-Flat SF100 Benchmark

SSB-Flat SF100 is a benchmark designed to test the performance of analytical databases in handling large, wide tables.

It is derived from the Star Schema Benchmark (SSB) but flattens the star schema into a single wide table to focus on the performance of single-table queries.

The SF100 indicates that the data scale is 100 times the base size, making it a significant test for evaluating query performance and system scalability.

ClickBench  Benchmark

TPC-H SF100 Benchmark

The TPC-H benchmark with a scale factor of 100 (SF100) is a widely used standard for evaluating database performance. It includes a set of complex SQL queries designed to simulate real-world business intelligence workloads.

The SF100 indicates that the data size is 100 times the base size, making it a large-scale test to measure query performance and system scalability.

Note: Since ClickHouse failed to execute 7 queries, the total execution time refers to the time taken by Doris to run all 22 queries, and by ClickHouse to run only 15 queries.

ClickBench  Benchmark

TPC-DS 1TB Benchmark

TPC-DS 1TB is a widely recognized benchmark for evaluating the performance of data warehouses and analytical databases. It involves a dataset of approximately 1TB in size, containing around 6.35 billion records spread across 24 tables.

The benchmark includes 99 complex queries designed to test various aspects of database performance, such as joins, aggregations, and subqueries.

The TPC-DS schema is based on a snowflake schema, representing real-world scenarios like web, catalog, and store sales. The 1TB scale is considered a moderate size for data warehouses but is still challenging due to the complexity of the queries and the large number of records

Note:TPC-DS makes heavy use of correlated subqueries which are at the time of testing (September 2024) not supported by ClickHouse. As a result, about 50% of benchmark queries will fail with errors.

ClickBench  Benchmark

More Migration Stories