Skip to main content

TPC-DS Benchmark

TPC-DS (Transaction Processing Performance Council Decision Support Benchmark) is a benchmark test that focuses on decision support and aims to evaluate the performance of data warehousing and analytics systems. It was developed by the Transaction Processing Performance Council (TPC) organization to compare the capabilities of different systems in handling complex queries and large-scale data analysis.

The design goal of TPC-DS is to simulate complex decision support workloads in the real world. It tests the performance of systems through a series of complex queries and data operations, including joins, aggregations, sorting, filtering, subqueries, and more. These query patterns cover various scenarios ranging from simple to complex, such as report generation, data mining, and OLAP (Online Analytical Processing).

This document mainly introduces the performance of Doris on the TPC-DS 1000G test set.

On 99 queries on the TPC-DS standard test data set, we conducted a comparison test based on Apache Doris 3.0.3-rc03 (Compute-Storage Coupled Mode) and Apache Doris 2.1.7-rc03 versions. The performance of the integrated storage and computing mode in version 3.x is based on version 2.1.x

TPCDS_1000G

1. Hardware Environment​

HardwareConfiguration Instructions
Number of Machines4 Aliyun Virtual Machine (1FE,3BEs)
CPUIntel Xeon (Ice Lake) Platinum 8369B 32C
Memory128G
DiskEnterprise SSD (PL0)

2. Software Environment​

  • Doris Deployed 3BEs and 1FE
  • Kernel Version: Linux version 5.15.0-101-generic
  • OS version: Ubuntu 20.04 LTS (Focal Fossa)
  • Doris software version: Apache Doris 3.0.3-rc03 (Compute-Storage Coupled Mode), Apache Doris 2.1.7-rc03
  • JDK: openjdk version "17.0.2"

3. Test Data Volume​

The TPC-DS 1000G data generated by the simulation of the entire test are respectively imported into Apache Doris 3.0.3-rc03 (Compute-Storage Coupled Mode) and Apache Doris 2.1.7-rc03 for testing. The following is the relevant description and data volume of the table.

TPC-DS Table NameRows
customer_demographics1,920,800
reason65
warehouse20
date_dim73,049
catalog_sales1,439,980,416
call_center42
inventory783,000,000
catalog_returns143,996,756
household_demographics7,200
customer_address6,000,000
income_band20
catalog_page30,000
item300,000
web_returns71,997,522
web_site54
promotion1,500
web_sales720,000,376
store1,002
web_page3,000
time_dim86,400
store_returns287,999,764
store_sales2,879,987,999
ship_mode20
customer12,000,000

4. Test SQL​

TPC-DS 99 test query statements : TPC-DS-Query-SQL

5. Test Results​

Here we use Apache Doris 3.0.3-rc03 (Compute-Storage Coupled Mode) and Apache Doris 2.1.7-rc03 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows:

QueryApache Doris 3.0.3-rc03 Compute-Storage Coupled Mode (ms)Apache Doris 2.1.7-rc03 (ms)
query01580630
query0255404930
query03350360
query041079011070
query05710620
query06230220
query07590550
query08350330
query0975206830
query10390370
query1165606960
query12120100
query13780790
query141320013470
query15400510
query16410520
query1713001310
query18650560
query19250200
query20110100
query2111080
query2215702300
query233718038240
query2474708340
query25920780
query26200200
query27550530
query2873005940
query29920940
query30300270
query3120001890
query327060
query33400350
query34760750
query3512901370
query36460530
query378060
query3854507520
query39760560
query40140150
query415050
query42110100
query4311701150
query4421202020
query45280430
query4613901250
query4721602660
query48660630
query49810730
query5015701640
query5160306430
query52120110
query53280250
query5415401280
query55130110
query56300290
query5712401480
query58260240
query59101207760
query60370380
query61560540
query62920740
query63230210
query6416605790
query6548004900
query66400480
query672419027320
query6814001600
query691170380
query7031603480
query71440460
query7240903160
query73660660
query7457205990
query7545604610
query7618001590
query77330300
query781630017970
query7931603040
query80590570
query81540460
query82320270
query83230220
query84130130
query85780520
query86660760
query8762008000
query8856205560
query89400430
query90150150
query91160150
query925040
query9323802440
query94290340
query95410350
query96680660
query9748705020
query98200190
query9919401560
Total251620261320

6. Environmental Preparation​

Please refer to the official document to install and deploy Doris to obtain a normal running Doris cluster (at least 1 FE 1 BE, 1 FE 3 BE is recommended).

7. Data Preparation​

7.1 Download and Install TPC-DS Data Generation Tool​

Execute the following script to download and compile the tpcds-tools tool.

sh bin/build-tpcds-dbgen.sh

7.2 Generating the TPC-DS Test Set​

Execute the following script to generate the TPC-H dataset:

sh bin/gen-tpcds-data.sh -s 1000

Note 1: Check the script help via sh gen-tpcds-data.sh -h.

Note 2: The data will be generated under the tpcds-data/ directory with the suffix .dat. The total file size is about 1000GB and may need a few minutes to an hour to generate.

Note 3: A standard test data set of 100G is generated by default.

7.3 Create Table​

7.3.1 Prepare the doris-cluster.conf File​

Before import the script, you need to write the FE’s ip port and other information in the doris-cluster.conf file.

The file is located under ${DORIS_HOME}/tools/tpcds-tools/conf/ .

The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported:

# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
# The database where TPC-H tables located
export DB='tpcds'

Execute the Following Script to Generate and Create TPC-H Table​

sh bin/create-tpcds-tables.sh -s 1000

Or copy the table creation statement in create-tpcds-tables and excute it in Doris.

7.4 Import Data​

Please perform data import with the following command:

sh bin/load-tpcds-data.sh

7.5 Query Test​

7.5.1 Executing Query Scripts​

Execute the above test SQL or execute the following command

sh bin/run-tpcds-queries.sh -s 1000

7.5.2 Single SQL Execution​

You can also retrieve the latest SQL from the code repository. The address for the latest test query statements of TPC-DS.