Skip to main content
Skip to main content

TPC-DS Benchmark

TPC-DS Benchmark

TPC-DS (Transaction Processing Performance Council Decision Support Benchmark) is a benchmark test that focuses on decision support and aims to evaluate the performance of data warehousing and analytics systems. It was developed by the Transaction Processing Performance Council (TPC) organization to compare the capabilities of different systems in handling complex queries and large-scale data analysis.

The design goal of TPC-DS is to simulate complex decision support workloads in the real world. It tests the performance of systems through a series of complex queries and data operations, including joins, aggregations, sorting, filtering, subqueries, and more. These query patterns cover various scenarios ranging from simple to complex, such as report generation, data mining, and OLAP (Online Analytical Processing).

This document mainly introduces the performance of Doris on the TPC-DS 1000G test set.

We tested 99 queries on the TPC-H standard test dataset based on Apache Doris 2.0.6.

1. Hardware Environment

HardwareConfiguration Instructions
Number of mMachines4 Tencent Cloud Virtual Machine(1FE,3BEs)
CPUAMD EPYC™ Milan(2.55GHz/3.5GHz) 48C
Memory192G
Network21Gbps
DiskESSD Cloud Hard Disk

2. Software Environment

  • Doris Deployed 3BEs and 1FE
  • Kernel Version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
  • OS version: Ubuntu 20.04 LTS (Focal Fossa)
  • Doris software version: Apache Doris 2.0.6.
  • JDK: openjdk version "1.8.0_131"

3. Test Data Volume

The TPC-DS 1000G data generated by the simulation of the entire test are respectively imported into Apache Doris 2.0.6 for testing. The following is the relevant description and data volume of the table.

TPC-DS Table NameRows
customer_demographics1,920,800
reason65
warehouse20
date_dim73,049
catalog_sales1,439,980,416
call_center42
inventory783,000,000
catalog_returns143,996,756
household_demographics7,200
customer_address6,000,000
income_band20
catalog_page30,000
item300,000
web_returns71,997,522
web_site54
promotion1,500
web_sales720,000,376
store1,002
web_page3,000
time_dim86,400
store_returns287,999,764
store_sales2,879,987,999
ship_mode20
customer12,000,000

4. Test SQL

TPC-DS 99 test query statements : TPC-DS-Query-SQL

5. Test Results

Here we use Apache Doris 2.0.6 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows:

QueryApache Doris 2.0.6 (ms)
query1914
query24669
query3285
query435148
query522979
query61351
query7517
query8591
query95430
query103341
query1123300
query12105
query131719
query1433254
query151414
query16402
query172371
query18760
query19308
query20117
query2194
query222481
query2377381
query2423910
query251021
query26213
query27544
query284593
query291024
query30682
query312252
query3268
query33539
query34638
query3510505
query36441
query3786
query388379
query39898
query40190
query4130
query42113
query431332
query441520
query451306
query462167
query473859
query481419
query49725
query501299
query514954
query52123
query53391
query548212
query55124
query56434
query572494
query58666
query597432
query60481
query61536
query621082
query63303
query644968
query655971
query66603
query6734052
query681428
query69808
query704462
query711006
query724717
query73558
query7414127
query756312
query761870
query77496
query7823091
query794090
query801559
query81960
query82221
query83415
query84131
query85444
query86931
query878554
query885202
query89480
query90322
query91159
query9259
query931618
query94297
query9527354
query96847
query9711528
query98287
query992147
Total487990

6. Environmental Preparation

Please refer to the official document to install and deploy Doris to obtain a normal running Doris cluster (at least 1 FE 1 BE, 1 FE 3 BE is recommended).

7. Data Preparation

7.1 Download and Install TPC-DS Data Generation Tool

Execute the following script to download and compile the tpcds-tools tool.

sh bin/build-tpcds-dbgen.sh

7.2 Generating the TPC-DS Test Set

Execute the following script to generate the TPC-H dataset:

sh bin/gen-tpcds-data.sh -s 1000

Note 1: Check the script help via sh gen-tpcds-data.sh -h.

Note 2: The data will be generated under the tpcds-data/ directory with the suffix .dat. The total file size is about 1000GB and may need a few minutes to an hour to generate.

Note 3: A standard test data set of 100G is generated by default.

7.3 Create Table

7.3.1 Prepare the doris-cluster.conf File

Before import the script, you need to write the FE’s ip port and other information in the doris-cluster.conf file.

The file is located under ${DORIS_HOME}/tools/tpcds-tools/conf/ .

The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported:

# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
# The database where TPC-H tables located
export DB='tpcds'

Execute the Following Script to Generate and Create TPC-H Table

sh bin/create-tpcds-tables.sh -s 1000

Or copy the table creation statement in create-tpcds-tables.sql and excute it in Doris.

7.4 Import Data

Please perform data import with the following command:

sh bin/load-tpcds-data.sh

7.5 Query Test

7.5.1 Executing Query Scripts

Execute the above test SQL or execute the following command

sh bin/run-tpcds-queries.sh -s 1000

7.5.2 Single SQL Execution

You can also retrieve the latest SQL from the code repository. The address for the latest test query statements of TPC-DS.