Skip to main content

TPC-DS Benchmark

TPC-DS (Transaction Processing Performance Council Decision Support Benchmark) is a benchmark test that focuses on decision support and aims to evaluate the performance of data warehousing and analytics systems. It was developed by the Transaction Processing Performance Council (TPC) organization to compare the capabilities of different systems in handling complex queries and large-scale data analysis.

The design goal of TPC-DS is to simulate complex decision support workloads in the real world. It tests the performance of systems through a series of complex queries and data operations, including joins, aggregations, sorting, filtering, subqueries, and more. These query patterns cover various scenarios ranging from simple to complex, such as report generation, data mining, and OLAP (Online Analytical Processing).

This document mainly introduces the performance of Doris on the TPC-DS SF1000 test set.

On 99 queries on the TPC-DS standard test data set, we conducted a comparison test based on Apache Doris.

1. Hardware Environment

HardwareConfiguration Instructions
Number of Machines4 Aliyun g9i Virtual Machine (1FE,3BEs)
CPUIntel® Xeon® Granite Rapids 32C
Memory128G
DiskEnterprise SSD (PL0)

2. Software Environment

  • Doris Deployed 3BEs and 1FE
  • Kernel Version: Linux version 5.15.0-101-generic
  • OS version: Ubuntu 20.04 LTS (Focal Fossa)
  • JDK: openjdk 17.0.2

3. Test Data Volume

The TPC-DS SF1000 data generated by the simulation of the entire test are respectively imported into Apache Doris for testing. The following is the relevant description and data volume of the table.

TPC-DS Table NameRows
customer_demographics1,920,800
reason65
warehouse20
date_dim73,049
catalog_sales1,439,980,416
call_center42
inventory783,000,000
catalog_returns143,996,756
household_demographics7,200
customer_address6,000,000
income_band20
catalog_page30,000
item300,000
web_returns71,997,522
web_site54
promotion1,500
web_sales720,000,376
store1,002
web_page3,000
time_dim86,400
store_returns287,999,764
store_sales2,879,987,999
ship_mode20
customer12,000,000

4. Test SQL

TPC-DS 99 test query statements : TPC-DS-Query-SQL

5. Test Results

In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows:

QueryDoris 2.1.11 (ms)Doris 3.1.4 (ms)Doris 4.0.5 (ms)Doris 4.1.0 (ms)
Total185200190159190031159562
query01420491541459
query02297030582510589
query03260311397150
query048000778272456046
query05310475786454
query06180245352313
query07310383347390
query08240381365408
query094670494747214158
query10200243328261
query114600515945553815
query1270156127121
query13410435471481
query14_16230635363375365
query14_25880627658765048
query15300291348265
query16390349275245
query176707458381139
query18410607636682
query19150210295247
query20120160141134
query215010011187
query221160936948802
query23_113670146271283810419
query23_213480141031263310303
query24_12360267727762774
query24_22320263424532616
query25400646671739
query26150212183184
query27300396390327
query284170466442603598
query29520640727721
query30190242236240
query311150124410701283
query32407711492
query33200310304268
query34370478478286
query35880893842813
query36340357337333
query3710016620481
query385200251165935704
query39_1200284299213
query39_2160220209157
query40100133162140
query41508611889
query42509011186
query43690708596326
query441330145513441010
query45300205204196
query46480570698443
query472770270926932123
query48260362362311
query49360511599490
query50490589797330
query516590690132664243
query52608712391
query53200272270276
query5487010831143244
query5550789684
query56150245293258
query571580155315921180
query58150226245246
query593960404734751648
query60200263318296
query61200294329299
query62590694758421
query63180226287232
query643220210126872679
query653270347233083101
query66350381359328
query6727490268382604022313
query68390421698270
query69180272742700
query702350216721172158
query71510847811754
query722160239332692215
query73290331391122
query743990411739183183
query753150345030993115
query76111011221224969
query77180233288219
query781045011343105919480
query791580192320081336
query80330411579463
query81320365406348
query82210259427154
query83140161176181
query8490120187145
query85300537770769
query86660652698726
query875280303968856258
query883670378641143209
query89330359410437
query90130149188128
query91100118204183
query9230547086
query93109011741247973
query94250240344166
query95260330374207
query96440475581345
query973630378527532738
query98240453410379
query99117014201612853

6. Environmental Preparation

Please refer to the official document to install and deploy Doris to obtain a normal running Doris cluster (at least 1 FE 1 BE, 1 FE 3 BE is recommended).

7. Data Preparation

7.1 Download and Install TPC-DS Data Generation Tool

Execute the following script to download and compile the tpcds-tools tool.

sh bin/build-tpcds-dbgen.sh

7.2 Generating the TPC-DS Test Set

Execute the following script to generate the TPC-H dataset:

sh bin/gen-tpcds-data.sh -s 1000

Note 1: Check the script help via sh gen-tpcds-data.sh -h.

Note 2: The data will be generated under the tpcds-data/ directory with the suffix .dat. The total file size is about 1000GB and may need a few minutes to an hour to generate.

Note 3: A standard test data set of SF100 is generated by default.

7.3 Create Table

7.3.1 Prepare the doris-cluster.conf File

Before import the script, you need to write the FE’s ip port and other information in the doris-cluster.conf file.

The file is located under ${DORIS_HOME}/tools/tpcds-tools/conf/ .

The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported:

# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
# The database where TPC-H tables located
export DB='tpcds'

Execute the Following Script to Generate and Create TPC-H Table

sh bin/create-tpcds-tables.sh -s 1000

Or copy the table creation statement in create-tpcds-tables and execute it in Doris.

7.4 Import Data

Please perform data import with the following command:

sh bin/load-tpcds-data.sh

8 Query Test

8.1 Executing Query Scripts

Execute the above test SQL or execute the following command

sh bin/run-tpcds-queries.sh -s 1000

8.2 Single SQL Execution

You can also retrieve the latest SQL from the code repository. The address for the latest test query statements of TPC-DS.