Skip to main content

Doris Catalog

Doris Catalog allows users to access data across multiple Doris clusters through HTTP protocol and Arrow Flight protocol.

This document introduces how to configure remote Doris cluster connections and perform queries.

note

This feature is supported since version 4.0.2.

This is an experimental feature.

Use Cases

ScenarioDescription
Federated QueryDoris enables associative queries across multiple independent Doris clusters through predicate pushdown and Arrow Flight protocol

Configuring Catalog

Syntax

CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
'type' = 'doris', -- required
'fe_http_hosts' = 'http://<fe-host1>:<fe-http-port>,<fe-host2>:<fe-http-port>', -- required
'fe_arrow_hosts' = '<fe-host1>:<fe-arrow-flight-port>,<fe-host2>:<fe-arrow-flight-port>', -- required
'user' = '', -- required
'password' = '', -- required
{QueryProperties},
{HttpClientProperties},
{CommonProperties}
);
  • fe_http_hosts

    List of remote Doris cluster FE HTTP service endpoints.

  • fe_arrow_hosts

    List of remote Doris cluster FE Arrow Flight service endpoints.

  • {QueryProperties}

    Optional properties

    Parameter NameDescriptionDefault Value
    enable_parallel_result_sinkWhen enabled, local Doris BE nodes will pull data from remote Doris cluster BE nodes in parallel.true
    query_retry_countMaximum number of retries for failed query requests to remote Doris. (Does not include failures that may occur during asynchronous execution after the request is accepted)3
    query_timeout_secTimeout for sending queries to remote Doris. (Does not include asynchronous execution time after the request is accepted)15
    compatibleUsed to attempt compatibility with metadata formats when accessing remote Doris with versions lower than the local cluster. No need to enable when cluster versions are consistent.false
  • {HttpClientProperties}

    HttpClientProperties section is used to configure HTTP Client related parameters. This client is used to send HTTP requests to synchronize remote cluster metadata. These are all optional parameters.

    Parameter NameDescriptionDefault Value
    metadata_http_ssl_enabledWhether to enable SSL/TLS encrypted communication for HTTP metadata synchronization.false
    metadata_sync_retry_countMaximum retry count for failed HTTP requests3
    metadata_max_idle_connectionsMaximum idle connections for HTTP metadata synchronization client5
    metadata_keep_alive_duration_secKeep-alive duration for HTTP metadata synchronization client idle connections300
    metadata_connect_timeout_secTCP connection timeout for HTTP metadata synchronization client10
    metadata_read_timeout_secSocket read timeout for HTTP metadata synchronization client10
    metadata_write_timeout_secSocket write timeout for HTTP metadata synchronization client10
    metadata_call_timeout_secTotal HTTP request timeout for HTTP metadata synchronization client10
  • {CommonProperties}

    CommonProperties section is used to fill in common properties. Please refer to the [Common Properties] section in the Data Catalog Overview.

Column Type Mapping

Doris external table types are exactly the same as local Doris types.

Query Operations

Basic Queries

After configuring the Catalog, you can query table data in the Catalog in the following ways:

-- 1. switch to catalog, use database and query
SWITCH doris_ctl;
USE doris_db;
SELECT * FROM doris_tbl LIMIT 10;

-- 2. use doris database directly
USE doris_ctl.doris_db;
SELECT * FROM doris_tbl LIMIT 10;

-- 3. use full qualified name to query
SELECT * FROM doris_ctl.doris_db.doris_tbl LIMIT 10;

Query Optimization

When Doris Catalog accesses data sources, Doris will try to push down predicates or function conditions and concatenate them into the generated SQL. You can view the generated SQL statement through EXPLAIN SQL.

...
| 0:VREMOTE_DORIS_SCAN_NODE(68) |
| TABLE: test.test_time |
| QUERY: SELECT /*+ SET_VAR(enable_parallel_result_sink=true) */ `timestamp` FROM test.test_time WHERE (timestamp > '2025-11-03 00:00:00.000') |
| PREDICATES: (timestamp[#0] > '2025-11-03 00:00:00.000')
...