JDBC Catalog Developer Guide

1. Overview

Apache Doris's JdbcCatalog provides access to external databases through the JDBC protocol.

This document provides detailed instructions on how to add support for a new data source type to JdbcCatalog, using the addition of NewDB data source as an example.

This document is for version 3.0+.

2. Architecture Overview

2.1 Overall Architecture

JdbcCatalog adopts a Frontend (FE) and Backend (BE) separated architecture:

Frontend (FE): Responsible for metadata management, SQL parsing, query planning and optimization.
Backend (BE): Responsible for data scanning, type conversion and execution.

2.2 Core Components

Frontend Core Components

JdbcResource: Defines JDBC connection resources and parameters.
JdbcExternalCatalog: Manages the entire Catalog, creates and manages JdbcClient.
JdbcExternalDatabase: Manages database-level metadata.
JdbcExternalTable: Manages table-level metadata and Schema.
JdbcClient: Abstract base class that handles metadata operations (getting database list, table list, column information, etc.).
JdbcScanNode: Scan node in query plan, responsible for generating query SQL.
JdbcTableSink: Sink node in write plan.

Backend Core Components

vjdbc_connector.cpp: C++ connector that calls Java executor through JNI. (Developers do not need to modify this file)
BaseJdbcExecutor: Java executor abstract base class that handles data read/write operations.
JdbcExecutorFactory: Factory class that creates corresponding executors based on data source type.

3. Development Steps for Adding New Data Source

Step 1: Update Thrift Definition

Modify Thrift file

In the gensrc/thrift//Types.thrift file, add new type to the TOdbcTableType enum:
Generate code

Execute the script in the Doris root directory to make the new enum values effective in Java and C++ code:

Step 2: Define Core Metadata in Frontend

Modify JdbcResource.java

Add constants for NewDB's URL prefix and type name, and add recognition logic in the parseDbType method.
Modify JdbcTable.java

Map NewDB type string to Thrift enum and define its SQL identifier reference style.

Step 3: Implement Frontend Metadata Client (JdbcClient)

Core Logic for Metadata Interaction

This step is the core of metadata interaction with external data sources. You need to create a JdbcNewDBClient.java file in the doris/fe/fe-core/src/main/java/org/apache/doris/datasource/jdbc/client/ directory and override the following key methods:

// doris/fe/fe-core/src/main/java/org/apache/doris/datasource/jdbc/client/JdbcNewDBClient.java

public class JdbcNewDBClient extends JdbcClient {

    public JdbcNewDBClient(JdbcClientConfig jdbcClientConfig) {
        super(jdbcClientConfig);
    }

    /**
     * [Must Override] Get database (or Schema) list.
     * 
     * @return List of database names.
     * @purpose This is the underlying implementation of the `SHOW DATABASES` command.
     * @implementation
     * 1. Get a JDBC Connection from the connection pool.
     * 2. Use `connection.getMetaData().getCatalogs()` or `getSchemas()` to get the list.
     * 3. Iterate through ResultSet and add database names to List<String>.
     * 4. Call `filterDatabaseNames()` to filter out system databases and non-compliant databases.
     */
    @Override
    public List<String> getDatabaseNameList() {
        // Implement logic to get all NewDB database names
    }

    /**
     * This is the underlying implementation of the `SHOW TABLES` command.
     * 
     * @param remoteDbName Database name.
     * @return List of table names.
     */
    @Override
    public List<String> getTablesNameList(String remoteDbName) {
        // Usually no need to override this method directly, but override processTable() that it calls internally
        return super.getTablesNameList(remoteDbName);
    }

    /**
     * [Must Override] Get table metadata.
     * 
     * @purpose Called by getTablesNameList() and isTableExist() methods,
     *          performs metadata lookup through `DatabaseMetaData.getTables()`.
     * @implementation
     * 1. Get JDBC Connection and DatabaseMetaData.
     * 2. Call `databaseMetaData.getTables(catalog, schemaPattern, tableNamePattern, types)`.
     * 3. The key is to pass correct values for `catalog` and `schemaPattern` parameters, which depends on NewDB's JDBC driver implementation.
     *    - If NewDB uses Catalog, pass `remoteDbName` for catalog and `null` for schemaPattern.
     *    - If NewDB uses Schema, pass `null` for catalog and `remoteDbName` for schemaPattern.
     */
    @Override
    protected void processTable(String remoteDbName, String remoteTableName, String[] tableTypes,
            Consumer<ResultSet> resultSetConsumer) {
        // Implement logic to call DatabaseMetaData.getTables()
    }

    /**
     * [Must Override] **Core Method** - Define mapping from NewDB types to Doris types.
     * 
     * @param fieldSchema Contains column information obtained from JDBC Driver, such as type name (getDataTypeName),
     *                    precision (getColumnSize), scale (getDecimalDigits), etc.
     * @return Corresponding Doris `Type`.
     */
    @Override
    public Type jdbcTypeToDoris(JdbcFieldSchema fieldSchema) {
        // Return Doris Type based on information in fieldSchema
        // For example:
        // String newdbType = fieldSchema.getDataTypeName().toUpperCase();
        // switch (newdbType) {
        //     case "VARCHAR": return ScalarType.createStringType();
        //     case "INTEGER": return Type.INT;
        //     case "DECIMAL":
        //         return ScalarType.createDecimalV3Type(fieldSchema.getColumnSize(), fieldSchema.getDecimalDigits());
        //     // ... other type mappings
        //     default: return Type.UNSUPPORTED;
        // }
    }
    
    /**
     * [Recommended Override] Define system databases to filter out from `getDatabaseNameList` results.
     */
    @Override
    protected Set<String> getFilterInternalDatabases() {
        // return ImmutableSet.of("information_schema", "sys", "performance_schema", "newdb_system_db");
    }

    /**
     * [Recommended Override] Define the query statement used for connection testing.
     * Default is "SELECT 1", but some databases (like Oracle) need "SELECT 1 FROM DUAL".
     */
    @Override
    public String getTestQuery() {
        // return "SELECT 1"; // or simple query supported by NewDB
    }

    /**
     * [Optional Override] Define what value the catalog parameter should be when calling `DatabaseMetaData` related methods.
     * Default returns `connection.getCatalog()`. Should return `null` if NewDB doesn't support Catalog.
     */
    @Override
    protected String getCatalogName(Connection conn) throws SQLException {
        // return conn.getCatalog(); // or return null;
    }

    /**
     * [Optional Override] Get column definitions.
     * Default implementation is usually sufficient, but for some databases (like MySQL), you might need to execute a SHOW or SELECT query
     * to get more precise type information (e.g., distinguish between `TINYINT(1)` and `TINYINT`).
     */
    @Override
    public List<JdbcFieldSchema> getJdbcColumnsInfo(String remoteDbName, String remoteTableName) {
        return super.getJdbcColumnsInfo(remoteDbName, remoteTableName);
    }
}

Register `JdbcNewDBClient` in Factory Class

Modify doris/fe/fe-core/src/main/java/org/apache/doris/datasource/jdbc/client/JdbcClient.java:

Step 4: Adapt Frontend Query Planning (JdbcScanNode)

In the getJdbcQueryStr() method, add specific LIMIT clause generation logic for NewDB.

// doris/fe/fe-core/src/main/java/org/apache/doris/datasource/jdbc/source/JdbcScanNode.java

private String getJdbcQueryStr() {
    // ...
    // MSSQL use select top to do top n
    if (shouldPushDownLimit() && jdbcType == TOdbcTableType.SQLSERVER) {
        sql.append("TOP " + limit + " ");
    }
    // ...
    sql.append(Joiner.on(", ").join(columns));
    sql.append(" FROM ").append(tableName);
    // ...
    // Other DataBase use limit do top n
    if (shouldPushDownLimit()
            && (jdbcType == TOdbcTableType.MYSQL
            || jdbcType == TOdbcTableType.POSTGRESQL
            /* Other databases using standard LIMIT */
            // Add here if NewDB also uses standard LIMIT
            )) {
        sql.append(" LIMIT ").append(limit);
    }
    if (shouldPushDownLimit() && jdbcType == TOdbcTableType.NEWDB) {
        // Example: If NewDB has special LIMIT syntax, e.g., 'FETCH FIRST n ROWS ONLY'
        // Handle independently here
        sql.append(" FETCH FIRST ").append(limit).append(" ROWS ONLY");
    }
    return sql.toString();
}

Step 5: Implement Backend Data Executor (JdbcExecutor)

Core Logic for Data Reading

This step is the core of actually executing JDBC data read/write through JNI on the BE side. You need to create a NewDBJdbcExecutor.java file in the doris/fe/be-java-extensions/jdbc-scanner/src/main/java/org/apache/doris/jdbc/ directory and override the following key methods:

// doris/fe/be-java-extensions/jdbc-scanner/src/main/java/org/apache/doris/jdbc/NewDBJdbcExecutor.java

public class NewDBJdbcExecutor extends BaseJdbcExecutor {

    public NewDBJdbcExecutor(byte[] thriftParams) throws Exception {
        super(thriftParams);
    }

    /**
     * [Override as needed] Initialize Java-side cache for storing a batch of data.
     *
     * @purpose This is a key performance optimization. It pre-allocates memory for the entire batch of data before starting to iterate through the ResultSet.
     *          This avoids repeated object creation in loops, significantly improving data reading performance.
     * @implementation
     * 1. Iterate through each column.
     * 2. For most standard types, best practice is to call `outputTable.getColumn(i).newObjectContainerArray(batchSizeNum)`.
     *    This creates a type-safe array container based on Doris target type (like Integer[], BigDecimal[], Long[], etc.).
     * 3. For types needing special handling (e.g., a binary object returned by JDBC Driver but we only want to treat as String),
     *    allocate a more generic container like `Object[]`.
     */
    @Override
    protected void initializeBlock(int columnCount, String[] replaceStringList, int batchSizeNum,
            VectorTable outputTable) {
        for (int i = 0; i < columnCount; ++i) {
            if (outputTable.getColumnType(i).getType() == Type.STRING) {
                block.add(new Object[batchSizeNum]);
            } else {
                block.add(outputTable.getColumn(i).newObjectContainerArray(batchSizeNum));
            }
        }
    }

    /**
     * [Must Override] **Core Method** - Get single column data from JDBC ResultSet.
     * 
     * @param columnIndex Column index (starting from 0).
     * @param type Expected column type on Doris side.
     * @return Java object retrieved and converted from ResultSet.
     * @implementation
     * 1. Use `resultSet.getObject(columnIndex + 1, TargetClass.class)` to get value, this is the safest way.
     * 2. `TargetClass.class` should match Doris `type`, e.g., `Type.INT` corresponds to `Integer.class`.
     * 3. Must handle `NULL` values correctly through `resultSet.wasNull()`.
     */
    @Override
    protected Object getColumnValue(int columnIndex, ColumnType type, String[] replaceStringList) throws SQLException {
        switch (type.getType()) {
            case INT:
                return resultSet.getObject(columnIndex + 1, Integer.class);
            case DATETIME:
                return resultSet.getObject(columnIndex + 1, LocalDateTime.class);
            // ... other types
            default:
                 return resultSet.getObject(columnIndex + 1);
        }
    }
    
    /**
     * [Must Override] Provide output converter for specific types.
     * 
     * @purpose Perform final processing on values obtained by `getColumnValue` before passing to C++ layer.
     *          Commonly used for scenarios requiring specific formatting.
     * @implementation
     *   A common example is `java.sql.Time` type. To preserve its microsecond precision, `getColumnValue` 
     *   gets the `Time` object, while the converter formats it to the `HH:mm:ss.SSSSSS` string required by Doris.
     *   Another example is converting `byte[]` type to hexadecimal display string.
     */
    @Override
    protected ColumnValueConverter getOutputConverter(ColumnType columnType, String replaceString) {
        if (columnType.getType() == ColumnType.Type.STRING) {
            // Example: Time type handling in MySQL Executor
            return createConverter(input -> {
                if (input instanceof java.sql.Time) {
                    return timeToString((java.sql.Time) input); // timeToString is a method in BaseJdbcExecutor
                }
                return input.toString();
            }, String.class);
        }
        return null; // Other types usually don't need conversion
    }

    /**
     * [Optional Override] Initialize PreparedStatement.
     * 
     * @purpose If optimization of certain write/read sizes and methods is needed
     * @implementation
     * 1. Must handle read operations (READ) and write operations (WRITE) separately.
     * 2. For read operations, control how many rows the JDBC Driver fetches from database at once through `stmt.setFetchSize()`.
     *    This can avoid OOM. Different Drivers behave differently with this parameter (e.g., MySQL setting to `Integer.MIN_VALUE` enables streaming read).
     */
    @Override
    protected void initializeStatement(Connection conn, JdbcDataSourceConfig config, String sql) throws SQLException {
        // if (config.getOp() == TJdbcOperation.READ) {
        //     stmt = conn.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
        //     stmt.setFetchSize(1024); // Set appropriate fetch size for NewDB
        // } else {
        //     preparedStatement = conn.prepareStatement(sql);
        // }
    }
    
    /**
     * [Optional Override] Interrupt JDBC connection when query is cancelled.
     * 
     * @purpose If certain data source Driver has special interruption method, need to override here
     * @implementation Call `connection.abort()` or other interruption methods provided by Driver.
     */
    @Override
    protected void abortReadConnection(Connection connection, ResultSet resultSet) throws SQLException {
        // connection.abort(MoreExecutors.directExecutor());
    }

    /**
     * [Override as needed] Set validation query for Hikari connection pool, default is SELECT 1, override if data source has special syntax
     * 
     * @purpose Ensure connections obtained from pool are valid.
     */
    @Override
    protected void setValidationQuery(HikariDataSource ds) {
        // ds.setConnectionTestQuery("SELECT 1"); // NewDB validation query
    }

    /**
     * [Optional Override] Set JDBC Driver specific system properties.
     * 
     * @purpose Some Drivers need to enable or disable certain features through `System.setProperty()`.
     */
    @Override
    protected void setJdbcDriverSystemProperties() {
        // System.setProperty("newdb.driver.property", "true");
    }
}

Register `NewDBJdbcExecutor` in Factory Class

Modify doris/fe/be-java-extensions/jdbc-scanner/src/main/java/org/apache/doris/jdbc/JdbcExecutorFactory.java:

Step 6: Add Regression Tests

Refer to doris/regression-test/suites/external_table_p0/jdbc/type_test/select/test_mysql_all_types_select.groovy to create a simple test case.

Create test_newdb_select.groovy

suite("test_newdb_select", "p0,external,newdb") {
    String enabled = context.config.otherConfigs.get("enableJdbcTest")
    if (enabled != null && enabled.equalsIgnoreCase("true")) {
        // 1. Define NewDB connection information
        def newdb_port = context.config.otherConfigs.get("newdb_port")
        def driver_url = "http://your_repo/newdb-driver.jar"

        // 2. Create Catalog
        sql """create catalog newdb_catalog properties(
            "type"="jdbc",
            "user"="root",
            "password"="123456",
            "jdbc_url" = "jdbc:newdb://\${context.config.otherConfigs.get("externalEnvIp")}:\${newdb_port}/your_db",
            "driver_url" = "\${driver_url}",
            "driver_class" = "com.newdb.jdbc.Driver"
        );"""

        // 3. Execute tests
        sql """use newdb_catalog.your_db"""
        qt_select """select * from your_table order by 1 limit 10;"""

        // 4. Clean up environment
        sql """drop catalog newdb_catalog"""
    }
}

4. Development Considerations

4.1 Data Type Mapping

In the jdbcTypeToDoris method, careful handling is needed for:

Precision mapping: DECIMAL types need correct precision and scale mapping.
Time types: Pay attention to timezone and precision handling.
Special types: Such as some binary special types

4.2 Data Source Specific Data Reading

In subclasses of BaseJdbcExecutor, focus on:

getColumnValue: How to get data
getOutputConverter: Data conversion function

4.3 Error Handling

SQL exception conversion: Convert data source specific exceptions to JdbcClientException.
Driver compatibility: Handle compatibility issues with different driver versions.

4.4 Predicate and Function Pushdown (Optional Optimization)

To improve query performance, Doris will push down WHERE conditions and some functions to external data sources for execution whenever possible.

Predicate pushdown: In most scenarios, Doris handles this automatically. But for special syntax (like special date functions), adaptation might be needed in the conjunctExprToString() method of JdbcScanNode.java.
Function pushdown: You can define the list of functions that NewDB supports for pushdown and function name replacement rules in JdbcFunctionPushDownRule.java for better performance.

5. Deployment Configuration

5.1 Driver Deployment

# Put NewDB JDBC driver in the specified directory
cp newdb-jdbc-driver.jar $DORIS_HOME/plugins/jdbc_drivers/

5.2 Create Catalog

CREATE CATALOG newdb_catalog PROPERTIES (
    "type" = "jdbc",
    "user" = "newdb_user",
    "password" = "newdb_password",
    "jdbc_url" = "jdbc:newdb://host:port/database",
    "driver_url" = "newdb-jdbc-driver.jar",
    "driver_class" = "com.newdb.jdbc.Driver"
);

-- Use Catalog
USE newdb_catalog.database_name;
SELECT * FROM table_name LIMIT 10;

1. Overview​

2. Architecture Overview​

2.1 Overall Architecture​

2.2 Core Components​

Frontend Core Components​

Backend Core Components​

3. Development Steps for Adding New Data Source​

Step 1: Update Thrift Definition​

Step 2: Define Core Metadata in Frontend​

Step 3: Implement Frontend Metadata Client (JdbcClient)​

Core Logic for Metadata Interaction​

Register JdbcNewDBClient in Factory Class​

Step 4: Adapt Frontend Query Planning (JdbcScanNode)​

Step 5: Implement Backend Data Executor (JdbcExecutor)​

Core Logic for Data Reading​

Register NewDBJdbcExecutor in Factory Class​

Step 6: Add Regression Tests​

4. Development Considerations​

4.1 Data Type Mapping​

4.2 Data Source Specific Data Reading​

4.3 Error Handling​

4.4 Predicate and Function Pushdown (Optional Optimization)​

5. Deployment Configuration​

5.1 Driver Deployment​

5.2 Create Catalog​