Skip to main content

Loading Overview

Apache Doris offers various methods for importing and integrating data, allowing you to import data from diverse sources into the database. These methods can be categorized into four types:

  • Real-Time Writing: Data is written into Doris tables in real-time via HTTP or JDBC, suitable for scenarios requiring immediate analysis and querying.

    • For small amounts of data (once every 5 minutes), you can use JDBC INSERT.

    • For higher concurrency or frequency (more than 20 concurrent writes or multiple writes per minute), you can enable enable Group Commit and use JDBC INSERT or Stream Load.

    • For high throughput, you can use Stream Load via HTTP.

  • Streaming Synchronization: Real-time data streams (e.g., Flink, Kafka, transactional databases) are imported into Doris tables, ideal for real-time analysis and querying.

    • You can use Flink Doris Connector to write Flink’s real-time data streams into Doris.

    • You can use Routine Load or Doris Kafka Connector for Kafka’s real-time data streams. Routine Load pulls data from Kafka to Doris and supports CSV and JSON formats, while Kafka Connector writes data to Doris, supporting Avro, JSON, CSV, and Protobuf formats.

    • You can use Flink CDC or Datax to write transactional database CDC data streams into Doris.

  • Batch Import: Data is batch-loaded from external storage systems (e.g., Object Storage, HDFS, local files, NAS) into Doris tables, suitable for non-real-time data import needs.

    • You can use Broker Load to write files from Object Storage and HDFS into Doris.

    • You can use INSERT INTO SELECT to synchronously load files from Object Storage, HDFS, and NAS into Doris, and you can perform the operation asynchronously using a JOB.

    • You can use Stream Load or Doris Streamloader to write local files into Doris.

  • External Data Source Integration: Query and partially import data from external sources (e.g., Hive, JDBC, Iceberg) into Doris tables.

    • You can create a Catalog to read data from external sources and use INSERT INTO SELECT to synchronize this data into Doris, with asynchronous writing via JOB.

    • You can use X2Doris to migrate data from other AP systems into Doris.

Each import method in Doris is an implicit transaction by default. For more information on transactions, refer to Transactions.

Quick Overview of Import Methods

Doris's import process mainly involves various aspects such as data sources, data formats, import methods, error handling, data transformation, and transactions. You can quickly browse the scenarios suitable for each import method and the supported file formats in the table below.

Import MethodUse CaseSupported File FormatsImport Mode
Stream LoadImporting local files or push data in applications via http.csv, json, parquet, orcSynchronous
Broker LoadImporting from object storage, HDFS, etc.csv, json, parquet, orcAsynchronous
INSERT INTO VALUESWriting data via JDBC.SQLSynchronous
INSERT INTO SELECTImporting from an external source like a table in a catalog or files in Object Storage, HDFS.SQLSynchronous, Asynchronous via Job
Routine LoadReal-time import from Kafkacsv, jsonAsynchronous
MySQL LoadImporting from local files.csvSynchronous
Group CommitWriting with high frequency.Depending on the import method used-