Skip to main content

Loading Overview

Apache Doris offers various methods for importing and integrating data, allowing you to import data from diverse sources into the database. These methods can be categorized into four types:

  • Real-Time Writing: Data is written into Doris tables in real-time via HTTP or JDBC, suitable for scenarios requiring immediate analysis and querying.

    • For small amounts of data (once every 5 minutes), you can use JDBC INSERT.

    • For higher concurrency or frequency (more than 20 concurrent writes or multiple writes per minute), you can enable enable Group Commit and use JDBC INSERT or Stream Load.

    • For high throughput, you can use Stream Load via HTTP.

  • Streaming Synchronization: Real-time data streams (e.g., Flink, Kafka, transactional databases) are imported into Doris tables, ideal for real-time analysis and querying.

    • You can use Flink Doris Connector to write Flink’s real-time data streams into Doris.

    • You can use Routine Load or Doris Kafka Connector for Kafka’s real-time data streams. Routine Load pulls data from Kafka to Doris and supports CSV and JSON formats, while Kafka Connector writes data to Doris, supporting Avro, JSON, CSV, and Protobuf formats.

    • You can use Flink CDC or Datax to write transactional database CDC data streams into Doris.

  • Batch Import: Data is batch-loaded from external storage systems (e.g., S3, HDFS, local files, NAS) into Doris tables, suitable for non-real-time data import needs.

  • External Data Source Integration: Query and partially import data from external sources (e.g., Hive, JDBC, Iceberg) into Doris tables.

    • You can create a Catalog to read data from external sources and use INSERT INTO SELECT to synchronize this data into Doris, with asynchronous writing via JOB.

    • You can use X2Doris to migrate data from other AP systems into Doris.

Each import method in Doris is an implicit transaction by default. For more information on transactions, refer to Transactions.

Quick Overview of Import Methods

Doris's import process mainly involves various aspects such as data sources, data formats, import methods, error handling, data transformation, and transactions. You can quickly browse the scenarios suitable for each import method and the supported file formats in the table below.

Import MethodUse CaseSupported File FormatsSingle Import VolumeImport Mode
Stream LoadImporting local files or push data in applications via http.csv, json, parquet, orcLess than 10GBSynchronous
Broker LoadImporting from object storage, HDFS, etc.csv, json, parquet, orcTens of GB to hundreds of GBAsynchronous
INSERT INTO VALUESWriting data via JDBC.SQLSimple testingSynchronous
INSERT INTO SELECTImporting from an external source like a table in a catalog or files in s3.SQLDepending on memory sizeSynchronous, Asynchronous via Job
Routine LoadReal-time import from Kafkacsv, jsonMicro-batch import MB to GBAsynchronous
MySQL LoadImporting from local files.csvLess than 1GBSynchronous
Group CommitWriting with high frequency.Depending on the import method usedMicro-batch import KB-