Supported data sources
Doris provides a variety of data import solutions, and you can choose different data import methods for different data sources.
|Data Source||Import Method|
|Object Storage (s3), HDFS||Import data using Broker|
|Local file||Import local data|
|Kafka||Subscribe to Kafka data|
|Mysql, PostgreSQL, Oracle, SQLServer||Sync data via external table|
|Import via JDBC||Sync data using JDBC|
|Import JSON format data||JSON format data import|
|MySQL Binlog||Binlog Load|
Divided by import method
|Import method name||Use method|
|Spark Load||Import external data via Spark|
|Broker Load||Import external storage data via Broker|
|Stream Load||Stream import data (local file and memory data)|
|Routine Load||Import Kafka data|
|Binlog Load||collect Mysql Binlog import data|
|Insert Into||External table imports data through INSERT|
|S3 Load||Object storage data import of S3 protocol|
Supported data formats
Different import methods support slightly different data formats.
|Import Methods||Supported Formats|
|Broker Load||Parquet, ORC, csv, gzip|
|Stream Load||csv, gzip, json|
|Routine Load||csv, json|
The data import implementation of Apache Doris has the following common features, which are introduced here to help you better use the data import function
Import atomicity guarantees
Each import job of Doris, whether it is batch import using Broker Load or single import using INSERT statement, is a complete transaction operation. The import transaction can ensure that the data in a batch takes effect atomically, and there will be no partial data writing.
At the same time, an import job will have a Label. This Label is unique under a database (Database) and is used to uniquely identify an import job. Label can be specified by the user, and some import functions will also be automatically generated by the system.
Label is used to ensure that the corresponding import job can only be successfully imported once. A successfully imported Label, when used again, will be rejected with the error
Label already used. Through this mechanism,
At-Most-Once semantics can be implemented in Doris. If combined with the
At-Least-Once semantics of the upstream system, the
Exactly-Once semantics of imported data can be achieved.
For best practices on atomicity guarantees, see Importing Transactions and Atomicity.
Synchronous and asynchronous imports
Import methods are divided into synchronous and asynchronous. For the synchronous import method, the returned result indicates whether the import succeeds or fails. For the asynchronous import method, a successful return only means that the job was submitted successfully, not that the data was imported successfully. You need to use the corresponding command to check the running status of the import job.