Skip to main content

Data Export Overview

The data export feature writes query result sets or Apache Doris table data to a specified storage system in a specified file format. It is commonly used for result set downloads and cross-system data exchange.

Data Export vs. Data Backup

Both data export and data backup can output data from Apache Doris to external storage, but they target different scenarios. The following table compares the core differences between the two:

Comparison DimensionData ExportData Backup
Final storage locationHDFS, object storage, local file systemHDFS, object storage
Data formatOpen formats such as Parquet, ORC, CSVApache Doris internal storage format
Execution speedMedium (requires reading data and converting to the target format)Fast (uploads Doris data files directly, no parsing or conversion required)
FlexibilityAllows flexible export scope definition through SQLSupports only table-level full backup
Typical use scenariosResult set download, data exchange between different systemsData backup, data migration between Apache Doris clusters

Choosing an Export Method

Apache Doris provides the following three data export methods, each suited to different export needs:

  • SELECT INTO OUTFILE: supports exporting any SQL result set.
  • EXPORT: supports exporting partial or full data at the table level.
  • MySQL DUMP: data export compatible with MySQL Dump commands.

Capability Comparison of the Three Methods

The following table compares the three export methods across execution mode, SQL capabilities, concurrency, and supported formats, helping you choose quickly:

Comparison DimensionSELECT INTO OUTFILEEXPORTMySQL DUMP
Synchronous / AsynchronousSynchronousAsynchronous (check progress with SHOW EXPORT after submission)Synchronous
Supports arbitrary SQLSupportedNot supportedNot supported
Export specified partitionsSupportedSupportedNot supported
Export specified tabletsSupportedNot supportedNot supported
Concurrent exportSupported, high concurrency (limited by single-node operators such as ORDER BY)Supported, high concurrency (Tablet-level concurrency)Not supported, single-threaded export
Supported export formatsParquet, ORC, CSVParquet, ORC, CSVMySQL Dump proprietary format
Supports external tablesSupportedPartially supportedNot supported
Supports viewsSupportedSupportedSupported
Supported export locationsS3, HDFSS3, HDFSLOCAL

Applicable Scenarios

SELECT INTO OUTFILE

Suitable for the following scenarios:

  • The exported data needs to go through complex computation logic, such as filtering, aggregation, or joins.
  • Scenarios suitable for executing synchronous tasks.

For detailed usage, see SELECT INTO OUTFILE.

EXPORT

Suitable for the following scenarios:

  • Single-table export of large data volumes that requires only simple filter conditions.
  • Scenarios that require asynchronous task submission.

For detailed usage, see Export Asynchronous Export.

MySQL Dump

Suitable for the following scenarios:

  • Compatibility with the MySQL ecosystem, with the need to export both table schemas and data.
  • Used only for development, testing, or cases with very small data volumes.

For detailed usage, see MySQL Dump.

Column Type Mapping for Exported Files

The Parquet and ORC file formats have their own data type definitions, and Apache Doris automatically converts internal data types to the corresponding Parquet/ORC types during export. The CSV format has no type definitions; all data is output as text.

The mappings between Apache Doris data types and the ORC and Parquet formats are listed below.

ORC Type Mapping

Doris TypeORC Type
booleanboolean
tinyinttinyint
smallintsmallint
intint
bigintbigint
largeIntstring
datestring
datev2string
datetimestring
datetimev2timestamp
floatfloat
doubledouble
char / varchar / stringstring
decimaldecimal
structstruct
mapmap
arrayarray
jsonstring
variantstring
bitmapbinary
quantile_statebinary
hllbinary

Parquet Type Mapping

When Apache Doris exports to the Parquet file format, it first converts the in-memory Doris data to the Arrow in-memory format, and then Arrow writes it out to the Parquet file. The mappings are as follows:

Doris TypeArrow TypeParquet Physical TypeParquet Logical Type
booleanbooleanBOOLEAN
tinyintint8INT32INT_8
smallintint16INT32INT_16
intint32INT32INT_32
bigintint64INT64INT_64
largeIntutf8BYTE_ARRAYUTF8
dateutf8BYTE_ARRAYUTF8
datev2date32INT32DATE
datetimeutf8BYTE_ARRAYUTF8
datetimev2timestampINT96 / INT64TIMESTAMP(MICROS/MILLIS/SECONDS)
floatfloat32FLOAT
doublefloat64DOUBLE
char / varchar / stringutf8BYTE_ARRAYUTF8
decimaldecimal128FIXED_LEN_BYTE_ARRAYDECIMAL(scale, precision)
structstructParquet Group
mapmapParquet Map
arraylistParquet List
jsonutf8BYTE_ARRAYUTF8
variantutf8BYTE_ARRAYUTF8
bitmapbinaryBYTE_ARRAY
quantile_statebinaryBYTE_ARRAY
hllbinaryBYTE_ARRAY
note

In versions 2.1.11 and 3.0.7, the parquet.enable_int96_timestamps property is supported to specify whether the Doris datetimev2 type is stored as INT96 or INT64 in Parquet. The default is INT96. INT96 has been deprecated in the Parquet standard and is used only for compatibility with legacy systems (such as Hive versions before 4.0).