Skip to main content

Importing Data from MinIO

MinIO is an S3-compatible object storage. Doris provides two methods for importing files from MinIO. Choose between them based on data volume and timeliness requirements:

Import methodExecution modeApplicable scenarioDocumentation reference
S3 LoadAsynchronousLarge-batch data import; tasks that need to run in the backgroundBroker Load Manual
TVF (Table Value Function)SynchronousSmall-batch, ad-hoc query imports; works with INSERT INTO ... SELECTExamples in this document

Prerequisites

Before importing MinIO data using either method, confirm the following conditions:

  • A Doris cluster is deployed and can access the MinIO service normally.
  • You have obtained the MinIO endpoint, region, access key, and secret key.
  • The CSV/Parquet files to be imported have been uploaded to a MinIO bucket.
Important: MinIO Connection Configuration Notes

When using S3 Load or TVF to import MinIO data, note the following two points:

  • Endpoint protocol prefix: If MinIO is deployed on a local network without TLS enabled, you need to explicitly add http:// to the endpoint, for example "s3.endpoint" = "http://localhost:9000".
  • Path access style: The S3 SDK uses virtual-hosted style by default, but MinIO does not enable this access mode by default. Add "use_path_style" = "true" to force path style.

Method 1: Import Using S3 Load (Asynchronous)

S3 Load is suitable for importing files from MinIO into Doris as an asynchronous task. For detailed steps, refer to the Broker Load Manual.

Step 1: Prepare the Data

Create a CSV file s3load_example.csv and upload it to MinIO with the following content:

1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64

Step 2: Create a Table in Doris

CREATE TABLE test_s3load(
user_id BIGINT NOT NULL COMMENT "user id",
name VARCHAR(20) COMMENT "name",
age INT COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;

Step 3: Import Data Using S3 Load

Execute the following SQL to submit an S3 Load task:

LOAD LABEL s3_load_2022_04_01
(
DATA INFILE("s3://your_bucket_name/s3load_example.csv")
INTO TABLE test_s3load
COLUMNS TERMINATED BY ","
FORMAT AS "CSV"
(user_id, name, age)
)
WITH S3
(
"provider" = "S3",
"s3.endpoint" = "play.min.io:9000",
"s3.region" = "us-east-1",
"s3.access_key" = "myminioadmin",
"s3.secret_key" = "minio-secret-key-change-me",
"use_path_style" = "true"
)
PROPERTIES
(
"timeout" = "3600"
);

Step 4: Verify the Imported Data

Run a query to verify whether the data has been imported successfully:

SELECT * FROM test_s3load;

Expected output:

mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name | age |
+---------+-----------+------+
| 5 | Ava | 17 |
| 10 | Liam | 64 |
| 7 | Sophia | 32 |
| 9 | Emma | 37 |
| 1 | Emily | 25 |
| 4 | Alexander | 60 |
| 2 | Benjamin | 35 |
| 3 | Olivia | 28 |
| 6 | William | 69 |
| 8 | James | 64 |
+---------+-----------+------+
10 rows in set (0.04 sec)

Method 2: Import Using TVF (Synchronous)

The TVF (Table Value Function) method reads MinIO files as a virtual table through the S3() function, and combined with INSERT INTO ... SELECT it completes the import synchronously. It is suitable for small-batch or ad-hoc scenarios.

Step 1: Prepare the Data

Create a CSV file s3load_example.csv and upload it to MinIO with the following content:

1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64

Step 2: Create a Table in Doris

CREATE TABLE test_s3load(
user_id BIGINT NOT NULL COMMENT "user id",
name VARCHAR(20) COMMENT "name",
age INT COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;

Step 3: Import Data Using TVF

Execute the following SQL to import the data synchronously:

INSERT INTO test_s3load
SELECT * FROM S3
(
"uri" = "s3://your_bucket_name/s3load_example.csv",
"format" = "csv",
"provider" = "S3",
"s3.endpoint" = "play.min.io:9000",
"s3.region" = "us-east-1",
"s3.access_key" = "myminioadmin",
"s3.secret_key" = "minio-secret-key-change-me",
"column_separator" = ",",
"csv_schema" = "user_id:int;name:string;age:int",
"use_path_style" = "true"
);

Step 4: Verify the Imported Data

Run a query to verify whether the data has been imported successfully:

SELECT * FROM test_s3load;

Expected output:

mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name | age |
+---------+-----------+------+
| 5 | Ava | 17 |
| 10 | Liam | 64 |
| 7 | Sophia | 32 |
| 9 | Emma | 37 |
| 1 | Emily | 25 |
| 4 | Alexander | 60 |
| 2 | Benjamin | 35 |
| 3 | Olivia | 28 |
| 6 | William | 69 |
| 8 | James | 64 |
+---------+-----------+------+
10 rows in set (0.04 sec)

Key Parameters

The following parameters must be configured correctly for both S3 Load and TVF:

ParameterDescriptionExample value
providerObject storage provider. Set to S3 when using MinIO.S3
s3.endpointMinIO service address. The http:// prefix is required when TLS is not enabled.http://localhost:9000
s3.regionThe region where MinIO is deployed. Can be set to any value but must remain consistent.us-east-1
s3.access_keyMinIO access key ID.myminioadmin
s3.secret_keyMinIO access key secret.minio-secret-key-change-me
use_path_styleWhether to use path-style access. Must be set to true for MinIO.true

FAQ

Q1: How do I choose between S3 Load and TVF?

  • S3 Load: Executes asynchronously. Suitable for large-batch data imports. After submission, Doris schedules and runs the task in the background, and you can query the task status with SHOW LOAD.
  • TVF: Executes synchronously. Suitable for small-batch, ad-hoc analysis, or scenarios combined with INSERT INTO ... SELECT pipelines. Returns results immediately.

Confirm whether the endpoint has the correct protocol prefix:

  • TLS not enabled: Must include http://, such as http://localhost:9000.
  • TLS enabled: Use the https:// prefix.

Q3: What should I do if access reports a bucket parsing error or 404?

MinIO does not support virtual-hosted style access by default. You need to explicitly add the following to the import parameters:

"use_path_style" = "true"

Q4: Are other formats such as Parquet/ORC supported?

Yes. Replace FORMAT AS "CSV" (or "format" = "csv" in TVF) with parquet, orc, or other corresponding formats. For details, see the Broker Load Manual.