Importing Data from MinIO
MinIO is an S3-compatible object storage. Doris provides two methods for importing files from MinIO. Choose between them based on data volume and timeliness requirements:
| Import method | Execution mode | Applicable scenario | Documentation reference |
|---|---|---|---|
| S3 Load | Asynchronous | Large-batch data import; tasks that need to run in the background | Broker Load Manual |
| TVF (Table Value Function) | Synchronous | Small-batch, ad-hoc query imports; works with INSERT INTO ... SELECT | Examples in this document |
Prerequisites
Before importing MinIO data using either method, confirm the following conditions:
- A Doris cluster is deployed and can access the MinIO service normally.
- You have obtained the MinIO endpoint, region, access key, and secret key.
- The CSV/Parquet files to be imported have been uploaded to a MinIO bucket.
When using S3 Load or TVF to import MinIO data, note the following two points:
- Endpoint protocol prefix: If MinIO is deployed on a local network without TLS enabled, you need to explicitly add
http://to theendpoint, for example"s3.endpoint" = "http://localhost:9000". - Path access style: The S3 SDK uses virtual-hosted style by default, but MinIO does not enable this access mode by default. Add
"use_path_style" = "true"to force path style.
Method 1: Import Using S3 Load (Asynchronous)
S3 Load is suitable for importing files from MinIO into Doris as an asynchronous task. For detailed steps, refer to the Broker Load Manual.
Step 1: Prepare the Data
Create a CSV file s3load_example.csv and upload it to MinIO with the following content:
1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64
Step 2: Create a Table in Doris
CREATE TABLE test_s3load(
user_id BIGINT NOT NULL COMMENT "user id",
name VARCHAR(20) COMMENT "name",
age INT COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;
Step 3: Import Data Using S3 Load
Execute the following SQL to submit an S3 Load task:
LOAD LABEL s3_load_2022_04_01
(
DATA INFILE("s3://your_bucket_name/s3load_example.csv")
INTO TABLE test_s3load
COLUMNS TERMINATED BY ","
FORMAT AS "CSV"
(user_id, name, age)
)
WITH S3
(
"provider" = "S3",
"s3.endpoint" = "play.min.io:9000",
"s3.region" = "us-east-1",
"s3.access_key" = "myminioadmin",
"s3.secret_key" = "minio-secret-key-change-me",
"use_path_style" = "true"
)
PROPERTIES
(
"timeout" = "3600"
);
Step 4: Verify the Imported Data
Run a query to verify whether the data has been imported successfully:
SELECT * FROM test_s3load;
Expected output:
mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name | age |
+---------+-----------+------+
| 5 | Ava | 17 |
| 10 | Liam | 64 |
| 7 | Sophia | 32 |
| 9 | Emma | 37 |
| 1 | Emily | 25 |
| 4 | Alexander | 60 |
| 2 | Benjamin | 35 |
| 3 | Olivia | 28 |
| 6 | William | 69 |
| 8 | James | 64 |
+---------+-----------+------+
10 rows in set (0.04 sec)
Method 2: Import Using TVF (Synchronous)
The TVF (Table Value Function) method reads MinIO files as a virtual table through the S3() function, and combined with INSERT INTO ... SELECT it completes the import synchronously. It is suitable for small-batch or ad-hoc scenarios.
Step 1: Prepare the Data
Create a CSV file s3load_example.csv and upload it to MinIO with the following content:
1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64
Step 2: Create a Table in Doris
CREATE TABLE test_s3load(
user_id BIGINT NOT NULL COMMENT "user id",
name VARCHAR(20) COMMENT "name",
age INT COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;
Step 3: Import Data Using TVF
Execute the following SQL to import the data synchronously:
INSERT INTO test_s3load
SELECT * FROM S3
(
"uri" = "s3://your_bucket_name/s3load_example.csv",
"format" = "csv",
"provider" = "S3",
"s3.endpoint" = "play.min.io:9000",
"s3.region" = "us-east-1",
"s3.access_key" = "myminioadmin",
"s3.secret_key" = "minio-secret-key-change-me",
"column_separator" = ",",
"csv_schema" = "user_id:int;name:string;age:int",
"use_path_style" = "true"
);
Step 4: Verify the Imported Data
Run a query to verify whether the data has been imported successfully:
SELECT * FROM test_s3load;
Expected output:
mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name | age |
+---------+-----------+------+
| 5 | Ava | 17 |
| 10 | Liam | 64 |
| 7 | Sophia | 32 |
| 9 | Emma | 37 |
| 1 | Emily | 25 |
| 4 | Alexander | 60 |
| 2 | Benjamin | 35 |
| 3 | Olivia | 28 |
| 6 | William | 69 |
| 8 | James | 64 |
+---------+-----------+------+
10 rows in set (0.04 sec)
Key Parameters
The following parameters must be configured correctly for both S3 Load and TVF:
| Parameter | Description | Example value |
|---|---|---|
provider | Object storage provider. Set to S3 when using MinIO. | S3 |
s3.endpoint | MinIO service address. The http:// prefix is required when TLS is not enabled. | http://localhost:9000 |
s3.region | The region where MinIO is deployed. Can be set to any value but must remain consistent. | us-east-1 |
s3.access_key | MinIO access key ID. | myminioadmin |
s3.secret_key | MinIO access key secret. | minio-secret-key-change-me |
use_path_style | Whether to use path-style access. Must be set to true for MinIO. | true |
FAQ
Q1: How do I choose between S3 Load and TVF?
- S3 Load: Executes asynchronously. Suitable for large-batch data imports. After submission, Doris schedules and runs the task in the background, and you can query the task status with
SHOW LOAD. - TVF: Executes synchronously. Suitable for small-batch, ad-hoc analysis, or scenarios combined with
INSERT INTO ... SELECTpipelines. Returns results immediately.
Q2: What should I do if I get endpoint-related errors when connecting to MinIO?
Confirm whether the endpoint has the correct protocol prefix:
- TLS not enabled: Must include
http://, such ashttp://localhost:9000. - TLS enabled: Use the
https://prefix.
Q3: What should I do if access reports a bucket parsing error or 404?
MinIO does not support virtual-hosted style access by default. You need to explicitly add the following to the import parameters:
"use_path_style" = "true"
Q4: Are other formats such as Parquet/ORC supported?
Yes. Replace FORMAT AS "CSV" (or "format" = "csv" in TVF) with parquet, orc, or other corresponding formats. For details, see the Broker Load Manual.