Skip to main content

Alibaba Cloud OSS

Doris provides two ways to load files from Alibaba Cloud OSS:

  • Use S3 Load to load Alibaba Cloud OSS files into Doris, which is an asynchronous load method.
  • Use TVF to load Alibaba Cloud OSS files into Doris, which is a synchronous load method.

load with S3 Load

Use S3 Load to import files on object storage. For detailed steps, please refer to the Broker Load Manual

Step 1: Prepare the data

Create a CSV file s3load_example.csv The file is stored on Alibaba Cloud OSS and its content is as follows:

1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64

Step 2: Create a table in Doris

CREATE TABLE test_s3load(
user_id BIGINT NOT NULL COMMENT "user id",
name VARCHAR(20) COMMENT "name",
age INT COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;

Step 3: Load data using S3 Load

Caution

The endpoint URLs for public and internal access are different. If the server and OSS are in the same region, it is recommended to use the internal endpoint link.

  • internal endpoint: oss-cn-hangzhou-internal.aliyuncs.com
  • public endpoint: oss-cn-hangzhou.aliyuncs.com
LOAD LABEL s3_load_2022_04_01
(
DATA INFILE("s3://your_bucket_name/s3load_example.csv")
INTO TABLE test_s3load
COLUMNS TERMINATED BY ","
FORMAT AS "CSV"
(user_id, name, age)
)
WITH S3
(
"provider" = "OSS",
"s3.endpoint" = "oss-cn-hangzhou.aliyuncs.com",
"s3.region" = "oss-cn-hangzhou",
"s3.access_key" = "<your-ak>",
"s3.secret_key" = "<your-sk>"
)
PROPERTIES
(
"timeout" = "3600"
);

Step 4: Check the imported data

SELECT * FROM test_s3load;

Results:

mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name | age |
+---------+-----------+------+
| 5 | Ava | 17 |
| 10 | Liam | 64 |
| 7 | Sophia | 32 |
| 9 | Emma | 37 |
| 1 | Emily | 25 |
| 4 | Alexander | 60 |
| 2 | Benjamin | 35 |
| 3 | Olivia | 28 |
| 6 | William | 69 |
| 8 | James | 64 |
+---------+-----------+------+
10 rows in set (0.04 sec)

Load with TVF

Step 1: Prepare the data

Create a CSV file s3load_example.csv The file is stored on Alibaba Cloud OSS and its content is as follows:

1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64

Step 2: Create a table in Doris

CREATE TABLE test_s3load(
user_id BIGINT NOT NULL COMMENT "user id",
name VARCHAR(20) COMMENT "name",
age INT COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;

Step 3: Load data using TVF

Caution

The endpoint URLs for public and internal access are different. If the server and OSS are in the same region, it is recommended to use the internal endpoint link.

  • internal endpoint: oss-cn-hangzhou-internal.aliyuncs.com
  • public endpoint: oss-cn-hangzhou.aliyuncs.com
INSERT INTO test_s3load
SELECT * FROM S3
(
"uri" = "s3://your_bucket_name/s3load_example.csv",
"format" = "csv",
"provider" = "OSS",
"s3.endpoint" = "oss-cn-hangzhou.aliyuncs.com",
"s3.region" = "oss-cn-hangzhou",
"s3.access_key" = "<your-ak>",
"s3.secret_key" = "<your-sk>",
"column_separator" = ",",
"csv_schema" = "user_id:int;name:string;age:int"
);

Step 4: Check the imported data

SELECT * FROM test_s3load;

Results:

mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name | age |
+---------+-----------+------+
| 5 | Ava | 17 |
| 10 | Liam | 64 |
| 7 | Sophia | 32 |
| 9 | Emma | 37 |
| 1 | Emily | 25 |
| 4 | Alexander | 60 |
| 2 | Benjamin | 35 |
| 3 | Olivia | 28 |
| 6 | William | 69 |
| 8 | James | 64 |
+---------+-----------+------+
10 rows in set (0.04 sec)