local-file
{ "title": "local file", "language": "en" }β
Doris provides multiple ways to load data from local sources:
1. Stream Loadβ
Load local files or data streams into Doris via HTTP protocol. Supports CSV, JSON, Parquet, and ORC formats. For more information, refer to the Stream Load documentation.
2. Streamloader Toolβ
Streamloader is a dedicated client tool based on Stream Load, supporting concurrent loads, making it suitable for large data loads. For more information, refer to the Streamloader documentation.
3. MySQL Loadβ
Doris is compatible with MySQL protocol and supports using the standard LOAD DATA syntax to load local files, suitable for loading CSV files.
Using Stream Load to Load Dataβ
Step 1: Prepare Dataβ
Create a CSV file named streamload_example.csv
with the following content:
1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64
Step 2: Create Tableβ
Create a table in Doris with the following syntax:
CREATE TABLE testdb.test_streamload(
user_id BIGINT NOT NULL COMMENT "User ID",
name VARCHAR(20) COMMENT "User Name",
age INT COMMENT "User Age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;
Step 3: Load Dataβ
Submit a Stream Load job using curl
:
curl --location-trusted -u <doris_user>:<doris_password> \
-H "column_separator:," \
-H "columns:user_id,name,age" \
-T streamload_example.csv \
-XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
Example of load result:
{
"TxnId": 3,
"Status": "Success",
"NumberTotalRows": 10,
"NumberLoadedRows": 10
}
Step 4: Check Loaded Dataβ
mysql> SELECT COUNT(*) FROM testdb.test_streamload;
+----------+
| count(*) |
+----------+
| 10 |
+----------+
Using Streamloader Tool to Load Dataβ
Step 1: Prepare Dataβ
Create a CSV file named streamloader_example.csv
with the same content as above.
Step 2: Create Tableβ
Create the table in Doris with the same syntax as above.
Step 3: Load Dataβ
Use the Streamloader tool to load data:
doris-streamloader --source_file="streamloader_example.csv" --url="http://localhost:8330" --header="column_separator:," --db="testdb" --table="test_streamloader"
Example of load result:
Load Result: {
"Status": "Success",
"TotalRows": 10,
"LoadedRows": 10
}
Step 4: Check Loaded Dataβ
mysql> SELECT COUNT(*) FROM testdb.test_streamloader;
+----------+
| count(*) |
+----------+
| 10 |
+----------+
Using MySQL Load to Load Dataβ
Step 1: Prepare Dataβ
Create a file named client_local.csv
with the following sample data:
1,10
2,20
3,30
4,40
5,50
6,60
Step 2: Connect to MySQL Clientβ
mysql --local-infile -h <fe_ip> -P <fe_query_port> -u root -D testdb
Step 3: Load Dataβ
Execute the MySQL Load command:
LOAD DATA LOCAL
INFILE 'client_local.csv'
INTO TABLE testdb.t1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n';
Step 4: Check Loaded Dataβ
If the load is successful, the result will be displayed as follows:
Query OK, 6 row affected (0.17 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 0