INSERT

Description

The change statement is to complete the data insertion operation.

INSERT INTO table_name
    [ PARTITION (p1, ...) ]
    [ WITH LABEL label]
    [ (column [, ...]) ]
    [ [ hint [, ...] ] ]
    { VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }

Parameters

tablet_name: The destination table for importing data. Can be of the form db_name.table_name

partitions: Specify the partitions to be imported, which must be partitions that exist in table_name. Multiple partition names are separated by commas

label: specify a label for the Insert task

column_name: The specified destination column, must be a column that exists in table_name

expression: the corresponding expression that needs to be assigned to a column

DEFAULT: let the corresponding column use the default value

query: a common query, the result of the query will be written to the target

hint: some indicator used to indicate the execution behavior of INSERT. You can choose one of this values: /*+ STREAMING */, /*+ SHUFFLE */ or `/*+ NOSHUFFLE */.

STREAMING: At present, it has no practical effect and is only reserved for compatibility with previous versions. (In the previous version, adding this hint would return a label, but now it defaults to returning a label)

SHUFFLE: When the target table is a partition table, enabling this hint will do repartiiton.

NOSHUFFLE: Even if the target table is a partition table, repartiiton will not be performed, but some other operations will be performed to ensure that the data is correctly dropped into each partition.

For a Unique table with merge-on-write enabled, you can also perform partial columns updates using the insert statement. To perform partial column updates with the insert statement, you need to set the session variable enable_unique_key_partial_update to true (the default value for this variable is false, meaning partial columns updates with the insert statement are not allowed by default). When performing partial columns updates, the columns being inserted must contain at least all the Key columns and specify the columns you want to update. If the Key column values for the inserted row already exist in the original table, the data in the row with the same key column values will be updated. If the Key column values for the inserted row do not exist in the original table, a new row will be inserted into the table. In this case, columns not specified in the insert statement must either have default values or be nullable. These missing columns will first attempt to be populated with default values, and if a column has no default value, it will be filled with null. If a column cannot be null, the insert operation will fail.

Please note that the default value of the session variable enable_insert_strict, which controls whether the insert statement operates in strict mode, is true. In other words, the insert statement is in strict mode by default, and in this mode, updating non-existing keys in partial column updates is not allowed. Therefore, when using the insert statement for partial columns update and wishing to insert non-existing keys, you need to set enable_unique_key_partial_update to true and simultaneously set enable_insert_strict to false.

Notice:

When executing the INSERT statement, the default behavior is to filter the data that does not conform to the target table format, such as the string is too long. However, for business scenarios that require data not to be filtered, you can set the session variable enable_insert_strict to true to ensure that INSERT will not be executed successfully when data is filtered out.

Examples

The test table contains two columns c1, c2.

Import a row of data into the test table

INSERT INTO test VALUES (1, 2);
INSERT INTO test (c1, c2) VALUES (1, 2);
INSERT INTO test (c1, c2) VALUES (1, DEFAULT);
INSERT INTO test (c1) VALUES (1);

The first and second statements have the same effect. When no target column is specified, the column order in the table is used as the default target column. The third and fourth statements express the same meaning, use the default value of the c2 column to complete the data import.

Import multiple rows of data into the test table at one time

INSERT INTO test VALUES (1, 2), (3, 2 + 2);
INSERT INTO test (c1, c2) VALUES (1, 2), (3, 2 * 2);
INSERT INTO test (c1) VALUES (1), (3);
INSERT INTO test (c1, c2) VALUES (1, DEFAULT), (3, DEFAULT);

The first and second statements have the same effect, import two pieces of data into the test table at one time The effect of the third and fourth statements is known, and the default value of the c2 column is used to import two pieces of data into the test table

Import a query result into the test table

INSERT INTO test SELECT * FROM test2;
INSERT INTO test (c1, c2) SELECT * from test2;

Import a query result into the test table, specifying the partition and label

INSERT INTO test PARTITION(p1, p2) WITH LABEL `label1` SELECT * FROM test2;
INSERT INTO test WITH LABEL `label1` (c1, c2) SELECT * from test2;

Keywords

INSERT

Best Practice

View the returned results

The INSERT operation is a synchronous operation, and the return of the result indicates the end of the operation. Users need to perform corresponding processing according to the different returned results.
1. The execution is successful, the result set is empty
  
  If the result set of the insert corresponding to the select statement is empty, it will return as follows:
```
mysql> insert into tbl1 select * from empty_tbl;
Query OK, 0 rows affected (0.02 sec)
```
  Query OK indicates successful execution. 0 rows affected means that no data was imported.
2. The execution is successful, the result set is not empty
  
  In the case where the result set is not empty. The returned results are divided into the following situations:
  1. Insert executes successfully and is visible:
    mysql> insert into tbl1 select * from tbl2; Query OK, 4 rows affected (0.38 sec) {'label':'insert_8510c568-9eda-4173-9e36-6adc7d35291c', 'status':'visible', 'txnId':'4005'} mysql> insert into tbl1 with label my_label1 select * from tbl2; Query OK, 4 rows affected (0.38 sec) {'label':'my_label1', 'status':'visible', 'txnId':'4005'} mysql> insert into tbl1 select * from tbl2; Query OK, 2 rows affected, 2 warnings (0.31 sec) {'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'visible', 'txnId':'4005'} mysql> insert into tbl1 select * from tbl2; Query OK, 2 rows affected, 2 warnings (0.31 sec) {'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'committed', 'txnId':'4005'}
    Query OK indicates successful execution. 4 rows affected means that a total of 4 rows of data were imported. 2 warnings indicates the number of lines to be filtered.
    
    Also returns a json string:
    {'label':'my_label1', 'status':'visible', 'txnId':'4005'} {'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'committed', 'txnId':'4005'} {'label':'my_label1', 'status':'visible', 'txnId':'4005', 'err':'some other error'}
    label is a user-specified label or an automatically generated label. Label is the ID of this Insert Into import job. Each import job has a unique Label within a single database.
    
    status indicates whether the imported data is visible. Show visible if visible, committed if not visible.
    
    txnId is the id of the import transaction corresponding to this insert.
    
    The err field shows some other unexpected errors.
    
    When you need to view the filtered rows, the user can pass the following statement
    show load where label="xxx";
    The URL in the returned result can be used to query the wrong data. For details, see the summary of Viewing Error Lines later.
    
    Invisibility of data is a temporary state, this batch of data will eventually be visible
    
    You can view the visible status of this batch of data with the following statement:
    show transaction where id=4005;
    If the TransactionStatus column in the returned result is visible, the representation data is visible.
3. Execution failed
  
  Execution failure indicates that no data was successfully imported, and the following is returned:
```
mysql> insert into tbl1 select * from tbl2 where k1 = "a";
ERROR 1064 (HY000): all partitions have no load data. url: http://10.74.167.16:8042/api/_load_error_log?file=__shard_2/error_log_insert_stmt_ba8bb9e158e4879-ae8de8507c0bf8a2_ba8bb9e158e4879_ae8de8507c0
```
  Where ERROR 1064 (HY000): all partitions have no load data shows the reason for the failure. The following url can be used to query the wrong data:
```
show load warnings on "url";
```
  You can view the specific error line.
Timeout time

The timeout for an INSERT operation is governed by max(insert_timeout, query_timeout). Both are environment variables, with insert_timeout defaulting to 4 hours and query_timeout defaulting to 5 minutes. If the operation exceeds the timeout, the job will be canceled. The introduction of insert_timeout is to ensure that INSERT statements have a longer default timeout, allowing import tasks to be unaffected by the shorter default timeout typically applied to regular queries.
Label and atomicity

The INSERT operation also guarantees the atomicity of imports, see the Import Transactions and Atomicity documentation.

When using CTE(Common Table Expressions) as the query part in an insert operation, the WITH LABEL and column parts must be specified.
Filter Threshold

Unlike other import methods, INSERT operations cannot specify a filter threshold (max_filter_ratio). The default filter threshold is 1, which means that rows with errors can be ignored.

For business scenarios that require data not to be filtered, you can set session variable enable_insert_strict to true to ensure that when there is data When filtered out, INSERT will not be executed successfully.
Performance issues

There is no single row insertion using the VALUES method. If you must use it this way, combine multiple rows of data into one INSERT statement for bulk commit.

Description​

Examples​

Keywords​

Best Practice​

Description

Examples

Keywords

Best Practice