Skip to main content

Schema Change

You can modify the schema of a Doris table through ALTER TABLE. This article focuses on column-related schema changes. For index-related changes, refer to Table Index for the modification methods of different indexes.

Two Types of Schema Change

Doris supports two types of Schema Change operations: lightweight and heavyweight. The two differ significantly in execution complexity, speed, and resource consumption. Before choosing, you can refer to the table below for a quick decision:

FeatureLightweight Schema ChangeHeavyweight Schema Change
Execution speedSeconds (almost real-time)Minutes / hours / days (depends on data volume; the larger the volume, the slower)
Requires data rewriteNo, only metadata is modifiedYes, involves rewriting data files
System performance impactSmall impactMay affect system performance, especially during data conversion
Resource consumptionLowHigh, occupies compute resources; storage usage of table data doubles during the process
Typical operationsAdd / drop value columns, rename columns, modify VARCHAR lengthModify column data types, change primary keys, modify column order, etc.

Lightweight Schema Change

Only modifies metadata, without involving physical modification of data files. It typically completes in seconds and has minimal impact on system performance. It includes:

  • Adding or dropping value columns.
  • Renaming columns.
  • Modifying the length of VARCHAR columns (except key columns of UNIQUE and DUP tables).

Heavyweight Schema Change

Involves rewriting or converting data files, with the actual modification or reorganization performed by the Backend (BE) in the background. All operations not in the lightweight category are heavyweight, for example:

  • Modifying the data type of a column.
  • Modifying the sort order of columns.

Execution flow:

  1. The background starts a data conversion job, rewriting the original data into new data files in units of tablets.
  2. During conversion, "double writing" occurs: new data is written to both the new tablet and the old tablet.
  3. After conversion completes, the old tablet is deleted, and the new tablet takes over.

Usage Examples

The table below lists common operation scenarios and the corresponding entry points. Refer to them as needed:

User scenarioOperation syntax
Rename a columnRENAME COLUMN
Add a key/value columnADD COLUMN
Add multiple columns at onceADD COLUMN (...)
Drop a columnDROP COLUMN
Modify column type / positionMODIFY COLUMN
Reorder columnsORDER BY

Rename a Column

ALTER TABLE [database.]table RENAME COLUMN old_column_name new_column_name;

For specific syntax, refer to ALTER TABLE RENAME.

Add a Column

Notes:

  • When adding a value column to an aggregate model, you must specify agg_type.
  • When adding a key column to a non-aggregate model (such as DUPLICATE KEY), you must specify the KEY keyword.

Example 1: Adding a Column to a Non-Aggregate Table

  1. Table creation statement:

    CREATE TABLE IF NOT EXISTS example_db.my_table(
    col1 int,
    col2 int,
    col3 int,
    col4 int,
    col5 int
    ) DUPLICATE KEY(col1, col2, col3)
    DISTRIBUTED BY RANDOM BUCKETS 10;
  2. Add the key column key_col after col1:

    ALTER TABLE example_db.my_table ADD COLUMN key_col INT KEY DEFAULT "0" AFTER col1;
  3. Add the value column value_col after col4:

    ALTER TABLE example_db.my_table ADD COLUMN value_col INT DEFAULT "0" AFTER col4;

Example 2: Adding a Column to an Aggregate Table

  1. Table creation statement:

    CREATE TABLE IF NOT EXISTS example_db.my_table(
    col1 int,
    col2 int,
    col3 int,
    col4 int SUM,
    col5 varchar(32) REPLACE DEFAULT "abc"
    ) AGGREGATE KEY(col1, col2, col3)
    DISTRIBUTED BY HASH(col1) BUCKETS 10;
  2. Add the key column key_col after col1:

    ALTER TABLE example_db.my_table ADD COLUMN key_col INT DEFAULT "0" AFTER col1;
  3. Add the value column value_col after col4 with the SUM aggregate type:

    ALTER TABLE example_db.my_table ADD COLUMN value_col INT SUM DEFAULT "0" AFTER col4;

Add Multiple Columns

Notes:

  • When adding value columns to an aggregate model, you must specify agg_type.
  • When adding key columns to an aggregate model, you must specify the KEY keyword.

Add multiple columns to an aggregate table:

  1. Table creation statement:

    CREATE TABLE IF NOT EXISTS example_db.my_table(
    col1 int,
    col2 int,
    col3 int,
    col4 int SUM,
    col5 varchar(32) REPLACE DEFAULT "abc"
    ) AGGREGATE KEY(col1, col2, col3)
    DISTRIBUTED BY HASH(col1) BUCKETS 10;
  2. Add multiple columns at once:

    ALTER TABLE example_db.my_table
    ADD COLUMN (c1 INT DEFAULT "1", c2 FLOAT SUM DEFAULT "0");

Drop a Column

Notes:

  • Partition columns cannot be dropped.
  • Key columns of a UNIQUE table cannot be dropped.

Drop a column from example_db.my_table:

  1. Table creation statement:

    CREATE TABLE IF NOT EXISTS example_db.my_table(
    col1 int,
    col2 int,
    col3 int,
    col4 int SUM,
    col5 varchar(32) REPLACE DEFAULT "abc"
    ) AGGREGATE KEY(col1, col2, col3)
    DISTRIBUTED BY HASH(col1) BUCKETS 10;
  2. Drop the col4 column from example_db.my_table:

    ALTER TABLE example_db.my_table DROP COLUMN col4;

Modify Column Type and Position

Notes:

  • When modifying a value column in an aggregate model, you must specify agg_type.
  • When modifying a key column in a non-aggregate model, you must specify the KEY keyword.
  • You can only modify the type of a column. Other column attributes must remain unchanged.
  • Partition columns and bucketing columns cannot be modified in any way.
  • Pay attention to precision loss when modifying columns. For supported type conversions, see Supported Type Conversions below.

Example:

  1. Table creation statement:

    CREATE TABLE IF NOT EXISTS example_db.my_table(
    col0 int,
    col1 int DEFAULT "1",
    col2 int,
    col3 varchar(32),
    col4 int SUM,
    col5 varchar(32) REPLACE DEFAULT "abc"
    ) AGGREGATE KEY(col0, col1, col2, col3)
    DISTRIBUTED BY HASH(col0) BUCKETS 10;
  2. Change the type of the key column col1 to BIGINT and move it after col2 (whether modifying a key column or a value column, you must declare the complete column information):

    ALTER TABLE example_db.my_table
    MODIFY COLUMN col1 BIGINT KEY DEFAULT "1" AFTER col2;
  3. Modify the maximum length of the col5 column in the base table. The original col5 is VARCHAR(32) REPLACE DEFAULT "abc" (you can only modify the column type; other attributes must remain unchanged):

    ALTER TABLE example_db.my_table
    MODIFY COLUMN col5 VARCHAR(64) REPLACE DEFAULT "abc";
  4. Modify the length of a field in a key column:

    ALTER TABLE example_db.my_table
    MODIFY COLUMN col3 varchar(50) KEY NULL COMMENT 'to 50';

Supported Type Conversions

Pay attention to precision loss when modifying column types. The following conversions are currently supported:

Source typeTarget typeDescription
TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLEA numeric type with a larger range-
TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE/DECIMALVARCHAR-
VARCHARVARCHAROnly modifying the maximum length is supported
VARCHAR/CHARTINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE-
VARCHAR/CHARDATESix formats are supported: %Y-%m-%d, %y-%m-%d, %Y%m%d, %y%m%d, %Y/%m/%d, %y/%m/%d
DATETIMEDATEOnly year-month-day is retained, for example, 2019-12-09 21:47:05 becomes 2019-12-09
DATEDATETIMEHours, minutes, and seconds are automatically padded with zeros, for example, 2019-12-09 becomes 2019-12-09 00:00:00
FLOATDOUBLE-
INTDATEIf the INT data is invalid, the conversion fails and the original data remains unchanged
Types other than DATE and DATETIMESTRINGSTRING cannot be converted to any other type

Reorder Columns

Notes:

  • All columns in the table must be listed.
  • Value columns must be placed after key columns.

Example:

  1. Table creation statement:

    CREATE TABLE IF NOT EXISTS example_db.my_table(
    k1 int DEFAULT "1",
    k2 int,
    k3 varchar(32),
    k4 date,
    v1 int SUM,
    v2 int MAX,
    ) AGGREGATE KEY(k1, k2, k3, k4)
    DISTRIBUTED BY HASH(k1) BUCKETS 10;
  2. Reorder the columns in example_db.my_table:

    ALTER TABLE example_db.my_table
    ORDER BY (k3, k1, k2, k4, v2, v1);

Job Management

View Jobs

Use the SHOW ALTER TABLE COLUMN command to view the progress of Schema Change jobs. It shows jobs that are currently running or have completed. When a job involves materialized views, this command displays multiple rows, one for each materialized view. Example:

mysql > SHOW ALTER TABLE COLUMN\G;
*************************** 1. row ***************************
JobId: 20021
TableName: tbl1
CreateTime: 2019-08-05 23:03:13
FinishTime: 2019-08-05 23:03:42
IndexName: tbl1
IndexId: 20022
OriginIndexId: 20017
SchemaVersion: 2:792557838
TransactionId: 10023
State: FINISHED
Msg:
Progress: NULL
Timeout: 86400
1 row in set (0.00 sec)

Cancel Jobs

When the job state is not FINISHED or CANCELLED, you can cancel a Schema Change job with the following command:

CANCEL ALTER TABLE COLUMN FROM tbl_name;

Limitations

CategoryLimitation
ConcurrencyA single table can have only one Schema Change job running at a time
Immutable columnsPartition columns and bucketing columns cannot be modified
Key column deletionWhen an aggregate table contains value columns aggregated with REPLACE, key columns cannot be deleted; key columns of a Unique table also cannot be deleted
Aggregate default valueWhen adding a SUM or REPLACE type value column, because the historical data has lost its detail information, the default value cannot actually reflect the post-aggregation value and is meaningless for historical data
Type modificationOnly the column type can be modified; other fields such as the aggregation method, Nullable, and default value must be filled in according to the original column information
Unsupported modificationsModifying the aggregation type, Nullable attribute, and default value is not supported

FE Configuration

Configuration itemDefault valueDescription
alter_table_timeout_second86400 (seconds)Default timeout for jobs

BE Configuration

Configuration itemDefault valueDescription
alter_tablet_worker_count3The number of threads on the BE side used to convert historical data. You can increase this value appropriately to speed up Schema Change jobs, but too many threads may increase IO pressure and affect other operations
alter_index_worker_count3The number of threads on the BE side used to build indexes on historical data (currently only inverted indexes are supported). Adjustment recommendations are the same as above

After adjusting BE configuration, BE must be restarted for the changes to take effect.