ANN Index Management

The Approximate Nearest Neighbor (ANN) index in Apache Doris is used to perform efficient vector similarity searches on high-dimensional vector columns. Starting from Doris 4.x, the general index operation syntax covers ANN indexes. This document focuses on the SQL operation syntax and parameter descriptions related to ANN indexes.

You can jump to the corresponding section based on your use case:

Scenario	Section
Create an index on a vector column for the first time	Create an ANN Index
Build an index offline on existing data	Build an ANN Index
View existing indexes and their parameter settings	View an ANN Index
Drop indexes that are no longer needed	Drop an ANN Index
Choose a specific algorithm such as HNSW, IVF, or a quantizer	Index Parameters

Prerequisites

Before creating an ANN index, confirm the following:

The data type of the vector column is ARRAY<FLOAT> NOT NULL.
A suitable metric type has been chosen: l2_distance (Euclidean distance) or inner_product.
An index algorithm (HNSW, IVF, or IVF On-Disk) has been chosen based on data scale and recall/performance requirements.

Create an ANN Index

Doris provides two ways to create an ANN index. Choose one based on whether the data has already been loaded:

Method	Applicable scenario	When the index is built
Define the index when creating the table	The table has not been created yet, or the index needs to be built continuously as data is written	Built synchronously during data load
Create and build the index separately	The table already exists with data, and the index needs to be added later	Built asynchronously via `BUILD INDEX`

Method 1: Define the Index When Creating the Table

Declare the ANN index directly with INDEX ... USING ANN after the column definitions in the CREATE TABLE statement. The index is built synchronously as data is loaded.

CREATE TABLE [IF NOT EXISTS] <table_name> (
    <columns_definition>
    INDEX <index_name> (<vector_column>) USING ANN PROPERTIES (
        "<key>" = "<value>" [, ...]
    )
)
...

Method 2: Create the Index Separately

For an existing table, use CREATE INDEX or ALTER TABLE ADD INDEX to add an ANN index, and then use BUILD INDEX to build it on the existing data.

CREATE INDEX [IF NOT EXISTS] <index_name>
             ON <table_name> (<column_name>)
             USING ANN
             PROPERTIES ("<key>" = "<value>" [, ...])
             [COMMENT '<index_comment>']

-- Or

ALTER TABLE <table_name> ADD INDEX <index_name>(<column_name>)
             USING ANN
             [PROPERTIES ("<key>" = "<value>" [, ...])]
             [COMMENT '<index_comment>']

Index Parameters

The behavior of an ANN index is determined by the attributes in PROPERTIES, which fall into three categories: general attributes, index-algorithm-specific attributes, and quantizer-specific attributes.

General Attributes

The basic attributes that all ANN indexes need to configure:

Attribute	Description	Default
`index_type`	The ANN index type. Available values: `ivf`, `ivf_on_disk`, `hnsw`	-
`metric_type`	The metric type. Available values: `l2_distance` (Euclidean distance), `inner_product`	-
`dim`	The dimension of the vector column	-
`quantizer`	The quantizer type. Available values: `flat`, `sq4`, `sq8`, `pq`	`flat`

Index-Algorithm-Specific Attributes

IVF / IVF On-Disk

Attribute	Description	Default
`nlist`	The number of clusters (inverted lists). Required for both `ivf` and `ivf_on_disk`. A larger value yields higher recall but increases build time and resource consumption.	`1024`

HNSW

Attribute	Description	Default
`max_degree`	The maximum number of connections per node, which affects recall and query performance.	`32`
`ef_construction`	The size of the candidate queue during index construction. A larger value yields a higher-quality graph but increases build time.	`40`

Quantizer-Specific Attributes

quantizer is used to compress vector storage. The differences between quantizers are as follows:

Quantizer	Meaning	Extra parameters
`flat`	No quantization. Vectors are stored as the original 32-bit floating-point numbers.	None
`sq4`	Scalar Quantization. Each dimension is stored as a 4-bit integer instead of a 32-bit float.	None
`sq8`	Scalar Quantization. Each dimension is stored as an 8-bit integer instead of a 32-bit float.	None
`pq`	Product Quantization. The vector is split into several sub-vectors, which are quantized separately.	`pq_m` and `pq_nbits` are required

Extra Parameters for Product Quantization (PQ)

Attribute	Description
`pq_m`	The number of sub-vectors. The vector dimension `dim` must be divisible by `pq_m`.
`pq_nbits`	The number of bits per sub-vector. In faiss, `pq_nbits` is typically required to be no greater than 24.

Index Creation Examples

The following SQL examples cover common combinations and can be used as templates.

Declare an HNSW Index When Creating the Table

CREATE TABLE tbl_ann (
    id int NOT NULL,
    embedding array<float> NOT NULL,
    INDEX ann_index (embedding) USING ANN PROPERTIES(
        "index_type"="hnsw",
        "metric_type"="l2_distance",
        "dim"="128"
    )
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");

IVF Index

CREATE INDEX ann_ivf_index ON tbl_ivf (`embedding`) USING ANN PROPERTIES(
    "index_type"="ivf",
    "metric_type"="l2_distance",
    "dim"="128",
    "nlist"="1024"
);

HNSW Index

CREATE INDEX ann_hnsw_index ON tbl_hnsw (`embedding`) USING ANN PROPERTIES(
    "index_type"="hnsw",
    "metric_type"="l2_distance",
    "dim"="128",
    "max_degree"="32",
    "ef_construction"="40"
);

HNSW + SQ

CREATE INDEX ann_hnsw_sq ON tbl_hnsw (`embedding`) USING ANN PROPERTIES(
    "index_type"="hnsw",
    "metric_type"="l2_distance",
    "dim"="128",
    "max_degree"="32",
    "ef_construction"="40",
    "quantizer"="sq8"
);

HNSW + PQ

CREATE INDEX ann_hnsw_pq ON tbl_hnsw (`embedding`) USING ANN PROPERTIES(
    "index_type"="hnsw",
    "metric_type"="l2_distance",
    "dim"="128",
    "max_degree"="32",
    "ef_construction"="40",
    "quantizer"="pq",
    "pq_m"="8",
    "pq_nbits"="8"
);

IVF + SQ

CREATE INDEX ann_ivf_sq ON tbl_ivf (`embedding`) USING ANN PROPERTIES(
    "index_type"="ivf",
    "metric_type"="l2_distance",
    "dim"="128",
    "nlist"="1024",
    "quantizer"="sq8"
);

IVF + PQ

CREATE INDEX ann_ivf_pq ON tbl_ivf (`embedding`) USING ANN PROPERTIES(
    "index_type"="ivf",
    "metric_type"="l2_distance",
    "dim"="128",
    "nlist"="1024",
    "quantizer"="pq",
    "pq_m"="8",
    "pq_nbits"="8"
);

Build an ANN Index

For indexes that are created separately via CREATE INDEX or ALTER TABLE ADD INDEX, use BUILD INDEX to build them on existing data. This operation runs asynchronously.

Trigger a Build

BUILD INDEX <index_name> ON <table_name> [PARTITION (<partition_name> [, ...])]

Monitor Build Progress

Use SHOW BUILD INDEX to view the progress and status of index build tasks:

-- View the progress of all BUILD INDEX tasks (a database can be specified)
SHOW BUILD INDEX [FROM db_name];

-- View the progress of BUILD INDEX tasks for a specific table
SHOW BUILD INDEX WHERE TableName = "<table_name>";

The output contains columns such as JobId, TableName, State (for example, FINISHED or RUNNING), and Progress. Example:

mysql> show build index where TableName = "sift_1M";
+---------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
| JobId         | TableName | PartitionName | AlterInvertedIndexes                                                                                                                                | CreateTime              | FinishTime              | TransactionId | State    | Msg  | Progress |
+---------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
| 1764579876673 | sift_1M   | sift_1M       | [ADD INDEX idx_test_ann (`embedding`) USING ANN PROPERTIES("dim" = "128", "index_type" = "ivf", "metric_type" = "l2_distance", "nlist" = "1024")],  | 2025-12-01 17:59:54.277 | 2025-12-01 17:59:56.987 | 82            | FINISHED |      | NULL     |
+---------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
1 row in set (0.00 sec)

Cancel a Build

To cancel an index build task that is in progress:

CANCEL BUILD INDEX ON <table_name> [(<job_id> [, ...])]

View an ANN Index

You can view index information through SHOW INDEX or SHOW CREATE TABLE:

SHOW INDEX[ES] FROM [<db_name>.]<table_name> [FROM <db_name>]

-- Or

SHOW CREATE TABLE [<db_name>.]<table_name>

The output of SHOW INDEX includes columns such as Table, Key_name, Index_type (which is shown as ANN for ANN indexes), and Properties (which contains the full index configuration). Example:

mysql> SHOW INDEX FROM sift_1M;
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+----------------------------------------------------------------------------------------+
| Table   | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Properties                                                                             |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+----------------------------------------------------------------------------------------+
| sift_1M |            | idx_test_ann |              | embedding   |           |             |          |        |      | ANN        |         | ("dim" = "128", "index_type" = "ivf", "metric_type" = "l2_distance", "nlist" = "1024") |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+----------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Drop an ANN Index

Use DROP INDEX or ALTER TABLE DROP INDEX to drop an existing ANN index:

DROP INDEX [IF EXISTS] <index_name> ON [<db_name>.]<table_name>

-- Or

ALTER TABLE [<db_name>.]<table_name> DROP INDEX <index_name>

Quick Navigation​

Prerequisites​

Create an ANN Index​

Method 1: Define the Index When Creating the Table​

Method 2: Create the Index Separately​

Index Parameters​

General Attributes​

Index-Algorithm-Specific Attributes​

IVF / IVF On-Disk​

HNSW​

Quantizer-Specific Attributes​

Extra Parameters for Product Quantization (PQ)​

Index Creation Examples​

Declare an HNSW Index When Creating the Table​

IVF Index​

HNSW Index​

HNSW + SQ​

HNSW + PQ​

IVF + SQ​

IVF + PQ​

Build an ANN Index​

Trigger a Build​

Monitor Build Progress​

Cancel a Build​

View an ANN Index​

Drop an ANN Index​

Quick Navigation