Skip to main content

ANN Index Management

The Approximate Nearest Neighbor (ANN) index in Apache Doris is used to perform efficient vector similarity searches on high-dimensional vector columns. Starting from Doris 4.x, the general index operation syntax covers ANN indexes. This document focuses on the SQL operation syntax and parameter descriptions related to ANN indexes.

Quick Navigation

You can jump to the corresponding section based on your use case:

ScenarioSection
Create an index on a vector column for the first timeCreate an ANN Index
Build an index offline on existing dataBuild an ANN Index
View existing indexes and their parameter settingsView an ANN Index
Drop indexes that are no longer neededDrop an ANN Index
Choose a specific algorithm such as HNSW, IVF, or a quantizerIndex Parameters

Prerequisites

Before creating an ANN index, confirm the following:

  • The data type of the vector column is ARRAY<FLOAT> NOT NULL.
  • A suitable metric type has been chosen: l2_distance (Euclidean distance) or inner_product.
  • An index algorithm (HNSW, IVF, or IVF On-Disk) has been chosen based on data scale and recall/performance requirements.

Create an ANN Index

Doris provides two ways to create an ANN index. Choose one based on whether the data has already been loaded:

MethodApplicable scenarioWhen the index is built
Define the index when creating the tableThe table has not been created yet, or the index needs to be built continuously as data is writtenBuilt synchronously during data load
Create and build the index separatelyThe table already exists with data, and the index needs to be added laterBuilt asynchronously via BUILD INDEX

Method 1: Define the Index When Creating the Table

Declare the ANN index directly with INDEX ... USING ANN after the column definitions in the CREATE TABLE statement. The index is built synchronously as data is loaded.

CREATE TABLE [IF NOT EXISTS] <table_name> (
<columns_definition>
INDEX <index_name> (<vector_column>) USING ANN PROPERTIES (
"<key>" = "<value>" [, ...]
)
)
...

Method 2: Create the Index Separately

For an existing table, use CREATE INDEX or ALTER TABLE ADD INDEX to add an ANN index, and then use BUILD INDEX to build it on the existing data.

CREATE INDEX [IF NOT EXISTS] <index_name>
ON <table_name> (<column_name>)
USING ANN
PROPERTIES ("<key>" = "<value>" [, ...])
[COMMENT '<index_comment>']

-- Or

ALTER TABLE <table_name> ADD INDEX <index_name>(<column_name>)
USING ANN
[PROPERTIES ("<key>" = "<value>" [, ...])]
[COMMENT '<index_comment>']

Index Parameters

The behavior of an ANN index is determined by the attributes in PROPERTIES, which fall into three categories: general attributes, index-algorithm-specific attributes, and quantizer-specific attributes.

General Attributes

The basic attributes that all ANN indexes need to configure:

AttributeDescriptionDefault
index_typeThe ANN index type. Available values: ivf, ivf_on_disk, hnsw-
metric_typeThe metric type. Available values: l2_distance (Euclidean distance), inner_product-
dimThe dimension of the vector column-
quantizerThe quantizer type. Available values: flat, sq4, sq8, pqflat

Index-Algorithm-Specific Attributes

IVF / IVF On-Disk

AttributeDescriptionDefault
nlistThe number of clusters (inverted lists). Required for both ivf and ivf_on_disk. A larger value yields higher recall but increases build time and resource consumption.1024

HNSW

AttributeDescriptionDefault
max_degreeThe maximum number of connections per node, which affects recall and query performance.32
ef_constructionThe size of the candidate queue during index construction. A larger value yields a higher-quality graph but increases build time.40

Quantizer-Specific Attributes

quantizer is used to compress vector storage. The differences between quantizers are as follows:

QuantizerMeaningExtra parameters
flatNo quantization. Vectors are stored as the original 32-bit floating-point numbers.None
sq4Scalar Quantization. Each dimension is stored as a 4-bit integer instead of a 32-bit float.None
sq8Scalar Quantization. Each dimension is stored as an 8-bit integer instead of a 32-bit float.None
pqProduct Quantization. The vector is split into several sub-vectors, which are quantized separately.pq_m and pq_nbits are required

Extra Parameters for Product Quantization (PQ)

AttributeDescription
pq_mThe number of sub-vectors. The vector dimension dim must be divisible by pq_m.
pq_nbitsThe number of bits per sub-vector. In faiss, pq_nbits is typically required to be no greater than 24.

Index Creation Examples

The following SQL examples cover common combinations and can be used as templates.

Declare an HNSW Index When Creating the Table

CREATE TABLE tbl_ann (
id int NOT NULL,
embedding array<float> NOT NULL,
INDEX ann_index (embedding) USING ANN PROPERTIES(
"index_type"="hnsw",
"metric_type"="l2_distance",
"dim"="128"
)
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");

IVF Index

CREATE INDEX ann_ivf_index ON tbl_ivf (`embedding`) USING ANN PROPERTIES(
"index_type"="ivf",
"metric_type"="l2_distance",
"dim"="128",
"nlist"="1024"
);

HNSW Index

CREATE INDEX ann_hnsw_index ON tbl_hnsw (`embedding`) USING ANN PROPERTIES(
"index_type"="hnsw",
"metric_type"="l2_distance",
"dim"="128",
"max_degree"="32",
"ef_construction"="40"
);

HNSW + SQ

CREATE INDEX ann_hnsw_sq ON tbl_hnsw (`embedding`) USING ANN PROPERTIES(
"index_type"="hnsw",
"metric_type"="l2_distance",
"dim"="128",
"max_degree"="32",
"ef_construction"="40",
"quantizer"="sq8"
);

HNSW + PQ

CREATE INDEX ann_hnsw_pq ON tbl_hnsw (`embedding`) USING ANN PROPERTIES(
"index_type"="hnsw",
"metric_type"="l2_distance",
"dim"="128",
"max_degree"="32",
"ef_construction"="40",
"quantizer"="pq",
"pq_m"="8",
"pq_nbits"="8"
);

IVF + SQ

CREATE INDEX ann_ivf_sq ON tbl_ivf (`embedding`) USING ANN PROPERTIES(
"index_type"="ivf",
"metric_type"="l2_distance",
"dim"="128",
"nlist"="1024",
"quantizer"="sq8"
);

IVF + PQ

CREATE INDEX ann_ivf_pq ON tbl_ivf (`embedding`) USING ANN PROPERTIES(
"index_type"="ivf",
"metric_type"="l2_distance",
"dim"="128",
"nlist"="1024",
"quantizer"="pq",
"pq_m"="8",
"pq_nbits"="8"
);

Build an ANN Index

For indexes that are created separately via CREATE INDEX or ALTER TABLE ADD INDEX, use BUILD INDEX to build them on existing data. This operation runs asynchronously.

Trigger a Build

BUILD INDEX <index_name> ON <table_name> [PARTITION (<partition_name> [, ...])]

Monitor Build Progress

Use SHOW BUILD INDEX to view the progress and status of index build tasks:

-- View the progress of all BUILD INDEX tasks (a database can be specified)
SHOW BUILD INDEX [FROM db_name];

-- View the progress of BUILD INDEX tasks for a specific table
SHOW BUILD INDEX WHERE TableName = "<table_name>";

The output contains columns such as JobId, TableName, State (for example, FINISHED or RUNNING), and Progress. Example:

mysql> show build index where TableName = "sift_1M";
+---------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
| JobId | TableName | PartitionName | AlterInvertedIndexes | CreateTime | FinishTime | TransactionId | State | Msg | Progress |
+---------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
| 1764579876673 | sift_1M | sift_1M | [ADD INDEX idx_test_ann (`embedding`) USING ANN PROPERTIES("dim" = "128", "index_type" = "ivf", "metric_type" = "l2_distance", "nlist" = "1024")], | 2025-12-01 17:59:54.277 | 2025-12-01 17:59:56.987 | 82 | FINISHED | | NULL |
+---------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
1 row in set (0.00 sec)

Cancel a Build

To cancel an index build task that is in progress:

CANCEL BUILD INDEX ON <table_name> [(<job_id> [, ...])]

View an ANN Index

You can view index information through SHOW INDEX or SHOW CREATE TABLE:

SHOW INDEX[ES] FROM [<db_name>.]<table_name> [FROM <db_name>]

-- Or

SHOW CREATE TABLE [<db_name>.]<table_name>

The output of SHOW INDEX includes columns such as Table, Key_name, Index_type (which is shown as ANN for ANN indexes), and Properties (which contains the full index configuration). Example:

mysql> SHOW INDEX FROM sift_1M;
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+----------------------------------------------------------------------------------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Properties |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+----------------------------------------------------------------------------------------+
| sift_1M | | idx_test_ann | | embedding | | | | | | ANN | | ("dim" = "128", "index_type" = "ivf", "metric_type" = "l2_distance", "nlist" = "1024") |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+----------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Drop an ANN Index

Use DROP INDEX or ALTER TABLE DROP INDEX to drop an existing ANN index:

DROP INDEX [IF EXISTS] <index_name> ON [<db_name>.]<table_name>

-- Or

ALTER TABLE [<db_name>.]<table_name> DROP INDEX <index_name>