ANN Index Management
Overview
Approximate Nearest Neighbor (ANN) indexes in Apache Doris enable efficient vector similarity search for high-dimensional data. Since from Doris 4.x, the universal index operation syntax has also been extended to cover ANN indexes. This article will introduce the specific SQL syntax for ANN index-related operations and provide detailed parameter explanations.
ANN indexes are built on vector columns (typically ARRAY<FLOAT> NOT NULL type) and support distance metrics include L2 distance and inner product.
Creating ANN Indexes
ANN indexes can be created using the CREATE INDEX statement with USING ANN. There are two main approaches:
- Define the index during table creation: The index is built synchronously as data is loaded.
Syntax
CREATE TABLE [IF NOT EXISTS] <table_name> (
<columns_definition>
INDEX <index_name> (<vector_column) USING ANN PROPERTIES (
"<key>" = "<value>" [, ...]
)
)
[ <key_type> KEY (<key_cols>)
[ CLUSTER BY (<cluster_cols>) ]
]
[ COMMENT '<table_comment>' ]
[ <partitions_definition> ]
[ DISTRIBUTED BY { HASH (<distribute_cols>) | RANDOM }
[ BUCKETS { <bucket_count> | AUTO } ]
]
[ <roll_up_definition> ]
[ PROPERTIES (
-- Table property
<table_property>
-- Additional table properties
[ , ... ])
]
- Create the index separately: Define the index first, then build it on existing data using
BUILD INDEX.
Syntax
CREATE INDEX [IF NOT EXISTS] <index_name>
ON <table_name> (<column_name>)
USING ANN
PROPERTIES ("<key>" = "<value>" [, ...])
[COMMENT '<index_comment>']
-- or
ALTER TABLE <table_name> ADD INDEX <index_name>(<column_name>)
USING ANN
[PROPERTIES("<key>" = "<value>" [, ...])]
[COMMENT '<index_comment>']
General Properties
index_type: The type of ANN index. Supported values:"ivf"or"hnsw".metric_type: The distance metric. Supported values:"l2_distance","inner_product".dim: The dimension of the vector column.quantizer: The quantizer type. Supported values:flat,sq4,sq8,pq. Default toflatwhen not specified.
Index-Specific Properties
IVF Index Properties
nlist: Number of clusters (inverted lists). Default: 1024. Higher values improve recall but increase build time and memory usage.
HNSW Index Properties
max_degree: Maximum number of connections per node. Default: 32. Affects recall and query performance.ef_construction: Size of the candidate queue during index construction. Default: 40. Higher values improve graph quality but increase build time.
Quantization-Specific Properties
for quantizer property:
sq4: Scalar Quantization (SQ), uses 4-bit integers instead of the typical 32-bit floating point numbers to store each dimension value of a vector.sq8: Scalar Quantization (SQ), uses 8-bit integers instead of the typical 32-bit floating point numbers to store each dimension value of a vector.pq: Product Quantization (PQ), two additional parameters,pq_mandpq_nbitsare required in the properties.
Product Quantization Properties
pq_m: Specifies how many subvectors are used (vector dimension dim must be divisible by pq_m).pq_nbits: The number of bits used to represent each subvector, in faiss pq_nbits is generally required to be no greater than 24.
Examples
Create table with ANN index
CREATE TABLE tbl_ann (
id int NOT NULL,
embedding array<float> NOT NULL,
INDEX ann_index (embedding) USING ANN PROPERTIES(
"index_type"="hnsw",
"metric_type"="l2_distance",
"dim"="128"
)
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");
IVF Index
CREATE INDEX ann_ivf_index ON tbl_ivf (`embedding`) USING ANN PROPERTIES(
"index_type"="ivf",
"metric_type"="l2_distance",
"dim"="128",
"nlist"="1024"
);
HNSW Index
CREATE INDEX ann_hnsw_index ON tbl_hnsw (`embedding`) USING ANN PROPERTIES(
"index_type"="hnsw",
"metric_type"="l2_distance",
"dim"="128",
"max_degree"="32",
"ef_construction"="40"
);
HNSW Index with SQ
CREATE INDEX ann_hnsw_sq ON tbl_hnsw (`embedding`) USING ANN PROPERTIES(
"index_type"="hnsw",
"metric_type"="l2_distance",
"dim"="128",
"max_degree"="32",
"ef_construction"="40",
"quantizer"="sq8"
);
HNSW Index with PQ
CREATE INDEX ann_hnsw_pq ON tbl_hnsw (`embedding`) USING ANN PROPERTIES(
"index_type"="hnsw",
"metric_type"="l2_distance",
"dim"="128",
"max_degree"="32",
"ef_construction"="40",
"quantizer"="pq",
"pq_m"="8",
"pq_nbits"="8"
);
IVF Index with SQ
CREATE INDEX ann_ivf_sq ON tbl_ivf (`embedding`) USING ANN PROPERTIES(
"index_type"="ivf",
"metric_type"="l2_distance",
"dim"="128",
"nlist"="1024",
"quantizer"="sq8"
);
IVF Index with PQ
CREATE INDEX ann_ivf_pq ON tbl_ivf (`embedding`) USING ANN PROPERTIES(
"index_type"="ivf",
"metric_type"="l2_distance",
"dim"="128",
"nlist"="1024",
"quantizer"="pq",
"pq_m"="8",
"pq_nbits"="8"
);
Building ANN Indexes
For indexes created separately (not during table creation), use BUILD INDEX to build the index on existing data. This operation is asynchronous.
Syntax
BUILD INDEX <index_name> ON <table_name> [PARTITION (<partition_name> [, ...])]
Monitoring Build Progress
Use SHOW BUILD INDEX to check the status of index build jobs.
-- view all the progress of BUILD INDEX tasks [for a specific database]
SHOW BUILD INDEX [FROM db_name];
-- view the progress of BUILD INDEX tasks for a specific table
SHOW BUILD INDEX WHERE TableName = "<table_name>";
The output includes columns such as JobId, TableName, State (e.g., FINISHED, RUNNING), and Progress, for example:
mysql> show build index where TableName = "sift_1M";
+---------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
| JobId | TableName | PartitionName | AlterInvertedIndexes | CreateTime | FinishTime | TransactionId | State | Msg | Progress |
+---------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
| 1764579876673 | sift_1M | sift_1M | [ADD INDEX idx_test_ann (`embedding`) USING ANN PROPERTIES("dim" = "128", "index_type" = "ivf", "metric_type" = "l2_distance", "nlist" = "1024")], | 2025-12-01 17:59:54.277 | 2025-12-01 17:59:56.987 | 82 | FINISHED | | NULL |
+---------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
1 row in set (0.00 sec)
Canceling Index Builds
To cancel an ongoing index build:
CANCEL BUILD INDEX ON <table_name> [(<job_id> [, ...])]
Dropping ANN Indexes
Remove an ANN index using DROP INDEX.
Syntax
DROP INDEX [IF EXISTS] <index_name> ON [<db_name>.]<table_name>
-- or
ALTER TABLE [<db_name>.]<table_name> DROP INDEX <index_name>
Showing ANN Indexes
Display information about indexes on a table using SHOW INDEX or SHOW CREATE TABLE.
Syntax
SHOW INDEX[ES] FROM [<db_name>.]<table_name> [FROM <db_name>]
-- or
SHOW CREATE TABLE [<db_name>.]<table_name>
Example Output
mysql> SHOW INDEX FROM sift_1M;
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+----------------------------------------------------------------------------------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Properties |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+----------------------------------------------------------------------------------------+
| sift_1M | | idx_test_ann | | embedding | | | | | | ANN | | ("dim" = "128", "index_type" = "ivf", "metric_type" = "l2_distance", "nlist" = "1024") |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+----------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
The output includes columns such as Table, Key_name, Index_type (shows ANN for ANN indexes), and Properties (containing the index configuration).