ANN Resource Estimation Guide
ANN workloads are usually constrained by memory and CPU rather than raw storage. This guide provides a practical way to estimate cluster sizing before launch.
The method follows the same pattern commonly used by vector databases:
- Estimate index memory first.
- Estimate CPU cores based on target query performance.
- Reserve memory headroom for non-vector columns and execution overhead.
Why ANN Needs Explicit Capacity Planning
Compared with regular OLAP indexes, ANN has a few specific resource characteristics:
- Index build is CPU-intensive.
- Very large segments may cause index build failures (for example, due to out-of-memory during single-index construction).
- High-performance query usually requires indexes to stay resident in memory.
- High-QPS query needs enough CPU cores to sustain distance computation and merge overhead.
To reduce memory usage, Doris supports vector quantization (sq8, sq4, pq). Quantization saves memory but may bring trade-offs:
- slower import (extra encoding),
- sometimes slower query (extra decode/reconstruction),
- reduced recall because quantization is lossy.
Step-by-Step Estimation
Prepare the following inputs:
- Vector dimension
D - Total row count
N - Index type (
hnsworivf) - Quantizer (
flat,sq8,sq4,pq) - HNSW parameter
max_degree(if using HNSW) - Target QPS and latency goal
Then estimate in this order:
- Index memory
- CPU cores
- Safety headroom
HNSW Memory Estimation
For HNSW with default max_degree=32, practical memory is:
HNSW_FLAT_Bytes ~= 1.3 * D * 4 * N
Where:
D * 4 * Nis raw float32 vector memory1.3includes HNSW graph overhead
If max_degree is increased, scale graph overhead proportionally:
HNSW_factor ~= 1 + 0.3 * (max_degree / 32)
HNSW_FLAT_Bytes ~= HNSW_factor * D * 4 * N
Quantizer-based approximations:
sq8: about1/4offlatsq4: about1/8offlatpq: typically close tosq4in memory (for examplepq_m=D/2, pq_nbits=8)
Quick Reference (D=768, max_degree=32)
| Rows | FLAT | SQ8 | SQ4 | PQ (m=384, nbits=8) |
|---|---|---|---|---|
| 1M | 4 GB | 1 GB | 0.5 GB | 0.5 GB |
| 10M | 40 GB | 10 GB | 5 GB | 5 GB |
| 100M | 400 GB | 100 GB | 50 GB | 50 GB |
| 1B | 4000 GB | 1000 GB | 500 GB | 500 GB |
| 10B | 40000 GB | 10000 GB | 5000 GB | 5000 GB |
IVF Memory Estimation
IVF has lower structural overhead than HNSW. A practical approximation is:
IVF_FLAT_Bytes ~= D * 4 * N
Quantizer-based approximations:
sq8: about1/4offlatsq4: about1/8offlatpq: typically close tosq4
Quick Reference (D=768)
| Rows | FLAT | SQ8 | SQ4 | PQ (m=384, nbits=8) |
|---|---|---|---|---|
| 1M | 3 GB | 0.75 GB | 0.35 GB | 0.35 GB |
| 10M | 30 GB | 7.5 GB | 3.5 GB | 3.5 GB |
| 100M | 300 GB | 75 GB | 35 GB | 35 GB |
| 1B | 3000 GB | 750 GB | 350 GB | 350 GB |
| 10B | 30000 GB | 7500 GB | 3500 GB | 3500 GB |
CPU Estimation
For high-QPS ANN search, a practical baseline ratio is:
16 cores : 64 GB memory (about 1 core : 4 GB)
When using quantization, CPU demand does not always shrink proportionally with index memory. In practice, estimate CPU from the FLAT-memory-equivalent workload, then tune down only after benchmark validation.
Real-Query Headroom (Do Not Size to 100%)
The formulas above estimate ANN index memory only. Real SQL often returns extra columns, for example:
SELECT id, text, l2_distance_approximate(embedding, [...]) AS dist
FROM tbl
ORDER BY dist
LIMIT N;
Even with TopN delayed materialization, execution still needs memory for other operators and columns. To reduce risk in production:
- keep ANN index memory below about
70%of machine memory, - reserve the remaining memory for query execution, compaction, and non-vector data access.
Sizing Recommendations by Scenario
- Highest performance, memory is not a concern:
HNSW + FLAT. - Memory-constrained deployments:
HNSW/IVF + PQ(often better practical balance thanSQ8/SQ4). - For PQ parameterization, start from
pq_m = D / 2, then tune by recall and latency targets. - If query performance requirements are moderate, prioritize reducing CPU core count. In some deployments, you can provision higher CPU during import/build and scale down CPU afterward.