Skip to main content

APPROX_COUNT_DISTINCT

Description​

The APPROX_COUNT_DISTINCT function is implemented based on the HyperLogLog algorithm, which uses a fixed size of memory to estimate the column base. The algorithm is based on the assumption of a null distribution in the tails, and the accuracy depends on the data distribution. Based on the fixed bucket size used by Doris, the relative standard error of the algorithm is 0.8125%. For a more detailed and specific analysis, see related paper

Syntax​

APPROX_COUNT_DISTINCT(<expr>)

Parameters​

ParametersDescription
<expr>The expression needs to be obtained

Return Value​

Returns a value of type BIGINT.

Example​

select approx_count_distinct(query_id) from log_statis group by datetime;
+-----------------+
| approx_count_distinct(`query_id`) |
+-----------------+
| 17721 |
+-----------------+