APPROX_COUNT_DISTINCT

説明

NULL以外の異なる要素の数を返します。この関数はHyperLogLogアルゴリズムに基づいて実装されており、列ベースを推定するために固定サイズのメモリを使用します。このアルゴリズムは末尾におけるnull分布の仮定に基づいており、精度はデータ分布に依存します。Dorisで使用される固定バケットサイズに基づき、アルゴリズムの相対標準誤差は0.8125%です。より詳細で具体的な分析については、related paperを参照してください。

構文

APPROX_COUNT_DISTINCT(<expr>)
NDV(<expr>)

パラメータ

パラメータ	説明
`<expr>`	値を取得する式。サポートされている型は String、Date、DateTime、Timestamptz、IPv4、IPv6、TinyInt、Bool、SmallInt、Integer、BigInt、LargeInt、Float、Double、Decimal です。

戻り値

BIGINT 型の値を返します。

例

-- setup
create table t1(
        k1 int,
        k_string varchar(100),
        k_tinyint tinyint
) distributed by hash (k1) buckets 1
properties ("replication_num"="1");
insert into t1 values 
    (1, 'apple', 10),
    (1, 'banana', 20),
    (1, 'apple', 10),
    (2, 'orange', 30),
    (2, 'orange', 40),
    (2, 'grape', 50),
    (3, null, null);

select approx_count_distinct(k_string) from t1;

String型: すべてのk_string値の概算個別数を計算します。NULL値は計算に含まれません。

+---------------------------------+
| approx_count_distinct(k_string) |
+---------------------------------+
|                               4 |
+---------------------------------+

select approx_count_distinct(k_tinyint) from t1;

TinyInt型：すべてのk_tinyint値の概算重複除外数を計算します。

+----------------------------------+
| approx_count_distinct(k_tinyint) |
+----------------------------------+
|                                5 |
+----------------------------------+

select approx_count_distinct(k1) from t1;

Integer型: すべてのk1値の概算個別数を計算します。

+---------------------------+
| approx_count_distinct(k1) |
+---------------------------+
|                         3 |
+---------------------------+

select k1, approx_count_distinct(k_string) from t1 group by k1;

k1でグループ化し、各グループ内のk_stringの概算重複除外数を計算します。グループ内の全レコードがNULLの場合、0を返します。

+------+---------------------------------+
| k1   | approx_count_distinct(k_string) |
+------+---------------------------------+
|    1 |                               2 |
|    2 |                               2 |
|    3 |                               0 |
+------+---------------------------------+

select ndv(k_string) from t1;

エイリアスNDVを使用することは、APPROX_COUNT_DISTINCTと同じ効果があります。

+---------------+
| ndv(k_string) |
+---------------+
|             4 |
+---------------+

select approx_count_distinct(k_string) from t1 where k1 = 999;

クエリ結果が空の場合、0を返します。

+---------------------------------+
| approx_count_distinct(k_string) |
+---------------------------------+
|                               0 |
+---------------------------------+

説明​

構文​

パラメータ​

戻り値​

例​

説明

構文

パラメータ

戻り値

例