SEQUENCE_COUNT
SEQUENCE-COUNT
描述
语法
sequence_count(pattern, timestamp, cond1, cond2, ...);
计算与模式匹配的事件链的数量。该函数搜索不重叠的事件链。当前链匹配后,它开始搜索下一个链。
警告!
在同一秒钟发生的事件可能以未定义的顺序排列在序列中,会影响最终结果。
参数
pattern
— 模式字符串.
模式语法
(?N)
— 在位置N匹配条件参数。 条件在编号 [1, 32]
范围。 例如, (?1)
匹配传递给 cond1
参数。
.*
— 匹配任何事件的数字。 不需要条件参数来匹配这个模式。
(?t operator value)
— 分开两个事件的时间。 单位为秒。
t
表示为两个时间的差值,单位为秒。 例如: (?1)(?t>1800)(?2)
匹配彼此发生超过1800秒的事件, (?1)(?t>10000)(?2)
匹配彼此发生超过10000秒的事件。 这些事件之间可以存在任意数量的任何事件。 您可以使用 >=
, >
, <
, <=
, ==
运算符。
timestamp
— 包含时间的列。典型的时间类型是: Date
和 DateTime
。也可以使用任何支持的 UInt
数据类型。
cond1
, cond2
— 事件链的约束条件。 数据类型是: UInt8
。 最多可以传递32个条件参数。 该函数只考虑这些条件中描述的事件。 如果序列包含未在条件中描述的数据,则函数将跳过这些数据。
返回值
匹配的非重叠事件链数。
举例
匹配例子
DROP TABLE IF EXISTS sequence_count_test1;
CREATE TABLE sequence_count_test1(
`uid` int COMMENT 'user id',
`date` datetime COMMENT 'date time',
`number` int NULL COMMENT 'number'
)
DUPLICATE KEY(uid)
DISTRIBUTED BY HASH(uid) BUCKETS 3
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO sequence_count_test1(uid, date, number) values (1, '2022-11-02 10:41:00', 1),
(2, '2022-11-02 13:28:02', 2),
(3, '2022-11-02 16:15:01', 1),
(4, '2022-11-02 19:05:04', 2),
(5, '2022-11-02 20:08:44', 3);
SELECT * FROM sequence_count_test1 ORDER BY date;
+------+---------------------+--------+
| uid | date | number |
+------+---------------------+--------+
| 1 | 2022-11-02 10:41:00 | 1 |
| 2 | 2022-11-02 13:28:02 | 2 |
| 3 | 2022-11-02 16:15:01 | 1 |
| 4 | 2022-11-02 19:05:04 | 2 |
| 5 | 2022-11-02 20:08:44 | 3 |
+------+---------------------+--------+
SELECT sequence_count('(?1)(?2)', date, number = 1, number = 3) FROM sequence_count_test1;
+----------------------------------------------------------------+
| sequence_count('(?1)(?2)', `date`, `number` = 1, `number` = 3) |
+----------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------+
SELECT sequence_count('(?1)(?2)', date, number = 1, number = 2) FROM sequence_count_test1;
+----------------------------------------------------------------+
| sequence_count('(?1)(?2)', `date`, `number` = 1, `number` = 2) |
+----------------------------------------------------------------+
| 2 |
+----------------------------------------------------------------+
SELECT sequence_count('(?1)(?t>=3600)(?2)', date, number = 1, number = 2) FROM sequence_count_test1;
+---------------------------------------------------------------------------+
| sequence_count('(?1)(?t>=3600)(?2)', `date`, `number` = 1, `number` = 2) |
+---------------------------------------------------------------------------+
| 2 |
+---------------------------------------------------------------------------+
不匹配例子
DROP TABLE IF EXISTS sequence_count_test2;
CREATE TABLE sequence_count_test2(
`uid` int COMMENT 'user id',
`date` datetime COMMENT 'date time',
`number` int NULL COMMENT 'number'
)
DUPLICATE KEY(uid)
DISTRIBUTED BY HASH(uid) BUCKETS 3
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO sequence_count_test2(uid, date, number) values (1, '2022-11-02 10:41:00', 1),
(2, '2022-11-02 11:41:00', 7),
(3, '2022-11-02 16:15:01', 3),
(4, '2022-11-02 19:05:04', 4),
(5, '2022-11-02 21:24:12', 5);
SELECT * FROM sequence_count_test2 ORDER BY date;
+------+---------------------+--------+
| uid | date | number |
+------+---------------------+--------+
| 1 | 2022-11-02 10:41:00 | 1 |
| 2 | 2022-11-02 11:41:00 | 7 |
| 3 | 2022-11-02 16:15:01 | 3 |
| 4 | 2022-11-02 19:05:04 | 4 |
| 5 | 2022-11-02 21:24:12 | 5 |
+------+---------------------+--------+
SELECT sequence_count('(?1)(?2)', date, number = 1, number = 2) FROM sequence_count_test2;
+----------------------------------------------------------------+
| sequence_count('(?1)(?2)', `date`, `number` = 1, `number` = 2) |
+----------------------------------------------------------------+
| 0 |
+----------------------------------------------------------------+
SELECT sequence_count('(?1)(?2).*', date, number = 1, number = 2) FROM sequence_count_test2;
+------------------------------------------------------------------+
| sequence_count('(?1)(?2).*', `date`, `number` = 1, `number` = 2) |
+------------------------------------------------------------------+
| 0 |
+------------------------------------------------------------------+
SELECT sequence_count('(?1)(?t>3600)(?2)', date, number = 1, number = 7) FROM sequence_count_test2;
+--------------------------------------------------------------------------+
| sequence_count('(?1)(?t>3600)(?2)', `date`, `number` = 1, `number` = 7) |
+--------------------------------------------------------------------------+
| 0 |
+--------------------------------------------------------------------------+
特殊例子
DROP TABLE IF EXISTS sequence_count_test3;
CREATE TABLE sequence_count_test3(
`uid` int COMMENT 'user id',
`date` datetime COMMENT 'date time',
`number` int NULL COMMENT 'number'
)
DUPLICATE KEY(uid)
DISTRIBUTED BY HASH(uid) BUCKETS 3
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO sequence_count_test3(uid, date, number) values (1, '2022-11-02 10:41:00', 1),
(2, '2022-11-02 11:41:00', 7),
(3, '2022-11-02 16:15:01', 3),
(4, '2022-11-02 19:05:04', 4),
(5, '2022-11-02 21:24:12', 5);
SELECT * FROM sequence_count_test3 ORDER BY date;
+------+---------------------+--------+
| uid | date | number |
+------+---------------------+--------+
| 1 | 2022-11-02 10:41:00 | 1 |
| 2 | 2022-11-02 11:41:00 | 7 |
| 3 | 2022-11-02 16:15:01 | 3 |
| 4 | 2022-11-02 19:05:04 | 4 |
| 5 | 2022-11-02 21:24:12 | 5 |
+------+---------------------+--------+
Perform the query:
SELECT sequence_count('(?1)(?2)', date, number = 1, number = 5) FROM sequence_count_test3;
+----------------------------------------------------------------+
| sequence_count('(?1)(?2)', `date`, `number` = 1, `number` = 5) |
+----------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------+
上面为一个非常简单的匹配例子, 该函数找到了数字5跟随数字1的事件链。 它跳过了它们之间的数字7,3,4,因为该数字没有被描述为事件。 如果我们想在搜索示例中给出的事件链时考虑这个数字,我们应该为它创建一个条件。
现在,考虑如下执行语句:
SELECT sequence_count('(?1)(?2)', date, number = 1, number = 5, number = 4) FROM sequence_count_test3;
+------------------------------------------------------------------------------+
| sequence_count('(?1)(?2)', `date`, `number` = 1, `number` = 5, `number` = 4) |
+------------------------------------------------------------------------------+
| 0 |
+------------------------------------------------------------------------------+
您可能对这个结果有些许疑惑,在这种情况下,函数找不到与模式匹配的事件链,因为数字4的事件发生在1和5之间。 如果在相同的情况下,我们检查了数字6的条件,则序列将与模式匹配。
SELECT sequence_count('(?1)(?2)', date, number = 1, number = 5, number = 6) FROM sequence_count_test3;
+------------------------------------------------------------------------------+
| sequence_count('(?1)(?2)', `date`, `number` = 1, `number` = 5, `number` = 6) |
+------------------------------------------------------------------------------+
| 1 |
+------------------------------------------------------------------------------+
keywords
SEQUENCE_COUNT