REGEXP_COUNT
Descriptionβ
This is a function to count the number of characters in a string that match a given regular expression pattern. The input consists of a user-provided string and a regular expression pattern. The return value is n the total count of matching characters; if no matches are found, it returns 0.
'str' paratemer is 'string' type,it is the string of usr want to match by a regexp expression.
'pattern' paratemer is 'string' type, it is the string of a regexp regular which will be used to match the string;
It should be noted that when handling character set matching, Utf-8 standard character classes should be used. This ensures that functions can correctly identify and process various characters from different languages.
Support character match classes : https://github.com/google/re2/wiki/Syntax
Syntaxβ
REGEXP_COUNT(<str>, <pattern>)
Parametersβ
Parameter | Description |
---|---|
<str> | The parameter is 'string' type,it is the dest value which matched by the regexp expression. |
<pattern> | The parameter is 'string' type, it is a regexp expression and it is used to match the string which meet the regular of the pattern. |
Return Valueβ
- Returns number of matches for a regular expression 'pattern' within a 'str',it is 'int',if no character can be matched, return 0; If pattern is NULL or str is NULL or both are NULL,return NULL; If pattern is not allowed regexp regular , it will throw error,it`s a wrong for this action;
Examplesβ
The string region matching against an expression containing escape characters and return the result
SELECT regexp_count('a.b:c;d', '[\\\\.:;]');
+--------------------------------------+
| regexp_count('a.b:c;d', '[\\\\.:;]') |
+--------------------------------------+
| 3 |
+--------------------------------------+
The string matching result of the regular expression for the ordinary character ':'.
SELECT regexp_count('a.b:c;d', ':');
+------------------------------+
| regexp_count('a.b:c;d', ':') |
+------------------------------+
| 1 |
+------------------------------+
The return result when matching a string against a regular expression containing two square brackets.
SELECT regexp_count('Hello, World!', '[[:punct:]]');
+----------------------------------------------+
| regexp_count('Hello, World!', '[[:punct:]]') |
+----------------------------------------------+
| 2 |
+----------------------------------------------+
Pattern is NULL case
SELECT regexp_count("abc",NULL);
+------------------------+
| regexp_count("abc",NULL) |
+------------------------+
| NULL |
+------------------------+
Str is NULL case
SELECT regexp_count(NULL,"abc");
+------------------------+
| regexp_count(NULL,"abc") |
+------------------------+
| NULL |
+------------------------+
Both are NULL
SELECT regexp_count(NULL,NULL);
+------------------------+
| regexp_count(NULL,NULL) |
+------------------------+
| NULL |
+------------------------+
The return result of inserting certain variable values and then retrieving the variables from the stored rows for matching.
CREATE TABLE test_table_for_regexp_count (
id INT,
text_data VARCHAR(500),
pattern VARCHAR(100)
) PROPERTIES ("replication_num"="1");
INSERT INTO test_table_for_regexp_count VALUES
(1, 'HelloWorld', '[A-Z][a-z]+'),
(2, 'apple123', '[a-z]{5}[0-9]'),
(3, 'aabbcc', '(aa|bb|cc)'),
(4, '123-456-7890', '[0-9][0-9][0-9]'),
(5, 'test,data', ','),
(6, 'a1b2c3', '[a-z][0-9]'),
(7, 'book keeper', 'oo|ee'),
(8, 'ababab', '(ab)(ab)(ab)'),
(9, 'aabbcc', '(aa|bb|cc)'),
(10, 'apple,banana', '[aeiou][a-z]+');
SELECT id, regexp_count(text_data, pattern) as count_result FROM test_table_for_regexp_count ORDER BY id;
+------+--------------+
| id | count_result |
+------+--------------+
| 1 | 2 |
| 2 | 1 |
| 3 | 3 |
| 4 | 3 |
| 5 | 1 |
| 6 | 3 |
| 7 | 2 |
| 8 | 1 |
| 9 | 3 |
| 10 | 2 |
+------+--------------+
The return result of inserting certain variable values, retrieving the variables from stored rows for matching, with the regular expression being a constant.
CREATE TABLE test_table_for_regexp_count (
id INT,
text_data VARCHAR(500),
pattern VARCHAR(100)
) PROPERTIES ("replication_num"="1");
INSERT INTO test_table_for_regexp_count VALUES
(1, 'HelloWorld', '[A-Z][a-z]+'),
(2, 'apple123', '[a-z]{5}[0-9]'),
(3, 'aabbcc', '(aa|bb|cc)'),
(4, '123-456-7890', '[0-9][0-9][0-9]'),
(5, 'test,data', ','),
(6, 'a1b2c3', '[a-z][0-9]'),
(7, 'book keeper', 'oo|ee'),
(8, 'ababab', '(ab)(ab)(ab)'),
(9, 'aabbcc', '(aa|bb|cc)'),
(10, 'apple,banana', '[aeiou][a-z]+');
SELECT id, regexp_count(text_data, 'e') as count_e FROM test_table_for_regexp_count WHERE text_data IS NOT NULL ORDER BY id;
+------+---------+
| id | count_e |
+------+---------+
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
| 4 | 0 |
| 5 | 1 |
| 6 | 0 |
| 7 | 3 |
| 8 | 0 |
| 9 | 0 |
| 10 | 1 |
+------+---------+
Emoji regexp count
SELECT regexp_count('ππππ', 'π|π|π');
+----------------------------------------------------+
| regexp_count('ππππ', 'π|π|π') |
+----------------------------------------------------+
| 3 |
+----------------------------------------------------+
If the 'pattern' is not allowd regular regular,throw error
SELECT regexp_count('Hello, World!', '[[:punct:');
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: [[:punct:
Error: missing ]: [[:punct: