Skip to main content

SOUNDEX

Description

The SOUNDEX function computes the Soundex encoding of a string. Soundex is a phonetic algorithm that encodes English words into codes representing their pronunciation, so words with similar pronunciation will have the same encoding.

Encoding rule: Returns a 4-character code consisting of one uppercase letter followed by three digits (e.g., S530).

Syntax

SOUNDEX(<expr>)

Parameters

ParameterDescription
<expr>The string to compute Soundex encoding for (only supports ASCII characters). Type: VARCHAR

Return Value

Returns VARCHAR(4) type, representing the Soundex encoding of the string.

Special cases:

  • If the argument is NULL, returns NULL
  • If the string is empty or contains no letters, returns an empty string
  • Only processes ASCII letters, ignoring other characters
  • Non-ASCII characters will cause the function to throw an error

Examples

  1. Basic usage: word encoding
SELECT soundex('Doris');
+------------------+
| soundex('Doris') |
+------------------+
| D620 |
+------------------+
  1. Words with similar pronunciation have the same encoding
SELECT soundex('Smith'), soundex('Smyth');
+------------------+------------------+
| soundex('Smith') | soundex('Smyth') |
+------------------+------------------+
| S530 | S530 |
+------------------+------------------+
  1. Empty string processing
SELECT soundex('');
+-------------+
| soundex('') |
+-------------+
| |
+-------------+
  1. NULL value handling
SELECT soundex(NULL);
+---------------+
| soundex(NULL) |
+---------------+
| NULL |
+---------------+
  1. Empty string returns empty string
SELECT soundex('');
+-------------+
| soundex('') |
+-------------+
| |
+-------------+
  1. Non-letter characters only return empty string
SELECT soundex('123@*%');
+-------------------+
| soundex('123@*%') |
+-------------------+
| |
+-------------------+
  1. Ignoring non-letter characters
SELECT soundex('R@b-e123rt'), soundex('Robert');
+-----------------------+-------------------+
| soundex('R@b-e123rt') | soundex('Robert') |
+-----------------------+-------------------+
| R163 | R163 |
+-----------------------+-------------------+
  1. Non-ASCII characters only error example
SELECT soundex('你好');  
ERROR 1105 (HY000): errCode = 2, detailMessage = Not Supported: Not Supported: soundex only supports ASCII, but got: 你
SELECT soundex('Apache Doris 你好');
+--------------------------------+
| soundex('Apache Doris 你好') |
+--------------------------------+
| A123 |
+--------------------------------+

Keywords

SOUNDEX