Skip to main content

LLM_SIMILARITY

Description

Determines the semantic similarity between two texts.

Syntax

LLM_SIMILARITY([<resource_name>], <text_1>, <text_2>)

Parameters

ParameterDescription
<resource_name>The specified resource name
<text_1>Text
<text_2>Text

Return Value

Returns a floating-point number between 0 and 10. 0 means no similarity, 10 means strong similarity.

If any input is NULL, returns NULL.

The result is generated by the large language model, so the output may not be fixed.

Example

Suppose you have the following table representing comments received by a courier company:

CREATE TABLE user_comments (
id INT,
comment VARCHAR(500)
) DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 10
PROPERTIES (
"replication_num" = "1"
);

If you want to rank comments by customer sentiment, you can use:

SELECT comment,
LLM_SIMILARITY('resource_name', 'I am extremely dissatisfied with their service.', comment) AS score
FROM user_comments ORDER BY score DESC LIMIT 5;

The query result may look like:

+-------------------------------------------------+-------+
| comment | score |
+-------------------------------------------------+-------+
| It arrived broken and I am really disappointed. | 7.5 |
| Delivery was very slow and frustrating. | 6.5 |
| Not bad, but the packaging could be better. | 3.5 |
| It is fine, nothing special to mention. | 3 |
| Absolutely fantastic, highly recommend it. | 1 |
+-------------------------------------------------+-------+