跳到主要内容

EMBED

描述

根据输入文本生成语义嵌入向量,用于表示文本的语义信息,可用于相似度计算、检索等场景。

语法

EMBED([<resource_name>], <text>)

参数

参数说明
<resource_name>指定的资源名称
<text>生成嵌入向量的文本

返回值

返回类型为 ARRAY 代表所生成的向量

当输入值为 NULL 时返回 NULL

结果为大模型生成,所以返回内容并不固定

示例

下表模拟某公司的行为手册

CREATE TABLE knowledge_base (
id BIGINT,
title STRING,
content STRING,
embedding ARRAY<FLOAT> COMMENT '由 EMBED 函数生成的嵌入向量'
)
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 4
PROPERTIES (
"replication_num" = "1"
);

SET default_ai_resource = 'embed_resource_name';

-- `embedding` 是函数 EMBED 根据 content 对应的标签所生成的嵌入向量
INSERT INTO knowledge_base (id, title, content, embedding) VALUES
(1, "Travel Reimbursement Policy",
"Employees must submit a reimbursement request within 7 days after the business trip, with invoices and travel approval attached.",
EMBED("travel reimbursement policy")),
(2, "Leave Policy",
"Employees must apply for leave in the system in advance. If the leave is longer than three days, approval from the direct manager is required.",
EMBED("leave request policy")),
(3, "VPN User Guide",
"To access the internal network, employees must use VPN. For the first login, download and install the client and configure the certificate.",
EMBED("VPN guide intranet access")),
(4, "Meeting Room Reservation",
"Meeting rooms can be reserved in advance through the OA system, with time and number of participants specified.",
EMBED("meeting room booking reservation")),
(5, "Procurement Request Process",
"Departments must fill out a procurement request form for purchasing items. If the amount exceeds $5000, financial approval is required.",
EMBED("procurement request process finance"));

通过对文本的向量化处理,可以进行类似下列操作:

  1. 问答检索(结合 COSINE_DISTANCE)
SELECT 
id, title, content,
COSINE_DISTANCE(embedding, EMBED("How to apply for travel reimbursement?")) AS score
FROM knowledge_base
ORDER BY score ASC
LIMIT 2;
+------+-----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+--------------------+
| id | title | content | score |
+------+-----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+--------------------+
| 1 | Travel Reimbursement Policy | Employees must submit a reimbursement request within 7 days after the business trip, with invoices and travel approval attached. | 0.4463210454563673 |
| 5 | Procurement Request Process | Departments must fill out a procurement request form for purchasing items. If the amount exceeds $5000, financial approval is required. | 0.5726841578491431 |
+------+-----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+--------------------+
  1. 问题分析匹配(结合 L2_DISTANCE)
SELECT 
id, title, content,
L2_DISTANCE(embedding, EMBED("How to access the company intranet")) AS distance
FROM knowledge_base
ORDER BY distance ASC
LIMIT 2;
+------+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
| id | title | content | distance |
+------+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
| 3 | VPN User Guide | To access the internal network, employees must use VPN. For the first login, download and install the client and configure the certificate. | 0.5838271122253775 |
| 1 | Travel Reimbursement Policy | Employees must submit a reimbursement request within 7 days after the business trip, with invoices and travel approval attached. | 1.272394695975331 |
+------+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
  1. 根据文章内容进行文本相关度匹配并推荐(结合INNER PRODUCT)
SELECT 
id, title, content,
INNER_PRODUCT(embedding, EMBED("Leave system request leader approval")) AS score
FROM knowledge_base
WHERE id != 2
ORDER BY score DESC
LIMIT 2;
+------+-----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| id | title | content | score |
+------+-----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| 5 | Procurement Request Process | Departments must fill out a procurement request form for purchasing items. If the amount exceeds $5000, financial approval is required. | 0.33268885332504 |
| 4 | Meeting Room Reservation | Meeting rooms can be reserved in advance through the OA system, with time and number of participants specified. | 0.29224032230852487 |
+------+-----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+---------------------+
  1. 寻找差异较小的内容(结合L1_DISTANCE)
SELECT 
id, title, content,
L1_DISTANCE(embedding, EMBED("Procurement application process")) AS distance
FROM knowledge_base
ORDER BY distance ASC
LIMIT 3;
+------+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
| id | title | content | distance |
+------+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
| 5 | Procurement Request Process | Departments must fill out a procurement request form for purchasing items. If the amount exceeds $5000, financial approval is required. | 18.66882028897362 |
| 4 | Meeting Room Reservation | Meeting rooms can be reserved in advance through the OA system, with time and number of participants specified. | 30.90449328294426 |
| 2 | Leave Policy | Employees must apply for leave in the system in advance. If the leave is longer than three days, approval from the direct manager is required. | 31.060405636536416 |
+------+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+