LLM_Function
Descriptionβ
LLM Function is a built-in function provided by Doris based on large language model (LLM) capabilities. Users can directly call LLM in SQL queries to perform various intelligent text tasks. LLM Function connects to multiple mainstream LLM providers (such as OpenAI, Anthropic, DeepSeek, Gemini, Ollama, MoonShot, etc.) through Doris's resource mechanism.
The LLM used must be provided externally by Doris and support text analysis.
Configure LLM Resourceβ
Before using LLM Function, you need to create a Resource of type LLM to centrally manage access information for the LLM API.
Example: Create LLM Resourceβ
CREATE RESOURCE "llm_resource_name"
PROPERTIES (
'type' = 'llm',
'llm.provider_type' = 'openai',
'llm.endpoint' = 'https://endpoint_example',
'llm.model_name' = 'model_example',
'llm.api_key' = 'sk-xxx',
'llm.temperature' = '0.7',
'llm.max_token' = '1024',
'llm.max_retries' = '3',
'llm.retry_delay_second' = '1'
);
Parameter Descriptionβ
type
: Required, must be llm
, used as the type identifier for llm.
llm.provider_type
: Required, external LLM provider type. Currently supported providers include: OpenAI, Anthropic, Gemini, DeepSeek, Local, MoonShot, MiniMax, Zhipu, QWen, Baichuan. If there are providers not listed above but their API format is the same as OpenAI/Anthropic/Gemini, you can directly fill in the corresponding provider.
llm.endpoint
: Required, LLM API endpoint.
llm.model_name
: Required, model name.
llm_api_key
: Required except when llm.provider_type = local
, API key.
llm.temperature
: Optional. Controls the randomness of generated content. Accepts a float value between 0 and 1.
The default value is -1, which means this parameter is not set.
llm.max_tokens
: Optional. Limits the maximum number of tokens for generated content.
The default value is -1, which means this parameter is not set. The default value for Anthropic is 2048.
llm.max_retries
: Optional. The maximum number of retries for a single request. The default value is 3.
llm.retry_delay_second
: Optional. The delay time (in seconds) before retrying. The default value is 0.
Resource Selection and Session Variablesβ
When users call LLM-related functions, resources can be specified in the following two ways:
- Explicitly specify the resource: directly pass the resource name when calling the function.
- Implicitly specify the resource: set the Session variable in advance, and the function will automatically use the corresponding resource.
Set Session variable format:
SET default_llm_resource='resource_name';
Function call format:
SELECT LLM_FUNCTION([<resource_name>], <args...>);
Resource Selection Priorityβ
When calling an LLM_Function, it determines which resource to use in the following order:
- The resource explicitly specified by the user in the call
- The global default resource (
default_llm_resource
)
Example:
SET default_llm_resource='global_default_resource';
SELECT LLM_SENTIMENT('this is a test'); -- Uses resource named 'global_default_resource'
SELECT LLM_SENTIMENT('invoke_resource', 'this is a test') --Uses resource named 'invoke_resource'
LLM Functionsβ
Currently supported LLM Functions in Doris include:
-
LLM_CLASSIFY
: Information classification -
LLM_EXTRACT
: Information extraction -
LLM_FIXGRAMMAR
: Grammar correction -
LLM_GENERATE
: Text generation -
LLM_MASK
: Masking sensitive information -
LLM_SENTIMENT
: Sentiment analysis -
LLM_SUMMARIZE
: Text summarization -
LLM_TRANSLATE
: Translation
Examplesβ
LLM_TRANSLATE
SELECT LLM_TRANSLATE('resource_name', 'this is a test', 'Chinese');
-- θΏζ―δΈδΈͺζ΅θ―
LLM_SENTIMENT
SET default_llm_resource = 'resource_name';
SELECT LLM_SENTIMENT('Apache Doris is a great DBMS.');
For detailed function and usage, please refer to the documentation of each specific function.