Skip to main content

CREATE STREAMING JOB

Description

Doris Streaming Job is a continuous import task based on the Job + TVF approach. After the Job is submitted, Doris will continuously run the import job, querying the data in TVF and writing it into the Doris table in real time.

Syntax

CREATE JOB <job_name>
ON STREAMING
[job_properties]
[ COMMENT <comment> ]
DO <Insert_Command>

Required parameters

1. <job_name>

The job name is used to uniquely identify an event within a database. The job name must be globally unique; an error will occur if a job with the same name already exists.

3. <sql_body>

The DO clause specifies the operation to be executed when the job is triggered, i.e., an SQL statement. Currently, it only supports S3 TVF.

Optional parameters

1. <job_properties>

ParametersDefault Values ​​Description
session.*NoneSupports configuring all session variables in job_properties
s3.max_batch_files256Triggers an import write when the cumulative number of files reaches this value
s3.max_batch_bytes10GTriggers an import write when the cumulative data volume reaches this value
max_interval10sThe idle scheduling interval when there are no new files or data added upstream.

Access Control Requirements

The user executing this SQL command must have at least the following privileges:

PrivilegeObjectNotes
LOAD_PRIVDatabase (DB)Currently, only the LOAD privilege is supported to perform this operation

Usage Notes

  • The TASK only retains the latest 100 records.
  • Currently, only the INSERT internal table Select * From S3(...) operation is supported; more operations will be supported in the future.

Examples

  • Create a job named my_job that continuously monitors files in a specified directory on S3 and imports data from files ending in .csv into db1.tbl1.

    CREATE JOB my_job
    ON STREAMING
    DO
    INSERT INTO db1.`tbl1`
    SELECT * FROM S3
    (
    "uri" = "s3://bucket/s3/demo/*.csv",
    "format" = "csv",
    "column_separator" = ",",
    "s3.endpoint" = "s3.ap-southeast-1.amazonaws.com",
    "s3.region" = "ap-southeast-1",
    "s3.access_key" = "",
    "s3.secret_key" = ""
    );

CONFIG

fe.conf

ParametersDefault Values ​​
max_streaming_job_num1024Maximum number of Streaming jobs
job_streaming_task_exec_thread_num10Number of threads used to execute StreamingTasks
max_streaming_task_show_count100Maximum number of task execution records that a StreamingTask keeps in memory