Skip to main content
Skip to main content

Logstash Doris Output Plugin

Doris Output Plugin

This plugin is used to output data to Doris for logstash, use the HTTP protocol to interact with the Doris FE Http interface, and import data through Doris's stream load.

Learn more about Doris Stream Load

Learn more about Doris

Install and compile​

1.Download source code​

2.compile​

Execute under extension/logstash/ directory

gem build logstash-output-doris.gemspec

You will get logstash-output-doris-{version}.gem file in the same directory

3.Plug-in installation​

copy logstash-output-doris-{version}.gem to the logstash installation directory

Executing an order

./bin/logstash-plugin install logstash-output-doris-{version}.gem

Install logstash-output-doris plugin

Configuration​

Example:​

Create a new configuration file in the config directory and name it logstash-doris.conf

The specific configuration is as follows:

output {
doris {
http_hosts => [ "http://fehost:8030" ]
user => user_name
password => password
db => "db_name"
table => "table_name"
label_prefix => "label_prefix"
column_separator => ","
}
}

Configuration instructions:

Connection configuration:

ConfigurationExplanation
http_hostsFE's HTTP interactive address eg : ["http://fe1:8030", "http://fe2:8030"]
userUser name, the user needs to have import permission for the doris table
passwordPassword
dbDatabase name
tableTable name
label_prefixImport the identification prefix, the final generated ID is {label_prefix}_{db}_{table}_{time_stamp}

Load configuration:(Reference documents

ConfigurationExplanation
column_separatorColumn separator, the default is \t
columnsUsed to specify the correspondence between the columns in the import file and the columns in the table
whereThe filter conditions specified by the import task
max_filter_ratioThe maximum tolerance rate of the import task, the default is zero tolerance
partitionPartition information of the table to be imported
timeouttimeout, the default is 600s
strict_modeStrict mode, the default is false
timezoneSpecify the time zone used for this import, the default is the East Eight District
exec_mem_limitImport memory limit, default is 2GB, unit is byte
formatSpecify the format of imported data, support csv, json, csv_with_names, csv_with_names_and_types, parquet, orc, default is csv.
jsonpathsThe matching pattern needs to match the corresponding value via the jsonpaths parameter.
json_rootUsed to specify the root node of a json document, default value is "".
fuzzy_parseBoolean type, true means that json will be parsed with the schema of the first row. Enabling this option can improve the efficiency of json import, but requires that the order of the keys of all json objects is the same as the first row, the default is false, only use in json format
num_as_stringBoolean type, true means that when parsing json data, the numeric type will be converted to a string, and then imported without losing precision.

Other configuration:

ConfigurationExplanation
save_on_failureIf the import fails to save locally, the default is true
save_dirLocal save directory, default is /tmp
automatic_retriesThe maximum number of retries on failure, the default is 3
batch_sizeThe maximum number of events processed per batch, the default is 100000
idle_flush_timeMaximum interval, the default is 20 (seconds)

Start Up​

Run the command to start the doris output plugin:

{logstash-home}/bin/logstash -f {logstash-home}/config/logstash-doris.conf --config.reload.automatic

Complete usage example​

1. Compile doris-output-plugin​

1> Download the ruby compressed package and go to ruby official website to download it. The version 2.7.1 used here

2> Compile and install, configure ruby environment variables

3> Go to the doris source extension/logstash/ directory and execute

gem build logstash-output-doris.gemspec

Get the file logstash-output-doris-0.1.0.gem, and the compilation is complete

2. Install and configure filebeat (here use filebeat as input)​

1> es official website Download the filebeat tar compression package and decompress it

2> Enter the filebeat directory and modify the configuration file filebeat.yml as follows:

filebeat.inputs:
- type: log
paths:
- /tmp/doris.data
output.logstash:
hosts: ["localhost:5044"]

/tmp/doris.data is the doris data path

3> Start filebeat:

./filebeat -e -c filebeat.yml -d "publish"

3.Install logstash and doris-out-plugin​

1> es official website Download the logstash tar compressed package and decompress it

2> Copy the logstash-output-doris-0.1.0.gem obtained in step 1 to the logstash installation directory

3> execute

./bin/logstash-plugin install logstash-output-doris-0.1.0.gem

Install the plugin

4> Create a new configuration file logstash-doris.conf in the config directory as follows:

input {
beats {
port => "5044"
}
}

output {
doris {
http_hosts => [ "http://127.0.0.1:8030" ]
user => doris
password => doris
db => "logstash_output_test"
table => "output"
label_prefix => "doris"
column_separator => ","
columns => "a,b,c,d,e"
}
}

The configuration here needs to be configured according to the configuration instructions

5> Start logstash:

./bin/logstash -f ./config/logstash-doris.conf --config.reload.automatic

4.Test Load​

Add write data to /tmp/doris.data

echo a,b,c,d,e >> /tmp/doris.data

Observe the logstash log. If the status of the returned response is Success, the import was successful. At this time, you can view the imported data in the logstash_output_test.output table