Skip to main content

OpenTelemetry

OpenTelemetry (OTel for short) is a vendor-neutral open-source observability framework used to monitor, generate, collect, and export observability data such as logs, traces, and metrics. OpenTelemetry defines a set of observability standards and protocols that have been widely adopted by the observability community and vendors, gradually becoming the de facto standard in the observability field.

OpenTelemetry provides frameworks and observability data collection SDKs that allow applications and systems to be monitored across different programming languages, infrastructures, and runtime environments. Doris can serve as the storage backend for OpenTelemetry, providing high-performance, low-cost, and unified observability data storage and analysis capabilities. The overall architecture is as follows:

Doris Opentelemetry Integration

Applicable scenarios and integration process

You can choose the corresponding integration approach based on the data type:

User scenarioRecommended configurationReference section
Collect general TEXT logs such as Doris FE logs and handle multi-line logs such as Java stacktracesUse the filelog receiver, multiline, and regex_parser to merge multi-line logs and parse them before writing into DorisCollect Doris FE TEXT logs
Collect structured logs with one JSON object per lineUse the filelog receiver and json_parser to parse JSON line logs and write them into DorisCollect GitHub Events JSON logs
Collect application trace dataUse the otlp receiver to receive data reported by the OpenTelemetry Java Agent, then write it into Doris through the Doris ExporterCollect application trace data
Configure Doris write target, automatic table creation, partition retention, and Stream Load parametersConfigure parameters such as endpoint, database, table.*, create_schema, and headers in the Doris ExporterConfigure Doris Exporter

The basic process for integrating with Doris using the OpenTelemetry Collector Doris Exporter is as follows:

  1. Download and extract OpenTelemetry Collector Contrib.
  2. Configure the Doris Exporter, including the Doris FE address, account, target database and table, and Stream Load parameters.
  3. Configure the receiver, processor, and pipeline based on the data type.
  4. Start the Collector and write log, trace, or metrics data into Doris.
  5. Observe the import results through Stream Load response logs and write speed logs.

Install OpenTelemetry Collector Contrib

Download the OpenTelemetry Collector Contrib installation package from the OpenTelemetry official Release page. The Contrib version includes the Doris Exporter, for example:

wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.132.2/otelcol-contrib_0.132.2_linux_amd64.tar.gz

After downloading, extract the installation package to obtain the otelcol-contrib executable file.

Configure Doris Exporter

The OpenTelemetry Collector Doris Exporter writes data through the Doris Stream Load HTTP interface. The core configuration items are as follows:

Configuration itemDefault valueDescription
endpointNoneDoris FE HTTP address, in the format host:port, for example 127.0.0.1:8030.
mysql_endpointNoneDoris FE MySQL address, in the format host:port, for example 127.0.0.1:9030.
usernameNoneDoris username. The user must have write permission on the corresponding database and table.
passwordNonePassword of the Doris user.
databaseNoneName of the Doris database to write to.
table.logsotel_logsDoris table name to which logs data is written.
table.tracesotel_tracesDoris table name to which traces data is written.
table.metricsotel_metricsDoris table name to which metrics data is written.
create_schematrueWhether to automatically create the Doris database and table.
history_days0Number of days to retain historical data in the automatically created Doris table. 0 means retain permanently.
create_history_days0Number of initial partition days for the automatically created Doris table. 0 means do not create partitions.
label_prefixopen_telemetryDoris Stream Load Label prefix. The final generated Label format is {label_prefix}_{db}_{table}_{yyyymmdd_hhmmss}_{uuid}.
headersNoneHeaders parameter of Doris Stream Load. The syntax format is a YAML map.
log_progress_interval10Time interval, in seconds, for outputting write speed in the log. Set to 0 to disable this log.

For more configuration, refer to the OpenTelemetry Collector Contrib Doris Exporter documentation.

Collect Doris FE TEXT logs

This scenario uses Doris FE logs as an example to show how to collect TEXT logs and write them into Doris.

Step 1: Prepare the log file

Doris FE log files are usually located at fe/log/fe.log under the Doris installation directory. FE logs are typical Java program logs that contain fields such as timestamp, log level, thread name, code location, and log content. The logs include both regular logs and exception logs with stacktraces. Because a stacktrace spans multiple lines, the main log and its corresponding stacktrace need to be merged into a single log entry during collection.

2024-07-08 21:18:01,432 INFO (Statistics Job Appender|61) [StatisticsJobAppender.runAfterCatalogReady():70] Stats table not available, skip
2024-07-08 21:18:53,710 WARN (STATS_FETCH-0|208) [StmtExecutor.executeInternalQuery():3332] Failed to run internal SQL: OriginStatement{originStmt='SELECT * FROM __internal_schema.column_statistics WHERE part_id is NULL ORDER BY update_time DESC LIMIT 500000', idx=0}
org.apache.doris.common.UserException: errCode = 2, detailMessage = tablet 10031 has no queryable replicas. err: replica 10032's backend 10008 does not exist or not alive
at org.apache.doris.planner.OlapScanNode.addScanRangeLocations(OlapScanNode.java:931) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.planner.OlapScanNode.computeTabletInfo(OlapScanNode.java:1197) ~[doris-fe.jar:1.2-SNAPSHOT]

Step 2: Write the OpenTelemetry configuration

The log collection configuration file opentelemetry_java_log.yml mainly contains three parts:

Configuration partFunction
receiversRead raw data.
processorsTransform and batch process data.
exportersOutput data to Doris.
# 1. receivers are responsible for reading raw data.
# filelog is a local receiver that can read log files from the local file system.
# multiline appends lines that do not start with a timestamp to the previous line, used to merge stacktraces with the main log.
receivers:
filelog:
include:
- /path/to/fe.log
start_at: beginning
multiline:
line_start_pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}' # Match the timestamp as the start of a new log
operators:
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) (?P<severity>INFO|WARN|ERROR) (?P<message>.*)'
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%d %H:%M:%S,%f'
severity:
parse_from: attributes.severity
trace: TRACE
debug: DEBUG
info: INFO
warn: WARN
error: ERROR
fatal: FATAL

# 2. processors are responsible for data transformation.
# The batch processor accumulates data into batches before sending.
processors:
batch:
send_batch_size: 100000 # Number of data entries per batch. It is recommended that the batch data size be between 100M and 1G.
timeout: 10s

# 3. exporters are responsible for outputting data.
# The doris exporter writes to Doris through the Stream Load HTTP interface.
# The default Stream Load data format is JSON. Stream Load parses JSON fields and writes them to the corresponding fields in the Doris table.
exporters:
doris:
endpoint: http://localhost:8030 # FE HTTP address
mysql_endpoint: localhost:9030 # FE MySQL address
database: doris_db_name
username: doris_username
password: doris_password
table:
logs: otel_logs
create_schema: true # Whether to automatically create the schema. If set to false, the table must be created manually.
history_days: 10
create_history_days: 10
timezone: Asia/Shanghai
timeout: 60s # HTTP Stream Load client timeout
log_response: true
sending_queue:
enabled: true
num_consumers: 20
queue_size: 1000
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
headers:
load_to_single_tablet: "true"

service:
pipelines:
logs:
receivers: [filelog]
processors: [batch]
exporters: [doris]

Step 3: Run the OpenTelemetry Collector

./otelcol-contrib --config config/opentelemetry_java_log.yml

Step 4: View the write results

When log_response is true, the log outputs the request parameters and response result of each Stream Load:

2025-08-18T00:33:22.543+0800    info    dorisexporter@v0.132.0/exporter_logs.go:181    log response:
{
"TxnId": 52,
"Label": "open_telemetry_otel_otel_logs_20250818003321_498bb8ec-040c-4982-9eb4-452b15129782",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 50355,
"NumberLoadedRows": 50355,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 31130235,
"LoadTimeMs": 680,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 3,
"ReadDataTimeMs": 106,
"WriteDataTimeMs": 653,
"ReceiveDataTimeMs": 11,
"CommitAndPublishTimeMs": 23
}

By default, write speed information is output every 10 seconds, including the data volume since startup (MB and ROWS), the total speed (MB/s and R/s), and the speed over the last 10 seconds:

2025-08-18T00:05:00.017+0800    info    dorisexporter@v0.132.0/progress_reporter.go:63    [LOG] total 11 MB 18978 ROWS, total speed 0 MB/s 632 R/s, last 10 seconds speed 1 MB/s 1897 R/s

Collect GitHub Events JSON logs

This scenario uses GitHub Events Archive data as an example to show how to collect JSON logs and write them into Doris.

Step 1: Prepare the JSON data

GitHub Events Archive is archive data of GitHub user action events in JSON format, which can be downloaded from https://www.gharchive.org/. The following example downloads the data for 15:00 on January 1, 2024:

wget https://data.gharchive.org/2024-01-01-15.json.gz

The following is a sample data record. The actual data is one JSON object per line; it is formatted here for readability.

{
"id": "37066529221",
"type": "PushEvent",
"actor": {
"id": 46139131,
"login": "Bard89",
"display_login": "Bard89",
"gravatar_id": "",
"url": "https://api.github.com/users/Bard89",
"avatar_url": "https://avatars.githubusercontent.com/u/46139131?"
},
"repo": {
"id": 780125623,
"name": "Bard89/talk-to-me",
"url": "https://api.github.com/repos/Bard89/talk-to-me"
},
"payload": {
"repository_id": 780125623,
"push_id": 17799451992,
"size": 1,
"distinct_size": 1,
"ref": "refs/heads/add_mvcs",
"head": "f03baa2de66f88f5f1754ce3fa30972667f87e81",
"before": "85e6544ede4ae3f132fe2f5f1ce0ce35a3169d21"
},
"public": true,
"created_at": "2024-04-01T23:00:00Z"
}

Step 2: Write the OpenTelemetry configuration

Unlike TEXT log collection, JSON log collection uses json_parser in the filelog receiver. json_parser parses each line of text as JSON, and the parsed fields are used in subsequent processing.

receivers:
filelog:
include:
- /path/to/2024-01-01-15.json
start_at: beginning
operators:
- type: json_parser
timestamp:
parse_from: attributes.created_at
layout: '%Y-%m-%dT%H:%M:%SZ'

processors:
batch:
send_batch_size: 100000 # Number of data entries per batch. It is recommended that the batch data size be between 100M and 1G.
timeout: 10s

exporters:
doris:
endpoint: http://localhost:8030 # FE HTTP address
mysql_endpoint: localhost:9030 # FE MySQL address
database: doris_db_name
username: doris_username
password: doris_password
table:
logs: otel_logs
create_schema: true # Whether to automatically create the schema. If set to false, the table must be created manually.
history_days: 10
create_history_days: 10
timezone: Asia/Shanghai
timeout: 60s # HTTP Stream Load client timeout
log_response: true
sending_queue:
enabled: true
num_consumers: 20
queue_size: 1000
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
headers:
load_to_single_tablet: "true"

service:
pipelines:
logs:
receivers: [filelog]
processors: [batch]
exporters: [doris]

Step 3: Run the OpenTelemetry Collector

./otelcol-contrib --config config/opentelemetry_json_log.yml

Collect application trace data

This scenario shows how to receive trace data reported from the application side through the OTLP protocol and write the traces into Doris.

Step 1: Write the OpenTelemetry configuration

Create the otel_trace.yaml configuration file:

receivers:
otlp: # OTLP protocol, receives data sent by the OpenTelemetry Java Agent
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
batch:
send_batch_size: 100000 # Number of data entries per batch. It is recommended that the batch data size be between 100M and 1G.
timeout: 10s

exporters:
doris:
endpoint: http://localhost:8030 # FE HTTP address
database: doris_db_name
username: doris_username
password: doris_password
table:
traces: doris_table_name
create_schema: true # Whether to automatically create the schema. If set to false, the table must be created manually.
mysql_endpoint: localhost:9030 # FE MySQL address
history_days: 10
create_history_days: 10
timezone: Asia/Shanghai
timeout: 60s # HTTP Stream Load client timeout
log_response: true
sending_queue:
enabled: true
num_consumers: 20
queue_size: 1000
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
headers:
load_to_single_tablet: "true"

service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [doris]

Step 2: Run the OpenTelemetry Collector

./otelcol-contrib --config otel_trace.yaml

Step 3: Integrate the OpenTelemetry SDK on the application side

The following example uses a Spring Boot sample application to integrate the OpenTelemetry Java SDK. The sample application comes from the official demo and returns the simple string Hello World! for the path /.

Download the OpenTelemetry Java Agent. The advantage of using the Java Agent is that no modification to existing application code is required. For other languages and integration methods, refer to the Language APIs & SDKs or Zero-code Instrumentation on the OpenTelemetry official website.

Before starting the application, add the following environment variables:

export JAVA_TOOL_OPTIONS="${JAVA_TOOL_OPTIONS} -javaagent:/your/path/to/opentelemetry-javaagent.jar" # Path to the OpenTelemetry Java Agent
export OTEL_JAVAAGENT_LOGGING="none" # Disable OTel logs to avoid interfering with the service's own logs
export OTEL_SERVICE_NAME="myproject"
export OTEL_TRACES_EXPORTER="otlp" # Use the OTLP protocol to send trace data
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317" # Address of the OpenTelemetry Collector

java -jar myproject-0.0.1-SNAPSHOT.jar