Real-Time Observability
for the AI Agent Era

Apache Doris unifies logs, metrics, traces and AI agent events on a single high-performance analytical foundation, so teams troubleshoot faster, control costs and keep improving AI quality over time.

Why Observability mattersfor modern teams.

When teams unify observability across logs, traces, metrics and AI agent events, five things shift at once: incident detection, user experience, cost at scale, AI quality, and the link between system behavior and business outcomes.

01 / Faster Incident Detection

Detect anomalies and find
the root cause sooner.

Observability connects logs, traces, and metrics to help teams detect anomalies and find root causes before failures spread. For AI agents, it turns LLM timeouts, prompt failures, failed tool calls, and runaway loops into analyzable execution data.

Where it shows up

Application performance monitoring
Microservice troubleshooting
AI agent workflow debugging

02 / User Experience & SLA

PROTECT EVERY USER EXPERIENCE,
FROM LATENCY TO ANSWER QUALITY.

Modern observability connects system health to real user impact. For AI applications, that means tracking not only latency, errors, and degraded endpoints, but also answer accuracy, task completion, grounded responses, and human handoff.

Where it shows up

Customer-facing analytics
SaaS tenant-level monitoring
AI assistant quality monitoring

04 / Continuous AI Improvement

MAKE AI APPLICATIONS
IMPROVE OVER TIME..

AI applications can return valid responses that are still wrong, incomplete, or ungrounded. By observing prompts, responses, RAG context, tool calls, scores, and user feedback, teams can continuously improve prompt quality, retrieval accuracy, and task completion.

Where it shows up

LLM application monitoring
RAG quality evaluation
AI agent evaluation and optimization

05 / Business-Aware Operations

Connect system behavior
to business outcomes.

Observability becomes more valuable when every signal is tied to business impact. Teams can see which customers, tenants, and workflows are affected — and understand how AI agent failures impact conversion, support load, and revenue.

Where it shows up

Tenant-level impact analysis
Business-impact incident prioritization
AI workflow success tracking

01 / 05

Already running in production.

Three teams run Apache Doris as the analytical foundation for observability: at scale, on real-time operational data, across logs, metrics and events.

Case 01 · PB-Scale Logging

MiniMax: migrated from Grafana Loki to a PB-scale logging system on Apache Doris

“After migrating from Grafana Loki, Apache Doris now powers the logging system across all of MiniMax's business lines, serving PB-scale data with over 99.9% availability and second-level query latency on hundreds of millions of log entries.”

Outcome

PB-scale log storage with 99.9%+ availability across all business lines
Keyword and aggregation queries on 1 billion logs return within 2 seconds
10 GB/s write throughput with second-level ingestion latency
5:1 compression and tiered storage cut storage costs by 70%

Read Case Study

Case 02 · Logs & Time Series

NetEase: replaced Elasticsearch and InfluxDB with Apache Doris for monitoring and time series analytics

“NetEase migrated its Eagle monitoring platform off Elasticsearch and its IM time series platform off InfluxDB, consolidating both workloads on Apache Doris for faster queries, lower storage costs, and a more flexible index design.”

Outcome

11× faster query speed and 70% lower storage cost vs. Elasticsearch on monitoring logs
67% less storage and half the servers vs. InfluxDB on time series workloads
1 GB/s peak write throughput sustaining up to 1M TPS at peak
Flexible inverted indexes that can be added or dropped incrementally without rewriting tables

Read Case Study

Case 03 · Elasticsearch Migration

Tencent Music: replaced Elasticsearch with Apache Doris and cut costs by 80%

“The shift from Elasticsearch to Apache Doris has slashed storage costs by 80% while boosting write performance by 4×, with inverted indexes powering full-text search and aggregations in a single SQL query.”

Outcome

80% lower overall operational cost vs. Elasticsearch
72% less storage footprint (697.7 GB → 195.4 GB on the same dataset)
4× faster write throughput, ingestion time cut from 10+ hours to under 3 hours
Alert frequency dropped from 20+ per day to single digits per month

Read Case Study

What modern Observability demandsand how Apache Doris answers.

Five things a modern observability platform has to be good at, and the specific Apache Doris capabilities that meet each one.

Observability technical requirements

Ingest Every Signal in Real Time

Modern observability data streams from applications, Kubernetes, APIs, gateways, databases, and AI systems. Your platform needs to absorb high-volume logs, traces, metrics, and AI agent events with low latency, so every signal — from LLM requests and tool calls to RAG retrievals, token costs, evaluations, and user feedback — is ready to query in near real time.

Analyze Dynamic JSON Without Heavy ETL

Logs, traces, tool outputs, model responses, and agent events often arrive as nested JSON with constantly changing fields. The platform must handle schema evolution and make any new field immediately queryable, so teams can filter, investigate, and analyze new signals without rebuilding pipelines.

Fast Full-Text Search Across Logs and Agent Signals

Observability data contains massive volumes of searchable text, from error messages, stack traces and log lines to prompts, model responses, tool outputs and agent failure reasons. Teams need fast keyword and full-text search so they can find failures, trace requests and investigate agent behavior at scale.

Interactive Analytics on Metrics, Cost & Quality

Observability is not only search. Teams need to slice, aggregate, and drill into massive telemetry datasets to analyze reliability, cost, and AI quality — from P99 latency and SLA trends to token usage, model cost, RAG quality, and agent task completion. That requires a high-performance analytical engine alongside fast search.

Hybrid Search for AI Agent Observability

AI agent observability requires more than keyword search over logs. Teams need to correlate structured metadata with full-text and semantic search across prompts, responses, tool calls and traces, so they can find exact matches, similar failures, hallucinated outputs and recurring agent behavior patterns in one query.

Apache Doris capabilities

CAP · 01

Real-Time Ingestion
for Logs & Agent Traces

Apache Doris ingests high-volume telemetry from Kafka, CDC pipelines, and streaming APIs with low latency. Logs, metrics, traces, and AI agent events become queryable in near real time for fast debugging, monitoring, and cost analysis.

01 / 05

Build Observability for the AI Agent Era
with Apache Doris.

Get Started

Real-Time Observability
for the AI Agent Era

Why Observability mattersfor modern teams.

Detect anomalies and find
the root cause sooner.

PROTECT EVERY USER EXPERIENCE,
FROM LATENCY TO ANSWER QUALITY.

CONTROL OBSERVABILITY COSTS
AS DATA GROWS.

MAKE AI APPLICATIONS
IMPROVE OVER TIME..

Connect system behavior
to business outcomes.

Already running in production.

MiniMax: migrated from Grafana Loki to a PB-scale logging system on Apache Doris

NetEase: replaced Elasticsearch and InfluxDB with Apache Doris for monitoring and time series analytics

Tencent Music: replaced Elasticsearch with Apache Doris and cut costs by 80%

What modern Observability demandsand how Apache Doris answers.

Observability technical requirements

Ingest Every Signal in Real Time

Analyze Dynamic JSON Without Heavy ETL

Fast Full-Text Search Across Logs and Agent Signals

Interactive Analytics on Metrics, Cost & Quality

Hybrid Search for AI Agent Observability

Apache Doris capabilities

Real-Time Ingestion
for Logs & Agent Traces

Semi-Structured Analytics
with VARIANT

Fast Full-Text Search
with Inverted Index & BM25

INTERACTIVE ANALYTICS
for Observability Dashboards

Hybrid Search across
Structured, Text & Vector

Build Observability for the AI Agent Era
with Apache Doris.

Real-Time Observabilityfor the AI Agent Era

Why Observability mattersfor modern teams.

Detect anomalies and findthe root cause sooner.

Already running in production.

MiniMax: migrated from Grafana Loki to a PB-scale logging system on Apache Doris

NetEase: replaced Elasticsearch and InfluxDB with Apache Doris for monitoring and time series analytics

Tencent Music: replaced Elasticsearch with Apache Doris and cut costs by 80%

What modern Observability demandsand how Apache Doris answers.

Observability technical requirements

Ingest Every Signal in Real Time

Analyze Dynamic JSON Without Heavy ETL

Fast Full-Text Search Across Logs and Agent Signals

Interactive Analytics on Metrics, Cost & Quality

Hybrid Search for AI Agent Observability

Apache Doris capabilities

Real-Time Ingestionfor Logs & Agent Traces

Build Observability for the AI Agent Erawith Apache Doris.

Real-Time Observability
for the AI Agent Era

Detect anomalies and find
the root cause sooner.

Real-Time Ingestion
for Logs & Agent Traces

Build Observability for the AI Agent Era
with Apache Doris.