You're viewing the preview version of this page. For the full experience, please return to the .

Real-Time Observability
for the AI Agent Era

Apache Doris unifies logs, metrics, traces and AI agent events on a single high-performance analytical foundation, so teams troubleshoot faster, control costs and keep improving AI quality over time.

Why Observability mattersfor modern teams.

When teams unify observability across logs, traces, metrics and AI agent events, five things shift at once: incident detection, user experience, cost at scale, AI quality, and the link between system behavior and business outcomes.

01 / Faster Incident Detection

Detect anomalies and find
the root cause sooner.

Observability connects logs, traces, and metrics to help teams detect anomalies and find root causes before failures spread. For AI agents, it turns LLM timeouts, prompt failures, failed tool calls, and runaway loops into analyzable execution data.

Where it shows up
  • Application performance monitoring
  • Microservice troubleshooting
  • AI agent workflow debugging
01 / 05

Already running in production.

Three teams run Apache Doris as the analytical foundation for observability: at scale, on real-time operational data, across logs, metrics and events.

Case 01 · PB-Scale Logging

MiniMax: migrated from Grafana Loki to a PB-scale logging system on Apache Doris

After migrating from Grafana Loki, Apache Doris now powers the logging system across all of MiniMax's business lines, serving PB-scale data with over 99.9% availability and second-level query latency on hundreds of millions of log entries.

Outcome
  • PB-scale log storage with 99.9%+ availability across all business lines
  • Keyword and aggregation queries on 1 billion logs return within 2 seconds
  • 10 GB/s write throughput with second-level ingestion latency
  • 5:1 compression and tiered storage cut storage costs by 70%
Read Case Study
Case 02 · Logs & Time Series

NetEase: replaced Elasticsearch and InfluxDB with Apache Doris for monitoring and time series analytics

NetEase migrated its Eagle monitoring platform off Elasticsearch and its IM time series platform off InfluxDB, consolidating both workloads on Apache Doris for faster queries, lower storage costs, and a more flexible index design.

Outcome
  • 11× faster query speed and 70% lower storage cost vs. Elasticsearch on monitoring logs
  • 67% less storage and half the servers vs. InfluxDB on time series workloads
  • 1 GB/s peak write throughput sustaining up to 1M TPS at peak
  • Flexible inverted indexes that can be added or dropped incrementally without rewriting tables
Read Case Study
Case 03 · Elasticsearch Migration

Tencent Music: replaced Elasticsearch with Apache Doris and cut costs by 80%

The shift from Elasticsearch to Apache Doris has slashed storage costs by 80% while boosting write performance by 4×, with inverted indexes powering full-text search and aggregations in a single SQL query.

Outcome
  • 80% lower overall operational cost vs. Elasticsearch
  • 72% less storage footprint (697.7 GB → 195.4 GB on the same dataset)
  • 4× faster write throughput, ingestion time cut from 10+ hours to under 3 hours
  • Alert frequency dropped from 20+ per day to single digits per month
Read Case Study

What modern Observability demandsand how Apache Doris answers.

Five things a modern observability platform has to be good at, and the specific Apache Doris capabilities that meet each one.

Observability technical requirements

Ingest Every Signal in Real Time

Modern observability data streams from applications, Kubernetes, APIs, gateways, databases, and AI systems. Your platform needs to absorb high-volume logs, traces, metrics, and AI agent events with low latency, so every signal — from LLM requests and tool calls to RAG retrievals, token costs, evaluations, and user feedback — is ready to query in near real time.

Analyze Dynamic JSON Without Heavy ETL

Logs, traces, tool outputs, model responses, and agent events often arrive as nested JSON with constantly changing fields. The platform must handle schema evolution and make any new field immediately queryable, so teams can filter, investigate, and analyze new signals without rebuilding pipelines.

Fast Full-Text Search Across Logs and Agent Signals

Observability data contains massive volumes of searchable text, from error messages, stack traces and log lines to prompts, model responses, tool outputs and agent failure reasons. Teams need fast keyword and full-text search so they can find failures, trace requests and investigate agent behavior at scale.

Interactive Analytics on Metrics, Cost & Quality

Observability is not only search. Teams need to slice, aggregate, and drill into massive telemetry datasets to analyze reliability, cost, and AI quality — from P99 latency and SLA trends to token usage, model cost, RAG quality, and agent task completion. That requires a high-performance analytical engine alongside fast search.

Hybrid Search for AI Agent Observability

AI agent observability requires more than keyword search over logs. Teams need to correlate structured metadata with full-text and semantic search across prompts, responses, tool calls and traces, so they can find exact matches, similar failures, hallucinated outputs and recurring agent behavior patterns in one query.

Apache Doris capabilities

CAP · 01

Real-Time Ingestion
for Logs & Agent Traces

Apache Doris ingests high-volume telemetry from Kafka, CDC pipelines, and streaming APIs with low latency. Logs, metrics, traces, and AI agent events become queryable in near real time for fast debugging, monitoring, and cost analysis.

Powered by
  • Routine Load
  • Stream Load
  • Built-in MySQL / PostgreSQL CDC
  • Built-in Kafka subscription
  • Real-time data update
01 / 05

Build Observability for the AI Agent Era
with Apache Doris.

Get Started