跳到主要内容

MCP Server

TL;DR The Apache Doris MCP Server is a Python service that speaks the Model Context Protocol. The server exposes Apache Doris as a set of tools an AI assistant can call, including listing tables, running SQL, fetching schemas, and reading audit logs. Point Claude Desktop, Cursor, or any MCP client at the server and the model can work against a real cluster without a custom integration.

Apache Doris MCP Server: A Model Context Protocol server for Apache Doris that lets Claude Desktop, Cursor, Cline, and other AI clients query a cluster through tool calls.

Why use the Apache Doris MCP Server?

The Apache Doris MCP Server replaces the same chat-assistant-to-database integration that every company would otherwise build from scratch. You would otherwise stand up a small Python service, wrap a few SQL helpers, handle credentials, timeouts, result truncation, and read-only enforcement, and then rewrite client-specific glue for Claude Desktop, Cursor, and whatever shows up next month. The work is mostly boilerplate, but the bug surface is large: an over-eager DELETE from an LLM is a real outage. The in-database AI surface, LLM SQL functions and embeddings, complements this card from the SQL side.

Anthropic released MCP in November 2024 to standardize that work. Servers expose tools with typed inputs and outputs; clients (Claude Desktop, Cursor, Cline, Continue, Zed, and others) speak the same protocol; the model decides when to call. Database vendors have followed: ClickHouse, Snowflake, MotherDuck, BigQuery, and Supabase all ship official servers.

The Apache Doris MCP Server is the equivalent for Apache Doris. It lives in a separate repo (apache/doris-mcp-server), it ships under Apache 2.0, and you launch it from any MCP client config in a few lines.

What is the Apache Doris MCP Server?

The Apache Doris MCP Server is a Python 3.12 service built on FastAPI. It connects to Apache Doris over the MySQL protocol, registers a fixed set of MCP tools, and serves them over stdio, Server-Sent Events, or the streamable HTTP transport. AI assistants treat each tool as a function call. The server logs every call, applies a SQL security filter, and returns JSON.

Key terms

  • MCP (Model Context Protocol): an open JSON-RPC 2.0 protocol for connecting LLM clients to external tools and data. Tools are typed functions; resources are read-only data; prompts are reusable templates.
  • Tool: one Python function decorated with @mcp.tool(). The Apache Doris server ships eight: exec_query, get_db_list, get_db_table_list, get_table_schema, get_table_comment, get_table_column_comments, get_table_indexes, and get_recent_audit_logs.
  • Transport: how the client and server talk. stdio runs the server as a subprocess (the default for Claude Desktop). SSE and Streamable HTTP are for remote deployments.
  • SQL security filter: a server-side guard, on by default, that blocks DROP, DELETE, INSERT, UPDATE, ALTER, and CREATE, and adds an automatic LIMIT to bare SELECT statements.

How does the Apache Doris MCP Server work?

The Apache Doris MCP Server runs through a five-step loop: the client launches the server, the server connects to the cluster, the model picks a tool, the server filters and runs the SQL, and results return as JSON.

  1. The client launches the server. In stdio mode, Claude Desktop or Cursor spawns the server as a subprocess and communicates over stdin/stdout. In SSE or HTTP mode, you run the server long-lived and the client connects over the network.
  2. The server connects to Apache Doris. It reads DB_HOST, DB_PORT, DB_USER, DB_PASSWORD, and DB_DATABASE from environment variables, then opens a MySQL-protocol connection on port 9030. No JDBC URL, no driver setup.
  3. The model calls a tool. The assistant decides, given the user's prompt, which tool to invoke. For "what tables hold order data?", that is get_db_table_list. For "summarize yesterday's slow queries," that is get_recent_audit_logs. The user typically approves each call before it runs.
  4. The server filters and runs the query. exec_query parses the statement, rejects anything that mutates data when ENABLE_SQL_SECURITY_CHECK=true, and appends a LIMIT if the query has none. A 30-second timeout (configurable per call) caps runtime.
  5. Results return as JSON. The client renders them inline in the chat. Large result sets are truncated by max_rows, default 100, so a careless SELECT * does not blow up the model's context window.

Quick start

{
"mcpServers": {
"doris": {
"command": "uv",
"args": ["--project", "/path/to/doris-mcp-server", "run", "doris-mcp"],
"env": {
"DB_HOST": "127.0.0.1",
"DB_PORT": "9030",
"DB_USER": "root",
"DB_PASSWORD": "your_password",
"DB_DATABASE": "your_db"
}
}
}
}

Expected result

Save the snippet as ~/Library/Application Support/Claude/claude_desktop_config.json (macOS), restart Claude Desktop, and the doris server appears in the tools menu. Ask "what databases do we have?" and the assistant calls get_db_list, returning something like:

information_schema, mysql, ssb, tpch_100

The assistant can now compose follow-up calls: get_db_table_list('ssb'), then get_table_schema('lineorder', 'ssb'), then a plain exec_query once it has the column names.

When should you use the Apache Doris MCP Server?

The Apache Doris MCP Server fits read-mostly AI assistant scenarios, especially schema discovery, ad-hoc analysis, and on-call investigation against a real cluster.

Good fit

  • AI-assisted SQL authoring inside Cursor or Claude Code, where the assistant inspects the schema and drafts a query against your real cluster instead of guessing column names.
  • Ad-hoc "ask your data" sessions in Claude Desktop, especially for engineers who would otherwise paste schemas into the chat by hand.
  • On-call assistants that read audit logs (get_recent_audit_logs) to find the slow query that broke a dashboard.
  • Schema discovery and BI prototyping, where the assistant chains get_db_listget_db_table_listget_table_schema to sketch a model before anyone writes a query.

Not a good fit

  • Production write paths. The server is preview-grade, the SQL filter is an allowlist, and an LLM in the loop is not the right place for INSERT or UPDATE. Use a real application for writes.
  • Untrusted data. An attacker who can put text into a row your assistant later reads can attempt prompt injection. The community has documented real incidents on Postgres MCP servers; treat anything the model fetches as data, not instructions, and review tool calls before running them. See MCP security best practices.
  • Browsing multi-million-row tables. Tool results land in the model's context window, and the per-token bill scales accordingly. Cap max_rows, ask the model to write aggregations, and reach for a notebook for anything beyond a sample.
  • Multi-tenant clusters with no row-level scoping. The server connects with one MySQL account; whatever that account can see, the model can see. Create a dedicated read-only user, restrict its database grants, and never reuse a power-user account.
  • Workloads that need fine-grained, programmable tool access. The eight tools cover schema and read paths well, but anything beyond that (custom workflows, batch jobs, NL2SQL with user-defined prompts) belongs in a custom integration that calls Apache Doris directly.

Further reading