Skip to main content
Skip to main content
Blog/Top News

Why Apache Doris is the best open source alternative to Rockset

Zaki Lu

OpenAI dropped a bomb on the data world by announcing the acquisition of Rockset, a cloud-based, fully managed analytical database. Among all the congratulating voices, one question is raised: why Rockset?

OpenAI acquisition Rockset

Founded in 2016 by Venkat Venkataramani, former Engineering Director at Meta, Rockset focuses on real-time search and data analytics. Compared to other DBMS, Rockset stands out by its:

  • Real-time data updates: Rockset ensures data freshness for users by its capabilities in fetching and delivering the latest data. It supports real-time updates at the granularity of data fields, which can be performed within milliseconds.

  • Converged index: It reaps the benefits of inverted index, columnar storage, and row-oriented storage, and provides efficient and flexible data querying services.

  • Native support for semi-structured data: Rockset is well-suited to the growing demand for semi-structured data processing, hash joins, and nested loop joins.

  • SQL and JOIN compatibility: The Search Index of Rockset is optimized for various join queries.

The news also gaves all Rockset users a ticking time bomb: they have to find an appropriate alternative to Rockset for their own use case within three months. This, of course, arises as an opportunity for other analytical databases on the market. However, of all the claim-to-be alternatives, only a few of them cover all the above-mentioned key features of Rockset. Among them, Apache Doris is worth looking into.

As an open-source real-time data warehouse, Apache Doris is trusted by over 4000 enterprise users worldwide with powerful functionalities including:

  • Real-time data updates: Apache Doris supports not only real-time updates and deletion, but also real-time partial column updates, making it particularly useful in cases involving frequent data updates.

  • Row/column hybrid storage: Apache Doris is a column-oriented data warehouse that achieves world-leading OLAP performance on ClickBench. Additionally, it supports row-oriented storage to serve high-concurrency point query scenarios, which allows it to respond to almost a million query requests within milliseconds.

  • Inverted index and full-text searches: Apache Doris provides high efficiency and flexibility in keyword searching. It allows index creation on all fields and a flexible combination of data fields for multi-dimensional data analysis.

  • Native support for semi-structured data: Apache Doris has introduced the VARIANT data type to accommodate semi-structured data. It enables flexible data schema and high query speed on top of cost-efficient data storage. Compared to traditional JSON methods, VARIANT can bring a 10x performance improvement.

  • Support for various SQL and join operations: Apache Doris is highly compatible with MySQL syntaxes and interfaces. It supports INNER JOIN, CROSS JOIN, and all types of OUTER JOIN. The best part is its capability of auto-optimization based on data types to guarantee optimal performance under different circumstances.

As a Top-Level Project of the Apache Software Foundation, Apache Doris is supported by a robust and fast-growing community. It has accumulated over 11.8K GitHub stars and 636 contributors so far.

If you are seeking a fully managed solution instead of an open source product, you might want to look into VeloDB. As the commercial service provider of Apache Doris, VeloDB offers a wider range of products that are more tailored to the needs of enterprises. VeloDB Cloud decouples compute and storage on the basis of Apache Doris, thus realizing higher elastic scalability and cost efficiency. Like cloud-based Rockset, it frees users from tedious database operations and maintenance and redirects their focus to what drives their business growth.