Doris Compute-Storage Decoupled Deployment Preparation
1. Overviewβ
This document describes the deployment preparation work for the Apache Doris compute-storage decoupled mode. The decoupled architecture aims to improve system scalability and performance, suitable for large-scale data processing scenarios.
2. Architecture Componentsβ
The Doris compute-storage decoupled architecture consists of three main modules:
- Frontend (FE): Handles user requests and manages metadata.
- Backend (BE): Stateless compute nodes that execute query tasks.
- Meta Service (MS): Manages metadata operations and data recovery.
3. System Requirementsβ
3.1 Hardware Requirementsβ
- Minimum configuration: 3 servers
- Recommended configuration: 5 or more servers
3.2 Software Dependenciesβ
- FoundationDB (FDB) version 7.1.38 or higher
- OpenJDK 17
4. Deployment Planningβ
4.1 Testing Environment Deploymentβ
Deploy all modules on a single machine, not suitable for production environments.
4.2 Production Deploymentβ
- Deploy FDB on 3 or more machines
- Deploy FE and Meta Service on 3 or more machines
- Deploy BE on 3 or more machines
When machine configurations are high, consider mixing FDB, FE, and Meta Service, but do not mix disks.
5. Installation Stepsβ
5.1 Install FoundationDBβ
This section provides a step-by-step guide to configure, deploy, and start the FoundationDB (FDB) service using the provided scripts fdb_vars.sh
and fdb_ctl.sh
. You can download doris tools and get fdb_vars.sh
and fdb_ctl.sh
from fdb
directory.
5.1.1 Machine Requirementsβ
Typically, at least 3 machines equipped with SSDs are required to form a FoundationDB cluster with dual data replicas and allow for single machine failures. If SSDs are not available, at least standard cloud disks or local disks with a standard POSIX-compliant file system must be used for data storage. Otherwise, FoundationDB may fail to operate properly - for instance, storage solutions like JuiceFS should not be used as the underlying storage for FoundationDB.
If only for development/testing purposes, a single machine is sufficient.
5.1.2 fdb_vars.sh
Configurationβ
Required Custom Settingsβ
Parameter | Description | Type | Example | Notes |
---|---|---|---|---|
DATA_DIRS | Specify the data directory for FoundationDB storage | Comma-separated list of absolute paths | /mnt/foundationdb/data1,/mnt/foundationdb/data2,/mnt/foundationdb/data3 | - Ensure directories are created before running the script - SSD and separate directories are recommended for production environments |
FDB_CLUSTER_IPS | Define cluster IPs | String (comma-separated IP addresses) | 172.200.0.2,172.200.0.3,172.200.0.4 | - At least 3 IP addresses for production clusters - The first IP will be used as the coordinator - For high availability, place machines in different racks |
FDB_HOME | Define the main directory for FoundationDB | Absolute path | /fdbhome | - Default path is /fdbhome - Ensure this path is absolute |
FDB_CLUSTER_ID | Define the cluster ID | String | SAQESzbh | - Each cluster ID must be unique - Can be generated using mktemp -u XXXXXXXX |
FDB_CLUSTER_DESC | Define the description of the FDB cluster | String | dorisfdb | - It is recommended to change this to something meaningful for the deployment |
Optional Custom Settingsβ
Parameter | Description | Type | Example | Notes |
---|---|---|---|---|
MEMORY_LIMIT_GB | Define the memory limit for FDB processes in GB | Integer | MEMORY_LIMIT_GB=16 | Adjust this value based on available memory resources and FDB process requirements |
CPU_CORES_LIMIT | Define the CPU core limit for FDB processes | Integer | CPU_CORES_LIMIT=8 | Set this value based on the number of available CPU cores and FDB process requirements |
5.1.3 Deploy FDB Clusterβ
After configuring the environment with fdb_vars.sh
, you can deploy the FDB cluster on each node using the fdb_ctl.sh
script.
./fdb_ctl.sh deploy
This command initiates the deployment process of the FDB cluster.
5.1.4 Start FDB Serviceβ
Once the FDB cluster is deployed, you can start the FDB service on each node using the fdb_ctl.sh
script.
./fdb_ctl.sh start
This command starts the FDB service, making the cluster operational and obtaining the FDB cluster connection string, which can be used for configuring the MetaService.
5.2 Install OpenJDK 17β
- Download OpenJDK 17
- Extract and set the environment variable JAVA_HOME.
6. Next Stepsβ
After completing the above preparations, please refer to the following documents to continue the deployment:
7. Notesβ
- Ensure time synchronization across all nodes
- Regularly back up FoundationDB data
- Adjust FoundationDB and Doris configuration parameters based on actual load
- Use standard cloud disks or local disks with a POSIX-compliant file system for data storage; otherwise, FoundationDB may not function properly.
- For example, storage solutions like JuiceFS should not be used as FoundationDB's storage backend.