Skip to main content

Installation and Deployment

Installation and Deployment

This topic is about the hardware and software environment needed to deploy Doris, the recommended deployment mode, cluster scaling, and common problems occur in creating and running clusters.

Before continue reading, you might want to compile Doris following the instructions in the Compile topic.

Software and Hardware Requirements

Overview

Doris, as an open source OLAP database with an MPP architecture, can run on most mainstream commercial servers. For you to take full advantage of the high concurrency and high availability of Doris, we recommend that your computer meet the following requirements:

Linux Operating System Version Requirements

Linux SystemVersion
Centos7.1 and above
Ubuntu16.04 and above

Software Requirements

SoftVersion
Java1.8 and above
GCC4.8.2 and above

OS Installation Requirements

Set the maximum number of open file descriptors in the system

vi /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
Clock synchronization

The metadata in Doris requires a time precision of less than 5000ms, so all machines in all clusters need to synchronize their clocks to avoid service exceptions caused by inconsistencies in metadata caused by clock problems.

Close the swap partition

The Linux swap partition can cause serious performance problems for Doris, so you need to disable the swap partition before installation.

Linux file system

When installing the operating system, we recommend that you select the ext4 file system.

Development Test Environment

ModuleCPUMemoryDiskNetworkNumber of Instances
Frontend8 core +8GB +SSD or SATA, 10GB + *Gigabit Network Card1
Backend8 core +16GB +SSD or SATA, 50GB + *Gigabit Network Card1-3*

Production Environment

ModuleCPUMemoryDiskNetworkNumber of Instances (Minimum Requirements)
Frontend16 core +64GB +SSD or RAID card, 100GB + *10,000 Mbp network card1-3*
Backend16 core +64GB +SSD or SATA, 100G + *10-100 Mbp network card3 *

Note 1:

  1. The disk space of FE is mainly used to store metadata, including logs and images. It usually ranges from several hundred MB to several GB.
  2. The disk space of BE is mainly used to store user data. The total disk space taken up is 3 times the total user data (3 copies). Then an additional 40% of the space is reserved for background compaction and intermediate data storage.
  3. On one single machine, you can deploy multiple BE instances but only one FE instance. If you need 3 copies of the data, you need to deploy 3 BE instances on 3 machines (1 instance per machine) instead of 3 BE instances on one machine). Clocks of the FE servers must be consistent (allowing a maximum clock skew of 5 seconds).
  4. The test environment can also be tested with only 1 BE instance. In the actual production environment, the number of BE instances directly determines the overall query latency.
  5. Disable swap for all deployment nodes.

Note 2: Number of FE nodes

  1. FE nodes are divided into Followers and Observers based on their roles. (Leader is an elected role in the Follower group, hereinafter referred to as Follower, too.)
  2. The number of FE nodes should be at least 1 (1 Follower). If you deploy 1 Follower and 1 Observer, you can achieve high read availability; if you deploy 3 Followers, you can achieve high read-write availability (HA).
  3. The number of Followers must be odd, and there is no limit on the number of Observers.
  4. According to past experience, for business that requires high cluster availability (e.g. online service providers), we recommend that you deploy 3 Followers and 1-3 Observers; for offline business, we recommend that you deploy 1 Follower and 1-3 Observers.
  • Usually we recommend 10 to 100 machines to give full play to Doris' performance (deploy FE on 3 of them (HA) and BE on the rest).
  • The performance of Doris is positively correlated with the number of nodes and their configuration. With a minimum of four machines (one FE, three BEs; hybrid deployment of one BE and one Observer FE to provide metadata backup) and relatively low configuration, Doris can still run smoothly.
  • In hybrid deployment of FE and BE, you might need to be watchful for resource competition and ensure that the metadata catalogue and data catalogue belong to different disks.

Broker Deployment

Broker is a process for accessing external data sources, such as hdfs. Usually, deploying one broker instance on each machine should be enough.

Network Requirements

Doris instances communicate directly over the network. The following table shows all required ports.

Instance NamePort NameDefault PortCommunication DirectionDescription
BEbe_port9060FE --> BEThrift server port on BE for receiving requests from FE
BEwebserver_port8040BE <--> BEHTTP server port on BE
BEheartbeat_service_port9050FE --> BEHeart beat service port (thrift) on BE, used to receive heartbeat from FE
BEbrpc_port8060FE <--> BE, BE <--> BEBRPC port on BE for communication between BEs
FEhttp_port8030FE <--> FE, user <--> FEHTTP server port on FE
FErpc_port9020BE --> FE, FE <--> FEThrift server port on FE; The configurations of each FE should be consistent.
FEquery_port9030user <--> FEMySQL server port on FE
FEedit_log_port9010FE <--> FEPort on FE for BDBJE communication
Brokerbroker ipc_port8000FE --> Broker, BE --> BrokerThrift server port on Broker for receiving requests

Note:

  1. When deploying multiple FE instances, make sure that the http_port configuration of each FE is consistent.
  2. Make sure that each port has access in its proper direction before deployment.

IP Binding

Because of the existence of multiple network cards, or the existence of virtual network cards caused by the installation of docker and other environments, the same host may have multiple different IPs. Currently Doris does not automatically identify available IPs. So when you encounter multiple IPs on the deployment host, you must specify the correct IP via the priority_networks configuration item.

priority_networks is a configuration item that both FE and BE have. It needs to be written in fe.conf and be.conf. It is used to tell the process which IP should be bound when FE or BE starts. Examples are as follows:

priority_networks=10.1.3.0/24

This is a representation of CIDR. FE or BE will find the matching IP based on this configuration item as their own local IP.

Note: Configuring priority_networks and starting FE or BE only ensure the correct IP binding of FE or BE. You also need to specify the same IP in ADD BACKEND or ADD FRONTEND statements, otherwise the cluster cannot be created. For example:

BE is configured as priority_networks = 10.1.3.0/24'..

If you use the following IP in the ADD BACKEND statement: ALTER SYSTEM ADD BACKEND "192.168.0.1:9050";

Then FE and BE will not be able to communicate properly.

At this point, you must DROP the wrong BE configuration and use the correct IP to perform ADD BACKEND.

The same works for FE.

Broker currently does not have the priority_networks configuration item, nor does it need. Broker's services are bound to 0.0.0.0 by default. You can simply execute the correct accessible BROKER IP when using ADD BROKER.

Table Name Case Sensitivity

By default, table names in Doris are case-sensitive. If you need to change that, you may do it before cluster initialization. The table name case sensitivity cannot be changed after cluster initialization is completed.

See the lower_case_table_names section in Variables for details.

Cluster Deployment

Manual Deployment

Deploy FE

  • Copy the FE deployment file into the specified node

    Find the Fe folder under the output generated by source code compilation, copy it into the specified deployment path of FE nodes and put it the corresponding directory.

  • Configure FE

    1. The configuration file is conf/fe.conf. Note: meta_dir indicates the metadata storage location. The default value is ${DORIS_HOME}/doris-meta. The directory needs to be created manually.

      Note: For production environments, it is better not to put the directory under the Doris installation directory but in a separate disk (SSD would be the best); for test and development environments, you may use the default configuration.

    2. The default maximum Java heap memory of JAVA_OPTS in fe.conf is 4GB. For production environments, we recommend that it be adjusted to more than 8G.

  • Start FE

    bin/start_fe.sh --daemon

    The FE process starts and enters the background for execution. Logs are stored in the log/ directory by default. If startup fails, you can view error messages by checking out log/fe.log or log/fe.out.

  • For details about deployment of multiple FEs, see the FE scaling section.

Deploy BE

  • Copy the BE deployment file into all nodes that are to deploy BE on

    Find the BE folder under the output generated by source code compilation, copy it into to the specified deployment paths of the BE nodes.

  • Modify all BE configurations

    Modify be/conf/be.conf, which mainly involves configuring storage_root_path: data storage directory. By default, under be/storage, the directory needs to be created manually. Use ; to separate multiple paths (do not add ; after the last directory).

You may specify the directory storage medium in the path: HDD or SSD. You may also add capacility limit to the end of every path and use , for separation. Unless you use a mix of SSD and HDD disks, you do not need to follow the configuration methods in Example 1 and Example 2 below, but only need to specify the storage directory; you do not need to modify the default storage medium configuration of FE, either.

Example 1:

Note: For SSD disks, add .SSD to the end of the directory; for HDD disks, add .HDD.

`storage_root_path=/home/disk1/doris.HDD;/home/disk2/doris.SSD;/home/disk2/doris`

Description

* 1./home/disk1/doris.HDD: The storage medium is HDD;
* 2./home/disk2/doris.SSD: The storage medium is SSD;
* 3./home/disk2/doris: The storage medium is HDD (default).

Example 2:

Note: You do not need to add the .SSD or .HDD suffix, but to specify the medium in the storage_root_path parameter

`storage_root_path=/home/disk1/doris,medium:hdd;/home/disk2/doris,medium:ssd`

Description

* 1./home/disk1/doris,medium:hdd  :  The storage medium is HDD;
* 2./home/disk2/doris,medium:ssd : The storage medium is SSD.
  • BE webserver_port configuration

    If the BE component is installed in hadoop cluster, you need to change configuration webserver_port=8040 to avoid port used.

  • Set JAVA_HOME environment variable

    SinceVersion 1.2.0Java UDF is supported since version 1.2, so BEs are dependent on the Java environment. It is necessary to set the `JAVA_HOME` environment variable before starting. You can also do this by adding `export JAVA_HOME=your_java_home_path` to the first line of the `start_be.sh` startup script.
  • Install Java UDF

    SinceVersion 1.2.0Because Java UDF is supported since version 1.2, you need to download the JAR package of Java UDF from the official website and put them under the lib directory of BE, otherwise it may fail to start.
  • Add all BE nodes to FE

    BE nodes need to be added in FE before they can join the cluster. You can use mysql-client (Download MySQL 5.7) to connect to FE:

    ./mysql-client -h fe_host -P query_port -uroot

    fe_host is the node IP where FE is located; query_port is in fe/conf/fe.conf; the root account is used by default and no password is required in login.

    After login, execute the following command to add all the Best:

    ALTER SYSTEM ADD BACKEND "be_host:heartbeat_service_port";

    be_host is the node IP where BE is located; heartbeat_service_port is in be/conf/be.conf.

  • Start BE

    bin/start_be.sh --daemon

    The BE process will start and go into the background for execution. Logs are stored in be/log/directory by default. If startup fails, you can view error messages by checking out be/log/be.log or be/log/be.out.

  • View BE status

    Connect to FE using mysql-client and execute SHOW PROC '/backends'; to view BE operation status. If everything goes well, the isAlivecolumn should be true.

(Optional) FS_Broker Deployment

Broker is deployed as a plug-in, which is independent of Doris. If you need to import data from a third-party storage system, you need to deploy the corresponding Broker. By default, Doris provides fs_broker for HDFS reading and object storage (supporting S3 protocol). fs_broker is stateless and we recommend that you deploy a Broker for each FE and BE node.

  • Copy the corresponding Broker directory in the output directory of the source fs_broker to all the nodes that need to be deployed. It is recommended to keep the Broker directory on the same level as the BE or FE directories.

  • Modify the corresponding Broker configuration

    You can modify the configuration in the corresponding broker/conf/directory configuration file.

  • Start Broker

    bin/start_broker.sh --daemon

  • Add Broker

    To let Doris FE and BE know which nodes Broker is on, add a list of Broker nodes by SQL command.

    Use mysql-client to connect the FE started, and execute the following commands:

    ALTER SYSTEM ADD BROKER broker_name "broker_host1:broker_ipc_port1","broker_host2:broker_ipc_port2",...;

    broker\_host is the Broker node ip; broker_ipc_port is in conf/apache_hdfs_broker.conf in the Broker configuration file.

  • View Broker status

    Connect any started FE using mysql-client and execute the following command to view Broker status: SHOW PROC '/brokers';

Note: In production environments, daemons should be used to start all instances to ensure that processes are automatically pulled up after they exit, such as Supervisor. For daemon startup, in Doris 0.9.0 and previous versions, you need to remove the last & symbol in the start_xx.sh scripts. In Doris 0.10.0 and the subsequent versions, you may just call sh start_xx.sh directly to start.

FAQ

  1. How can we know whether the FE process startup succeeds?

    After the FE process starts, metadata is loaded first. Based on the role of FE, you can see transfer from UNKNOWN to MASTER/FOLLOWER/OBSERVER in the log. Eventually, you will see the thrift server started log and can connect to FE through MySQL client, which indicates that FE started successfully.

    You can also check whether the startup was successful by connecting as follows:

    http://fe_host:fe_http_port/api/bootstrap

    If it returns:

    {"status":"OK","msg":"Success"}

    The startup is successful; otherwise, there may be problems.

    Note: If you can't see the information of boot failure in fe. log, you may check in fe. out.

  2. How can we know whether the BE process startup succeeds?

    After the BE process starts, if there have been data there before, it might need several minutes for data index loading.

    If BE is started for the first time or the BE has not joined any cluster, the BE log will periodically scroll the words waiting to receive first heartbeat from frontend, meaning that BE has not received the Master's address through FE's heartbeat and is waiting passively. Such error log will disappear after sending the heartbeat by ADD BACKEND in FE. If the word master client', get client from cache failed. host:, port: 0, code: 7 appears after receiving the heartbeat, it indicates that FE has successfully connected BE, but BE cannot actively connect FE. You may need to check the connectivity of rpc_port from BE to FE.

    If BE has been added to the cluster, the heartbeat log from FE will be scrolled every five seconds: get heartbeat, host:xx. xx.xx.xx, port:9020, cluster id:xxxxxxx, indicating that the heartbeat is normal.

    Secondly, if the word finish report task success. return code: 0 is scrolled every 10 seconds in the log, that indicates that the BE-to-FE communication is normal.

    Meanwhile, if there is a data query, you will see the rolling logs and the execute time is xxx logs, indicating that BE is started successfully, and the query is normal.

    You can also check whether the startup was successful by connecting as follows:

    http://be_host:be_http_port/api/health

    If it returns:

    {"status": "OK","msg": "To Be Added"}

    That means the startup is successful; otherwise, there may be problems.

    Note: If you can't see the information of boot failure in be.INFO, you may see it in be.out.

  3. How can we confirm that the connectivity of FE and BE is normal after building the system?

    Firstly, you need to confirm that FE and BE processes have been started separately and worked normally. Then, you need to confirm that all nodes have been added through ADD BACKEND or ADD FOLLOWER/OBSERVER statements.

    If the heartbeat is normal, BE logs will show get heartbeat, host:xx.xx.xx.xx, port:9020, cluster id:xxxxx; if the heartbeat fails, you will see backend [10001] got Exception: org.apache.thrift.transport.TTransportException in FE's log, or other thrift communication abnormal log, indicating that the heartbeat from FE to 10001 BE fails. Here you need to check the connectivity of the FE to BE host heartbeat port.

    If the BE-to-FE communication is normal, the BE log will display the words finish report task success. return code: 0. Otherwise, the words master client, get client from cache failed will appear. In this case, you need to check the connectivity of BE to the rpc_port of FE.

  4. What is the Doris node authentication mechanism?

    In addition to Master FE, the other role nodes (Follower FE, Observer FE, Backend) need to register to the cluster through the ALTER SYSTEM ADD statement before joining the cluster.

    When Master FE is started for the first time, a cluster_id is generated in the doris-meta/image/VERSION file.

    When FE joins the cluster for the first time, it first retrieves the file from Master FE. Each subsequent reconnection between FEs (FE reboot) checks whether its cluster ID is the same as that of other existing FEs. If it is not the same, the FE will exit automatically.

    When BE first receives the heartbeat of Master FE, it gets the cluster ID from the heartbeat and records it in the cluster_id file of the data directory. Each heartbeat after that compares that to the cluster ID sent by FE. If the cluster IDs are not matched, BE will refuse to respond to FE's heartbeat.

    The heartbeat also contains Master FE's IP. If the Master FE changes, the new Master FE will send the heartbeat to BE together with its own IP, and BE will update the Master FE IP it saved.

    priority_network

    priority_network is a configuration item that both FE and BE have. It is used to help FE or BE identify their own IP addresses in the cases of multi-network cards. priority_network uses CIDR notation: RFC 4632

    If the connectivity of FE and BE is confirmed to be normal, but timeout still occurs in creating tables, and the FE log shows the error message backend does not find. host:xxxx.xxx.XXXX, this means that there is a problem with the IP address automatically identified by Doris and that the priority_network parameter needs to be set manually.

    The explanation to this error is as follows. When the user adds BE through the ADD BACKEND statement, FE recognizes whether the statement specifies hostname or IP. If it is a hostname, FE automatically converts it to an IP address and stores it in the metadata. When BE reports on the completion of the task, it carries its own IP address. If FE finds that the IP address reported by BE is different from that in the metadata, the above error message will show up.

    Solutions to this error: 1) Set priority_network in FE and BE separately. Usually FE and BE are in one network segment, so their priority_network can be set to be the same. 2) Put the correct IP address instead of the hostname in the ADD BACKEND statement to prevent FE from getting the wrong IP address.

  5. What is the number of file descriptors of a BE process?

    The number of file descriptor of a BE process is determined by two parameters: min_file_descriptor_number/max_file_descriptor_number.

    If it is not in the range [min_file_descriptor_number, max_file_descriptor_number], error will occurs when starting a BE process. You may use the ulimit command to reset the parameters.

    The default value of min_file_descriptor_number is 65536.

    The default value of max_file_descriptor_number is 131072.

    For example, the command ulimit -n 65536; means to set the number of file descriptors to 65536.

    After starting a BE process, you can use cat /proc/$pid/limits to check the actual number of file descriptors of the process.

    If you have used supervisord and encountered a file descriptor error, you can fix it by modifying minfds in supervisord.conf.

    vim /etc/supervisord.conf

    minfds=65535 ; (min. avail startup file descriptors;default 1024)