Skip to main content

load balancing

load balancing

When deploying multiple FE nodes, users can deploy a load balancing layer on top of multiple FEs to achieve high availability of Doris.

Code method

Retry and load balance yourself in the application layer code. For example, if a connection is found to be down, it will automatically retry on other connections. Application layer code retry requires the application to configure multiple doris front-end node addresses.

JDBC Connector

If you use mysql jdbc connector to connect to Doris, you can use jdbc's automatic retry mechanism:

jdbc:mysql:loadbalance://[host:port],[host:port].../[database][?propertyName1][=propertyValue1][&propertyName2][=propertyValue

For details, please refer to Mysql official website document

ProxySQL method

ProxySQL is a flexible and powerful MySQL proxy layer. It is a MySQL middleware that can be actually used in a production environment. It can realize read-write separation, support Query routing function, support dynamic designation of a certain SQL for cache, support dynamic loading configuration, failure Switching and some SQL filtering functions.

Doris's FE process is responsible for receiving user connections and query requests. It itself is horizontally scalable and highly available, but it requires users to set up a proxy on multiple FEs to achieve automatic connection load balancing.

Install ProxySQL (yum way)

 Configure yum source
# vim /etc/yum.repos.d/proxysql.repo
[proxysql_repo]
name= ProxySQL YUM repository
baseurl=http://repo.proxysql.com/ProxySQL/proxysql-1.4.x/centos/\$releasever
gpgcheck=1
gpgkey=http://repo.proxysql.com/ProxySQL/repo_pub_key

Perform installation
# yum clean all
# yum makecache
# yum -y install proxysql
View version
# proxysql --version
ProxySQL version 1.4.13-15-g69d4207, codename Truls
Set up auto start
# systemctl enable proxysql
# systemctl start proxysql
# systemctl status proxysql
After startup, it will listen to two ports, the default is 6032 and 6033. Port 6032 is the management port of ProxySQL, and 6033 is the port for ProxySQL to provide external services (that is, the forwarding port connected to the real database of the forwarding backend).
# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:6032 0.0.0.0:* LISTEN 23940/proxysql
tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN

ProxySQL Config

ProxySQL has a configuration file /etc/proxysql.cnf and a configuration database file /var/lib/proxysql/proxysql.db. Special attention is needed here: If there is a "proxysql.db" file (under the /var/lib/proxysql directory), the ProxySQL service will only be read when it is started for the first time The proxysql.cnf file and parse it; after startup, the proxysql.cnf file will not be read! If you want the configuration in the proxysql.cnf file to take effect after restarting the proxysql service (that is, you want proxysql to read and parse the proxysql.cnf configuration file when it restarts), you need to delete /var/lib/proxysql/proxysql first. dbdatabase file, and then restart the proxysql service. This is equivalent to initializing the proxysql service, and a pure proxysql.db database file will be produced again (if proxysql related routing rules, etc. are configured before, it will be erased)

View and modify configuration files

Here are mainly a few parameters, which have been commented out below, and you can modify them according to your needs

# egrep -v "^#|^$" /etc/proxysql.cnf
datadir="/var/lib/proxysql" #data dir
admin_variables=
{
admin_credentials="admin:admin" #User name and password for connecting to the management terminal
mysql_ifaces="0.0.0.0:6032" #Management port, used to connect to proxysql management database
}
mysql_variables=
{
threads=4 #Specify the number of threads opened for the forwarding port
max_connections=2048
default_query_delay=0
default_query_timeout=36000000
have_compress=true
poll_timeout=2000
interfaces="0.0.0.0:6033" #Specify the forwarding port, used to connect to the back-end mysql database, which is equivalent to acting as a proxy
default_schema="information_schema"
stacksize=1048576
server_version="5.5.30" #Specify the version of the backend mysql
connect_timeout_server=3000
monitor_username="monitor"
monitor_password="monitor"
monitor_history=600000
monitor_connect_interval=60000
monitor_ping_interval=10000
monitor_read_only_interval=1500
monitor_read_only_timeout=500
ping_interval_server_msec=120000
ping_timeout_server=500
commands_stats=true
sessions_sort=true
connect_retries_on_failure=10
}
mysql_servers =
(
)
mysql_users:
(
)
mysql_query_rules:
(
)
scheduler=
(
)
mysql_replication_hostgroups=
(
)

Connect to the ProxySQL management port test

# mysql -uadmin -padmin -P6032 -hdoris01
View the global_variables table information of the main library (it is in this library after login by default)
MySQL [(none)]> show databases;
+-----+---------------+-------------------------------------+
| seq | name | file |
+-----+---------------+-------------------------------------+
| 0 | main | |
| 2 | disk | /var/lib/proxysql/proxysql.db |
| 3 | stats | |
| 4 | monitor | |
| 5 | stats_history | /var/lib/proxysql/proxysql_stats.db |
+-----+---------------+-------------------------------------+
5 rows in set (0.000 sec)
MySQL [(none)]> use main;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MySQL [main]> show tables;
+--------------------------------------------+
| tables |
+--------------------------------------------+
| global_variables |
| mysql_collations |
| mysql_group_replication_hostgroups |
| mysql_query_rules |
| mysql_query_rules_fast_routing |
| mysql_replication_hostgroups |
| mysql_servers |
| mysql_users |
| proxysql_servers |
| runtime_checksums_values |
| runtime_global_variables |
| runtime_mysql_group_replication_hostgroups |
| runtime_mysql_query_rules |
| runtime_mysql_query_rules_fast_routing |
| runtime_mysql_replication_hostgroups |
| runtime_mysql_servers |
| runtime_mysql_users |
| runtime_proxysql_servers |
| runtime_scheduler |
| scheduler |
+--------------------------------------------+
20 rows in set (0.000 sec)

ProxySQL configuration backend Doris FE

Use the insert statement to add the host to the mysql_servers table, where: hostgroup_id is 10 for the write group, and 20 for the read group. We don't need to read and write the license here, and it doesn't matter which one can be set randomly.

[root@mysql-proxy ~]# mysql -uadmin -padmin -P6032 -h127.0.0.1
............
MySQL [(none)]> insert into mysql_servers(hostgroup_id,hostname,port) values(10,'192.168.9.211',9030);
Query OK, 1 row affected (0.000 sec)

MySQL [(none)]> insert into mysql_servers(hostgroup_id,hostname,port) values(10,'192.168.9.212',9030);
Query OK, 1 row affected (0.000 sec)

MySQL [(none)]> insert into mysql_servers(hostgroup_id,hostname,port) values(10,'192.168.9.213',9030);
Query OK, 1 row affected (0.000 sec)

If an error occurs during the insertion process:
ERROR 1045 (#2800): UNIQUE constraint failed: mysql_servers.hostgroup_id, mysql_servers.hostname, mysql_servers.port

It means that other configurations may have been defined before, you can clear this table or delete the configuration of the corresponding host
MySQL [(none)]> select * from mysql_servers;
MySQL [(none)]> delete from mysql_servers;
Query OK, 6 rows affected (0.000 sec)

Check whether these 3 nodes are inserted successfully and their status.
MySQL [(none)]> select * from mysql_servers\G;
*************************** 1. row ***************************
hostgroup_id: 10
hostname: 192.168.9.211
port: 9030
status: ONLINE
weight: 1
compression: 0
max_connections: 1000
max_replication_lag: 0
use_ssl: 0
max_latency_ms: 0
comment:
*************************** 2. row ***************************
hostgroup_id: 10
hostname: 192.168.9.212
port: 9030
status: ONLINE
weight: 1
compression: 0
max_connections: 1000
max_replication_lag: 0
use_ssl: 0
max_latency_ms: 0
comment:
*************************** 3. row ***************************
hostgroup_id: 10
hostname: 192.168.9.213
port: 9030
status: ONLINE
weight: 1
compression: 0
max_connections: 1000
max_replication_lag: 0
use_ssl: 0
max_latency_ms: 0
comment:
6 rows in set (0.000 sec)

ERROR: No query specified

After the above modification, load it to RUNTIME and save it to disk. The following two steps are very important, otherwise your configuration information will be gone after you exit and must be saved
MySQL [(none)]> load mysql servers to runtime;
Query OK, 0 rows affected (0.006 sec)

MySQL [(none)]> save mysql servers to disk;
Query OK, 0 rows affected (0.348 sec)

Monitor Doris FE node configuration

After adding doris fe nodes, you also need to monitor these back-end nodes. For multiple FE high-availability load balancing environments on the backend, this is necessary because ProxySQL needs to be automatically adjusted by the read_only value of each node

Whether they belong to the read group or the write group.

First create a user name for monitoring on the back-end master main data node

Execute on the node of the doris fe master master database:
# mysql -P9030 -uroot -p
mysql> create user monitor@'192.168.9.%' identified by 'P@ssword1!';
Query OK, 0 rows affected (0.03 sec)
mysql> grant ADMIN_PRIV on *.* to monitor@'192.168.9.%';
Query OK, 0 rows affected (0.02 sec)

Then go back to the mysql-proxy proxy layer node to configure monitoring
# mysql -uadmin -padmin -P6032 -h127.0.0.1
MySQL [(none)]> set mysql-monitor_username='monitor';
Query OK, 1 row affected (0.000 sec)

MySQL [(none)]> set mysql-monitor_password='P@ssword1!';
Query OK, 1 row affected (0.000 sec)

After modification, load to RUNTIME and save to disk
MySQL [(none)]> load mysql variables to runtime;
Query OK, 0 rows affected (0.001 sec)

MySQL [(none)]> save mysql variables to disk;
Query OK, 94 rows affected (0.079 sec)

Verify the monitoring results: The indicators of the ProxySQL monitoring module are stored in the log table of the monitor library.
The following is the monitoring of whether the connection is normal (monitoring of connect indicators):
Note: There may be many connect_errors, this is because there is an error when the monitoring information is not configured. After the configuration, if the result of connect_error is NULL, it means normal。
MySQL [(none)]> select * from mysql_server_connect_log;
+---------------+------+------------------+-------------------------+---------------+
| hostname | port | time_start_us | connect_success_time_us | connect_error |
+---------------+------+------------------+-------------------------+---------------+
| 192.168.9.211 | 9030 | 1548665195883957 | 762 | NULL |
| 192.168.9.212 | 9030 | 1548665195894099 | 399 | NULL |
| 192.168.9.213 | 9030 | 1548665195904266 | 483 | NULL |
| 192.168.9.211 | 9030 | 1548665255883715 | 824 | NULL |
| 192.168.9.212 | 9030 | 1548665255893942 | 656 | NULL |
| 192.168.9.211 | 9030 | 1548665495884125 | 615 | NULL |
| 192.168.9.212 | 9030 | 1548665495894254 | 441 | NULL |
| 192.168.9.213 | 9030 | 1548665495904479 | 638 | NULL |
| 192.168.9.211 | 9030 | 1548665512917846 | 487 | NULL |
| 192.168.9.212 | 9030 | 1548665512928071 | 994 | NULL |
| 192.168.9.213 | 9030 | 1548665512938268 | 613 | NULL |
+---------------+------+------------------+-------------------------+---------------+
20 rows in set (0.000 sec)
The following is the monitoring of heartbeat information (monitoring of ping indicators)
MySQL [(none)]> select * from mysql_server_ping_log;
+---------------+------+------------------+----------------------+------------+
| hostname | port | time_start_us | ping_success_time_us | ping_error |
+---------------+------+------------------+----------------------+------------+
| 192.168.9.211 | 9030 | 1548665195883407 | 98 | NULL |
| 192.168.9.212 | 9030 | 1548665195885128 | 119 | NULL |
...........
| 192.168.9.213 | 9030 | 1548665415889362 | 106 | NULL |
| 192.168.9.213 | 9030 | 1548665562898295 | 97 | NULL |
+---------------+------+------------------+----------------------+------------+
110 rows in set (0.001 sec)

The read_only log is also empty at this time (normally, when the new environment is configured, this read-only log is empty)
MySQL [(none)]> select * from mysql_server_read_only_log;
Empty set (0.000 sec)

All 3 nodes are in the group with hostgroup_id=10.
Now, load the modification of the mysql_replication_hostgroups table just now to RUNTIME to take effect。
MySQL [(none)]> load mysql servers to runtime;
Query OK, 0 rows affected (0.003 sec)

MySQL [(none)]> save mysql servers to disk;
Query OK, 0 rows affected (0.361 sec)


MySQL [(none)]> select hostgroup_id,hostname,port,status,weight from mysql_servers;
+--------------+---------------+------+--------+--------+
| hostgroup_id | hostname | port | status | weight |
+--------------+---------------+------+--------+--------+
| 10 | 192.168.9.211 | 9030 | ONLINE | 1 |
| 20 | 192.168.9.212 | 9030 | ONLINE | 1 |
| 20 | 192.168.9.213 | 9030 | ONLINE | 1 |
+--------------+---------------+------+--------+--------+
3 rows in set (0.000 sec)

Configure Doris users

All the above configurations are about the back-end Doris FE node. Now you can configure the SQL statements, including: the user who sends the SQL statement, the routing rules of the SQL statement, the cache of the SQL query, the rewriting of the SQL statement, and so on.

This section is the user configuration used by the SQL request, such as the root user. This requires that we need to add relevant users to the back-end Doris FE node first. Here are examples of two user names root and doris.

First, execute on the Doris FE master master database node:
# mysql -P9030 -uroot -p
.........

mysql> create user doris@'%' identified by 'P@ssword1!';
Query OK, 0 rows affected, 1 warning (0.04 sec)

mysql> grant ADMIN_PRIV on *.* to doris@'%';
Query OK, 0 rows affected, 1 warning (0.03 sec)


Then go back to the mysql-proxy proxy layer node, configure the mysql_users table, and add the two users just now to the table.
admin> insert into mysql_users(username,password,default_hostgroup) values('root','',10);
Query OK, 1 row affected (0.001 sec)

admin> insert into mysql_users(username,password,default_hostgroup) values('doris','P@ssword1!',10);
Query OK, 1 row affected (0.000 sec)

admin> load mysql users to runtime;
Query OK, 0 rows affected (0.001 sec)

admin> save mysql users to disk;
Query OK, 0 rows affected (0.108 sec)

The mysql_users table has many fields. The three main fields are username, password, and default_hostgroup:
-username: The username used by the front-end to connect to ProxySQL and ProxySQL to route SQL statements to MySQL.
-password: the password corresponding to the user name. It can be a plain text password or a hash password. If you want to use the hash password, you can execute it on a MySQL node first select password(PASSWORD), and then copy the encryption result to this field.
-default_hostgroup: The default routing destination of the username. For example, when the field value of the specified root user is 10, the SQL statement sent by the root user is used by default
In this case, it will be routed to a node in the hostgroup_id=10 group.

admin> select * from mysql_users\G
*************************** 1. row ***************************
username: root
password:
active: 1
use_ssl: 0
default_hostgroup: 10
default_schema: NULL
schema_locked: 0
transaction_persistent: 1
fast_forward: 0
backend: 1
frontend: 1
max_connections: 10000
*************************** 2. row ***************************
username: doris
password: P@ssword1!
active: 1
use_ssl: 0
default_hostgroup: 10
default_schema: NULL
schema_locked: 0
transaction_persistent: 1
fast_forward: 0
backend: 1
frontend: 1
max_connections: 10000
2 rows in set (0.000 sec)

Although the mysql_users table is not described in detail here, only users with active=1 are valid users, and the default active is 1.

MySQL [(none)]> load mysql users to runtime;
Query OK, 0 rows affected (0.001 sec)

MySQL [(none)]> save mysql users to disk;
Query OK, 0 rows affected (0.123 sec)

In this way, you can use the doris username and password to connect to ProxySQL through the sql client

Connect to Doris through ProxySQL for testing

Next, use the root user and doris user to test whether they can be routed to the default hostgroup_id=10 (it is a write group) to read data. The following is connected through the forwarding port 6033, the connection is forwarded to the real back-end database!

#mysql -uroot -p -P6033 -hdoris01 -e "show databases;"
Enter password:
ERROR 9001 (HY000) at line 1: Max connect timeout reached while reaching hostgroup 10 after 10000ms
At this time, an error was found, and it was not forwarded to the real doris fe on the backend.
Through the log, you can see that there is set autocommit=0 to open the transaction
Check the configuration found:

mysql-forward_autocommit=false
mysql-autocommit_false_is_transaction=false

We don’t need to read and write separation here, just turn these two parameters into true directly through the following statement.
mysql> UPDATE global_variables SET variable_value='true' WHERE variable_name='mysql-forward_autocommit';
Query OK, 1 row affected (0.00 sec)

mysql> UPDATE global_variables SET variable_value='true' WHERE variable_name='mysql-autocommit_false_is_transaction';
Query OK, 1 row affected (0.01 sec)

mysql> LOAD MYSQL VARIABLES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

mysql> SAVE MYSQL VARIABLES TO DISK;
Query OK, 98 rows affected (0.12 sec)

Then we try again and it shows success
[root@doris01 ~]# mysql -udoris -pP@ssword1! -P6033 -h192.168.9.211 -e "show databases;"
Warning: Using a password on the command line interface can be insecure.
+--------------------+
| Database |
+--------------------+
| doris_audit_db |
| information_schema |
| retail |
+--------------------+

OK, that's the end, you can use Mysql client, JDBC, etc. to connect to ProxySQL to operate your doris.

Nginx TCP reverse proxy method

Overview

Nginx can implement load balancing of HTTP and HTTPS protocols, as well as load balancing of TCP protocol. So, the question is, can the load balancing of the Apache Doris database be achieved through Nginx? The answer is: yes. Next, let's discuss how to use Nginx to achieve load balancing of Apache Doris.

Environmental preparation

Note: Using Nginx to achieve load balancing of Apache Doris database, the premise is to build an Apache Doris environment. The IP and port of Apache Doris FE are as follows. Here I use one FE to demonstrate, multiple FEs only You need to add multiple FE IP addresses and ports in the configuration

The Apache Doris and port to access MySQL through Nginx are shown below.

IP: 172.31.7.119 
端口: 9030

Install dependencies

sudo apt-get install build-essential
sudo apt-get install libpcre3 libpcre3-dev
sudo apt-get install zlib1g-dev
sudo apt-get install openssl libssl-dev

Install Nginx

sudo wget http://nginx.org/download/nginx-1.18.0.tar.gz
sudo tar zxvf nginx-1.18.0.tar.gz
cd nginx-1.18.0
sudo ./configure --prefix=/usr/local/nginx --with-stream --with-http_ssl_module --with-http_gzip_static_module --with-http_stub_status_module
sudo make && make install

Configure reverse proxy

Here is a new configuration file

vim /usr/local/nginx/conf/default.conf

Then add the following in it

events {
worker_connections 1024;
}
stream {
upstream mysqld {
hash $remote_addr consistent;
server 172.31.7.119:9030 weight=1 max_fails=2 fail_timeout=60s;
##注意这里如果是多个FE,加载这里就行了
}
###这里是配置代理的端口,超时时间等
server {
listen 6030;
proxy_connect_timeout 30s;
proxy_timeout 30s;
proxy_pass mysqld;
}
}

Start Nginx

Start the specified configuration file

cd /usr/local/nginx
/usr/local/nginx/sbin/nginx -c conf.d/default.conf

verify

mysql -uroot -P6030 -h172.31.7.119

Parameter explanation: -u specifies the Doris username -p specifies the Doris password, my password here is empty, so there is no -h specifies the Nginx proxy server IP-P specifies the port

mysql -uroot -P6030 -h172.31.7.119
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 13
Server version: 5.1.0 Doris version 0.15.1-rc09-Unknown

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| test |
+--------------------+
2 rows in set (0.00 sec)

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+------------------+
| Tables_in_test |
+------------------+
| dwd_product_live |
+------------------+
1 row in set (0.00 sec)
mysql> desc dwd_product_live;
+-----------------+---------------+------+-------+---------+---------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-------+---------+---------+
| dt | DATE | Yes | true | NULL | |
| proId | BIGINT | Yes | true | NULL | |
| authorId | BIGINT | Yes | true | NULL | |
| roomId | BIGINT | Yes | true | NULL | |
| proTitle | VARCHAR(1024) | Yes | false | NULL | REPLACE |
| proLogo | VARCHAR(1024) | Yes | false | NULL | REPLACE |
| shopId | BIGINT | Yes | false | NULL | REPLACE |
| shopTitle | VARCHAR(1024) | Yes | false | NULL | REPLACE |
| profrom | INT | Yes | false | NULL | REPLACE |
| proCategory | BIGINT | Yes | false | NULL | REPLACE |
| proPrice | DECIMAL(18,2) | Yes | false | NULL | REPLACE |
| couponPrice | DECIMAL(18,2) | Yes | false | NULL | REPLACE |
| livePrice | DECIMAL(18,2) | Yes | false | NULL | REPLACE |
| volume | BIGINT | Yes | false | NULL | REPLACE |
| addedTime | BIGINT | Yes | false | NULL | REPLACE |
| offTimeUnix | BIGINT | Yes | false | NULL | REPLACE |
| offTime | BIGINT | Yes | false | NULL | REPLACE |
| createTime | BIGINT | Yes | false | NULL | REPLACE |
| createTimeUnix | BIGINT | Yes | false | NULL | REPLACE |
| amount | DECIMAL(18,2) | Yes | false | NULL | REPLACE |
| views | BIGINT | Yes | false | NULL | REPLACE |
| commissionPrice | DECIMAL(18,2) | Yes | false | NULL | REPLACE |
| proCostPrice | DECIMAL(18,2) | Yes | false | NULL | REPLACE |
| proCode | VARCHAR(1024) | Yes | false | NULL | REPLACE |
| proStatus | INT | Yes | false | NULL | REPLACE |
| status | INT | Yes | false | NULL | REPLACE |
| maxPrice | DECIMAL(18,2) | Yes | false | NULL | REPLACE |
| liveView | BIGINT | Yes | false | NULL | REPLACE |
| firstCategory | BIGINT | Yes | false | NULL | REPLACE |
| secondCategory | BIGINT | Yes | false | NULL | REPLACE |
| thirdCategory | BIGINT | Yes | false | NULL | REPLACE |
| fourCategory | BIGINT | Yes | false | NULL | REPLACE |
| minPrice | DECIMAL(18,2) | Yes | false | NULL | REPLACE |
| liveVolume | BIGINT | Yes | false | NULL | REPLACE |
| liveClick | BIGINT | Yes | false | NULL | REPLACE |
| extensionId | VARCHAR(128) | Yes | false | NULL | REPLACE |
| beginTime | BIGINT | Yes | false | NULL | REPLACE |
| roomTitle | TEXT | Yes | false | NULL | REPLACE |
| beginTimeUnix | BIGINT | Yes | false | NULL | REPLACE |
| nickname | TEXT | Yes | false | NULL | REPLACE |
+-----------------+---------------+------+-------+---------+---------+
40 rows in set (0.06 sec)