Q1. Query error: Failed to get scan range, no queryable replica found in tablet: xxxx
This happens because the corresponding tablet does not find a copy that can be queried, usually because the BE is down, the copy is missing, etc. You can first pass the
show tablet tablet_id statement and then execute the following
show proc statement to view the replica information corresponding to this tablet and check whether the replica is complete. At the same time, you can also query the progress of replica scheduling and repair in the cluster through
show proc "/cluster_balance" information.
For commands related to data copy management, please refer to Data Copy Management.
Q2. Show backends/frontends The information viewed is incomplete
After executing certain statements such as
show backends/frontends, some columns may be found to be incomplete in the results. For example, the disk capacity information cannot be seen in the show backends result.
Usually this problem occurs when the cluster has multiple FEs. If users connect to non-Master FE nodes to execute these statements, they will see incomplete information. This is because some information exists only on the Master FE node. For example, BE's disk usage information, etc. Therefore, complete information can only be obtained after a direct connection to the Master FE.
Of course, users can also execute
set forward_to_master=true; before executing these statements. After the session variable is set to true, some information viewing statements executed subsequently will be automatically forwarded to the Master FE to obtain the results. In this way, no matter which FE the user is connected to, the complete result can be obtained.
Q3. invalid cluster id: xxxx
This error may appear in the results of the show backends or show frontends commands. Usually appears in the error message column of an FE or BE node. The meaning of this error is that after the Master FE sends the heartbeat information to the node, the node finds that the cluster id carried in the heartbeat information is different from the cluster id stored locally, so it refuses to respond to the heartbeat.
The Master FE node of Doris will actively send heartbeats to each FE or BE node, and will carry a cluster_id in the heartbeat information. cluster_id is the unique cluster ID generated by the Master FE when a cluster is initialized. When the FE or BE receives the heartbeat information for the first time, the cluster_id will be saved locally in the form of a file. The file of FE is in the image/ directory of the metadata directory, and the BE has a cluster_id file in all data directories. After that, each time the node receives the heartbeat, it will compare the content of the local cluster_id with the content in the heartbeat. If it is inconsistent, it will refuse to respond to the heartbeat.
This mechanism is a node authentication mechanism to prevent receiving false heartbeat messages sent by nodes outside the cluster.
If needed to recover from this error. The first thing to do is to make sure that all the nodes are in the correct cluster. After that, for the FE node, you can try to modify the cluster_id value in the image/VERSION file in the metadata directory and restart the FE. For the BE node, you can delete all the cluster_id files in the data directory and restart the BE.
Q4. Unique Key model query results are inconsistent
In some cases, when a user uses the same SQL to query a table with a Unique Key model, the results of multiple queries may be inconsistent. And the query results always change between 2-3 kinds.
This may be because, in the same batch of imported data, there are data with the same key but different values, which will lead to inconsistent results between different replicas due to the uncertainty of the sequence of data overwriting.
For example, the table is defined as k1, v1. A batch of imported data is as follows:
Then maybe the result of copy 1 is
1, "abc", and the result of copy 2 is
1, "def". As a result, the query results are inconsistent.
To ensure that the data sequence between different replicas is unique, you can refer to the Sequence Column function.
Q5. The problem of querying bitmap/hll type data returns NULL
In version 1.1.x, when vectorization is enabled, and the bitmp type field in the query data table returns a NULL result,
- First you have to
- Turn off vectorization
- Turn off SQL cache
set [global] enable_sql_cache = false;
This is because the bitmap / hll type is in the vectorized execution engine: the input is all NULL, and the output result is also NULL instead of 0
Q6. Error when accessing object storage: curl 77: Problem with the SSL CA cert
curl 77: Problem with the SSL CA cert error appears in the be.INFO log. You can try to solve it in the following ways:
- Download the certificate at https://curl.se/docs/caextract.html: cacert.pem
- Copy the certificate to the specified location:
sudo cp /tmp/cacert.pem /etc/ssl/certs/ca-certificates.crt
- Restart the BE node.