Tablet Local Debug
Tablet Local Debug
During the online operation of Doris, various bugs may occur due to various reasons. For example: the replica is inconsistent, the data exists in the version diff, etc.
At this time, it is necessary to copy the copy data of the tablet online to the local environment for reproduction, and then locate the problem.
1. Get information about the tablet
The tablet id can be confirmed by the BE log, and then the information can be obtained by the following command (assuming the tablet id is 10020).
Get information such as DbId/TableId/PartitionId where the tablet is located.
mysql> show tablet 10020\G
*************************** 1. row ***************************
DbName: default_cluster:db1
TableName: tbl1
PartitionName: tbl1
IndexName: tbl1
DbId: 10004
TableId: 10016
PartitionId: 10015
IndexId: 10017
IsSync: true
Order: 1
DetailCmd: SHOW PROC '/dbs/10004/10016/partitions/10015/10017/10020';
Execute DetailCmd
in the previous step to obtain information such as BackendId/SchemaHash.
mysql> SHOW PROC '/dbs/10004/10016/partitions/10015/10017/10020'\G
*************************** 1. row ***************************
ReplicaId: 10021
BackendId: 10003
Version: 3
LstSuccessVersion: 3
LstFailedVersion: -1
LstFailedTime: NULL
SchemaHash: 785778507
LocalDataSize: 780
RemoteDataSize: 0
RowCount: 2
State: NORMAL
IsBad: false
VersionCount: 3
PathHash: 7390150550643804973
MetaUrl: http://192.168.10.1:8040/api/meta/header/10020
CompactionStatus: http://192.168.10.1:8040/api/compaction/show?tablet_id=10020
Create tablet snapshot and get table creation statement
mysql> admin copy tablet 10020 properties("backend_id" = "10003", "version" = "2")\G
*************************** 1. row ***************************
TabletId: 10020
BackendId: 10003
Ip: 192.168.10.1
Path: /path/to/be/storage/snapshot/20220830101353.2.3600
ExpirationMinutes: 60
CreateTableStmt: CREATE TABLE `tbl1` (
`k1` int(11) NULL,
`k2` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`k1`, `k2`)
DISTRIBUTED BY HASH(k1) BUCKETS 1
PROPERTIES (
"replication_num" = "1",
"version_info" = "2"
);
The admin copy tablet
command can generate a snapshot file of the corresponding replica and version for the specified tablet. Snapshot files are stored in the Path
directory of the BE node indicated by the Ip
field.
There will be a directory named tablet id under this directory, which will be packaged as a whole for later use. (Note that the directory is kept for a maximum of 60 minutes, after which it is automatically deleted).
cd /path/to/be/storage/snapshot/20220830101353.2.3600
tar czf 10020.tar.gz 10020/
The command will also generate the table creation statement corresponding to the tablet at the same time. Note that this table creation statement is not the original table creation statement, its bucket number and replica number are both 1, and the versionInfo
field is specified. This table building statement is used later when loading the tablet locally.
So far, we have obtained all the necessary information, the list is as follows:
- Packaged tablet data, such as 10020.tar.gz.
- Create a table statement.
2. Load Tablet locally
Build a local debugging environment
Deploy a single-node Doris cluster (1FE, 1BE) locally, and the deployment version is the same as the online cluster. If the online deployment version is DORIS-1.1.1, the local environment also deploys the DORIS-1.1.1 version.
Create a table
Create a table in the local environment using the create table statement from the previous step.
Get the tablet information of the newly created table
Because the number of buckets and replicas of the newly created table is 1, there will only be one tablet with one replica:
mysql> show tablets from tbl1\G
*************************** 1. row ***************************
TabletId: 10017
ReplicaId: 10018
BackendId: 10003
SchemaHash: 44622287
Version: 1
LstSuccessVersion: 1
LstFailedVersion: -1
LstFailedTime: NULL
LocalDataSize: 0
RemoteDataSize: 0
RowCount: 0
State: NORMAL
LstConsistencyCheckTime: NULL
CheckVersion: -1
VersionCount: -1
PathHash: 7390150550643804973
MetaUrl: http://192.168.10.1:8040/api/meta/header/10017
CompactionStatus: http://192.168.10.1:8040/api/compaction/show?tablet_id=10017mysql> show tablet 10017\G
*************************** 1. row ***************************
DbName: default_cluster:db1
TableName: tbl1
PartitionName: tbl1
IndexName: tbl1
DbId: 10004
TableId: 10015
PartitionId: 10014
IndexId: 10016
IsSync: true
Order: 0
DetailCmd: SHOW PROC '/dbs/10004/10015/partitions/10014/10016/10017';Here we will record the following information:
- TableId
- PartitionId
- TabletId
- SchemaHash
At the same time, we also need to go to the data directory of the BE node in the debugging environment to confirm the shard id where the new tablet is located:
cd /path/to/storage/data/*/10017 && pwd
This command will enter the directory where the tablet 10017 is located and display the path. Here we will see a path similar to the following:
/path/to/storage/data/0/10017
where
0
is the shard id.Modify Tablet Data
Unzip the tablet data package obtained in the first step. The editor opens the 10017.hdr.json file, and modifies the following fields to the information obtained in the previous step:
"table_id":10015
"partition_id":10014
"tablet_id":10017
"schema_hash":44622287
"shard_id":0Load the tablet
First, stop the debug environment's BE process (./bin/stop_be.sh). Then copy all the .dat files in the same level directory of the 10017.hdr.json file to the
/path/to/storage/data/0/10017/44622287
directory. This directory is the directory where the debugging environment tablet we obtained in step 3 is located.10017/44622287
are the tablet id and schema hash respectively.Delete the original tablet meta with the
meta_tool
tool. The tool is located in thebe/lib
directory../lib/meta_tool --root_path=/path/to/storage --operation=delete_meta --tablet_id=10017 --schema_hash=44622287
Where
/path/to/storage
is the data root directory of BE. If the deletion is successful, the delete successfully log will appear.Load the new tablet meta via the
meta_tool
tool../lib/meta_tool --root_path=/path/to/storage --operation=load_meta --json_meta_path=/path/to/10017.hdr.json
If the load is successful, the load successfully log will appear.
Verification
Restart the debug environment's BE process (./bin/start_be.sh). Query the table, if correct, you can query the data of the loaded tablet, or reproduce the online problem.