Leading Hint
Leading Hint is a powerful query optimization technique that allows users to guide the Doris optimizer to determine the table join order in the query plan. Correct use of Leading Hint can significantly improve the performance of complex queries. This article will describe in detail how to use Leading Hint to control the join order in Doris.
Regular Leading Hint
Syntax
Leading Hint allows specifying the table join order that the optimizer should follow. In Doris, the basic syntax of Leading Hint is as follows:
SELECT /*+ LEADING(tablespec [tablespec]...) */ ...
It should be noted that:
- Leading Hint is surrounded by
/*+
and*/
and placed after the SELECT keyword in the SQL statement. tablespec
is the table name or table alias, and at least two tables need to be specified.- Multiple tables are separated by spaces or ','.
- You can use curly braces
{}
to explicitly specify the shape of the Join Tree.
For example:
mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on c1 = c2;
+------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(t2 t1) |
| UnUsed: |
| SyntaxError: |
+------------------------------------------------------------------------------+
When Leading Hint is not effective, the normal process will be used to generate the plan. EXPLAIN will display whether the used Hint is effective, mainly divided into three types:
Used
:Leading Hint is effective normally.Unused
:The unsupported cases here include that the join order specified by Leading Hint is not equivalent to the original SQL or the feature is not supported in this version (see limitations for details).SyntaxError
: Indicates a syntax error in Leading Hint, such as the inability to find the corresponding table.
- The default syntax of Leading Hint constructs a left-deep tree:
mysql> explain shape plan select /*+ leading(t1 t2 t3) */ * from t1 join t2 on c1 = c2 join t3 on c2=c3;
+--------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: leading(t1 t2 t3) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------+
- At the same time, curly braces can be used to specify the shape of the Join tree:
mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 join t2 on c1 = c2 join t3 on c2=c3;
+----------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+----------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t1] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ------------PhysicalOlapScan[t2] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: leading(t1 { t2 t3 }) |
| UnUsed: |
| SyntaxError: |
+----------------------------------------------------------------------------------+
- When a View is used as an alias to participate in JoinReorder, the corresponding View can be specified as a parameter of Leading Hint. For example:
mysql> explain shape plan select /*+ leading(alias t1) */ count(*) from t1 join (select c2 from t2 join t3 on t2.c2 = t3.c3) as alias on t1.c1 = alias.c2;
+--------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------+
| PhysicalResultSink |
| --hashAgg[GLOBAL] |
| ----PhysicalDistribute[DistributionSpecGather] |
| ------hashAgg[LOCAL] |
| --------PhysicalProject |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = alias.c2)) otherCondition=() |
| ------------PhysicalProject |
| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ----------------PhysicalProject |
| ------------------PhysicalOlapScan[t2] |
| ----------------PhysicalDistribute[DistributionSpecHash] |
| ------------------PhysicalProject |
| --------------------PhysicalOlapScan[t3] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalProject |
| ----------------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(alias t1) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------------+
Case
Basic Scenario
The table creation statements are as follows:
CREATE DATABASE testleading;
USE testleading;
create table t1 (c1 int, c11 int) distributed by hash(c1) buckets 3 properties('replication_num' = '1');
create table t2 (c2 int, c22 int) distributed by hash(c2) buckets 3 properties('replication_num' = '1');
create table t3 (c3 int, c33 int) distributed by hash(c3) buckets 3 properties('replication_num' = '1');
create table t4 (c4 int, c44 int) distributed by hash(c4) buckets 3 properties('replication_num' = '1');
The original plan:
mysql> explain shape plan select * from t1 join t2 on t1.c1 = c2;
+-------------------------------------------+
| Explain String |
+-------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| --------PhysicalOlapScan[t2] |
| --------PhysicalDistribute |
| ----------PhysicalOlapScan[t1] |
+-------------------------------------------+
When we need to exchange the join order of t1 and t2, we only need to add leading(t2 t1)
in front. When executing explain
, it will show whether this hint is used. The following is the Leading plan: Used
indicates that the Hint is effective normally.
mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on c1 = c2;
+------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(t2 t1) |
| UnUsed: |
| SyntaxError: |
+------------------------------------------------------------------------------+
If there is a syntax error in Leading Hint, when executing explain
, the corresponding information will be displayed in SyntaxError
, but the plan can still be generated normally, just without using Leading. For example:
mysql> explain shape plan select /*+ leading(t2 t3) */ * from t1 join t2 on t1.c1 = c2;
+--------------------------------------------------------+
| Explain String |
+--------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| --------PhysicalOlapScan[t1] |
| --------PhysicalDistribute |
| ----------PhysicalOlapScan[t2] |
| |
| Used: |
| UnUsed: |
| SyntaxError: leading(t2 t3) Msg:can not find table: t3 |
+--------------------------------------------------------+
Extended Scenario
- Left-Deep Tree
As mentioned above, when the query statement in Doris does not use any parentheses, Leading will default to generating a left-deep tree.
mysql> explain shape plan select /*+ leading(t1 t2 t3) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3;
+--------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: leading(t1 t2 t3) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------+
- Right-Deep Tree
When you need to make the shape of the plan a right-deep tree, Bushy tree, or zig-zag tree, you only need to add curly braces to limit the shape of the plan, without the need to use swap like Oracle to adjust step by step from the left-deep tree.
mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3;
+-----------------------------------------------+
| Explain String |
+-----------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| --------PhysicalOlapScan[t1] |
| --------PhysicalDistribute |
| ----------hashJoin[INNER_JOIN](t2.c2 = t3.c3) |
| ------------PhysicalOlapScan[t2] |
| ------------PhysicalDistribute |
| --------------PhysicalOlapScan[t3] |
| |
| Used: leading(t1 { t2 t3 }) |
| UnUsed: |
| SyntaxError: |
+-----------------------------------------------+
- Bushy Tree
mysql> explain shape plan select /*+ leading({t1 t2} {t3 t4}) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3 join t4 on c3 = c4;
+-----------------------------------------------+
| Explain String |
+-----------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t2.c2 = t3.c3) |
| --------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute |
| ----------hashJoin[INNER_JOIN](t3.c3 = t4.c4) |
| ------------PhysicalOlapScan[t3] |
| ------------PhysicalDistribute |
| --------------PhysicalOlapScan[t4] |
| |
| Used: leading({ t1 t2 } { t3 t4 }) |
| UnUsed: |
| SyntaxError: |
+-----------------------------------------------+
- zig-zag tree
mysql> explain shape plan select /*+ leading(t1 {t2 t3} t4) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3 join t4 on c3 = c4;
+--------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t3.c3 = t4.c4)) otherCondition=() |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ------------PhysicalOlapScan[t1] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ----------------PhysicalOlapScan[t2] |
| ----------------PhysicalDistribute[DistributionSpecHash] |
| ------------------PhysicalOlapScan[t3] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t4] |
| |
| Hint log: |
| Used: leading(t1 { t2 t3 } t4) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------------+
- Non-inner Join
When encountering a non-inner join (such as Outer Join or Semi/Anti Join), Leading Hint will automatically derive the join method of each join according to the original SQL semantics. If Leading Hint is different from the original SQL semantics or cannot be generated, it will be placed in UnUsed, but this does not affect the normal generation of the plan. The following is an example that cannot be swapped:
-------- test outer join which can not swap
-- t1 leftjoin (t2 join t3 on (P23)) on (P12) != (t1 leftjoin t2 on (P12)) join t3 on (P23)
mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 join t3 on c2 = c3;
+--------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: |
| UnUsed: leading(t1 { t2 t3 }) |
| SyntaxError: |
+--------------------------------------------------------------------------------+
Here are some examples of Outer, Semi, and Anti Joins with Leading Hint for readers to verify on their own.
-------- test outer join which can swap
-- (t1 leftjoin t2 on (P12)) innerjoin t3 on (P13) = (t1 innerjoin t3 on (P13)) leftjoin t2 on (P12)
explain shape plan select * from t1 left join t2 on c1 = c2 join t3 on c1 = c3;
explain shape plan select /*+ leading(t1 t3 t2) */ * from t1 left join t2 on c1 = c2 join t3 on c1 = c3;
-- (t1 leftjoin t2 on (P12)) leftjoin t3 on (P13) = (t1 leftjoin t3 on (P13)) leftjoin t2 on (P12)
explain shape plan select * from t1 left join t2 on c1 = c2 left join t3 on c1 = c3;
explain shape plan select /*+ leading(t1 t3 t2) */ * from t1 left join t2 on c1 = c2 left join t3 on c1 = c3;
-- (t1 leftjoin t2 on (P12)) leftjoin t3 on (P23) = t1 leftjoin (t2 leftjoin t3 on (P23)) on (P12)
select /*+ leading(t2 t3 t1) SWAP_INPUT(t1) */ * from t1 left join t2 on c1 = c2 left join t3 on c2 = c3;
explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 left join t3 on c2 = c3;
explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 left join t3 on c2 = c3;
-------- test outer join which can not swap
-- t1 leftjoin (t2 join t3 on (P23)) on (P12) != (t1 leftjoin t2 on (P12)) join t3 on (P23)
-- eliminated to inner join
explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 join t3 on c2 = c3;
explain graph select /*+ leading(t1 t2 t3) */ * from t1 left join (select * from t2 join t3 on c2 = c3) on c1 = c2;
-- test semi join
explain shape plan select * from t1 where c1 in (select c2 from t2);
explain shape plan select /*+ leading(t2 t1) */ * from t1 where c1 in (select c2 from t2);
-- test anti join
explain shape plan select * from t1 where exists (select c2 from t2);
- View
In the case of involving aliases (Alias), the alias can be specified as a complete and independent subtree, and the join order is generated according to the text order
mysql> explain shape plan select /*+ leading(alias t1) */ count(*) from t1 join (select c2 from t2 join t3 on t2.c2 = t3.c3) as alias on t1.c1 = alias.c2;
+--------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------+
| PhysicalResultSink |
| --hashAgg[GLOBAL] |
| ----PhysicalDistribute[DistributionSpecGather] |
| ------hashAgg[LOCAL] |
| --------PhysicalProject |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = alias.c2)) otherCondition=() |
| ------------PhysicalProject |
| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ----------------PhysicalProject |
| ------------------PhysicalOlapScan[t2] |
| ----------------PhysicalDistribute[DistributionSpecHash] |
| ------------------PhysicalProject |
| --------------------PhysicalOlapScan[t3] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalProject |
| ----------------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(alias t1) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------------+
Ordered Hint
Ordered hint can be regarded as a special case of leading hint and is used to control the join order as the text order.
Syntax
The syntax of Ordered Hint is /*+ ORDERED */
, which is placed after the SELECT
keyword in the SELECT
statement, followed immediately by the rest of the query.
Case
The following is an example of using Ordered Hint:
mysql> explain shape plan select /*+ ORDERED */ t1.c1 from t2 join t1 on t1.c1 = t2.c2 join t3 on c2 = c3;
+--------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalProject |
| ------------PhysicalOlapScan[t2] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalProject |
| --------------PhysicalOlapScan[t1] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalProject |
| ------------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: ORDERED |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------+
Relationship with Leading Hint
When Ordered Hint and Leading Hint are used simultaneously, Ordered Hint will take precedence over Leading Hint. This means that even if Leading Hint is specified, if Ordered Hint also exists, the query plan will be executed according to the rules of Ordered Hint, and Leading Hint will be ignored. The following is an example showing the situation when both are used simultaneously:
mysql> explain shape plan select /*+ ORDERED LEADING(t1 t2 t3) */ t1.c1 from t2 join t1 on t1.c1 = t2.c2 join t3 on c2 = c3;
+--------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalProject |
| ------------PhysicalOlapScan[t2] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalProject |
| --------------PhysicalOlapScan[t1] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalProject |
| ------------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: ORDERED |
| UnUsed: leading(t1 t2 t3) |
| SyntaxError: |
+--------------------------------------------------------------------------------+
Summary
Leading Hint is a powerful feature for manually controlling the join order and is widely used in production business tuning. Using leading hint well can meet the tuning requirements for the join order on site and increase the flexibility of system control