Controlling Join Order with Leading Hint: Manually Specify Join Order to Optimize Queries
Leading Hint is a hint syntax for manually specifying the Join order of multiple tables in SQL, used to optimize execution plans for complex queries in specific scenarios. For detailed syntax, refer to the leading hint documentation.
Reading Notes
- You are familiar with the Doris Nereids optimizer and
EXPLAIN SHAPE PLANoutput - The current query is a multi-table Join, and the order automatically chosen by the optimizer does not meet expectations
- You need to control the Join shape as a left-deep tree, right-deep tree, or Bushy tree
- You need to control both the Join order and the distribution method (Shuffle / Broadcast) at the same time
Doris already provides strong out-of-the-box capabilities. In the vast majority of scenarios, the optimizer adaptively optimizes performance for various scenarios, and users do not need to manually tune with Hints. The content of this chapter is mainly intended for professional tuning personnel; business users only need to be aware of it.
Quick Reference for Applicable Scenarios
| Scenario | Recommended Hint Syntax | Resulting Join Shape |
|---|---|---|
| Swap the left and right Join order of two tables | leading(t2 t1) | Left and right swapped |
| Force a left-deep tree for multiple tables | leading(t1 t2 t3) | Left-deep tree |
| Force a right-deep tree for multiple tables | leading(t1 {t2 t3}) | Right-deep tree |
| Force a Bushy tree for multiple tables | leading({t1 t2} {t3 t4}) | Bushy tree |
| Subquery / view participates in the Join as a whole | leading(alias t1) | Alias acts as a single Join node |
| Control both order and distribution method | leading(a shuffle b broadcast c) | Order plus specified Shuffle / Broadcast |
One-sentence definition: Leading Hint = manually telling the optimizer "which tables to Join first, which to Join next, and in what shape."
Case 1: Adjusting the Order of Left and Right Tables
Goal: Adjust the default t1 join t2 order to t2 join t1.
Original plan:
mysql> explain shape plan select * from t1 join t2 on t1.c1 = t2.c2;
+------------------------------------------------------------------------------+
| _Explain_ String(Nereids Planner) |
+------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t1] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t2] |
+------------------------------------------------------------------------------+
Apply Leading Hint:
mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on t1.c1 = t2.c2;
+------------------------------------------------------------------------------+
| _Explain_ String(Nereids Planner) |
+------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(t2 t1) |
| UnUsed: |
| SyntaxError: |
+------------------------------------------------------------------------------+
Explanation: Used: leading(t2 t1) in the Hint log indicates that the Hint has taken effect, and the order of the left and right tables has been swapped.
Case 2: Forcing a Left-Deep Tree
Goal: Make t1, t2, and t3 execute as a left-deep tree in the form ((t1 ⨝ t2) ⨝ t3).
mysql> explain shape plan select /*+ leading(t1 t2 t3) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3;
+--------------------------------------------------------------------------------+
| _Explain_ String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: leading(t1 t2 t3) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------+
Explanation: Used: leading(t1 t2 t3) in the Hint log indicates that the Hint has taken effect, and the plan is a left-deep tree.
Case 3: Forcing a Right-Deep Tree
Goal: Use curly braces {} to wrap the right-side sub-Join and construct a right-deep tree of the form (t1 ⨝ (t2 ⨝ t3)).
mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3;
+----------------------------------------------------------------------------------+
| _Explain_ String(Nereids Planner) |
+----------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t1] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ------------PhysicalOlapScan[t2] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: leading(t1 { t2 t3 }) |
| UnUsed: |
| SyntaxError: |
+----------------------------------------------------------------------------------+
Explanation: Used: leading(t1 { t2 t3 }) in the Hint log indicates that the Hint has taken effect, and the plan is a right-deep tree.
Case 4: Forcing a Bushy Tree
Goal: Use two sets of {} to wrap the left and right subtrees respectively, constructing a Bushy tree of the form ((t1 ⨝ t2) ⨝ (t3 ⨝ t4)).
mysql> explain shape plan select /*+ leading({t1 t2} {t3 t4}) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3 join t4 on t3.c3 = t4.c4;
+-----------------------------------------------+
| _Explain_ String |
+-----------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t2.c2 = t3.c3) |
| --------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute |
| ----------hashJoin[INNER_JOIN](t3.c3 = t4.c4) |
| ------------PhysicalOlapScan[t3] |
| ------------PhysicalDistribute |
| --------------PhysicalOlapScan[t4] |
| |
| Used: leading({ t1 t2 } { t3 t4 }) |
| UnUsed: |
| SyntaxError: |
+-----------------------------------------------+
Explanation: Used: leading({ t1 t2 } { t3 t4 }) in the Hint log indicates that the Hint has taken effect, and the plan is a Bushy tree.
Case 5: View / Subquery Participating in the Join as a Whole
Goal: Make the subquery alias alias join with the outer table t1 in a specified order.
mysql> explain shape plan select /*+ leading(alias t1) */ count(*) from t1 join (select c2 from t2 join t3 on t2.c2 = t3.c3) as alias on t1.c1 = alias.c2;
+--------------------------------------------------------------------------------------+
| _Explain_ String(Nereids Planner) |
+--------------------------------------------------------------------------------------+
| PhysicalResultSink |
| --hashAgg[GLOBAL] |
| ----PhysicalDistribute[DistributionSpecGather] |
| ------hashAgg[LOCAL] |
| --------PhysicalProject |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = alias.c2)) otherCondition=() |
| ------------PhysicalProject |
| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ----------------PhysicalProject |
| ------------------PhysicalOlapScan[t2] |
| ----------------PhysicalDistribute[DistributionSpecHash] |
| ------------------PhysicalProject |
| --------------------PhysicalOlapScan[t3] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalProject |
| ----------------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(alias t1) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------------+
Explanation: Used: leading(alias t1) in the Hint log indicates that the Hint has taken effect, and the subquery alias is treated as a single whole node.
Case 6: Mixing Distribute Hint with Leading Hint
Goal: While specifying the Join order, specify the shuffle or broadcast distribution method for each pair of joins.
explain shape plan
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
/*+ leading(orders shuffle {lineitem shuffle part} shuffle {supplier broadcast nation} shuffle partsupp) */
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%green%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;
Meaning of keywords:
| Keyword | Purpose |
|---|---|
leading(...) | Controls the overall relative Join order and shape between tables |
shuffle | Specifies that this Join uses Shuffle distribution |
broadcast | Specifies that this Join uses Broadcast distribution |
{ ... } | Bundles multiple tables into a subtree, determining the Join shape |
Explanation: By combining the two types of Hints, you can flexibly control both the Join order and the Join method at the same time, making it easier to manually specify the desired execution plan.
- Use
EXPLAINto carefully analyze the execution plan and confirm that the Leading Hint achieves the expected effect. - After Doris version upgrades or changes in business data, re-evaluate the effect of the Leading Hint, and record and adjust it in a timely manner.
Frequently Asked Questions
Q1: The Hint did not take effect, and the Hint log shows content under UnUsed or SyntaxError?
- Common causes: misspelled table names / aliases in the Hint, mismatched number of tables between the Hint and the SQL, or unmatched brackets.
- Troubleshooting: Check the
Hint logsection at the end of theEXPLAIN SHAPE PLANoutput, and locate the specific message on theUnUsedorSyntaxErrorline.
Q2: What is the relationship between Leading Hint and Distribute Hint?
leadingdetermines which tables to Join first and the resulting shape;shuffle/broadcastdetermines the data distribution method for each pair of joins.- They can be used independently or mixed together as in Case 6.
Q3: Why can a subquery not be split into a Leading Hint?
- A subquery / view participates in
leadingordering as a single whole node through its alias. See Case 5. - If you also need to specify the order of tables inside the subquery, write another
leadingHint inside the subquery.
Q4: When should Leading Hint not be used?
- The plan automatically chosen by the optimizer is already optimal.
- When business data changes frequently and statistics are unstable, fixing a Hint may instead degrade performance.
Summary
- Leading Hint is used to manually control the Join order and shape (left-deep tree / right-deep tree / Bushy tree).
- It can be combined with Shuffle / Broadcast Hints to control both order and distribution method at the same time.
- Use with caution: apply it only after fully understanding the query characteristics and data distribution, and review it periodically as versions and data change.