Skip to main content

Controlling Join Order with Leading Hint: Manually Specify Join Order to Optimize Queries

Leading Hint is a hint syntax for manually specifying the Join order of multiple tables in SQL, used to optimize execution plans for complex queries in specific scenarios. For detailed syntax, refer to the leading hint documentation.

Reading Notes

  • You are familiar with the Doris Nereids optimizer and EXPLAIN SHAPE PLAN output
  • The current query is a multi-table Join, and the order automatically chosen by the optimizer does not meet expectations
  • You need to control the Join shape as a left-deep tree, right-deep tree, or Bushy tree
  • You need to control both the Join order and the distribution method (Shuffle / Broadcast) at the same time
Note

Doris already provides strong out-of-the-box capabilities. In the vast majority of scenarios, the optimizer adaptively optimizes performance for various scenarios, and users do not need to manually tune with Hints. The content of this chapter is mainly intended for professional tuning personnel; business users only need to be aware of it.

Quick Reference for Applicable Scenarios

ScenarioRecommended Hint SyntaxResulting Join Shape
Swap the left and right Join order of two tablesleading(t2 t1)Left and right swapped
Force a left-deep tree for multiple tablesleading(t1 t2 t3)Left-deep tree
Force a right-deep tree for multiple tablesleading(t1 {t2 t3})Right-deep tree
Force a Bushy tree for multiple tablesleading({t1 t2} {t3 t4})Bushy tree
Subquery / view participates in the Join as a wholeleading(alias t1)Alias acts as a single Join node
Control both order and distribution methodleading(a shuffle b broadcast c)Order plus specified Shuffle / Broadcast

One-sentence definition: Leading Hint = manually telling the optimizer "which tables to Join first, which to Join next, and in what shape."

Case 1: Adjusting the Order of Left and Right Tables

Goal: Adjust the default t1 join t2 order to t2 join t1.

Original plan:

mysql> explain shape plan select * from t1 join t2 on t1.c1 = t2.c2;
+------------------------------------------------------------------------------+
| _Explain_ String(Nereids Planner) |
+------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t1] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t2] |
+------------------------------------------------------------------------------+

Apply Leading Hint:

mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on t1.c1 = t2.c2;
+------------------------------------------------------------------------------+
| _Explain_ String(Nereids Planner) |
+------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(t2 t1) |
| UnUsed: |
| SyntaxError: |
+------------------------------------------------------------------------------+

Explanation: Used: leading(t2 t1) in the Hint log indicates that the Hint has taken effect, and the order of the left and right tables has been swapped.

Case 2: Forcing a Left-Deep Tree

Goal: Make t1, t2, and t3 execute as a left-deep tree in the form ((t1 ⨝ t2) ⨝ t3).

mysql> explain shape plan select /*+ leading(t1 t2 t3) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3;
+--------------------------------------------------------------------------------+
| _Explain_ String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: leading(t1 t2 t3) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------+

Explanation: Used: leading(t1 t2 t3) in the Hint log indicates that the Hint has taken effect, and the plan is a left-deep tree.

Case 3: Forcing a Right-Deep Tree

Goal: Use curly braces {} to wrap the right-side sub-Join and construct a right-deep tree of the form (t1 ⨝ (t2 ⨝ t3)).

mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3;
+----------------------------------------------------------------------------------+
| _Explain_ String(Nereids Planner) |
+----------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t1] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ------------PhysicalOlapScan[t2] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: leading(t1 { t2 t3 }) |
| UnUsed: |
| SyntaxError: |
+----------------------------------------------------------------------------------+

Explanation: Used: leading(t1 { t2 t3 }) in the Hint log indicates that the Hint has taken effect, and the plan is a right-deep tree.

Case 4: Forcing a Bushy Tree

Goal: Use two sets of {} to wrap the left and right subtrees respectively, constructing a Bushy tree of the form ((t1 ⨝ t2) ⨝ (t3 ⨝ t4)).

mysql> explain shape plan select /*+ leading({t1 t2} {t3 t4}) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3 join t4 on t3.c3 = t4.c4;
+-----------------------------------------------+
| _Explain_ String |
+-----------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t2.c2 = t3.c3) |
| --------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute |
| ----------hashJoin[INNER_JOIN](t3.c3 = t4.c4) |
| ------------PhysicalOlapScan[t3] |
| ------------PhysicalDistribute |
| --------------PhysicalOlapScan[t4] |
| |
| Used: leading({ t1 t2 } { t3 t4 }) |
| UnUsed: |
| SyntaxError: |
+-----------------------------------------------+

Explanation: Used: leading({ t1 t2 } { t3 t4 }) in the Hint log indicates that the Hint has taken effect, and the plan is a Bushy tree.

Case 5: View / Subquery Participating in the Join as a Whole

Goal: Make the subquery alias alias join with the outer table t1 in a specified order.

mysql>  explain shape plan select /*+ leading(alias t1) */ count(*) from t1 join (select c2 from t2 join t3 on t2.c2 = t3.c3) as alias on t1.c1 = alias.c2;
+--------------------------------------------------------------------------------------+
| _Explain_ String(Nereids Planner) |
+--------------------------------------------------------------------------------------+
| PhysicalResultSink |
| --hashAgg[GLOBAL] |
| ----PhysicalDistribute[DistributionSpecGather] |
| ------hashAgg[LOCAL] |
| --------PhysicalProject |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = alias.c2)) otherCondition=() |
| ------------PhysicalProject |
| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ----------------PhysicalProject |
| ------------------PhysicalOlapScan[t2] |
| ----------------PhysicalDistribute[DistributionSpecHash] |
| ------------------PhysicalProject |
| --------------------PhysicalOlapScan[t3] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalProject |
| ----------------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(alias t1) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------------+

Explanation: Used: leading(alias t1) in the Hint log indicates that the Hint has taken effect, and the subquery alias is treated as a single whole node.

Case 6: Mixing Distribute Hint with Leading Hint

Goal: While specifying the Join order, specify the shuffle or broadcast distribution method for each pair of joins.

explain shape plan
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
/*+ leading(orders shuffle {lineitem shuffle part} shuffle {supplier broadcast nation} shuffle partsupp) */
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%green%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;

Meaning of keywords:

KeywordPurpose
leading(...)Controls the overall relative Join order and shape between tables
shuffleSpecifies that this Join uses Shuffle distribution
broadcastSpecifies that this Join uses Broadcast distribution
{ ... }Bundles multiple tables into a subtree, determining the Join shape

Explanation: By combining the two types of Hints, you can flexibly control both the Join order and the Join method at the same time, making it easier to manually specify the desired execution plan.

Usage Recommendations
  • Use EXPLAIN to carefully analyze the execution plan and confirm that the Leading Hint achieves the expected effect.
  • After Doris version upgrades or changes in business data, re-evaluate the effect of the Leading Hint, and record and adjust it in a timely manner.

Frequently Asked Questions

Q1: The Hint did not take effect, and the Hint log shows content under UnUsed or SyntaxError?

  • Common causes: misspelled table names / aliases in the Hint, mismatched number of tables between the Hint and the SQL, or unmatched brackets.
  • Troubleshooting: Check the Hint log section at the end of the EXPLAIN SHAPE PLAN output, and locate the specific message on the UnUsed or SyntaxError line.

Q2: What is the relationship between Leading Hint and Distribute Hint?

  • leading determines which tables to Join first and the resulting shape; shuffle / broadcast determines the data distribution method for each pair of joins.
  • They can be used independently or mixed together as in Case 6.

Q3: Why can a subquery not be split into a Leading Hint?

  • A subquery / view participates in leading ordering as a single whole node through its alias. See Case 5.
  • If you also need to specify the order of tables inside the subquery, write another leading Hint inside the subquery.

Q4: When should Leading Hint not be used?

  • The plan automatically chosen by the optimizer is already optimal.
  • When business data changes frequently and statistics are unstable, fixing a Hint may instead degrade performance.

Summary

  • Leading Hint is used to manually control the Join order and shape (left-deep tree / right-deep tree / Bushy tree).
  • It can be combined with Shuffle / Broadcast Hints to control both order and distribution method at the same time.
  • Use with caution: apply it only after fully understanding the query characteristics and data distribution, and review it periodically as versions and data change.