Optimizing Join with Colocate Group
Defining colocate group is an efficient way of Join. It allows the execution engine to effectively avoid the data transmission overhead typically associated with Join operations (for an introduction to Colocate Group, see Colocation Join)
However, in some use cases, even if a Colocate Group has been successfully established, the execution plan may still show as Shuffle Join or Bucket Shuffle Join. This situation typically occurs when Doris is organizing data. For instance, it may be migrating tablets between BEs to ensure a more balanced data distribution across multiple BEs.
You can view the Colocate Group status using the command SHOW PROC "/colocation_group";
. As shown in the figure below, if IsStable
is false
, it indicates that there are unavailable Colocate Group instances.