MERGE JOIN:
Oracle performs a join between two sets of row data using the merge join algorithm. The inputs are two separate sets of row data. Output is the results of the join. Oracle reads rows from both inputs in an alternating fashion and merges together matching rows in order to generate output. The two inputs are sorted on join column.
Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins, but, on the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist:
•The row sources are sorted already.
•A sort operation does not have to be done.
Sort merge joins are useful when the join condition between two tables is an inequality condition like <, <=, >, or >= and they can perform better than nested loop joins for large data sets. We cannot use hash joins unless there is an equality condition.
In a merge join, there is no concept of a driving table. The join consists of two steps:
1. Sort joins operation: Both the inputs are sorted on the join key.
2. Merge joins operation: The sorted lists are merged together.
If the input is already sorted by the join column, then a sort join operation is not performed for that row source.
The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:
•The join condition between two tables is not an equi-join.
•OPTIMIZER_MODE is set to RULE.
•HASH_JOIN_ENABLED is false.
•Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.
•The optimizer thinks that the cost of a hash join is higher, based on the settings of HASH_AREA_SIZE and SORT_AREA_SIZE.
To advise the optimizer to use a sort merge join, apply the USE_MERGE hint. You might also need to give hints to force an access path.
HASH JOIN:
Oracle performs a join between two sets of row data using hash join algorithm. Input and Output same as Merge Join. Oracle reads all rows from the second input and builds a hash structure, before reading each row from the first input one at a time. For each row from the first input, the hash structure is probed and matching rows generate output.
Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.
After the hash table is complete, the following processes occur:
- The second, larger table is scanned.
- It is broken up into partitions like the smaller table.
- The partitions are written to disk.
When the hash table build is complete, it is possible that an entire hash table partition is resident in memory. Then, you do not need to build the corresponding partition for the second (larger) table. When that table is scanned, rows that hash to the resident hash table partition can be joined and returned immediately.
Each hash table partition is then read into memory, and the following processes occur:
- The corresponding partition for the second table is scanned.
- The hash table is probed to return the joined rows.
If the hash table does not fit in the memory, it is possible that parts of it may need to be swapped in and out, depending on the rows retrieved from the second table. Performance for this scenario can be extremely poor.
The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions is true:
- A large amount of data needs to be joined.
- A large fraction of the table needs to be joined.
Apply the USE_HASH hint to advise the optimizer to use a hash join when joining two tables together. If you are having trouble getting the optimizer to use hash joins, investigate the values for the HASH_AREA_SIZE and HASH_JOIN_ENABLED parameters