Distinct optimization
Distinct is used to compare a value with another value.
There are two methods to write a query that returns distinct values in SQL. One method uses the DISTINCT keyword:
SELECT DISTINCT COL1, COL2 FROM TABLE1The second method uses GROUP BY:
SELECT COL1, COL2 FROM TABLE1 GROUP BY COL1, COL2All queries that contain a DISTINCT, and are run using SQE, will be rewritten into queries using GROUP BY. This rewrite enables queries using DISTINCT to take advantage of the many grouping techniques available to the optimizer.
Distinct to Grouping implementation
Below is an example of a query with a DISTINCT:
SELECT DISTINCT COL1, COL2 FROM T1 WHERE COL2 > 5 AND COL3 = 2The optimizer will rewrite it into this query:
SELECT COL1, COL2 FROM T1 WHERE COL2 > 5 AND COL3 = 2 GROUP BY COL1, COL2
Distinct removal
A query containing a DISTINCT over whole-file aggregation (no grouping or selection) allows the DISTINCT to be removed. For example, look at this query with DISTINCT:
SELECT DISTINCT COUNT(C1), SUM(C1) FROM TABLE1The optimizer rewrites this query as the following:
SELECT COUNT(C1), SUM(C1) FROM TABLE1If the DISTINCT and the GROUP BY fields are identical, the DISTINCT can be removed. If the DISTINCT fields are a subset of the GROUP BY fields (and there are no aggregates), the DISTINCTs can be removed.
Parent topic:
Processing queries: Overview