Index grouping implementation

Index grouping implementation

There are two primary ways to implement grouping via an index: Ordered grouping and pre-summarized processing.

Ordered grouping

This implementation utilizes the Radix Index Scan or the Radix Index Probe access methods to perform the grouping. An index is required that contains all of the grouping columns as contiguous leftmost key columns. The database manager accesses the individual groups through the index and performs the requested summary functions.
Since the index, by definition, already has all of the key values grouped together, the first group result can be returned in less time than the hashing method. This is because of the temporary result that is required for the hashing method. This implementation can be beneficial if an application does not need to retrieve all of the group results or if an index already exists that matches the grouping columns.
When the grouping is implemented with an index and a permanent index does not already exist that satisfies grouping columns, a temporary index is created. The grouping columns specified within the query are used as the key columns for this index.

Pre-summarized processing

This SQE only implementation utilizes an Encoded Vector Index to extract the summary information already in the index's symbol table. The symbol table portion of an EVI contains the unique values of the key along with a count of the number of table records that have that unique value, basically the grouping for the columns of the index key are already performed. If the query references a single table and performs simple aggregation, the EVI may be used for quick access to the grouping results. For example, consider the following query:
SELECT COUNT(*), col1 
	FROM t1 
	GROUP BY col1
If an EVI exists over t1 with a key of col1, the optimizer can rewrite the query to access the precomputed grouping answer in the EVI symbol table. This can result in dramatic improvements in queries when the number of records in the table is large and the number of resulting groups is small (relative to the size of the table). This method is also possible with selection (WHERE clause), as long as the reference columns are in the key definition of the EVI. For example, consider the following query:
SELECT COUNT(*), col1
 FROM t1
 WHERE col1 > 100
 GROUP BY col1
This query can be rewritten by the optimizer to make use of the EVI. This pre-summarized processing works for DISTINCT processing, GROUP BY and for column function COUNT. All columns of the table referenced in the query must also be in the key definition of the EVI. So, for example, the following query can be made to use the EVI:
SELECT DISTINCT col1
	FROM t1
However, this query cannot:
SELECT DISTINCT col1
 FROM t1
 WHERE col2 > 1
The reason that this query cannot use the EVI is because it references col2 of the table, which is not in the key definition of the EVI. Note also that if multiple columns are defined in the EVI key, for example, col1 and col2, that it is important that the left most columns of the key be utilized. For example, if an EVI existed with a key definition of (col1, col2), but the query referenced just col2, it is very unlikely the EVI will be used.

Parent topic:
Grouping optimization