Collecting statistics with the Statistics Manager
As stated earlier, the collection of statistics is handled by a separate component called the Statistics Manager. Statistical information can be used by the query optimizer to determine the best access plan for a query. Since the query optimizer bases its choice of access plan on the statistical information found in the table, it is important that this information be current.
On many platforms, statistics collection is a manual process that is the responsibility of the database administrator. With iSeries™ servers, the database statistics collection process is handled automatically, and only rarely is it necessary to update statistics manually.
The Statistics Manager does not actually run or optimize the query. It controls the access to the metadata and other information that is required to optimize the query. It uses this information to answer questions posed by the query optimizer. The answers can either be derived from table header information, from existing indexes, or from single-column statistics.
The Statistics Manager must always provide an answer to the questions from the Optimizer. It uses the best method available to provide the answers. For example, it may use a single-column statistic or perform a key range estimate over an index. Along with the answer, the Statistics Manager returns a confidence level to the optimizer that the optimizer may use to provide greater latitude for sizing algorithms. If the Statistics Manager provides a low confidence in the number of groups that are estimated for a grouping request, then the optimizer may increase the size of the temporary hash table allocated.
- Automatic statistics collection
When the Statistics Manager prepares its responses to the Optimizer, it keeps track of the responses that are generated by using default filter factors (because column statistics or indexes were not available). It uses this information during the time that the access plan is being written to the Plan Cache to automatically generate a statistic collection request for the columns. If system resources allow it, the Statistics Manager generates statistics collections in real time for direct use by the current query, avoiding a default answer to the Optimizer.
- Automatic statistics refresh
Column statistics are not maintained when the underlying table data changes. The Statistics Manager determines if columns statistics are still valid or if they no longer represent the column accurately (stale).
- Viewing statistics requests
You can view the current statistics requests by using iSeries Navigator or by using Statistics APIs.
- Indexes versus column statistics
If you are trying to decide whether to use statistics or indexes to provide information to the Statistics Manager, keep the following differences in mind.
- Monitoring background statistics collection
The system value QDBFSTCCOL controls who is allowed to create statistics in the background.
- Replication of column statistics with CRTDUPOBJ versus CPYF
You can replicate column statistics with the Create Duplicate Object (CRTDUPOBJ) or the Copy File (CPYF) commands.
- Determining what column statistics exist
You can determine what column statistics exist in a couple of ways.
- Manually collecting and refreshing statistics
You can manually collect and refresh statistics through iSeries Navigator or by using Statistics APIs.
- Statistics Manager APIs
The following APIs are used to implement the statistics function of iSeries Navigator.
Parent topic:
Optimizing query performance using query optimization tools
Related concepts
Statistics Manager