Statistics Manager

 

In releases before V5R2, the retrieval of statistics was a function of the Optimizer. When the Optimizer needed to know information about a table, it looked at the table description to retrieve the row count and table size. If an index was available, the Optimizer might then extract further information about the data in the table. In V5R2, the collection of statistics was removed from the Optimizer and is now handled by a separate component called the Statistics Manager.

The Statistics Manager does not actually run or optimize the query. It controls the access to the metadata and other information that is required to optimize the query. It uses this information to answer questions posed by the query optimizer. The Statistics Manager always provides answers to the optimizer. In cases where it cannot provide an answer based on actual existing statistics information, it is designed to provide a predefined answer.

The Statistics Manager typically gathers and keeps track of the following information:

Cardinality of values

The number of unique or distinct occurrences of a specific value in a single column or multiple columns of a table.

Selectivity

Also known as a histogram, this information is an indication of how many rows will be selected by any given selection predicate or combination of predicates. Using sampling techniques, it describes the selectivity and distribution of values in a given column of the table.

Frequent values

The top nn most frequent values of a column together with account of how frequently each value occurs. This information is obtained by making use of statistical sampling techniques. Built-in algorithms eliminate the possibility of data skewing; for example, NULL values and default values that can influence the statistical values are not taken into account.

Metadata information

This includes the total number of rows in the table, indexes that exist over the table, and which indexes are useful for implementing the particular query.

Estimate of IO operation

This is an estimate of the amount of IO operations that are required to process the table or the identified index.

The Statistics Manager uses a hybrid approach to manage database statistics. The majority of this information can be obtained from existing indexes. In cases where the required statistics cannot be gathered from existing indexes, statistical information is constructed of single columns of a table and stored internally as part of the table. By default, this information is collected automatically by the system, but you can manually control the collection of statistics. Unlike indexes, however, statistics are not maintained immediately as data in the tables change.

 

Parent topic:

Query Engine Overview

 

Related reference


Collecting statistics with the Statistics Manager