|
Q
When the query optimizer in SQL Server 7.0
examines the distribution statistics of a composite
index, does it take into account the statistics for
each column in the index or just the statistics in
the first column (assuming that this table has only
index statistics, not column statistics)? If the optimizer
considers statistics for each column in the index,
could it choose an index because the last column in
that index has favorable statistics? Consider the
following example (assuming the table contains data):
CREATE
TABLE test_table
(entry_date datetime,
row_id int,
first_name char(30),
last_name char(30))
CREATE INDEX test_table_idx on test_table
(last_name, first_name, row_id)
SELECT ROW_id
FROM test_table
WHERE row_id=10
GO
If the query optimizer checks the distribution statistics
for each column in the composite index, will SQL Server
possibly choose to use the table's index to run this
query?
A
SQL Server has a histogram only in the first
column of the multicolumn index and has density information
for all initial prefixes. The example you provide
has three densities: one for last_name; one for the
combination of last_name and first_name; and one for
the last_name, first_name, and row_id combination.
The server might still choose to scan the index rather
than the table, not because of the statistics on the
row_id column but because scanning the index costs
fewer I/Os. In addition to scanning the table in the
shown index, the optimizer investigates other indexes
that exist in the table. The most favorable index
for the query would contain the row_id as the first
column and would contain the last_name and first_name
columns. If the row_id is unique (or almost unique),
another good plan is to have a single-column index
only for the row_id. In such a case, the index seek
would be followed by a fetch of the corresponding
first_name, last_name column values from the base
table, with or without a clustered index.
|