Estimate_percent
Here’s a live one from OTN – here are a couple of extracts from the problem statement:
We’re experiencing an issue where it seems that the query plan changes from day to day for a particular procedure that runs once a night.
It’s resulting in a performance variance of 10 second completion time vs 20 minutes (nothing in between).
It started occurring about 2 months ago and now it’s becoming more prevalent where the bad query plan is coming up more often.
I noticed that the query plans vary for a simple query.
We do run gather statistics every night. (DBMS_STATS.GATHER_SCHEMA_STATS (ownname=>sys_context( ‘userenv’, ‘current_schema’ ), estimate_percent => 1);)
The query and two execution plans look like this:
select count(*) from cs_bucket_member_v2 where bucket_type='P' and sec_id > 0 and order_id=0; ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 12 | 155 (0)| 00:00:02 | | 1 | SORT AGGREGATE | | 1 | 12 | | | |* 2 | TABLE ACCESS BY INDEX ROWID| CS_BUCKET_MEMBER_V2 | 1148 | 13776 | 155 (0)| 00:00:02 | |* 3 | INDEX RANGE SCAN | CS_BUCKET_MEMBER_N1 | 1272 | | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("BUCKET_TYPE"='P' AND "SEC_ID">0) 3 - access("ORDER_ID"=0) ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 12 | 11215 (2)| 00:01:41 | | 1 | SORT AGGREGATE | | 1 | 12 | | | |* 2 | TABLE ACCESS FULL| CS_BUCKET_MEMBER_V2 | 1522K| 17M| 11215 (2)| 00:01:41 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ORDER_ID"=0 AND "SEC_ID">0 AND "BUCKET_TYPE"='P')
There are a couple of bits of information that would be useful – such as the database version, the number of rows in the table, the number of distinct values in each column, and whether any of the columns have histograms – but there are a couple of reasonable guesses that we might make about the problem. Notice particularly that the number of rows estimated from the index ranges scan is 1272 and only a small volume is then eliminated by the table filter predicates on sec_id and bucket_type. This suggests that the optimizer has information that tells it that most of the rows in the table have sec_id > 0 and bucket_type = ‘P’, and you might note that that suggests that there’s a histogram on bucket_type.
Rather than stating the most obvious guesses about the problem, though, I’ll start by creating a data set and emulating the problem, starting from an empty schema on 11.2.0.4:
create table t1 nologging as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select rownum sec_id, case when mod(rownum,1000) = 0 then 'X' else 'P' end bucket_type, case when rownum < 1e6 - 50000 then mod(rownum-1,1e5) else 1000 end order_id, lpad(rownum,10,'0') id_vc, rpad('x',100,'x') padding from generator v1, generator v2 where rownum <= 1e6
create index t1_i1 on t1(order_id) nologging; select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000;
The column names in the table match those needed by the query, and the bucket_p column has a very skewed distribution that will eliminate very little data; the sec_id column is also not going to eliminate data, but it’s very evenly distributed with no large gaps so not a good candidate for a histogram in any case. The order_id has 50,000 rows out of 1,000,000 (5%) set of a single value, and most of those special rows are at the end of the table – it’s a pretty good candidate for a histogram (if Oracle spots it, and if we actually write queries to access that data).
I’ve run a query that references all three columns so that the default method_optof “for all columns size auto” will apply to them when I gather stats. So here’s the code that gathers stats and checks the result execution plans, first for “auto_sample_size” then for the 1% used by the OP:
set autotrace traceonly explain begin dbms_stats.gather_schema_stats( /* estimate_percent => 1, */ ownname => user ); end; / select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000; begin dbms_stats.gather_schema_stats( estimate_percent => 1, ownname => user ); end; / select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000; set autotrace off
And here are the two plans – in the same order:
--------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 12 | 2333 (4)| 00:00:12 | | 1 | SORT AGGREGATE | | 1 | 12 | | | |* 2 | TABLE ACCESS FULL| T1 | 51063 | 598K| 2333 (4)| 00:00:12 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ORDER_ID"=1000 AND "SEC_ID">1000 AND "BUCKET_TYPE"='P') -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 12 | 23 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 12 | | | |* 2 | TABLE ACCESS BY INDEX ROWID| T1 | 20 | 240 | 23 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | T1_I1 | 20 | | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("SEC_ID">1000 AND "BUCKET_TYPE"='P') 3 - access("ORDER_ID"=1000)
I don’t know why, but with a 1% sample (which really did sample 10,000 rows) the optimizer didn’t spot the need for a histogram on order_id, but with the auto_sample_size (which sampled 5,500 – yes, half as many rows) the optimizer spotted the need for the histogram. Checking the trace files the only difference visible in the sample SQL was the presence in the 1% sample of the id_vc and padding columns which the auto_sample_size ignored because they hadn’t been logged as used by col_usage$.
Moral
Histograms are tricky things – and you can only make things worse in 11g by NOT using the auto_sample_size.
Footnote
Based on previous experience – my “obvious” guess about the OP’s data was that there was a special-case value for order_id, that the rows for that value were fairly well clustered, probably towards the end of the table, and constituted a small percentage of the table, and that the rest of the data reported “a few” rows per value. That’s why I built the model you see above.