It’s hard to understand all the ramifications of Oracle’s undo handling, and it’s not hard to find cases where the resulting effects are very confusing. In a recent post on the OTN database forum resulted in one response insisting that the OP was obviously updating a table with frequent commits from one session while querying it from another thereby generating a large number of undo reads in the querying session.
It’s a possible cause of the symptoms that had been described – although not the only possible cause, especially since the symptoms hadn’t been described completely. It’s actually possible to see this type of activity when there are no updates and no outstanding commits taking place at all on the target table. Unfortunately it’s quite hard to demonstrate this with a quick, simple, script in recent versions of Oracle unless you do some insanely stupid things to make the problem appear – but I know how to do “insanely stupid” in Oracle, so here we go; first some data creation:
rem rem Script: undo_rec_apply_2.sql rem Author: Jonathan Lewis rem Dated: March 2017 rem create table t2(v1 varchar2(100)); insert into t2 values(rpad('x',100)); commit; create table t1 nologging pctfree 99 pctused 1 as with generator as ( select rownum id from dual connect by level <= 1e4 ) select cast(rownum as number(8,0)) id, cast(lpad(rownum,10,'0') as varchar2(10)) v1, cast(lpad('x',100,'x') as varchar2(100)) padding from generator v1, generator v2 where rownum <= 8e4 -- > comment to bypass WordPress formatting issue ; alter table t1 add constraint t1_pk primary key(id) ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 1' ); end; /
The t2 table is there as a target for a large of updates from a session other than the one demonstrating the problem. The t1 table has been defined and populated in a way that puts one row into each of 80,000 blocks (though, with ASSM and my specific tablespace definition of uniform 1MB extents, the total space is about 80,400 blocks). I’ve got a primary key declaration that allows me to pick single rows/blocks from the table if I want to.
At this point I’m going to do a lot of updates to the main table using a very inefficient strategy to emulate the type of thing that can happen on a very large table with lots of random updates and many indexes to maintain:
begin for i in 1..800 loop update t1 set v1 = upper(v1) where id = 100 * i; execute immediate 'alter system switch logfile'; execute immediate 'alter system flush buffer_cache'; commit; dbms_lock.sleep(0.01); end loop; end; / set transaction read only;
I’m updating every 100th row/block in the table with single row commits, but before each commit I’m switching log files and flushing the buffer cache.
This is NOT an experiment to try on a production system, or even a development system if there are lots of busy developers or testers around – and if you’re running your dev/test in archivelog mode (which, for some of your systems you should be) you’re going to end up with a lot of archived redo logs. I have to do this switch to ensure that the updated blocks are unpinned so that they will be written to disc and flushed from the cache by the flush buffer cache. (This extreme approach would not have been necessary in earlier versions of Oracle, but the clever developers at Oracle Corp. keep adding “damage limitation” touches to the code that I have to work around to create small tests.) Because the block has been flushed from memory before the commit the session will record a “commit cleanout failures: block lost” on each commit. By the time this loop has run to completion there will be 800 blocks from the table on disc needing a “delayed block cleanout”.
Despite the extreme brute force I use in this loop, there is a further very important detail that has to be set before this test will work (at least in 22.214.171.124, which is what I’ve used in my test runs). I had to start the database with the hidden parameter _db_cache_pre_warm set to false. If I don’t have the database started with this feature disabled Oracle would notice that the buffer cache had a lot of empty space and would “pre-warm” the cache by loading a few thousand blocks from t1 as I updated one row – with the side effect that the update from the previous cycle of the loop would be cleaned out on the current cycle of the loop. If you do run this experiment, remember to reset the parameter and restart the instance when you’ve finished.
I’ve finished this chunk of code with a call to “set transaction read only” – this emulates the start of a long-running query: it captures a point in time (through the current SCN) and any queries that run in the session from now on have to be read-consistent with that point in time. After doing this I need to use a second session to do a bit of hard work – in my case the following:
execute snap_rollstats.start_snap begin for i in 1..10000 loop update t2 set v1 = upper(v1); update t2 set v1 = lower(v1); commit; end loop; end; / execute snap_rollstats.end_snap
The calls to the snap_rollstats package simply read v$rollstat and give me a report of the changes in the undo segment statistics over the period of the loop. I’ve executed 10,000 transactions in the interval, which was sufficient on my system to use each undo segment header at least 1,000 times and (since there are 34 transaction table slots in each undo segment header) overwrite each transaction table slot about 30 times. You can infer from these comments that I have only 10 undo segments active at the time, your system may have many more (check the number of rows in v$rollstat) so you may want to scale up that 10,000 loop count accordingly.
At this point, then, the only thing I’ve done since the start of my “long running query” is to update another table from another session. What happens when I do a simple count() from t1 that requires a full tablescan ?
alter system flush buffer_cache; execute snap_filestat.start_snap execute snap_my_stats.start_snap select count(v1) from t1; execute snap_my_stats.end_snap execute snap_filestat.end_snap
I’ve flushed the buffer cache to get rid of any buffered undo blocks – again an unreasonable thing to do in production but a valid way of emulating the aging out of undo blocks that would take place in a production system – and surrounded my count() with a couple of packaged call to report the session stats and file I/O stats due to my query. (If you’re sharing your database then the file I/O stats will be affected by the activity of other users, of course, but in my case I had a private database.)
Here are the file stats:
-------------- Datafile Stats -------------- file# Reads Blocks Avg Size Avg Csecs S_Reads Avg Csecs M_Reads Avg Csecs Max Writes Blocks Avg Csecs Max File name ----- ----- ------ -------- --------- ------- --------- ------- --------- --- ------ ------ --------- --- 1 17 17 1.000 .065 17 .065 0 .000 6 0 0 .000 15 /u01/app/oracle/oradata/TEST/datafile/o1_mf_system_938s4mr3_.dbf 3 665 665 1.000 .020 665 .020 0 .000 6 0 0 .000 15 /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf 5 631 80,002 126.786 .000 2 .045 629 .000 6 0 0 .000 17 /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__cz1w7tz1_.dbf
As expected I’ve done a number of multiblock reads of my data tablespace for a total of roughly 80,000 blocks read. What you may not have expected is that I’ve done 665 single block reads of the undo tablespace.
What have I been doing with all those undo blocks ? Check the session stats:
Session stats ------------- Name Value ---- ----- transaction tables consistent reads - undo records applied 10,014 transaction tables consistent read rollbacks 10
We’ve been reading undo blocks so that we can create read-consistent copies of the 10 undo segment headers that were active in my instance. We haven’t (and you’ll have to trust me on this, I can’t show you the stats that aren’t there!) reported any “data blocks consistent reads – undo records applied”.
If you want to see a detailed explanation of what has happened you’ll need to read Oracle Core (UK source), chapter 3 (and possibly chapter 2 to warm yourself up for the topic). In outline the following type of thing happens:
- Oracle gets to the first block updated in t1 and sees that there’s an ITL (interested transaction list) entry that hasn’t been marked as committed (we flushed the block from memory before the commit cleanout could take place so the relevant transaction is, apparently, still running and the row is still marked as locked).
- Let’s say the ITL entry says the transaction was for undo segment 34, transaction table slot 11, sequence 999. Oracle reads the undo segment header block for undo segment 34 and checks transaction table slot 11, which is now at sequence 1032. Oracle can infer from this that the transaction that updated the table has committed – but can’t yet know whether it committed before or after the start of our “long running query”.
- Somehow Oracle has to get slot 11 back to sequence 999 so that it can check the commit SCN recorded in the slot at that sequence number. This is where we see “undo records applied” to make the “transaction table read consistent”. It can do this because the undo segment header has a “transaction control” section in it that records some details of the most recent transaction started in that segment. When a transaction starts it updates this information, but saves the old version of the transaction control and the previous version of its transaction table slot in its first undo record, consequently Oracle can clone the undo segment header block, identify the most recent transaction, find its first undo record and apply it to unwind the transaction table information. As it does so it has also wound the transaction control section backwards one step, so it can use that (older) version to go back another step … and so on, until it takes the cloned undo segment header so far back that it takes our transaction table slot back to sequence 999 – and the job is done, we can now check the actual commit SCN. (Or, if we’re unlucky, we might receive an ORA-01555 before we get there)
So – no changes to the t1 table during the query, but lots of undo records read because OTHER tables have been changing.
In my example the tablescan used direct path reads – so the blocks that went through delayed block cleanout were in private memory, which means they weren’t in the buffer cache and didn’t get written out to disc. When I flushed the buffer cache (again to emulate aging our of undo blocks etc.) and repeated the tablescan Oracle had to go through all that work of creating read consistent transaction tables all over again.