With 12c and later, Oracle has introduced online statistics gathering for bulk operations like CTAS and and INSERT operation (with APPEND).
There is a new hidden parameter which controls this behaviour:
"_optimizer_gather_stats_on_load"
By default, this parameter is true.
Effect On Direct Load Operations
When you create a table with CTAS, its statistics will be collected automatically.
SQL> explain plan for
2 create table my_emp as select * from hr.employees;
Explained.
SQL> SELECT * FROM table (DBMS_XPLAN.DISPLAY('plan_table', null, 'basic'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3521905591
------------------------------------------------------
| Id | Operation | Name |
------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | |
| 1 | LOAD AS SELECT | MY_EMP |
| 2 | OPTIMIZER STATISTICS GATHERING | |
| 3 | TABLE ACCESS FULL | EMPLOYEES |
------------------------------------------------------
10 rows selected.
As you see in the explain plan, there is an extra step for statistics.
Lets create the table and check weather the table stats exist:
18:04:25 SQL> create table my_emp as select * from hr.employees;
Table created.
18:04:38 SQL>
18:04:42 SQL> column table_name format a30
18:05:02 SQL> column last_analyzed format a30
18:05:10 SQL>
18:06:23 SQL> select table_name, last_analyzed
18:06:38 2 from user_tables
18:06:43 3 where table_name='MY_EMP';
TABLE_NAME LAST_ANALYZED
------------------------------ ------------------------------
MY_EMP 27.06.2019 18:04:38
The "last_analyzed" time is exactly same with the creation time of the table.
Now, disable the online stats gathering on session level and create another table;
18:55:23 SQL> ALTER SESSION SET "_OPTIMIZER_GATHER_STATS_ON_LOAD"=FALSE;
Session altered.
18:55:54 SQL>
18:55:57 SQL> create table my_emp2 as select * from employees;
Table created.
18:56:13 SQL> column table_name format a30
18:56:26 SQL> column last_analyzed format a30
18:56:31 SQL>
18:56:32 SQL> select table_name, last_analyzed
18:56:39 2 from user_tables
18:56:44 3 where table_name='MY_EMP2';
TABLE_NAME LAST_ANALYZED
------------------------------ ------------------------------
MY_EMP2
There is no statistics for MY_EMP2 table.
It is also possible online stats gathering can impact your bulk load performance;
I have a filled table "TEST_DATA" which has 500.000 rows and empty tables TEST_DATA1 and TEST_DATA2 with same structure.
Online Statistics Gathering is Off;
SQL> alter system flush buffer_cache;
SQL> ALTER SESSION SET "_OPTIMIZER_GATHER_STATS_ON_LOAD"=FALSE;
Session altered.
Elapsed: 00:00:00.00
SQL>
SQL> insert /*+ APPEND*/ into test_data1 select * from test_data;
5000000 rows created.
Elapsed: 00:00:08.72
SQL> select * from table(dbms_xplan.display_cursor);
SQL_ID d10rkrsh5kqgg, child number 0
-------------------------------------
insert /*+ APPEND*/ into test_data1 select * from test_data
Plan hash value: 3839850929
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 5256 (100)| |
| 1 | LOAD AS SELECT | TEST_DATA1 | | | | |
| 2 | TABLE ACCESS FULL| TEST_DATA | 5679K| 157M| 5256 (1)| 00:00:01 |
---------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Online Statistics Gathering is On;
SQL> alter system flush buffer_cache;
SQL> ALTER SESSION SET "_OPTIMIZER_GATHER_STATS_ON_LOAD"=TRUE;
Session altered.
SQL> insert /*+ APPEND*/ into test_data2 select * from test_data;
5000000 rows created.
Elapsed: 00:00:13.27
SQL> select * from table(dbms_xplan.display_cursor);
SQL_ID 1n4c3095axqa0, child number 0
-------------------------------------
insert /*+ APPEND*/ into test_data2 select * from test_data
Plan hash value: 3839850929
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 5256 (100)| |
| 1 | LOAD AS SELECT | TEST_DATA2 | | | | |
| 2 | OPTIMIZER STATISTICS GATHERING | | 5679K| 157M| 5256 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST_DATA | 5679K| 157M| 5256 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
It spent almost 5 seconds to gather statistics after insert operation.
So, we can say that if we have huge insert operation and if we want to speed it up, it would be good to disable online statistics and gather manually (maybe parallel) after insert finihed.