I am not a database person, exactly, and most of my db work has been with MySQL, so forgive me if something in this question is incredibly naive.
I need to delete 5.5 million rows from an Oracle table that has about 100 million rows. I have all the IDs of the rows I need to delete in a temporary table. If it were a just a few thousand rows, I'd do this:
DELETE FROM table_name WHERE id IN (SELECT id FROM temp_table); COMMIT;
Is there anything I need to be aware of, and/or do differently, because it's 5.5 million rows? I thought about doing a loop, something like this:
DECLARE vCT NUMBER(38) := 0; BEGIN FOR t IN (SELECT id FROM temp_table) LOOP DELETE FROM table_name WHERE id = t.id; vCT := vCT + 1; IF MOD(vCT,200000) = 0 THEN COMMIT; END IF; END LOOP; COMMIT; END;
First of all - is this doing what I think it is - batching commits of 200,000 at a time? Assuming it is, I'm still not sure if it's better to generate 5.5 million SQL statements, and commit in batches of 200,000, or to have one SQL statement and commit all at once.
Ideas? Best practices?
EDIT: I ran the first option, the single delete statement, and it only took 2 hours to complete in development. Based on that, it's queued to be run in production. Thanks for your input!
The first approach is better, because you give the query optimizer a clear picture of what you are trying to do, instead of trying to hide it. The database engine might take a different approach to deleting 5.5m (or 5.5% of the table) internally than to deleting 200k (or 0.2%).
Here is also an article about massive DELETE in Oracle which you might want to read.
The fastest way is to create a new one with
CREATE TABLE AS SELECT using
NOLOGGING option. I mean:
ALTER TABLE table_to_delete RENAME TO tmp; CREATE TABLE table_to_delete NOLOGGING AS SELECT .... ;
Of course you have to recreate constraints with no validate, indexes with nologging, grants, ... but is very very fast.
If you have the trouble in production, you can do the following:
ALTER TABLE table_to_delete RENAME to tmp; CREATE VIEW table_to_delete AS SELECT * FROM tmp; -- Until there can be instantly CREATE TABLE new_table NOLOGGING AS SELECT .... FROM tmp WHERE ...; <create indexes with nologging> <create constraints with novalidate> <create other things...> -- From here ... DROP VIEW table_to_delete; ALTER TABLE new_table RENAME TO table_to_delete; -- To here, also instantly
You have take care of:
- Stored procedures can be invalidated, but they will be recompiled the second time are called. You have to test it.
NOLOGGINGmeans that minimal redo are generated. If you have DBA role, run a
ALTER SYSTEM CHECKPOINTto ensure no data lost if instance crash.
NOLOGGINGthe tablespace have to be also in
Another option better than create milions of inserts is:
-- Create table with ids DELETE FROM table_to_delete WHERE ID in (SELECT ID FROM table_with_ids WHERE ROWNUM < 100000); DELETE FROM table_with_ids WHERE ROWNUM < 100000; COMMIT; -- Run this 50 times ;-)
The PLSQL choice is not advisable because can create the Snapshot too old message due that you are commiting (and closing the transaction) with an opened cursor (the looped one) you want to continue using it. Oracle allows it but it's not a good practice.
UPDATE: Why I can ensure the last PLSQL block is going to work? Because I supose that:
- No other one is using this temporary table for any reason (dba or jobs gathering statistics, dab tasks like move, inserting records, and so on). That can be ensured because is an auxiliar table only for this.
- Then, with the last assertion, the query is going to be executed exactly with the same plan and is going to return the rows with the same order.
When performing massive deletions in
Oracle, make sure you are not running out of
Oracle first writes all changes into the
REDO log (the old data along with the new data).
REDO log is filled or a timeout occurs,
log synchronization: it writes
new data into the datafiles (in your case, marks the datafile blocks as free), and writes old data into the
UNDO tablespace (so that it remains visible to the concurrent transactions until you
commit your changes).
When you commit your changes, the space in
UNDO segments occupied by yuor transaction is freed.
This means that if you delete
5M rows of data, you'll need to have space for
all these rows in your
UNDO segments so that the data can be moved there first (
all at once) and deleted only after commit.
This also means that the concurrent queries (if any) will need to read from
REDO logs or
UNDO segments when performing table scans. This is not the fastest way to access data.
This also means that if the optimizer will select
HASH JOIN for your deletion query (which it will most probably do), and the temp table will not fit into the
HASH_AREA_SIZE (which most probably will be the case), then the query will need
several scans over the big table, and some of the parts of the table will be already moved into
Given all said above, you'd probably better delete data in
200,000 chunks and commit the changes in between.
Thus you will, first, get rid of the problems described above, and, second, optimize your
HASH_JOIN, as you will have the same number of reads but the reads themselves will be more efficient.
In your case, though, I would try to force the optimizer to use
NESTED LOOPS, as I expect it will be faster in your case.
To do this, make sure your temp table has a primary key on
ID, and rewrite your query as following:
DELETE FROM ( SELECT /*+ USE_NL(tt, tn) */ tn.id FROM temp_table tt, table_name tn WHERE tn.id = tt.id )
You'll need to have the primary key on
temp_table for this query to work.
Compare it with the following:
DELETE FROM ( SELECT /*+ USE_HASH(tn tt) */ tn.id FROM temp_table tt, table_name tn WHERE tn.id = tt.id )
, see what is faster and stick to this.
It's better to do everything at once as in your first example. But I'd definitely go over it with your DBA first since they may want to reclaim the blocks you are no longer using after the purge. Also, there may be scheduling concerns that are not normally visible from the user perspective.
I would recommend running this as a single delete.
Are there any child tables of the one you are deleting from? If so, make sure the foreign key in those tables is indexed. Otherwise, you might do a full scan of the child table for every row you delete which could make things very slow.
You might want some ways to check the progress of the delete as it runs. See http://stackoverflow.com/questions/622289/how-to-check-oracle-database-for-long-running-queries
As other people have suggested, if you want to test the water, you can put: rownum < 10000 on the end of your query.
If your original SQL takes a very long time, some concurrent SQLs may run slowly as they have to use UNDO to rebuild a version of the data without your uncommitted changes.
A compromise may be something like
FOR i in 1..100 LOOP DELETE FROM table_name WHERE id IN (SELECT id FROM temp_table) AND ROWNUM < 100000; EXIT WHEN SQL%ROWCOUNT = 0; COMMIT; END LOOP;
You can adjust ROWNUM as required. A smaller ROWNUM means more frequent commits and (probably) reduced impact on other sessions in terms of needing to apply undo. However, depending on execution plans, there may be other impacts and it will probably take more time overall. Technically the 'FOR' part of the loop is unnecessary as the EXIT will end the loop. But I'm paranoid about unlimited loops as it is a pain to kill the session if they do get stuck.
I've done something similar in the past with Oracle 7, where i had to delete millions of rows from thousands of tables. For all round performance and especially the large deletes (million rows plus in one table) this script worked well.
You'll have to modify it slightly (ie: examine the users/passwords, plus get your rollback segments right). Also you really need to discuss this with your DBA and run it in a TEST environment first. Having said all of that, it's pretty easy. The function delete_sql() looks up a batch of rowids in the table you specify then deletes them batch by batch. For example;
exec delete_sql('MSF710', 'select rowid from msf710 s where (s.equip_no, s.eq_tran_date, s.comp_data, s.rec_710_type, s.seq_710_no) not in (select c.equip_no, c.eq_tran_date, c.comp_data, c.rec_710_type, c.seq_710_no from msf710_sched_comm c)', 500);
The above example is deleting 500 records at a time from table MSF170 based on an sql statement.
If you need to delete data from multiple tables, just include additional
exec delete_sql(...) lines in the file delete-tables.sql
Oh and remember to put your rollback segments back online, it's not in the script.
spool delete-tables.log; connect system/SYSTEM_PASSWORD alter rollback segment r01 offline; alter rollback segment r02 offline; alter rollback segment r03 offline; alter rollback segment r04 offline; connect mims_3015/USER_PASSWORD CREATE OR REPLACE PROCEDURE delete_sql (myTable in VARCHAR2, mySql in VARCHAR2, commit_size in number) is i INTEGER; sel_id INTEGER; del_id INTEGER; exec_sel INTEGER; exec_del INTEGER; del_rowid ROWID; start_date DATE; end_date DATE; s_date VARCHAR2(1000); e_date VARCHAR2(1000); tt FLOAT; lrc integer; BEGIN --dbms_output.put_line('SQL is ' || mySql); i := 0; start_date:= SYSDATE; s_date:=TO_CHAR(start_date,'DD/MM/YY HH24:MI:SS'); --dbms_output.put_line('Deleting ' || myTable); sel_id := DBMS_SQL.OPEN_CURSOR; DBMS_SQL.PARSE(sel_id,mySql,dbms_sql.v7); DBMS_SQL.DEFINE_COLUMN_ROWID(sel_id,1,del_rowid); exec_sel := DBMS_SQL.EXECUTE(sel_id); del_id := DBMS_SQL.OPEN_CURSOR; DBMS_SQL.PARSE(del_id,'delete from ' || myTable || ' where rowid = :del_rowid',dbms_sql.v7); LOOP IF DBMS_SQL.FETCH_ROWS(sel_id) >0 THEN DBMS_SQL.COLUMN_VALUE(sel_id,1,del_rowid); lrc := dbms_sql.last_row_count; DBMS_SQL.BIND_VARIABLE(del_id,'del_rowid',del_rowid); exec_del := DBMS_SQL.EXECUTE(del_id); -- you need to get the last_row_count earlier as it changes. if mod(lrc,commit_size) = 0 then i := i + 1; --dbms_output.put_line(myTable || ' Commiting Delete no ' || i || ', Rowcount : ' || lrc); COMMIT; end if; ELSE exit; END IF; END LOOP; i := i + 1; --dbms_output.put_line(myTable || ' Final Commiting Delete no ' || i || ', Rowcount : ' || dbms_sql.last_row_count); COMMIT; DBMS_SQL.CLOSE_CURSOR(sel_id); DBMS_SQL.CLOSE_CURSOR(del_id); end_date := SYSDATE; e_date := TO_CHAR(end_date,'DD/MM/YY HH24:MI:SS'); tt:= trunc((end_date - start_date) * 24 * 60 * 60,2); dbms_output.put_line('Deleted ' || myTable || ' Time taken is ' || tt || 's from ' || s_date || ' to ' || e_date || ' in ' || i || ' deletes and Rows = ' || dbms_sql.last_row_count); END; / CREATE OR REPLACE PROCEDURE delete_test (myTable in VARCHAR2, mySql in VARCHAR2, commit_size in number) is i integer; start_date DATE; end_date DATE; s_date VARCHAR2(1000); e_date VARCHAR2(1000); tt FLOAT; BEGIN start_date:= SYSDATE; s_date:=TO_CHAR(start_date,'DD/MM/YY HH24:MI:SS'); i := 0; i := i + 1; dbms_output.put_line(i || ' SQL is ' || mySql); end_date := SYSDATE; e_date := TO_CHAR(end_date,'DD/MM/YY HH24:MI:SS'); tt:= round((end_date - start_date) * 24 * 60 * 60,2); dbms_output.put_line(i || ' Time taken is ' || tt || 's from ' || s_date || ' to ' || e_date); END; / show errors procedure delete_sql show errors procedure delete_test SET SERVEROUTPUT ON FORMAT WRAP SIZE 200000; exec delete_sql('MSF710', 'select rowid from msf710 s where (s.equip_no, s.eq_tran_date, s.comp_data, s.rec_710_type, s.seq_710_no) not in (select c.equip_no, c.eq_tran_date, c.comp_data, c.rec_710_type, c.seq_710_no from msf710_sched_comm c)', 500); spool off;
Oh and one last tip. It's going to be slow and depending on the table may require some downtime. Testing, timing and tuning are your best friend here.
All of the answers here are great, just one thing to add: if you want to delete all of the records in a table, and are sure you won't need to rollback, then you want to use the truncate table command.
(In your case, you only wanted to delete a subset, but for anybody lurking with a similar problem, I thought I'd add this)
The easiest way to me is:-
DECLARE L_exit_flag VARCHAR2(2):='N'; L_row_count NUMBER:= 0; BEGIN :exit_code :=0; LOOP DELETE table_name WHERE condition(s) AND ROWNUM <= 200000; L_row_count := L_row_count + SQL%ROWCOUNT; IF SQL%ROWCOUNT = 0 THEN COMMIT; :exit_code :=0; L_exit_flag := 'Y'; END IF; COMMIT; IF L_exit_flag = 'Y' THEN DBMS_OUTPUT.PUT_LINE ('Finally Number of Records Deleted : '||L_row_count); EXIT; END IF; END LOOP; --DBMS_OUTPUT.PUT_LINE ('Finally Number of Records Deleted : '||L_row_count); EXCEPTION WHEN OTHERS THEN ROLLBACK; DBMS_OUTPUT.PUT_LINE ('Error Code: '||SQLCODE); DBMS_OUTPUT.PUT_LINE ('Error Message: '||SUBSTR (SQLERRM, 1, 240)); :exit_code := 255; END;