问题
I have a table in oracle 11g with size 62GB and 1.2 billion records. Table has 4 columns, range partitioned on month and hash sub partition on transaction number.
Around 80 million records are delated and re-inserted into this table once in every week. This delete & insert process is taking ~4 hours when done with NOAPPEND PARALLEL hint and .
Is there any way i could speed up this process?
One way i could think is to replace NOAPPEND with APPEND but that will lead to space wastage and drastic increase in table size.
回答1:
APPEND is made exactly for this purpose. The amount of wasted space depends on extent size. Each INSERT creates one extent per parallel process, fills it and creates new one if needed. So with common settings, having 8 MB extent in partitioned table and you are inserting about 4 GB (62 GB / (1200M / 80M) records) the average waste will be 4 MB * parallel degree which I would say is decent. INSERT /*+ APPEND PARALLEL */ can be super fast - multi-million rows per second (and Gigabytes per second) on decent hardware. It mostly depends on number of indexes because their maintenance is the most time taking part.
The bigger issue is the DELETE part. You should think about if and how it can be transformed into DDL partition operation (CTAS and EXCHANGE PARTITION, etc.)
回答2:
Check which one of the two processes - deleting or inserting - is slower and start from there. Data deletion can generate a lot of redo and can be slow, especially if issued in a single thread. Also, update should in theory be faster than delete+insert.
Update speed can be heavily affected by table compression and the pctfree table physical attribute, so check them. It is even more heavily affected by parallelism. Issuing it in parallel might drastically improve performance.
To be able to execute parallel dml operations in most cases you will need to enable them on session level by issuing:
alter session enable parallel dml;
I'd suggest that you try the following:
- Check the table's compression and pctfree options
- Try enabling session parallel DML before running the queries.
- Try the update statement in parallel - again issue 'alter session...' before the statement
- Try MERGE statement, if you have any kind of a primary key columns - in some cases it can be faster than update
- If the data, which needs to be updated, fills entire partitions or subpartitions, you can use TRUNCATE instead of delete. It is a very fast operation (DDL) which empties a specific partition or subpartitions.
If all else fails, you can create a new table with the same definition and load the remaining data from the main table and the new data in there (again in parallel and with an APPEND hint). After that you can drop the old and rename the new table. Be sure to manage table grants and other privileges. Shifting 60GB of data is a cumbersome operation and needs a lot of temporary space, but at least you will avoid the DELETE operation. On a reasonable hardware it should take less than 4 hours. For the main bulk of data you can also issue a CTAS statement:
create table <new_table_name> parallel <partitioning_info> as select * from <old_table_name> where <filter>;Beware of library cache locks when issuing a long-running CTAS statements.
回答3:
I feel that there are multiple problems in current solution, you might want to revisit the partitioning strategy as well.
Ideally,
1.Load the new data to a new table or it could be external table (staging) 2.Join the new data with existing data (update !!) 3.Insert the new data to directly from step 2 as a CTAS with parallel , direct path insert .., with little bit more preparation you can make it a new partition table as well.. 4.Once the new Insert completes (CTAS), drop the existing table and rename the new table to the old (dropped) table name.
Only issue with this approach is , it requires space. In my experience, this is the best workarounds for large costly update problems in data warehouses. I have tried and tested this in Oracle RAC/Exadata environments.
回答4:
When working on such large tables you must already be batching your data to touch only one partition in single batch. If there is any way to avoid physical deletes by updating the existing record or logically set the delete flag against deleted record and insert new records.
The other way of optimizing the whole load process in this case would be:
- Create a separate staging table with same structure and with same partition for which your current batch is performing the load.
- Load the existing data of that partition which is still valid (not the delete candidate records), also insert the new data to be inserted in destination table.
- Once the staging table have all the data which needs to go in existing table, switch the partition from staging table to destination table, in a similar fashion described on this page.
来源:https://stackoverflow.com/questions/32229729/faster-way-to-load-huge-data-warehouse-table