Fastest way of doing field comparisons in the same table with large amounts of data in oracle

I am recieving information from a csv file from one department to compare with the same inforation in a different department to check for discrepencies (About 3/4 of a million rows of data with 44 columns in each row). After I have the data in a table, I have a program that will take the data and send reports based on a HQ. I feel like the way I am going about this is not the most efficient. I am using oracle for this comparison.

Here is what I have:

I have a vb.net program that parses the data and inserts it into an extract table
I run a procedure to do a full outer join on the two tables into a new table with the fields in one department prefixed with '_c'

I run another procedure to compare the old/new data and update 2 different tables with detail and summary information. Here is code from inside the procedure:

DECLARE 
  CURSOR Cur_Comp IS SELECT * FROM T.AEC_CIS_COMP;
BEGIN 
FOR compRow in Cur_Comp LOOP

    --If service pipe exists in CIS but not in FM and the service pipe has status of retired in CIS, ignore the variance
    If(compRow.pipe_num = '' AND cis_status_c = 'R')
        continue
    END IF

    --If there is not a summary record for this HQ in the table for this run, create one
    INSERT INTO t.AEC_CIS_SUM (HQ, RUN_DATE)
    SELECT compRow.HQ, to_date(sysdate, 'DD/MM/YYYY') from dual WHERE NOT EXISTS
    (SELECT null FROM t.AEC_CIS_SUM WHERE HQ = compRow.HQ AND RUN_DATE = to_date(sysdate, 'DD/MM/YYYY'))

    -- Check fields and update the tables accordingly
    If (compRow.cis_loop <> compRow.cis_loop_c) Then
        --Insert information into the details table
        INSERT INTO T.AEC_CIS_DET( Fac_id, Pipe_Num, Hq, Address, AutoUpdatedFl, 
                                              DateTime, Changed_Field, CIS_Value, FM_Value)
        VALUES(compRow.Fac_ID, compRow.Pipe_Num, compRow.Hq, compRow.Street_Num || ' ' || compRow.Street_Name,
               'Y', sysdate, 'Cis_Loop', compRow.cis_loop, compRow.cis_loop_c); 

        -- Update information into the summary table        
        UPDATE AEC_CIS_SUM                 
        SET cis_loop = cis_loop + 1
        WHERE Hq = compRow.Hq
          AND Run_Date = to_date(sysdate, 'DD/MM/YYYY')               
    End If;       
END LOOP;

END;

Any suggestions of an easier way of doing this rather than an if statement for all 44 columns of the table? (This is run once a week if it matters)

Update: Just to clarify, there are 88 columns of data (44 of duplicates to compare with one suffixed with _c). One table lists each field in a row that is different so one row can mean 30+ records written in that table. The other table keeps tally of the number of discrepencies for each week.

First of all I believe that your task can be implemented (and should be actually) with staight SQL. No fancy cursors, no loops, just selects, inserts and updates. I would start with unpivotting your source data (it is not clear if you have primary key to join two sets, I guess you do):

Col0_PK    Col1    Col2    Col3    Col4
----------------------------------------
Row1_val   A       B       C       D
Row2_val   E       F       G       H

Above is your source data. Using UNPIVOT clause we convert it to:

Col0_PK     Col_Name    Col_Value
------------------------------
Row1_val    Col1        A
Row1_val    Col2        B
Row1_val    Col3        C
Row1_val    Col4        D
Row2_val    Col1        E
Row2_val    Col2        F
Row2_val    Col3        G
Row2_val    Col4        H

I think you get the idea. Say we have table1 with one set of data and the same structured table2 with the second set of data. It is good idea to use index-organized tables.

Next step is comparing rows to each other and storing difference details. Something like:

insert into diff_details(some_service_info_columns_here)
 select some_service_info_columns_here_along_with_data_difference
  from table1 t1 inner join table2 t2
     on t1.Col0_PK = t2.Col0_PK
    and t1.Col_name = t2.Col_name
    and nvl(t1.Col_value, 'Dummy1') <> nvl(t2.Col_value, 'Dummy2');

And on the last step we update difference summary table:

insert into diff_summary(summary_columns_here)
 select diff_row_id, count(*) as diff_count
  from diff_details
 group by diff_row_id;

It's just rough draft to show my approach, I'm sure there is much more details should be taken into account. To summarize I suggest two things:

UNPIVOT data
Use SQL statements instead of cursors

You have several issues in your code:

If(compRow.pipe_num = '' AND cis_status_c = 'R')
    continue
END IF

"cis_status_c" is not declared. Is it a variable or a column in AEC_CIS_COMP? In case it is a column, just put the condition into the cursor, i.e. SELECT * FROM T.AEC_CIS_COMP WHERE not (compRow.pipe_num = '' AND cis_status_c = 'R')

to_date(sysdate, 'DD/MM/YYYY')

That's nonsense, you convert a date into a date, simply use TRUNC(SYSDATE)

Anyway, I think you can use three single statements instead of a cursor:

INSERT INTO t.AEC_CIS_SUM (HQ, RUN_DATE)
SELECT comp.HQ, trunc(sysdate)
from AEC_CIS_COMP comp
WHERE NOT EXISTS
    (SELECT null FROM t.AEC_CIS_SUM WHERE HQ = comp.HQ AND RUN_DATE = trunc(sysdate));


INSERT INTO T.AEC_CIS_DET( Fac_id, Pipe_Num, Hq, Address, AutoUpdatedFl, DateTime, Changed_Field, CIS_Value, FM_Value)
select comp.Fac_ID, comp.Pipe_Num, comp.Hq, comp.Street_Num || ' ' || comp.Street_Name, 'Y', sysdate, 'Cis_Loop', comp.cis_loop, comp.cis_loop_c
from T.AEC_CIS_COMP comp
where comp.cis_loop <> comp.cis_loop_c;

UPDATE AEC_CIS_SUM                 
SET cis_loop = cis_loop + 1
WHERE Hq IN (Select Hq from T.AEC_CIS_COMP)
  AND trunc(Run_Date) = trunc(sysdate);

They are not tested but they should give you a hint how to do it.

来源：https://stackoverflow.com/questions/20788699/fastest-way-of-doing-field-comparisons-in-the-same-table-with-large-amounts-of-d

标签

Oracle

performance

large-data