Extract data with an OLE DB faster

问题

Hi everyone I'm trying to extract a lot of records from a lot of joined tables and views using SSIS (OLE DB SOURCE) but it takes a huge time! the problem is due to the query because when I parsed it on sql server it takes more than hour ! Her's my ssis package design

I thought of paralleled extraction using two OLE DB source and merge join but it isn't recommended using it! besides it takes more time! Is there any way to help me please?

回答1:

Writing the T-sql query with all the joins in the OLEDB source will always be faster than using different source and then using Merge Join IMHO. The reason is SSIS is memory Oriented architecture .It has to bring all the data from N different tables into its buffers and then filter it using Merge join and more over Merge Join is an asynchronous component(Semi Blocking) therefore it cannot use the same input buffer for its output .A new buffer is created and you may run out of memory if there are large number of rows extracted from the table.

Having said that there are few ways you can enhance the extraction performance using OLEDB source

1.Tune your SQL Query .Avoid using Select *

2.Check network bandwidth .You just cannot have faster throughput than your bandwidth supports.

3.All source adapters are asynchronous .The speed of an SSIS Source is not about how fast your query runs .It's about how fast the data is retrieved .

As others have suggested above ,you should show us the query and also the time it is taking to retireve the data else these are just few optimization technique which can make the extraction faster

回答2:

Thank you for posting a screen shot of your data flow. I doubt whether the slowness you encounter is truly the fault of the OLE DB Source component.

Instead, you have 3 asynchronous components that result in a 2 full blocks of your data flow and one that's partially blocking (AGG, SRT, MRJ). That first aggregate will have to wait for all 500k rows to arrive before it can finish the aggregate and then pass it along to the sort.

These transformations also result in fragmented memory. Normally, a memory buffer is filled with data and visits each component in a data flow. Any changes happen directly to that address space and the engine can parallelize operations if it can determine step 2 is modifying field X and step 3 is modifying Y. The async components are going to cause data to be copied from one space to another. This is a double slow down. The first is the physical act of copying data from address space 0x01 to 0xFA or something. The second is that it reduces the available amount of memory for the dtexec process. No longer can SSIS play with all N gigs of memory. Instead, you'll have quartered your memory and after each async is done, that memory partition is just left there until the data flow completes.

If you want this run better, you'll need to fix your query. It may result in your aggregated data being materialized into a staging table or all in one big honkin' query.

Open a new question and provide insight into the data structures, indexes, data volumes, the query itself and preferably the query plan - estimated or actual. If you need help identifying these things, there are plenty of helpful folks here that can help you through the process.

来源：https://stackoverflow.com/questions/16335664/extract-data-with-an-ole-db-faster

标签

ssis

oledb

extract