How can I delete the columns in DataFlow Task in SSIS?

走远了吗. 提交于 2019-12-04 11:43:08

You can add an extra component of some sort. However, this will never reduce complexity or improve performance. Just thinking about it, logically, you are adding an additional interface that needs to be maintained. Performance-wise, anything that will eliminate columns means copying one set of rows from one buffer to a whole other buffer. This is called an asynchronous transformation, and it is better described here and here. You can imagine that copying rows is less efficient than updating them in place.

Here are some recommendations for reducing complexity, which will, in turn, improve performance:

  • Reduce the columns at the source. If you are selecting columns that are not subsequently used in any way, then remove them from the query or uncheck them from the source component. Removing columns in this way removes them from the buffer, which will occupy less memory.
  • Reduce the number of components in the dataflow. Very long dataflows are easy to create, a pain to test and even harder to maintain. Dataflows are expecting a unit of work, i.e. a data stream from here to there with a few things in the middle. This is where dataflows shine, in fact, they protect themselves from complexity with memory limitations and a max number of threads. It is better to divide the work into separate dataflows or stored procs. You could stage the data into a table and read it twice, rather than use a multicast, for example.
  • Use the database. SSIS is as much an orchestration tool as it is a data-moving tool. I have often found that using simple dataflows to stage the data, followed by calls to stored procedures to process the data, always out-performs an all-in-one dataflow.
  • Increase the number of times you write the data. This is completely counter intuitive, but if you process data in smaller sets of operations, it is faster running and easier to test. Given a clean slate, I will often design an ETL to write data from the source to a staging table, perform a cleansing step from the stage table to another, optionally, add a conforming step to combine data from different sources to yet another table and, finally, a last step to load a target table. Note that each source is pushed to its own target table and later combined, leveraging the database. The first and last steps are set up to run fast and avoid locking or blocking at either end.
  • Bulk Load. The prior step really does well, when you insure that bulk loading is happening. This can be a tricky thing, but generally you can get there by using "fast load" in the OLEDB destination and by never using the oledb command. Removing indexes and re-adding them is faster than loading in place (with few exceptions).

These guidelines will get you headed in the general direction, but do post more questions for tuning specific performance problems.

I believe that you can pass just one data flow path to a UNION ALL task to remove columns from that single data flow.

Take the single data flow path that you would like to remove columns from and pass it to a Union All task. Then open up the Union All task right click on the column(s) you would like to remove from that path and select delete.

Usually I think the source of the data should be altered to not send the unwanted columns out, but your case is special. With one path out of the multicast needing all of the columns from the source, while one path does not.

Hadi

First of all, i don't think that what you are asking will give a better performance because the data is loaded from source then multiplied when using Multicast Then The component that will reduce the column number...

You can do this multiple way:

  1. If you can create another DataFlow Task with a Reduced columns source (ex: OLEDB command with specific columns) it is better

  2. You can add Script component with an Asynchronous Output (like shown in the image below) and add the specifid columns to the output, map them using a Vb.net or C# script, something like this:

    Output0Buffer.AddRow()
    Output0Budder.OutColumn = Row.inColumn
    

  1. Add a UNION ALL component and select the columns you need

Side Note: It is good to test each scenario performance and choose the better

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!