How to approach an ETL mission?

后端 未结 5 1648
无人及你
无人及你 2020-12-12 21:35

I am supposed to perform ETL where source is a large and badly designed sql 2k database and a a better designed sql 2k5 database. I think SSIS is the way to go. Can anyone s

5条回答
  •  无人及你
    2020-12-12 22:01

    I have experience with ETL processes pulling data from 200+ distributed databases to a central database on a daily, weekly, monthly and yearly basis. It is a massive amount of data and there are many issues we have had specific to our situation. But as I see it, there are several items to think about regardless of the situation:

    • Make sure that you take file locks into consideration, both on the source and destination side. Making sure that other processes do not have the files locked (and removing those locks if necessary and it makes sense) is important.

    • locking the files for yourself. Make sure, especially on the source that you lock the files while pulling out the data so that you do not get halfway updated data.

    • if at all possible, pull deltas, not all of the data. Get a copy of the data and then pull only rows that have changed instead of everything. The larger your data set the more important this becomes. Look at journals and triggers if you have to, but as it becomes more important to have this data on a certain basis, this is probably the number one advice I would give you. Even if it adds a significant amount of time to the project.

    • execution log. make sure you know when it worked and when it didn't, and throwing specific errors in the process can really help in debugging.

    • document, document, document. If you build this right, you are going to build it and then not think about it for a long time. But you can be guaranteed, you or someone else will need to come back to it at some point to enhance it or do a bug fix. Documentation is key in these situations.

    HTH, ill update this if I think of anything else.

提交回复
热议问题