Performance optimization for processing of 115 million records for inserting into Oracle

旧时模样 提交于 2019-12-01 13:11:05

In a project, I worked on, we had to transfer 5 billion records from db2 to oracle. With a quite complex transformation logic. During the transformation, the data was saved about 4 times in different files. We were able to insert data with about 50'000 records a row in an oracle db. From that point of view, doing it under 4 hours seems realistic.

You didn't state where exactly your bottlenecks are, but here are some ideas.

  1. parallelisation - can you split up the file into chunks, which could be processid in parallel, for instance several instances of our job?
  2. chunksize - we used a chunksize of 5000 to 10000 when writing to oracle
  3. removing unnecessary data parsing, especially Date/Timestamp parsing - for instance, we had a lot of timestamps in our data, but they were not relevant for the processing logic. Since we had to read and write them from/to a file a couple of times during processing, we didn't parse, we just kept the string representation. Moreover, a lot of this timestamps had special values, like 1.1.0001 00:00:00.000000 or 31.12.9999 23.59.59.000000, we used LD or HD (for lowdate and highdate) to represent them.

HTH.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!