Advice on handling large data volumes

前端未结

关注

 11  1315

鱼传尺愫 2020-12-14 05:10

So I have a \"large\" number of \"very large\" ASCII files of numerical data (gigabytes altogether), and my program will need to process the entirety of it sequentially at l

11条回答

-上瘾入骨i (楼主)

2020-12-14 05:42
Without any additional insight into what kind of processing is going on, here are some general thoughts from when I have done similar work.
1. Write a prototype of your application (maybe even "one to throw away") that performs some arbitrary operation on your data set. See how fast it goes. If the simplest, most naive thing you can think of is acceptably fast, no worries!
2. If the naive approach does not work, consider pre-processing the data so that subsequent runs will run in an acceptable length of time. You mention having to "jump around" in the data set quite a bit. Is there any way to pre-process that out? Or, one pre-processing step can be to generate even more data - index data - that provides byte-accurate location information about critical, necessary sections of your data set. Then, your main processing run can utilize this information to jump straight to the necessary data.
So, to summarize, my approach would be to try something simple right now and see what the performance looks like. Maybe it will be fine. Otherwise, look into processing the data in multiple steps, saving the most expensive operations for infrequent pre-processing.

Don't "load everything into memory". Just perform file accesses and let the operating system's disk page cache decide when you get to actually pull things directly out of memory.
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...