Dealing with large amounts of data in c++

问题

I have an application that sometimes will utilize a large amount of data. The user has the option to load in a number of files which are used in a graphical display. If the user selects more data than the OS can handle, the application crashes pretty hard. On my test system, that number is about the 2 gigs of physical RAM.

What is a good way to handle this situation? I get the "bad alloc" thrown from new and tried trapping that but I still run into a crash. I feel as if I'm treading in nasty waters loading this much data but it is a requirement of this application to handle this sort of large data load.

Edit: I'm testing under a 32 bit Windows system for now but the application will run on various flavors of Windows, Sun and Linux, mostly 64 bit but some 32.

The error handling is not strong: It simply wraps the main instantiation code with a try catch block, the catch looking for any exception per another peer's complaint of not being able to trap the bad_alloc everytime.

I think you guys are right, I need a memory management system that doesn't load all of this data into the RAM, it just seems like it.

Edit2: Luther said it best. Thanks guy. For now, I just need a way to prevent a crash which with proper exception handling should be possible. But down the road I'll be implementing that acception solution.

回答1:

There is the STXXL library which offers STL like containers for large Datasets.

http://stxxl.sourceforge.net/

Change "large" into "huge". It is designed and optimized for multicore processing of data sets that fit on terabyte-disks only. This might suffice for your problem, or the implementation could be a good starting point to tailor your own solution.

It is hard to say anything about your application crashing, because there are numerous hiccups involved when it comes to tight memory conditions: You could hit a hard address space limit (for example by default 32-bit Windows only has 2GB address space per user process, this can be changed, http://www.fmepedia.com/index.php/Category:Windows_3GB_Switch_FAQ ), or be eaten alive by the OOM killer ( Not a mythical beast:, see http://lwn.net/Articles/104179/ ).

What I'd suggest in any case to think about a way to keep the data on disk and treat the main memory as a kind of Level-4 cache for the data. For example if you have, say, blobs of data, then wrap these in a class which can transparently load the blobs from disk when they are needed and registers to some kind of memory manager which can ask some of the blob-holders to free up their memory before the memory conditions become unbearable. A buffer cache thus.

回答2:

The user has the option to load in a number of files which are used in a graphical display.

Usual trick is not to load the data into memory directly, but rather use the memory mapping mechanism to make the files look like memory.

You need to make sure that the memory mapping is done in read-only mode to allow the OS to evict it from RAM if it is needed for something else.

If the user selects more data than the OS can handle, the application crashes pretty hard.

Depending on OS it is either: application is missing some memory allocation error handling or you really getting to the limit of available virtual memory.

Some OSs also have an administrative limit on how large the heap of application can grow.

On my test system, that number is about the 2 gigs of physical RAM.

It sounds like:

your application is 32-bits and
your OS uses the 2GB/2GB virtual memory split.

To avoid hitting the limit, your need to:

upgrade your app and OS to 64-bit or
tell OS (IIRC patch for Windows; most Linuxes already have it) to use 3GB/1GB virtual memory split. Some 32-bit OSs are using 2GB/2GB memory split: 2GB of virtual memory for kernel and 2 for the user application. 3/1 split means 1GB of VM for kernel, 3 for the user application.

回答3:

How about maintaining a header table instead of loading the entire data. Load the actual page when the user requests the data. Also use some data compression algorithms (like 7zip, znet etc.) which reduce the file size. (In my project they reduced the size from 200MB to 2MB)

回答4:

I mention this because it was only briefly mentioned above, but it seems a "file paging system" could be a solution. These systems read large data sets in "chunks" by breaking the files into pieces. Once written, they generally "just work" and you hopefully won't have to tinker with them anymore.

Reading Large Files

Variable Length Data in File--Paging

New Link below with very good answer.

Handling Files greater than 2 GB

Search term: "file paging lang:C++" add large or above 2GB for more. HTH

回答5:

Not sure if you are hitting it or not, but if you are using Linux, malloc will typically not fail, and operator new will typically not throw bad_alloc. This is because Linux will overcommit, and instead kill your process when it decides the system doesn't have enough memory, possibly at a page fault.

See: Google search for "oom killer".

You can disable this behavior with:

echo 2 > /proc/sys/vm/overcommit_memory

回答6:

Upgrade to a 64-bit CPU, 64-bit OS and 64-bit compiler, and make sure you have plenty of RAM.

A 32-bit app is restricted to 2GB of memory (regardless of how much physical RAM you have). This is because a 32-bit pointer can address 2^32 bytes == 4GB of virtual memory. 20 years ago this seemed like a huge amount of memory, so the original OS designers allocated 2GB to the running application and reserved 2GB for use by the OS. There are various tricks you can do to access more than 2GB, but they're complex. It's probably easier to upgrade to 64-bit.

来源：https://stackoverflow.com/questions/3494340/dealing-with-large-amounts-of-data-in-c

标签

c++

memory-management