c handle large file

此生再无相见时 提交于 2019-12-03 07:34:14

On both *nix and Windows, there are extensions to the I/O routines that touch file size that will support sizes larger than 2GB or 4GB. Naturally, the underlying file system must also support a file that large. On Windows, NTFS does, but FAT doesn't for instance. This is generally known as "large file support".

The two routines that are most critical for these purposes are fseek() and ftell() so that you can do random access to the whole file. Otherwise, the ordinary fopen() and fread() and friends can do sequential access to any size of file as long as the underlying OS and stdio implementation support large files.

Assuming you're on a linux/bsd/mac/notwindows 64-bit system (and seriously, who isn't these days?), mmap performs extremely well. It essentially lets you map a whole file into a process' address space and let the kernel perform caching/paging for you.

And if you MUST use windows, here's the same concept, but made by the friendly folks at Redmond. Note that for either of these, you will want to be running on a 64-bit system as the ABSOLUTE largest file you can map on a 32-bit system is ~4GB.

Define the macro -D_FILE_OFFSET_BITS=64 or #define _FILE_OFFSET_BITS 64 for all relevant sources (preferably the entire project). This common macro is provided automatically by several common build systems. Then use off_t (which will be 64 bit now) wherever the API requires it.

In addition to RBerteig's and Matt's answer:

If you enable the 64 bit IO support correctly and carefully for all your files in your project (for which the methods are systemn dependent) you don't have to be worried about integer overflow if you use the correct types, I think. off_t should then be the correct choice to position your file pointer.

If all else fails go with the correct C99 types if you make assumptions about the width of the type. Using int or long is almost always the wrong thing to do, they are too much compiler/platform dependent. Use int64_t (or int_fast64_t if you don't have that).

Depending on the Chomsky level there may be several free and commercial toolkits to create parsers for file format. I think the real problem you think you have is how to 'handle' several GB's of data.

Do you want all of the data in memory simultaneously ?
One way is to write out parts of file on to disk in temporary files, when not in use. Simple fread / fwrite of struct, and some clever ref-counted 'on demand' loading and writing can do this,

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!