Java handling billions bytes

问题

I'm creating a compression algorithm in Java; to use my algorithm I require a lot of information about the structure of the target file.

After collecting the data, I need to reread the file. <- But I don't want to.

While rereading the file, I make it a good target for compression by 'converting' the data of the file to a rather peculiar format. Then I compress it.

The problems now are:

I don't want to open a new FileInputStream for rereading the file.
I don't want to save the converted file which is usually 150% the size of the target file to the disk.

Are there any ways to 'reset' a FileInputStream for moving to the start of the file, and how would I store the huge amount 'converted' data efficiently without writing to the disk?

回答1:

You can use one or more RandomAccessFiles. You can memory map them to ByteBuffer() which doesn't consume heap (actually they use about 128 bytes) or direct memory but can be accessed randomly.

Your temporary data can be storing in a direct ByteBuffer(s) or more memory mapped files. Since you have random access to the original data, you may not need to duplicate as much data in memory as you think.

This way you can access the whole data with just a few KB of heap.

回答2:

There's the reset method, but you need to wrap the FileInputStream in a BufferedInputStream.

回答3:

You could use RandomAccessFile, or java.nio ByteBuffer is what you are looking for. (I do not know.)

Resources might be saved by pipes/streams: immediately writing to a compressed stream.

To answer your questions on reset: not possible; the base class InputStream has provisions for mark and reset-to-mark, but FileInputStream was made optimal for several operating systems and does purely sequential input. Closing and opening is best.

来源：https://stackoverflow.com/questions/7984740/java-handling-billions-bytes

标签

java

bigdata