Read big text file to HashMap - heap overflow

问题

I'm trying to get the data from a text file into a HashMap. The text-file has the following format:

it has something like 7 million lines... (size: 700MB)

So what I do is: I read each line, then I take the fields in green and concatenate them into a string which will the HashMap key. The Value will be the fild in red.

everytime I read a line I have to check in the HashMap if there is already an entry with such key, if so, I just update the value summing the value with the red; If not, a new entry is added to the HashMap.

I tried this with text-files with 70.000 lines, and it works quite well.

But now with the 7 Million line text-file I get a "java heap space" issue, like in the image:

Is this due to the HashMap ? Is it possible to optimize my algorithm ?

回答1:

You should increase your heap space

-Xms<size>        set initial Java heap size
-Xmx<size>        set maximum Java heap size

java -Xms1024m -Xmx2048m

A nice read From Java code to Java heap

Table 3. Attributes of a HashMap
Default capacity                     16 entries
Empty size                           128 bytes
Overhead                             64 bytes plus 36 bytes per entry
Overhead for a 10K collection   ~    360K
Search/insert/delete performance    O(1) — Time taken is constant time, regardless of the number of elements (assuming no hash collisions)

If you consider above table overhead for 7 Million records come to around 246 MB so your minimum heap size must be around 1000 MB

回答2:

As well as changing the heap size, consider 'compressing' (encoding) the keys by storing them as packed binary, not String.

Each IP address can be stored as 4 bytes. The port numbers (if that's what they are) are 2 bytes each. The protocol can probably be stored as a byte or less.

That's 13 bytes, rather than maybe 70 stored as a UTF16 String, reducing the memory for keys by a factor of 5, if my maths is correct at this time of night...

来源：https://stackoverflow.com/questions/13076097/read-big-text-file-to-hashmap-heap-overflow

标签

java

hashmap

text-files

heap-memory