Read big text file to HashMap - heap overflow

半城伤御伤魂 提交于 2019-12-13 19:00:26

问题


I'm trying to get the data from a text file into a HashMap. The text-file has the following format:

it has something like 7 million lines... (size: 700MB)

So what I do is: I read each line, then I take the fields in green and concatenate them into a string which will the HashMap key. The Value will be the fild in red.

everytime I read a line I have to check in the HashMap if there is already an entry with such key, if so, I just update the value summing the value with the red; If not, a new entry is added to the HashMap.

I tried this with text-files with 70.000 lines, and it works quite well.

But now with the 7 Million line text-file I get a "java heap space" issue, like in the image:

Is this due to the HashMap ? Is it possible to optimize my algorithm ?


回答1:


You should increase your heap space

-Xms<size>        set initial Java heap size
-Xmx<size>        set maximum Java heap size

java -Xms1024m -Xmx2048m

A nice read From Java code to Java heap

Table 3. Attributes of a HashMap
Default capacity                     16 entries
Empty size                           128 bytes
Overhead                             64 bytes plus 36 bytes per entry
Overhead for a 10K collection   ~    360K
Search/insert/delete performance    O(1) — Time taken is constant time, regardless of the number of elements (assuming no hash collisions)

If you consider above table overhead for 7 Million records come to around 246 MB so your minimum heap size must be around 1000 MB




回答2:


As well as changing the heap size, consider 'compressing' (encoding) the keys by storing them as packed binary, not String.

Each IP address can be stored as 4 bytes. The port numbers (if that's what they are) are 2 bytes each. The protocol can probably be stored as a byte or less.

That's 13 bytes, rather than maybe 70 stored as a UTF16 String, reducing the memory for keys by a factor of 5, if my maths is correct at this time of night...



来源:https://stackoverflow.com/questions/13076097/read-big-text-file-to-hashmap-heap-overflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!