问题
I'm trying to get the data from a text file into a HashMap. The text-file has the following format:

it has something like 7 million lines... (size: 700MB)
So what I do is: I read each line, then I take the fields in green and concatenate them into a string which will the HashMap key. The Value will be the fild in red.
everytime I read a line I have to check in the HashMap if there is already an entry with such key, if so, I just update the value summing the value with the red; If not, a new entry is added to the HashMap.
I tried this with text-files with 70.000 lines, and it works quite well.
But now with the 7 Million line text-file I get a "java heap space" issue, like in the image:

Is this due to the HashMap ? Is it possible to optimize my algorithm ?
回答1:
You should increase your heap space
-Xms<size> set initial Java heap size
-Xmx<size> set maximum Java heap size
java -Xms1024m -Xmx2048m
A nice read From Java code to Java heap
Table 3. Attributes of a HashMap
Default capacity 16 entries
Empty size 128 bytes
Overhead 64 bytes plus 36 bytes per entry
Overhead for a 10K collection ~ 360K
Search/insert/delete performance O(1) — Time taken is constant time, regardless of the number of elements (assuming no hash collisions)
If you consider above table overhead for 7 Million
records come to around 246 MB
so your minimum heap size must be around 1000 MB
回答2:
As well as changing the heap size, consider 'compressing' (encoding) the keys by storing them as packed binary, not String.
Each IP address can be stored as 4 bytes. The port numbers (if that's what they are) are 2 bytes each. The protocol can probably be stored as a byte or less.
That's 13 bytes, rather than maybe 70 stored as a UTF16 String, reducing the memory for keys by a factor of 5, if my maths is correct at this time of night...
来源:https://stackoverflow.com/questions/13076097/read-big-text-file-to-hashmap-heap-overflow