I am building a large data dictionary from a set of text files. As I read in the lines and process them, I append(dataline)
to a list.
At some point th
I had a similar problem happening when evaluating an expression containing large numpy
arrays (actually, one was sparse). I was doing this on a machine with 64GB of memory, of which only about 8GB was in use, so was surprised to get the MemoryError
.
It turned out that my problem was array shape broadcasting: I had inadvertently duplicated a large dimension.
It went something like this:
(286577, 1)
where I was expecting (286577)
. (286577, 130)
. (286577)
, I applied [:,newaxis]
in the expression to bring it to (286577,1)
so it would be broadcast to (286577,130)
. (286577,1)
however, [:,newaxis]
produced shape (286577,1,1)
and the two arrays were both broadcast to shape (286577,286577,130)
... of doubles. With two such arrays, that comes to about 80GB!