I have 8823 data points with x,y coordinates. I'm trying to follow the answer on how to get a scatter dataset to be represented as a heatmap but when I go through the
X, Y = np.meshgrid(x, y)
instruction with my data arrays I get MemoryError
. I am new to numpy and matplotlib and am essentially trying to run this by adapting the examples I can find.
Here's how I built my arrays from a file that has them stored:
XY_File = open ('XY_Output.txt', 'r') XY = XY_File.readlines() XY_File.close() Xf=[] Yf=[] for line in XY: Xf.append(float(line.split('\t')[0])) Yf.append(float(line.split('\t')[1])) x=array(Xf) y=array(Yf)
Is there a problem with my arrays? This same code worked when put into this example but I'm not too sure.
Why am I getting this MemoryError and how can I fix this?
Your call to meshgrid
requires a lot of memory -- it produces two 8823*8823 floating point arrays. Each of them are about 0.6 GB.
But your screen can't show (and your eye can't really process) that much information anyway, so you should probably think of a way to smooth your data to something more reasonable like 1024*1024 before you do this step.
in numpy 1.7.0 and newer meshgrid
has the sparse
keyword argument. A sparse meshgrid is setup so it broadcasts to a full meshgrid when used. This can save large amounts of memory e.g. when using the meshgrid to index arrays.
In [2]: np.meshgrid(np.arange(10), np.arange(10), sparse=True) Out[2]: [array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]), array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])]
Another option is to use smaller integers that are still able to represent the range:
np.meshgrid(np.arange(10).astype(np.int8), np.arange(10).astype(np.int8), sparse=True, copy=False)
though as of numpy 1.9 using these smaller integers for indexing will be slower as they will internally be converted back to larger integers in small (np.setbufsize sized) chunks.