I have 8823 data points with x,y coordinates. I\'m trying to follow the answer on how to get a scatter dataset to be represented as a heatmap but when I go
When you call np.meshgrid for scatter figure, you need to normalize your data if it is too large to process, try this module
# Feature Scaling
from sklearn.preprocessing import StandardScaler
st = StandardScaler()
X = st.fit_transform(X)
Your call to meshgrid
requires a lot of memory -- it produces two 8823*8823 floating point arrays. Each of them are about 0.6 GB.
But your screen can't show (and your eye can't really process) that much information anyway, so you should probably think of a way to smooth your data to something more reasonable like 1024*1024 before you do this step.
in numpy 1.7.0 and newer meshgrid
has the sparse
keyword argument. A sparse meshgrid is setup so it broadcasts to a full meshgrid when used. This can save large amounts of memory e.g. when using the meshgrid to index arrays.
In [2]: np.meshgrid(np.arange(10), np.arange(10), sparse=True)
Out[2]:
[array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]), array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])]
Another option is to use smaller integers that are still able to represent the range:
np.meshgrid(np.arange(10).astype(np.int8), np.arange(10).astype(np.int8),
sparse=True, copy=False)
though as of numpy 1.9 using these smaller integers for indexing will be slower as they will internally be converted back to larger integers in small (np.setbufsize sized) chunks.