Concatenate two big numpy 2D arrays

隐身守侯 提交于 2019-12-06 03:15:27

Unless there's something wrong with your NumPy build or your OS (both of which are unlikely), this is almost certainly a memory error.

For example, let's say all these values are float64. So, you've already allocated at least 18GB and 20GB for these two arrays, and now you're trying to allocate another 38GB for the concatenated array. But you only have, say, 64GB of RAM plus 2GB of swap. So, there's not enough room to allocate another 38GB. On some platforms, this allocation will just fail, which hopefully NumPy would just catch and raise a MemoryError. On other platforms, the allocation may succeed, but as soon as you try to actually touch all of that memory you'll segfault (see overcommit handling in linux for an example). On other platforms, the system will try to auto-expand swap, but then if you're out of disk space it'll segfault.

Whatever the reason, if you can't fit X1, X2, and X into memory at the same time, what can you do instead?

  • Just build X in the first place, and fill X1 and X2 by filling sliced views of X.
  • Write X1 and X2 out to disk, concatenate on disk, and read them back in.
  • Send X1 and X2 to a subprocess that reads them iteratively and builds X and then continues the work.

Not an expert in numpy but, why not use numpy.concatenate()?

http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html

For example:

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
   [3, 4],
   [5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
   [3, 4, 6]])
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!