可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am building a large data dictionary from a set of text files. As I read in the lines and process them, I append(dataline)
to a list.
At some point the append()
generates a Memory Error
exception. However, watching the program run in the Windows Task Manager, at the point of the crash I see 4.3 GB available and 1.1 GB free.
Thus, I do not understand the reason for the exception.
Python version is 2.6.6. I guess, the only reason is that it is not able to use more of the available RAM. If this is so, is it possible to increase the allocation?
回答1:
If you're using a 32-bit build of Python, you might want to try a 64-bit version.
It is possible for a process to address at most 4GB of RAM using 32-bit addresses, but typically (depending on the OS), one gets much less. It sounds like your Python process may be hitting this limit. 64-bit addressing removes this limitation.
edit Since you're asking about Windows, the following page is of relevance: Memory Limits for Windows Releases. As you can see, the limit per 32-bit process is 2, 3 or 4GB depending on the OS version and configuration.
回答2:
If you're open to restructuring the code instead of throwing more memory at it, you might be able to get by with this:
data = (processraw(raw) for raw in lines)
where lines
is either a list of lines or file.xreadlines()
or similar.
回答3:
I had a similar problem using a 32-bit version of python in a 64-bit windows environment. I tried the 64-bit windows version of python and very quickly ran into troubles with the Scipy libraries compiled for 64-bit windows.
The totally free solution that I implemented was
1) Install VirtualBox
2) Install CentOS 5.6 on the VM
3) Get the Enthought Python Distribution (Free 64 bit Linux Version).
Now all of my Numpy, Scipy, and Matplotlib dependant python code can use as much memory as I have Ram and available Linux swap.
回答4:
I had a similar problem happening when evaluating an expression containing large numpy
arrays (actually, one was sparse). I was doing this on a machine with 64GB of memory, of which only about 8GB was in use, so was surprised to get the MemoryError
.
It turned out that my problem was array shape broadcasting: I had inadvertently duplicated a large dimension.
It went something like this:
- I had passed an array with shape
(286577, 1)
where I was expecting (286577)
. - This was subracted from an array with shape
(286577, 130)
. - Because I was expecting
(286577)
, I applied [:,newaxis]
in the expression to bring it to (286577,1)
so it would be broadcast to (286577,130)
. - When I passed shape
(286577,1)
however, [:,newaxis]
produced shape (286577,1,1)
and the two arrays were both broadcast to shape (286577,286577,130)
... of doubles. With two such arrays, that comes to about 80GB!
回答5:
As its been already mentioned, you'll need a python64 bit (of a 64-bit version of windows).
Be aware that you'll probably face a lot of conflicts and problems with some of the basic packages you might want to work with. to avoid this problem I'd recommend Anaconda from Continuum Analytics. I'd advice you to look into it :)