Huge memory leak in repeated os.path.isdir calls?

折月煮酒 提交于 2019-11-30 16:27:49

问题


I've been scripting something that has to do with scanning directories and noticed a severe memory leak when calling os.path.isdir, so I've tried the following snippet:

def func():
    if not os.path.isdir('D:\Downloads'):
        return False
while True:
    func()

Within a few seconds, the Python process reached 100MB RAM.

I'm trying to figure out what's going on. It seems like the huge memory leak is in effect only when the path is indeed a valid directory path (meaning the 'return False' is not executed). Also, it is interesting to see what happens in related calls, like os.path.isfile.

Thoughts?

Edit: I think I'm onto something. Although isfile and isdir are implemented in the genericpath module, on Windows system - isdir is being imported from the builtin nt. So I had to download the 2.7.3 source (which I should've done long time ago...).

After a little bit of searching, I found out posix__isdir function in \Modules\posixmodule.c, which I assume is the 'isdir' function imported from nt.

This part of the function (and comment) caught my eye:

if (PyArg_ParseTuple(args, "U|:_isdir", &po)) {
        Py_UNICODE *wpath = PyUnicode_AS_UNICODE(po);

        attributes = GetFileAttributesW(wpath);
        if (attributes == INVALID_FILE_ATTRIBUTES)
            Py_RETURN_FALSE;
        goto check;
    }
    /* Drop the argument parsing error as narrow strings
       are also valid. */
    PyErr_Clear();

It seems that it all boils down to Unicode/ASCII handling bug.

I've just tried my snippet above with path argument in unicode (i.e. u'D:\Downloads') - no memory leak whatsoever. haha.


回答1:


The root cause is a failure to call PyMem_Free on the path variable in the non-Unicode path:

    if (!PyArg_ParseTuple(args, "et:_isdir",
                          Py_FileSystemDefaultEncoding, &path))
        return NULL;

    attributes = GetFileAttributesA(path);
    if (attributes == INVALID_FILE_ATTRIBUTES)
        Py_RETURN_FALSE;

check:
    if (attributes & FILE_ATTRIBUTE_DIRECTORY)
        Py_RETURN_TRUE;
    else
        Py_RETURN_FALSE;

As per the documentation on PyArg_ParseTuple:

  • et: Same as es...
  • es: PyArg_ParseTuple() will allocate a buffer of the needed size, copy the encoded data into this buffer and adjust *buffer to reference the newly allocated storage. The caller is responsible for calling PyMem_Free() to free the allocated buffer after use.

It's a bug in Python's standard library (fixed in Python 3 by using bytes objects directly); file a bug report at http://bugs.python.org.



来源:https://stackoverflow.com/questions/12648737/huge-memory-leak-in-repeated-os-path-isdir-calls

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!