When would the python tracemalloc module allocations statistics not match what's shown in ps or pmap?

问题

I'm trying to track down a memory leak, so I've done

import tracemalloc
tracemalloc.start()

<function call>

# copy pasted this from documentation
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("[ Top 10 ]")
for stat in top_stats[:10]:
    print(stat)

This shows no major allocations, all memory allocations are pretty small, while I'm seeing 8+ GB memory allocated in ps and pmap (checking before and after running the command, and after running garbage collection). Furthermore, tracemalloc.get_traced_memory confirms that tracemalloc is not seeing many allocations. pympler also does not see the allocations.

Does anyone know when this could be the case? Some modules are using cython, could this cause issues for tracemalloc?

In pmap the allocation looks like:

0000000002183000 6492008 6491876 6491876 rw--- [ anon ]

回答1:

From the documentation on tracemalloc:

The tracemalloc module is a debug tool to trace memory blocks allocated by Python.

In other words, memory not allocated by the python interpreter is not seen by tracemalloc. This would include anything not done by PyMalloc at the C-API level, including all standard libc malloc calls by native code used via extensions, or extension code using malloc directly.

Whether that is the case here is impossible to tell for certain without code to reproduce. You can try running the native code part outside of python through, for example, valgrind, to detect memory leaks in the native code.

If there is cython code doing malloc, that could be switched to PyMalloc to have it traced.

回答2:

An addition to @danny's answer, because it is too long for a comment.

As explained in PEP-464, tracemalloc uses functionality introduced in PEP-445 for tracking of the memory allocations.

Normally, one would have to use PyMem_RawMalloc instead of malloc in order to be able to use tracemalloc for a C-extension. However, since quite some time also using PyTraceMalloc_Track and PyTraceMalloc_Untrack from pymem.h as addition to malloc(instead of replacing it by PyMem_RawMalloc).

This is for example what is used in numpy, because in order to be able to wrap raw-c-pointers and take over its ownership numpy used malloc rather than the python-allocator, which is optimized for small objects - not the most crucial scenario for numpy, as can be seen here:

/*NUMPY_API
 * Allocates memory for array data.
 */
NPY_NO_EXPORT void *
PyDataMem_NEW(size_t size)
{
    void *result;

    result = malloc(size);
    if (_PyDataMem_eventhook != NULL) {
        NPY_ALLOW_C_API_DEF
        NPY_ALLOW_C_API
        if (_PyDataMem_eventhook != NULL) {
            (*_PyDataMem_eventhook)(NULL, result, size,
                                    _PyDataMem_eventhook_user_data);
        }
        NPY_DISABLE_C_API
    }
    PyTraceMalloc_Track(NPY_TRACE_DOMAIN, (npy_uintp)result, size);
    return result;
}

So basically, it is a responsibility of the C-extension to report memory allocations to the tracemalloc-module, on the other hand tracemalloc cannot be really trusted to register all memory allocations.

来源：https://stackoverflow.com/questions/50148554/when-would-the-python-tracemalloc-module-allocations-statistics-not-match-whats

标签

python

memory-management

cython