Increasing memory limit in Python?

一曲冷凌霜 提交于 2019-12-30 08:14:39

问题


I am currently using a function making extremely long dictionaries (used to compare DNA strings) and sometimes I'm getting MemoryError. Is there a way to allot more memory to Python so it can deal with more data at once?


回答1:


Python doesn’t limit memory usage on your program. It will allocate as much memory as your program needs until your computer is out of memory. The most you can do is reduce the limit to a fixed upper cap. That can be done with the resource module, but it isn't what you're looking for.

You'd need to look at making your code more memory/performance friendly.




回答2:


If you use linux, You can try to Extend Memory with Swap - a simple way to run programs which require more memory than installed in the machine.

However, a better way is to update to program to handle the data in chunks if possible or to extend memory in the machine as there is a performance penalty using this method (using the slower disk device).




回答3:


Python has MomeoryError which is the limit of your System RAM util you've defined it manually with resource package.

Defining your class with slots makes the python interpreter know that the attributes/members of your class are fixed. And can lead to significant memory savings!

You can reduce dict creation by python interpreter by using __slot__ . This will tell interpreter to not create dict internally and reuse same variable.

If the memory consumed by your python processes will continue to grow with time. This seems to be a combination of:

  • How the C memory allocator in Python works. This is essentially memory fragmentation, because the allocation cannot call ‘free’ unless the entire memory chunk is unused. But the memory chunk usage is usually not perfectly aligned to the objects that you are creating and using.
  • Using a number of small string to compare data. A process called interning used internally but creating multiple small strings brings load on interpreter.

The best way is to create Worker Thread or single threaded pool to do your work and invalidate worker/kill to free up resources attached/used in worker thread.

Below code creates single thread worker :

__slot__ = ('dna1','dna2','lock','errorResultMap')
lock = threading.Lock()
errorResultMap = []
def process_dna_compare(dna1, dna2):
    with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
        futures = {executor.submit(getDnaDict, lock, dna_key): dna_key for dna_key in dna1}
    '''max_workers=1 will create single threadpool'''
    dna_differences_map={}
    count = 0
    dna_processed = False;
    for future in concurrent.futures.as_completed(futures):
        result_dict = future.result()
        if result_dict :
            count += 1
            '''Do your processing XYZ here'''
    logger.info('Total dna keys processed ' + str(count))

def getDnaDict(lock,dna_key):
    '''process dna_key here and return item'''
    try:
        dataItem = item[0]
        return dataItem
    except:
        lock.acquire()
        errorResultMap.append({'dna_key_1': '', 'dna_key_2': dna_key_2, 'dna_key_3': dna_key_3,
                          'dna_key_4': 'No data for dna found'})
        lock.release()
        logger.error('Error in processing dna :'+ dna_key)
    pass

if __name__ == "__main__":
    dna1 = '''get data for dna1'''
    dna2 = '''get data for dna2'''
    process_dna_compare(dna1,dna2)
    if errorResultMap != []:
       ''' print or write to file the errorResultMap'''

Below code will help you understand memory usage : import objgraph import random import inspect

class Dna(object):
    def __init__(self):
        self.val = None
    def __str__(self):
        return "dna – val: {0}".format(self.val)

def f():
    l = []
    for i in range(3):
        dna = Dna()
        #print “id of dna: {0}”.format(id(dna))
        #print “dna is: {0}”.format(dna)
        l.append(dna)
    return l

def main():
    d = {}
    l = f()
    d['k'] = l
    print("list l has {0} objects of type Dna()".format(len(l)))
    objgraph.show_most_common_types()
    objgraph.show_backrefs(random.choice(objgraph.by_type('Dna')),
    filename="dna_refs.png")

    objgraph.show_refs(d, filename='myDna-image.png')

if __name__ == "__main__":
    main()

Output for memory usage :

list l has 3 objects of type Dna()
function                   2021
wrapper_descriptor         1072
dict                       998
method_descriptor          778
builtin_function_or_method 759
tuple                      667
weakref                    577
getset_descriptor          396
member_descriptor          296
type                       180

More read on slots please visit : https://elfsternberg.com/2009/07/06/python-what-the-hell-is-a-slot/



来源:https://stackoverflow.com/questions/44508254/increasing-memory-limit-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!