Maximum size of pandas dataframe

前端 未结 1 801
南旧
南旧 2020-12-05 00:11

I\'m trying to read in a somewhat large dataset using pandas read_csv or read_stata functions, but I keep running into Memory Er

相关标签:
1条回答
  • 2020-12-05 00:47

    I'm going to post this answer as was discussed in comments. I've seen it come up numerous times without an accepted answer.

    The Memory Error is intuitive - out of memory. But sometimes the solution or the debugging of this error is frustrating as you have enough memory, but the error remains.

    1) Check for code errors

    This may be a "dumb step" but that's why it's first. Make sure there are no infinite loops or things that will knowingly take a long time (like using something the os module that will search your entire computer and put the output in an excel file)

    2) Make your code more efficient

    Goes along the lines of Step 1. But if something simple is taking a long time, there's usually a module or a better way of doing something that is faster and more memory efficent. That's the beauty of Python and/or open source Languages!

    3) Check The Total Memory of the object

    The first step is to check the memory of an object. There are a ton of threads on Stack about this, so you can search them. Popular answers are here and here

    to find the size of an object in bites you can always use sys.getsizeof():

    import sys
    print(sys.getsizeof(OBEJCT_NAME_HERE))
    

    Now the error might happen before anything is created, but if you read the csv in chunks you can see how much memory is being used per chunk.

    4) Check the memory while running

    Sometimes you have enough memory but the function you are running consumes a lot of memory at runtime. This causes memory to spike beyond the actual size of the finished object causing the code/process to error. Checking memory in real time is lengthy, but can be done. Ipython is good with that. Check Their Document.

    use the code below to see the documentation straight in Jupyter Notebook:

    %mprun?
    %memit?
    

    Sample use:

    %load_ext memory_profiler
    def lol(x):
        return x
    %memit lol(500)
    #output --- peak memory: 48.31 MiB, increment: 0.00 MiB
    

    If you need help on magic functions This is a great post

    5) This one may be first.... but Check for simple things like bit version

    As in your case, a simple switching of the version of python you were running solved the issue.

    Usually the above steps solve my issues.

    0 讨论(0)
提交回复
热议问题