How to append rows in a pandas dataframe in a for loop?

前端 未结 5 1557
日久生厌
日久生厌 2020-11-29 16:43

I have the following for loop:

for i in links:
     data = urllib2.urlopen(str(i)).read()
     data = json.loads(data)
     data = pd.DataFrame(data.items())         


        
5条回答
  •  失恋的感觉
    2020-11-29 17:08

    Suppose your data looks like this:

    import pandas as pd
    import numpy as np
    
    np.random.seed(2015)
    df = pd.DataFrame([])
    for i in range(5):
        data = dict(zip(np.random.choice(10, replace=False, size=5),
                        np.random.randint(10, size=5)))
        data = pd.DataFrame(data.items())
        data = data.transpose()
        data.columns = data.iloc[0]
        data = data.drop(data.index[[0]])
        df = df.append(data)
    print('{}\n'.format(df))
    # 0   0   1   2   3   4   5   6   7   8   9
    # 1   6 NaN NaN   8   5 NaN NaN   7   0 NaN
    # 1 NaN   9   6 NaN   2 NaN   1 NaN NaN   2
    # 1 NaN   2   2   1   2 NaN   1 NaN NaN NaN
    # 1   6 NaN   6 NaN   4   4   0 NaN NaN NaN
    # 1 NaN   9 NaN   9 NaN   7   1   9 NaN NaN
    

    Then it could be replaced with

    np.random.seed(2015)
    data = []
    for i in range(5):
        data.append(dict(zip(np.random.choice(10, replace=False, size=5),
                             np.random.randint(10, size=5))))
    df = pd.DataFrame(data)
    print(df)
    

    In other words, do not form a new DataFrame for each row. Instead, collect all the data in a list of dicts, and then call df = pd.DataFrame(data) once at the end, outside the loop.

    Each call to df.append requires allocating space for a new DataFrame with one extra row, copying all the data from the original DataFrame into the new DataFrame, and then copying data into the new row. All that allocation and copying makes calling df.append in a loop very inefficient. The time cost of copying grows quadratically with the number of rows. Not only is the call-DataFrame-once code easier to write, it's performance will be much better -- the time cost of copying grows linearly with the number of rows.

提交回复
热议问题