How to append rows in a pandas dataframe in a for loop?

前端 未结 5 1556
日久生厌
日久生厌 2020-11-29 16:43

I have the following for loop:

for i in links:
     data = urllib2.urlopen(str(i)).read()
     data = json.loads(data)
     data = pd.DataFrame(data.items())         


        
相关标签:
5条回答
  • 2020-11-29 17:00

    I have created a data frame in a for loop with the help of a temporary empty data frame. Because for every iteration of for loop, a new data frame will be created thereby overwriting the contents of previous iteration.

    Hence I need to move the contents of the data frame to the empty data frame that was created already. It's as simple as that. We just need to use .append function as shown below :

    temp_df = pd.DataFrame() #Temporary empty dataframe
    for sent in Sentences:
        New_df = pd.DataFrame({'words': sent.words}) #Creates a new dataframe and contains tokenized words of input sentences
        temp_df = temp_df.append(New_df, ignore_index=True) #Moving the contents of newly created dataframe to the temporary dataframe
    

    Outside the for loop, you can copy the contents of the temporary data frame into the master data frame and then delete the temporary data frame if you don't need it

    0 讨论(0)
  • 2020-11-29 17:01

    A more compact and efficient way would be perhaps:

    cols = ['frame', 'count']
    N = 4
    dat = pd.DataFrame(columns = cols)
    for i in range(N):
    
        dat = dat.append({'frame': str(i), 'count':i},ignore_index=True)
    

    output would be:

    >>> dat
       frame count
    0     0     0
    1     1     1
    2     2     2
    3     3     3
    
    0 讨论(0)
  • 2020-11-29 17:08

    Suppose your data looks like this:

    import pandas as pd
    import numpy as np
    
    np.random.seed(2015)
    df = pd.DataFrame([])
    for i in range(5):
        data = dict(zip(np.random.choice(10, replace=False, size=5),
                        np.random.randint(10, size=5)))
        data = pd.DataFrame(data.items())
        data = data.transpose()
        data.columns = data.iloc[0]
        data = data.drop(data.index[[0]])
        df = df.append(data)
    print('{}\n'.format(df))
    # 0   0   1   2   3   4   5   6   7   8   9
    # 1   6 NaN NaN   8   5 NaN NaN   7   0 NaN
    # 1 NaN   9   6 NaN   2 NaN   1 NaN NaN   2
    # 1 NaN   2   2   1   2 NaN   1 NaN NaN NaN
    # 1   6 NaN   6 NaN   4   4   0 NaN NaN NaN
    # 1 NaN   9 NaN   9 NaN   7   1   9 NaN NaN
    

    Then it could be replaced with

    np.random.seed(2015)
    data = []
    for i in range(5):
        data.append(dict(zip(np.random.choice(10, replace=False, size=5),
                             np.random.randint(10, size=5))))
    df = pd.DataFrame(data)
    print(df)
    

    In other words, do not form a new DataFrame for each row. Instead, collect all the data in a list of dicts, and then call df = pd.DataFrame(data) once at the end, outside the loop.

    Each call to df.append requires allocating space for a new DataFrame with one extra row, copying all the data from the original DataFrame into the new DataFrame, and then copying data into the new row. All that allocation and copying makes calling df.append in a loop very inefficient. The time cost of copying grows quadratically with the number of rows. Not only is the call-DataFrame-once code easier to write, it's performance will be much better -- the time cost of copying grows linearly with the number of rows.

    0 讨论(0)
  • 2020-11-29 17:12

    There are 2 reasons you may append rows in a loop, 1. add to an existing df, and 2. create a new df.

    to create a new df, I think its well documented that you should either create your data as a list and then create the data frame:

    cols = ['c1', 'c2', 'c3']
    lst = []
    for a in range(2):
        lst.append([1, 2, 3])
    df1 = pd.DataFrame(lst, columns=cols)
    df1
    Out[3]: 
       c1  c2  c3
    0   1   2   3
    1   1   2   3
    

    OR, Create the dataframe with an index and then add to it

    cols = ['c1', 'c2', 'c3']
    df2 = pd.DataFrame(columns=cols, index=range(2))
    for a in range(2):
        df2.loc[a].c1 = 4
        df2.loc[a].c2 = 5
        df2.loc[a].c3 = 6
    df2
    Out[4]: 
      c1 c2 c3
    0  4  5  6
    1  4  5  6
    

    If you want to add to an existing dataframe, you could use either method above and then append the df's together (with or without the index):

    df3 = df2.append(df1, ignore_index=True)
    df3
    Out[6]: 
      c1 c2 c3
    0  4  5  6
    1  4  5  6
    2  1  2  3
    3  1  2  3
    

    Or, you can also create a list of dictionary entries and append those as in the answer above.

    lst_dict = []
    for a in range(2):
        lst_dict.append({'c1':2, 'c2':2, 'c3': 3})
    df4 = df1.append(lst_dict)
    df4
    Out[7]: 
       c1  c2  c3
    0   1   2   3
    1   1   2   3
    0   2   2   3
    1   2   2   3
    

    Using the dict(zip(cols, vals)))

    lst_dict = []
    for a in range(2):
        vals = [7, 8, 9]
        lst_dict.append(dict(zip(cols, vals)))
    df5 = df1.append(lst_dict)
    
    0 讨论(0)
  • 2020-11-29 17:17

    First, create a empty DataFrame with column names, after that, inside the for loop, you must define a dictionary (a row) with the data to append:

    df = pd.DataFrame(columns=['A'])
    for i in range(5):
        df = df.append({'A': i}, ignore_index=True)
    df
       A
    0  0
    1  1
    2  2
    3  3
    4  4
    

    If you want to add a row with more columns, the code will looks like this:

    df = pd.DataFrame(columns=['A','B','C'])
    for i in range(5):
        df = df.append({'A': i,
                        'B': i * 2,
                        'C': i * 3,
                       }
                       ,ignore_index=True
                      )
    df
        A   B   C
    0   0   0   0
    1   1   2   3
    2   2   4   6
    3   3   6   9
    4   4   8   12
    

    Source

    0 讨论(0)
提交回复
热议问题