Add numpy array as column to Pandas data frame

前端 未结 5 1341
离开以前
离开以前 2020-12-02 16:30

I have a Pandas data frame object of shape (X,Y) that looks like this:

[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]

and a numpy sparse matrix (CSC) of

相关标签:
5条回答
  • 2020-12-02 16:55
    df = pd.DataFrame(np.arange(1,10).reshape(3,3))
    df['newcol'] = pd.Series(your_2d_numpy_array)
    
    0 讨论(0)
  • 2020-12-02 16:57

    Consider using a higher dimensional datastructure (a Panel), rather than storing an array in your column:

    In [11]: p = pd.Panel({'df': df, 'csc': csc})
    
    In [12]: p.df
    Out[12]: 
       0  1  2
    0  1  2  3
    1  4  5  6
    2  7  8  9
    
    In [13]: p.csc
    Out[13]: 
       0  1  2
    0  0  1  0
    1  0  0  1
    2  1  0  0
    

    Look at cross-sections etc, etc, etc.

    In [14]: p.xs(0)
    Out[14]: 
       csc  df
    0    0   1
    1    1   2
    2    0   3
    

    See the docs for more on Panels.

    0 讨论(0)
  • 2020-12-02 17:06

    Here is other example:

    import numpy as np
    import pandas as pd
    
    """ This just creates a list of touples, and each element of the touple is an array"""
    a = [ (np.random.randint(1,10,10), np.array([0,1,2,3,4,5,6,7,8,9]))  for i in 
    range(0,10) ]
    
    """ Panda DataFrame will allocate each of the arrays , contained as a touple 
    element , as column"""
    df = pd.DataFrame(data =a,columns=['random_num','sequential_num'])
    

    The secret in general is to allocate the data in the form a = [ (array_11, array_12,...,array_1n),...,(array_m1,array_m2,...,array_mn) ] and panda DataFrame will order the data in n columns of arrays. Of course , arrays of arrays could be used instead of touples, in that case the form would be : a = [ [array_11, array_12,...,array_1n],...,[array_m1,array_m2,...,array_mn] ]

    This is the output if you print(df) from the code above:

                           random_num                  sequential_num
    0  [7, 9, 2, 2, 5, 3, 5, 3, 1, 4]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    1  [8, 7, 9, 8, 1, 2, 2, 6, 6, 3]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    2  [3, 4, 1, 2, 2, 1, 4, 2, 6, 1]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    3  [3, 1, 1, 1, 6, 2, 8, 6, 7, 9]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    4  [4, 2, 8, 5, 4, 1, 2, 2, 3, 3]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    5  [3, 2, 7, 4, 1, 5, 1, 4, 6, 3]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    6  [5, 7, 3, 9, 7, 8, 4, 1, 3, 1]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    7  [7, 4, 7, 6, 2, 6, 3, 2, 5, 6]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    8  [3, 1, 6, 3, 2, 1, 5, 2, 2, 9]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    9  [7, 2, 3, 9, 5, 5, 8, 6, 9, 8]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    

    Other variation of the example above:

    b = [ (i,"text",[14, 5,], np.array([0,1,2,3,4,5,6,7,8,9]))  for i in 
    range(0,10) ]
    df = pd.DataFrame(data=b,columns=['Number','Text','2Elemnt_array','10Element_array'])
    

    Output of df:

       Number  Text 2Elemnt_array                 10Element_array
    0       0  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    1       1  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    2       2  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    3       3  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    4       4  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    5       5  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    6       6  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    7       7  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    8       8  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    9       9  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    

    If you want to add other columns of arrays, then:

    df['3Element_array']=[([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3]),([1,2,3])]
    

    The final output of df will be:

       Number  Text 2Elemnt_array                 10Element_array 3Element_array
    0       0  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]      [1, 2, 3]
    1       1  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]      [1, 2, 3]
    2       2  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]      [1, 2, 3]
    3       3  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]      [1, 2, 3]
    4       4  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]      [1, 2, 3]
    5       5  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]      [1, 2, 3]
    6       6  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]      [1, 2, 3]
    7       7  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]      [1, 2, 3]
    8       8  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]      [1, 2, 3]
    9       9  text       [14, 5]  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]      [1, 2, 3]
    
    0 讨论(0)
  • 2020-12-02 17:06

    For normal numpy arrays, to add and retrieve from dataframe, you can do this. It builds on the previous answer that confused me because of the sparse part when I just had a normal numpy array.

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({'b':range(10)}) # target dataframe
    a = np.random.normal(size=(10,2)) # numpy array
    df['a']=a.tolist() # save array
    np.array(df['a'].tolist()) # retrieve array
    
    0 讨论(0)
  • 2020-12-02 17:11
    import numpy as np
    import pandas as pd
    import scipy.sparse as sparse
    
    df = pd.DataFrame(np.arange(1,10).reshape(3,3))
    arr = sparse.coo_matrix(([1,1,1], ([0,1,2], [1,2,0])), shape=(3,3))
    df['newcol'] = arr.toarray().tolist()
    print(df)
    

    yields

       0  1  2     newcol
    0  1  2  3  [0, 1, 0]
    1  4  5  6  [0, 0, 1]
    2  7  8  9  [1, 0, 0]
    
    0 讨论(0)
提交回复
热议问题