How do I select an element in array column of a data frame?

后端 未结 2 1326
囚心锁ツ
囚心锁ツ 2020-12-06 19:07

I have the following data frame:

pa=pd.DataFrame({\'a\':np.array([[1.,4.],[2.],[3.,4.,5.]])})

I want to select the column \'a\' and then on

相关标签:
2条回答
  • 2020-12-06 19:30

    Storing lists as values in a Pandas DataFrame tends to be a mistake because it prevents you from taking advantage of fast NumPy or Pandas vectorized operations.

    Therefore, you might be better off converting your DataFrame of lists of numbers into a wider DataFrame with native NumPy dtypes:

    import numpy as np
    import pandas as pd
    
    pa = pd.DataFrame({'a':np.array([[1.,4.],[2.],[3.,4.,5.]])})
    df = pd.DataFrame(pa['a'].values.tolist())
    #      0    1    2
    # 0  1.0  4.0  NaN
    # 1  2.0  NaN  NaN
    # 2  3.0  4.0  5.0
    

    Now, you could select the first column like this:

    In [36]: df.iloc[:, 0]
    Out[36]: 
    0    1.0
    1    2.0
    2    3.0
    Name: 0, dtype: float64
    

    or the first row like this:

    In [37]: df.iloc[0, :]
    Out[37]: 
    0    1.0
    1    4.0
    2    NaN
    Name: 0, dtype: float64
    

    If you wish to drop NaNs, use .dropna():

    In [38]: df.iloc[0, :].dropna()
    Out[38]: 
    0    1.0
    1    4.0
    Name: 0, dtype: float64
    

    and .tolist() to retrieve the values as a list:

    In [39]: df.iloc[0, :].dropna().tolist()
    Out[39]: [1.0, 4.0]
    

    but if you wish to leverage NumPy/Pandas for speed, you'll want to express your calculation as vectorized operations on df itself without converting back to Python lists.

    0 讨论(0)
  • 2020-12-06 19:36

    pa.loc[row] selects the row with label row.

    pa.loc[row, col] selects the cells which are the instersection of row and col

    pa.loc[:, col] selects all rows and the column named col. Note that although this works it is not the idiomatic way to refer to a column of a dataframe. For that you should use pa['a']

    Now you have lists in the cells of your column so you can use the vectorized string methods to access the elements of those lists like so.

    pa['a'].str[0] #first value in lists
    pa['a'].str[-1] #last value in lists
    
    0 讨论(0)
提交回复
热议问题