Selecting last n columns and excluding last n columns in dataframe

前端 未结 3 490
深忆病人
深忆病人 2020-12-05 14:01

How do I:

  1. Select last 3 columns in a dataframe and create a new dataframe?

I tried:

y = dataframe.iloc[:,-3:]
相关标签:
3条回答
  • 2020-12-05 14:37

    just do:

    y = dataframe[dataframe.columns[-3:]]
    

    This slices the columns so you can sub-select from the df

    Example:

    In [221]:
    df = pd.DataFrame(columns=np.arange(10))
    df[df.columns[-3:]]
    
    Out[221]:
    Empty DataFrame
    Columns: [7, 8, 9]
    Index: []
    

    I think the issue here is that because you have taken a slice of the df, it's returned a view but depending on what the rest of your code is doing it's raising a warning. You can make an explicit copy by calling .copy() to remove the warnings.

    So if we take a copy then assignment only affects the copy and not the original df:

    In [15]:
    df = pd.DataFrame(np.random.randn(5,10), columns= np.arange(10))
    df
    
    Out[15]:
              0         1         2         3         4         5         6  \
    0  0.568284 -1.488447  0.970365 -1.406463 -0.413750 -0.934892 -1.421308   
    1  1.186414 -0.417366 -1.007509 -1.620530 -1.322004  0.294540  1.205115   
    2 -1.073894 -0.214972  1.516563 -0.705571  0.068666  1.690654 -0.252485   
    3  0.923524 -0.856752  0.226294 -0.660085  1.259145  0.400596  0.559028   
    4  0.259807  0.135300  1.130347 -0.317305 -1.031875  0.232262  0.709244   
    
              7         8         9  
    0  1.741925 -0.475619 -0.525770  
    1  2.137546  0.215665  1.908362  
    2  1.180281 -0.144652  0.870887  
    3 -0.609804 -0.833186 -1.033656  
    4  0.480943  1.971933  1.928037  
    
    In [16]:    
    y = df[df.columns[-3:]].copy()
    y
    
    Out[16]:
              7         8         9
    0  1.741925 -0.475619 -0.525770
    1  2.137546  0.215665  1.908362
    2  1.180281 -0.144652  0.870887
    3 -0.609804 -0.833186 -1.033656
    4  0.480943  1.971933  1.928037
    
    In [17]:    
    y[y>0] = 0
    print(y)
    df
    
              7         8         9
    0  0.000000 -0.475619 -0.525770
    1  0.000000  0.000000  0.000000
    2  0.000000 -0.144652  0.000000
    3 -0.609804 -0.833186 -1.033656
    4  0.000000  0.000000  0.000000
    Out[17]:
              0         1         2         3         4         5         6  \
    0  0.568284 -1.488447  0.970365 -1.406463 -0.413750 -0.934892 -1.421308   
    1  1.186414 -0.417366 -1.007509 -1.620530 -1.322004  0.294540  1.205115   
    2 -1.073894 -0.214972  1.516563 -0.705571  0.068666  1.690654 -0.252485   
    3  0.923524 -0.856752  0.226294 -0.660085  1.259145  0.400596  0.559028   
    4  0.259807  0.135300  1.130347 -0.317305 -1.031875  0.232262  0.709244   
    
              7         8         9  
    0  1.741925 -0.475619 -0.525770  
    1  2.137546  0.215665  1.908362  
    2  1.180281 -0.144652  0.870887  
    3 -0.609804 -0.833186 -1.033656  
    4  0.480943  1.971933  1.928037  
    

    Here no warning is raised and the original df is untouched.

    0 讨论(0)
  • 2020-12-05 14:44

    This is because of using integer indices (ix selects those by label over -3 rather than position, and this is by design: see integer indexing in pandas "gotchas"*).

    *In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label:

    df.iloc[-3:] see the docs.

    As Wes points out, in this specific case you should just use tail!

    It should also be noted that in Pandas pre-0.14 iloc will raise an IndexError on an out-of-bounds access, while .head() and .tail() will not:

    pd.version '0.12.0' df = pd.DataFrame([{"a": 1}, {"a": 2}]) df.iloc[-5:] ... IndexError: out-of-bounds on slice (end) df.tail(5) a 0 1 1 2 Old answer (depreciated method):

    You can use the irows DataFrame method to overcome this ambiguity:

    In [11]: df1.irow(slice(-3, None)) Out[11]: STK_ID RPT_Date TClose sales discount 8 568 20080331 38.75 12.668 NaN 9 568 20080630 30.09 21.102 NaN 10 568 20080930 26.00 30.769 NaN Note: Series has a similar iget method.

    0 讨论(0)
  • 2020-12-05 14:46

    The most efficient way:

    1. Select last n columns

    df1 = df.iloc[:,-n:]

    2. Exclude last n columns

    df1 = df.iloc[:,:-n]

    0 讨论(0)
提交回复
热议问题