Convert Select Columns in Pandas Dataframe to Numpy Array

前端 未结 6 467
南方客
南方客 2020-12-13 02:05

I would like to convert everything but the first column of a pandas dataframe into a numpy array. For some reason using the columns= parameter of DataFram

相关标签:
6条回答
  • 2020-12-13 02:07

    The columns parameter accepts a collection of column names. You're passing a list containing a dataframe with two rows:

    >>> [df[1:]]
    [  viz  a1_count  a1_mean  a1_std
    1   n         0      NaN     NaN
    2   n         2       51      50]
    >>> df.as_matrix(columns=[df[1:]])
    array([[ nan,  nan],
           [ nan,  nan],
           [ nan,  nan]])
    

    Instead, pass the column names you want:

    >>> df.columns[1:]
    Index(['a1_count', 'a1_mean', 'a1_std'], dtype='object')
    >>> df.as_matrix(columns=df.columns[1:])
    array([[  3.      ,   2.      ,   0.816497],
           [  0.      ,        nan,        nan],
           [  2.      ,  51.      ,  50.      ]])
    
    0 讨论(0)
  • 2020-12-13 02:21

    The best way for converting to Numpy Array is using '.to_numpy(self, dtype=None, copy=False)'. It is new in version 0.24.0.Refrence

    You can also use '.array'.Refrence

    Pandas .as_matrix deprecated since version 0.23.0.

    0 讨论(0)
  • 2020-12-13 02:21

    The fastest and easiest way is to use .as_matrix(). One short line:

    df.iloc[:,[1,2,3]].as_matrix()
    

    Gives:

    array([[3, 2, 0.816497],
       [0, 'NaN', 'NaN'],
       [2, 51, 50.0]], dtype=object)
    

    By using indices of the columns, you can use this code for any dataframe with different column names.

    Here are the steps for your example:

    import pandas as pd
    columns = ['viz', 'a1_count', 'a1_mean', 'a1_std']
    index = [0,1,2]
    vals = {'viz': ['n','n','n'], 'a1_count': [3,0,2], 'a1_mean': [2,'NaN', 51], 'a1_std': [0.816497, 'NaN', 50.000000]}
    df = pd.DataFrame(vals, columns=columns, index=index)
    

    Gives:

       viz  a1_count a1_mean    a1_std
    0   n         3       2  0.816497
    1   n         0     NaN       NaN
    2   n         2      51        50
    

    Then:

    x1 = df.iloc[:,[1,2,3]].as_matrix()
    

    Gives:

    array([[3, 2, 0.816497],
       [0, 'NaN', 'NaN'],
       [2, 51, 50.0]], dtype=object)
    

    Where x1 is numpy.ndarray.

    0 讨论(0)
  • 2020-12-13 02:27

    the easy way is the "values" property df.iloc[:,1:].values

    a=df.iloc[:,1:]
    b=df.iloc[:,1:].values
    
    print(type(df))
    print(type(a))
    print(type(b))
    

    so, you can get type

    <class 'pandas.core.frame.DataFrame'>
    <class 'pandas.core.frame.DataFrame'>
    <class 'numpy.ndarray'>
    
    0 讨论(0)
  • 2020-12-13 02:27

    Hope this easy one liner helps:

    cols_as_np = df[df.columns[1:]].to_numpy()
    
    0 讨论(0)
  • 2020-12-13 02:29

    Please use the Pandas to_numpy() method. Below is an example--

    >>> import pandas as pd
    >>> df = pd.DataFrame({"A":[1, 2], "B":[3, 4], "C":[5, 6]})
    >>> df 
        A  B  C
     0  1  3  5
     1  2  4  6
    >>> s_array = df[["A", "B", "C"]].to_numpy()
    >>> s_array
    
    array([[1, 3, 5],
       [2, 4, 6]]) 
    
    >>> t_array = df[["B", "C"]].to_numpy() 
    >>> print (t_array)
    
    [[3 5]
     [4 6]]
    

    Hope this helps. You can select any number of columns using

    columns = ['col1', 'col2', 'col3']
    df1 = df[columns]
    

    Then apply to_numpy() method.

    0 讨论(0)
提交回复
热议问题