Convert pandas dataframe to NumPy array

前端 未结 15 2591
别那么骄傲
别那么骄傲 2020-11-21 23:57

I am interested in knowing how to convert a pandas dataframe into a NumPy array.

dataframe:

import numpy as np
import pandas as pd

index = [1, 2, 3,         


        
15条回答
  •  醉梦人生
    2020-11-22 00:46

    Here is my approach to making a structure array from a pandas DataFrame.

    Create the data frame

    import pandas as pd
    import numpy as np
    import six
    
    NaN = float('nan')
    ID = [1, 2, 3, 4, 5, 6, 7]
    A = [NaN, NaN, NaN, 0.1, 0.1, 0.1, 0.1]
    B = [0.2, NaN, 0.2, 0.2, 0.2, NaN, NaN]
    C = [NaN, 0.5, 0.5, NaN, 0.5, 0.5, NaN]
    columns = {'A':A, 'B':B, 'C':C}
    df = pd.DataFrame(columns, index=ID)
    df.index.name = 'ID'
    print(df)
    
          A    B    C
    ID               
    1   NaN  0.2  NaN
    2   NaN  NaN  0.5
    3   NaN  0.2  0.5
    4   0.1  0.2  NaN
    5   0.1  0.2  0.5
    6   0.1  NaN  0.5
    7   0.1  NaN  NaN
    

    Define function to make a numpy structure array (not a record array) from a pandas DataFrame.

    def df_to_sarray(df):
        """
        Convert a pandas DataFrame object to a numpy structured array.
        This is functionally equivalent to but more efficient than
        np.array(df.to_array())
    
        :param df: the data frame to convert
        :return: a numpy structured array representation of df
        """
    
        v = df.values
        cols = df.columns
    
        if six.PY2:  # python 2 needs .encode() but 3 does not
            types = [(cols[i].encode(), df[k].dtype.type) for (i, k) in enumerate(cols)]
        else:
            types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]
        dtype = np.dtype(types)
        z = np.zeros(v.shape[0], dtype)
        for (i, k) in enumerate(z.dtype.names):
            z[k] = v[:, i]
        return z
    

    Use reset_index to make a new data frame that includes the index as part of its data. Convert that data frame to a structure array.

    sa = df_to_sarray(df.reset_index())
    sa
    
    array([(1L, nan, 0.2, nan), (2L, nan, nan, 0.5), (3L, nan, 0.2, 0.5),
           (4L, 0.1, 0.2, nan), (5L, 0.1, 0.2, 0.5), (6L, 0.1, nan, 0.5),
           (7L, 0.1, nan, nan)], 
          dtype=[('ID', '

    EDIT: Updated df_to_sarray to avoid error calling .encode() with python 3. Thanks to Joseph Garvin and halcyon for their comment and solution.

提交回复
热议问题