Dataframe into numpy array with values comma seperated

后端 未结 3 723
-上瘾入骨i
-上瘾入骨i 2021-01-14 21:53

The Scenario

I\'ve read a csv (which is \\t seperated) into a Dataframe, which is now needed to be in a numpy array format for clustering without changing type

3条回答
  •  温柔的废话
    2021-01-14 22:56

    Use label-based selection and the .values attribute of the resulting pandas objects, which will be some sort of numpy array:

    >>> df
       uid  iid  rat
    0  196  242  3.0
    1  186  302  3.0
    2   22  377  1.0
    >>> df.loc[:,['iid','rat']]
       iid  rat
    0  242  3.0
    1  302  3.0
    2  377  1.0
    >>> df.loc[:,['iid','rat']].values
    array([[ 242.,    3.],
           [ 302.,    3.],
           [ 377.,    1.]])
    

    Note, your integer column will get promoted to float.

    Also note, this particular selection could be approached in different ways:

    >>> df.iloc[:, 1:] # integer-position based
       iid  rat
    0  242  3.0
    1  302  3.0
    2  377  1.0
    >>> df[['iid','rat']] # plain indexing performs column-based selection
       iid  rat
    0  242  3.0
    1  302  3.0
    2  377  1.0
    

    I like label-based because it is more explicit.

    Edit

    The reason you aren't seeing commas is an artifact of how numpy arrays are printed:

    >>> df[['iid','rat']].values
    array([[ 242.,    3.],
           [ 302.,    3.],
           [ 377.,    1.]])
    >>> print(df[['iid','rat']].values)
    [[ 242.    3.]
     [ 302.    3.]
     [ 377.    1.]]
    

    And actually, it is the difference between the str and repr results of the numpy array:

    >>> print(repr(df[['iid','rat']].values))
    array([[ 242.,    3.],
           [ 302.,    3.],
           [ 377.,    1.]])
    >>> print(str(df[['iid','rat']].values))
    [[ 242.    3.]
     [ 302.    3.]
     [ 377.    1.]]
    

提交回复
热议问题