Mapping rows of a Pandas dataframe to numpy array

柔情痞子 提交于 2021-02-08 08:00:46

问题


Sorry, I know there are so many questions relating to indexing, and it's probably starring me in the face, but I'm having a little trouble with this. I am familiar with .loc, .iloc, and .index methods and slicing in general. The method .reset_index may not have been (and may not be able to be) called on our dataframe and therefore index lables may not be in order. The dataframe and numpy array(s) are actually different length subsets of the dataframe, but for this example I'll keep them the same size (I can handle offsetting once I have an example).

Here is a picture that show's what I'm looking for:

I can pull cols of rows from the dataframe based on some search criteria.

idxlbls = df.index[df['timestamp'] == dt]
stuff = df.loc[idxlbls, 'col3':'col5']

But how do I map that to row number (array indices, not label indices) to be used as an array index in numpy (assuming same row length)?

stuffprime = array[?, ?]

The reason I need it is because the dataframe is much larger and more complete and contains the column searching criteria, but the numpy arrays are subsets that have been extracted and modified prior in the pipeline (and do not have the same searching criteria in them). I need to search the dataframe and pull the equivalent data from the numpy arrays. Basically I need to correlate specific rows from a dataframe to the corresponding rows of a numpy array.


回答1:


I believe need get_indexer for positions by filtered columns names, for index is possible use same way or numpy.where for positions by boolean mask:

df = pd.DataFrame({'timestamp':list('abadef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4]}, index=list('ABCDEF'))

print (df)
  timestamp  B  C  D  E
A         a  4  7  1  5
B         b  5  8  3  3
C         a  4  9  5  6
D         d  5  4  7  9
E         e  5  2  1  2
F         f  4  3  0  4

idxlbls = df.index[df['timestamp'] == 'a']
stuff = df.loc[idxlbls, 'C':'E']
print (stuff)
   C  D  E
A  7  1  5
C  9  5  6

a = df.index.get_indexer(stuff.index)

Or get positions by boolean mask:

a = np.where(df['timestamp'] == 'a')[0]

print (a)
[0 2]

b = df.columns.get_indexer(stuff.columns)
print (b)
[2 3 4]



回答2:


I would map pandas indices to numpy indicies:

keys_dict = dict(zip(idxlbls, range(len(idxlbls))))

Then you may use the dictionary keys_dict to address the array elements by a pandas index: array[keys_dict[some_df_index], :]



来源:https://stackoverflow.com/questions/51468593/mapping-rows-of-a-pandas-dataframe-to-numpy-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!