Selecting a row of pandas series/dataframe by integer index

匿名 (未验证) 提交于 2019-12-03 02:31:01

问题:

I am curious as to why df[2] is not supported, while df.ix[2] and df[2:3] both work.

In [26]: df.ix[2] Out[26]:  A    1.027680 B    1.514210 C   -1.466963 D   -0.162339 Name: 2000-01-03 00:00:00  In [27]: df[2:3] Out[27]:                    A        B         C         D 2000-01-03  1.02768  1.51421 -1.466963 -0.162339 

I would expect df[2] to work the same way as df[2:3] to be consistent with Python indexing convention. Is there a design reason for not supporting indexing row by single integer?

回答1:

echoing @HYRY, see the new docs in 0.11

http://pandas.pydata.org/pandas-docs/stable/indexing.html

Here we have new operators, .iloc to explicity support only integer indexing, and .loc to explicity support only label indexing

e.g. imagine this scenario

In [1]: df = DataFrame(randn(5,2),index=range(0,10,2),columns=list('AB'))  In [2]: df Out[2]:            A         B 0  1.068932 -0.794307 2 -0.470056  1.192211 4 -0.284561  0.756029 6  1.037563 -0.267820 8 -0.538478 -0.800654  In [5]: df.iloc[[2]] Out[5]:            A         B 4 -0.284561  0.756029  In [6]: df.loc[[2]] Out[6]:            A         B 2 -0.470056  1.192211 

[] slices the rows (by label location) only



回答2:

You can think DataFrame as a dict of Series. df[key] try to select the column index by key and returns a Series object.

However slicing inside of [] slices the rows, because it's a very common operation.

You can read the document for detail:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics



回答3:

To index-based access to the pandas table, one can also consider numpy.as_array option to convert the table to Numpy array as

np_df = df.as_matrix() 

and then

np_df[i]  

would work.



回答4:

The primary purpose of the DataFrame indexing operator, [] is to select columns.

When the indexing operator is passed a string or integer, it attempts to find a column with that particular name and return it as a Series.

So, in the question above: df[2] searches for a column name matching the integer value 2. This column does not exist and a KeyError is raised.


The DataFrame indexing operator completely changes behavior to select rows when slice notation is used

Strangely, when given a slice, the DataFrame indexing operator selects rows and can do so by integer location or by index label.

df[2:3] 

This will slice beginning from the row with integer location 2 up to 3, exclusive of the last element. So, just a single row. The following selects rows beginning at integer location 6 up to but not including 20 by every third row.

df[6:20:3] 

You can also use slices consisting of string labels if your DataFrame index has strings in it. For more details, see this solution on .iloc vs .loc.

I almost never use this slice notation with the indexing operator as its not explicit and hardly ever used. When slicing by rows, stick with .loc/.iloc.



回答5:

You can take a look at the source code .

DataFrame has a private function _slice() to slice the DataFrame, and it allows the parameter axis to determine which axis to slice. The __getitem__() for DataFrame doesn't set the axis while invoking _slice(). So the _slice() slice it by default axis 0.

You can take a simple experiment, that might help you:

print df._slice(slice(0, 2)) print df._slice(slice(0, 2), 0) print df._slice(slice(0, 2), 1) 


回答6:

you can loop through the data frame like this .

for ad in range(1,dataframe_c.size):     print(dataframe_c.values[ad]) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!