可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am curious as to why df[2]
is not supported, while df.ix[2]
and df[2:3]
both work.
In [26]: df.ix[2] Out[26]: A 1.027680 B 1.514210 C -1.466963 D -0.162339 Name: 2000-01-03 00:00:00 In [27]: df[2:3] Out[27]: A B C D 2000-01-03 1.02768 1.51421 -1.466963 -0.162339
I would expect df[2]
to work the same way as df[2:3]
to be consistent with Python indexing convention. Is there a design reason for not supporting indexing row by single integer?
回答1:
echoing @HYRY, see the new docs in 0.11
http://pandas.pydata.org/pandas-docs/stable/indexing.html
Here we have new operators, .iloc
to explicity support only integer indexing, and .loc
to explicity support only label indexing
e.g. imagine this scenario
In [1]: df = DataFrame(randn(5,2),index=range(0,10,2),columns=list('AB')) In [2]: df Out[2]: A B 0 1.068932 -0.794307 2 -0.470056 1.192211 4 -0.284561 0.756029 6 1.037563 -0.267820 8 -0.538478 -0.800654 In [5]: df.iloc[[2]] Out[5]: A B 4 -0.284561 0.756029 In [6]: df.loc[[2]] Out[6]: A B 2 -0.470056 1.192211
[]
slices the rows (by label location) only
回答2:
You can think DataFrame as a dict of Series. df[key]
try to select the column index by key
and returns a Series object.
However slicing inside of [] slices the rows, because it's a very common operation.
You can read the document for detail:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics
回答3:
To index-based access to the pandas table, one can also consider numpy.as_array option to convert the table to Numpy array as
np_df = df.as_matrix()
and then
np_df[i]
would work.
回答4:
The primary purpose of the DataFrame indexing operator, []
is to select columns.
When the indexing operator is passed a string or integer, it attempts to find a column with that particular name and return it as a Series.
So, in the question above: df[2]
searches for a column name matching the integer value 2
. This column does not exist and a KeyError
is raised.
The DataFrame indexing operator completely changes behavior to select rows when slice notation is used
Strangely, when given a slice, the DataFrame indexing operator selects rows and can do so by integer location or by index label.
df[2:3]
This will slice beginning from the row with integer location 2 up to 3, exclusive of the last element. So, just a single row. The following selects rows beginning at integer location 6 up to but not including 20 by every third row.
df[6:20:3]
You can also use slices consisting of string labels if your DataFrame index has strings in it. For more details, see this solution on .iloc vs .loc.
I almost never use this slice notation with the indexing operator as its not explicit and hardly ever used. When slicing by rows, stick with .loc/.iloc
.
回答5:
You can take a look at the source code .
DataFrame
has a private function _slice()
to slice the DataFrame
, and it allows the parameter axis
to determine which axis to slice. The __getitem__()
for DataFrame
doesn't set the axis while invoking _slice()
. So the _slice()
slice it by default axis 0.
You can take a simple experiment, that might help you:
print df._slice(slice(0, 2)) print df._slice(slice(0, 2), 0) print df._slice(slice(0, 2), 1)
回答6:
you can loop through the data frame like this .
for ad in range(1,dataframe_c.size): print(dataframe_c.values[ad])