问题
Suppose I want to select a range of columns from a dataframe: Call them 'column_1' through 'column_60'. I know I could use loc like this:
df.loc[:, 'column_1':'column_60']
That will give me all rows in columns 1-60.
But what if I wanted that range of columns plus 'column_81'. This doesn't work:
df.loc[:, 'column_1':'column_60', 'column_81']
It throws a "Too many indexers" error. Is there another way to state this using loc? Or is loc even the best function to use in this case?
Many thanks.
回答1:
How about
df.loc[:, [f'column_{i}' for i in range(1, 61)] + ['column_81']]
or
df.reindex([f'column_{i}' for i in range(1, 61)] + ['column_81'], axis=1)
if you want to fill missing columns, if there are, with default NaN
values.
回答2:
You can use pandas.concat():
pd.concat([df.loc[:,'column_1':'columns_60'],df.loc[:,'column_81']],axis=1)
回答3:
You can use numpy.r_
to combine ranges with scalars. The only complication is you need to use pd.DataFrame.iloc
instead, but this can be facilitated via df.columns.get_loc
.
Here's a demo:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['column'+str(i) for i in range(1, 82)])
colidx = df.columns.get_loc
res = df.iloc[:, np.r_[colidx('column1'):colidx('column5'), colidx('column80')]]
print(res.columns)
Index(['column1', 'column2', 'column3', 'column4', 'column80'], dtype='object')
回答4:
You can use numpy concatenate funciton. Assuming you know the order of columns you can use:
df.loc[:,df.columns[np.concatenate([np.arange(1,60),np.array(81)],axis=None)]]
This gives you columns 1:60 plus column 81 from your data frame.
来源:https://stackoverflow.com/questions/50647832/can-you-use-loc-to-select-a-range-of-columns-plus-a-column-outside-of-the-range