I have a DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame({\'foo.aa\': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
\'foo.fighters
Now that pandas' indexes support string operations, arguably the simplest and best way to select columns beginning with 'foo' is just:
df.loc[:, df.columns.str.startswith('foo')]
Alternatively, you can filter column (or row) labels with df.filter(). To specify a regular expression to match the names beginning with foo.
:
>>> df.filter(regex=r'^foo\.', axis=1)
foo.aa foo.bars foo.fighters foo.fox foo.manchu
0 1.0 0 0 2 NA
1 2.1 0 1 4 0
2 NaN 0 NaN 1 0
3 4.7 0 0 0 0
4 5.6 0 0 0 0
5 6.8 1 0 5 0
To select only the required rows (containing a 1
) and the columns, you can use loc
, selecting the columns using filter
(or any other method) and the rows using any
:
>>> df.loc[(df == 1).any(axis=1), df.filter(regex=r'^foo\.', axis=1).columns]
foo.aa foo.bars foo.fighters foo.fox foo.manchu
0 1.0 0 0 2 NA
1 2.1 0 1 4 0
2 NaN 0 NaN 1 0
5 6.8 1 0 5 0