How do you programmatically retrieve the number of columns in a pandas dataframe? I was hoping for something like:
df.num_columns
In order to include the number of row index "columns" in your total shape I would personally add together the number of columns df.columns.size with the attribute pd.Index.nlevels/pd.MultiIndex.nlevels:
Set up dummy data
import pandas as pd
flat_index = pd.Index([0, 1, 2])
multi_index = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), names=["letter", "id"])
columns = ["cat", "dog", "fish"]
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat_df = pd.DataFrame(data, index=flat_index, columns=columns)
multi_df = pd.DataFrame(data, index=multi_index, columns=columns)
# Show data
# -----------------
# 3 columns, 4 including the index
print(flat_df)
cat dog fish
id
0 1 2 3
1 4 5 6
2 7 8 9
# -----------------
# 3 columns, 5 including the index
print(multi_df)
cat dog fish
letter id
a 1 1 2 3
2 4 5 6
b 1 7 8 9
Writing our process as a function:
def total_ncols(df, include_index=False):
ncols = df.columns.size
if include_index is True:
ncols += df.index.nlevels
return ncols
print("Ignore the index:")
print(total_ncols(flat_df), total_ncols(multi_df))
print("Include the index:")
print(total_ncols(flat_df, include_index=True), total_ncols(multi_df, include_index=True))
This prints:
Ignore the index:
3 3
Include the index:
4 5
If you want to only include the number of indices if the index is a pd.MultiIndex, then you can throw in an isinstance check in the defined function.
As an alternative, you could use df.reset_index().columns.size to achieve the same result, but this won't be as performant since we're temporarily inserting new columns into the index and making a new index before getting the number of columns.