How do I retrieve the number of columns in a Pandas data frame?

后端 未结 9 2233
心在旅途
心在旅途 2021-01-29 18:06

How do you programmatically retrieve the number of columns in a pandas dataframe? I was hoping for something like:

df.num_columns
9条回答
  •  渐次进展
    2021-01-29 18:54

    In order to include the number of row index "columns" in your total shape I would personally add together the number of columns df.columns.size with the attribute pd.Index.nlevels/pd.MultiIndex.nlevels:

    Set up dummy data

    import pandas as pd
    
    flat_index = pd.Index([0, 1, 2])
    multi_index = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), names=["letter", "id"])
    
    columns = ["cat", "dog", "fish"]
    
    data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    flat_df = pd.DataFrame(data, index=flat_index, columns=columns)
    multi_df = pd.DataFrame(data, index=multi_index, columns=columns)
    
    # Show data
    # -----------------
    # 3 columns, 4 including the index
    print(flat_df)
        cat  dog  fish
    id                
    0     1    2     3
    1     4    5     6
    2     7    8     9
    
    # -----------------
    # 3 columns, 5 including the index
    print(multi_df)
               cat  dog  fish
    letter id                
    a      1     1    2     3
           2     4    5     6
    b      1     7    8     9
    

    Writing our process as a function:

    def total_ncols(df, include_index=False):
        ncols = df.columns.size
        if include_index is True:
            ncols += df.index.nlevels
        return ncols
    
    print("Ignore the index:")
    print(total_ncols(flat_df), total_ncols(multi_df))
    
    print("Include the index:")
    print(total_ncols(flat_df, include_index=True), total_ncols(multi_df, include_index=True))
    

    This prints:

    Ignore the index:
    3 3
    
    Include the index:
    4 5
    

    If you want to only include the number of indices if the index is a pd.MultiIndex, then you can throw in an isinstance check in the defined function.

    As an alternative, you could use df.reset_index().columns.size to achieve the same result, but this won't be as performant since we're temporarily inserting new columns into the index and making a new index before getting the number of columns.

提交回复
热议问题