get list of pandas dataframe columns based on data type

前端 未结 12 1369
后悔当初
后悔当初 2020-12-07 07:37

If I have a dataframe with the following columns:

1. NAME                                     object
2. On_Time                                      object
         


        
相关标签:
12条回答
  • 2020-12-07 07:57

    If after 6 years you still have the issue, this should solve it :)

    cols = [c for c in df.columns if df[c].dtype in ['object', 'datetime64[ns]']]
    
    0 讨论(0)
  • 2020-12-07 08:00

    I came up with this three liner.

    Essentially, here's what it does:

    1. Fetch the column names and their respective data types.
    2. I am optionally outputting it to a csv.

    inp = pd.read_csv('filename.csv') # read input. Add read_csv arguments as needed
    columns = pd.DataFrame({'column_names': inp.columns, 'datatypes': inp.dtypes})
    columns.to_csv(inp+'columns_list.csv', encoding='utf-8') # encoding is optional
    

    This made my life much easier in trying to generate schemas on the fly. Hope this helps

    0 讨论(0)
  • 2020-12-07 08:03

    If you want a list of only the object columns you could do:

    non_numerics = [x for x in df.columns \
                    if not (df[x].dtype == np.float64 \
                            or df[x].dtype == np.int64)]
    

    and then if you want to get another list of only the numerics:

    numerics = [x for x in df.columns if x not in non_numerics]
    
    0 讨论(0)
  • 2020-12-07 08:03

    I use infer_objects()

    Docstring: Attempt to infer better dtypes for object columns.

    Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

    df.infer_objects().dtypes

    0 讨论(0)
  • 2020-12-07 08:04

    Using dtype will give you desired column's data type:

    dataframe['column1'].dtype
    

    if you want to know data types of all the column at once, you can use plural of dtype as dtypes:

    dataframe.dtypes
    
    0 讨论(0)
  • 2020-12-07 08:06

    You can use boolean mask on the dtypes attribute:

    In [11]: df = pd.DataFrame([[1, 2.3456, 'c']])
    
    In [12]: df.dtypes
    Out[12]: 
    0      int64
    1    float64
    2     object
    dtype: object
    
    In [13]: msk = df.dtypes == np.float64  # or object, etc.
    
    In [14]: msk
    Out[14]: 
    0    False
    1     True
    2    False
    dtype: bool
    

    You can look at just those columns with the desired dtype:

    In [15]: df.loc[:, msk]
    Out[15]: 
            1
    0  2.3456
    

    Now you can use round (or whatever) and assign it back:

    In [16]: np.round(df.loc[:, msk], 2)
    Out[16]: 
          1
    0  2.35
    
    In [17]: df.loc[:, msk] = np.round(df.loc[:, msk], 2)
    
    In [18]: df
    Out[18]: 
       0     1  2
    0  1  2.35  c
    
    0 讨论(0)
提交回复
热议问题