How to determine whether a column/variable is numeric or not in Pandas/NumPy?

后端 未结 9 1982
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-01 02:47

Is there a better way to determine whether a variable in Pandas and/or NumPy is numeric or not ?

I have a self defined

相关标签:
9条回答
  • 2020-12-01 02:51

    Based on @jaime's answer in the comments, you need to check .dtype.kind for the column of interest. For example;

    >>> import pandas as pd
    >>> df = pd.DataFrame({'numeric': [1, 2, 3], 'not_numeric': ['A', 'B', 'C']})
    >>> df['numeric'].dtype.kind in 'biufc'
    >>> True
    >>> df['not_numeric'].dtype.kind in 'biufc'
    >>> False
    

    NB The meaning of biufc: b bool, i int (signed), u unsigned int, f float, c complex. See https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.kind.html#numpy.dtype.kind

    0 讨论(0)
  • 2020-12-01 02:51

    Pandas has select_dtype function. You can easily filter your columns on int64, and float64 like this:

    df.select_dtypes(include=['int64','float64'])
    
    0 讨论(0)
  • 2020-12-01 02:51

    How about just checking type for one of the values in the column? We've always had something like this:

    isinstance(x, (int, long, float, complex))
    

    When I try to check the datatypes for the columns in below dataframe, I get them as 'object' and not a numerical type I'm expecting:

    df = pd.DataFrame(columns=('time', 'test1', 'test2'))
    for i in range(20):
        df.loc[i] = [datetime.now() - timedelta(hours=i*1000),i*10,i*100]
    df.dtypes
    
    time     datetime64[ns]
    test1            object
    test2            object
    dtype: object
    

    When I do the following, it seems to give me accurate result:

    isinstance(df['test1'][len(df['test1'])-1], (int, long, float, complex))
    

    returns

    True
    
    0 讨论(0)
  • 2020-12-01 02:53

    You can check whether a given column contains numeric values or not using dtypes

    numerical_features = [feature for feature in train_df.columns if train_df[feature].dtypes != 'O']
    

    Note: "O" should be capital

    0 讨论(0)
  • 2020-12-01 02:57

    Just to add to all other answers, one can also use df.info() to get whats the data type of each column.

    0 讨论(0)
  • 2020-12-01 03:10

    You can use np.issubdtype to check if the dtype is a sub dtype of np.number. Examples:

    np.issubdtype(arr.dtype, np.number)  # where arr is a numpy array
    np.issubdtype(df['X'].dtype, np.number)  # where df['X'] is a pandas Series
    

    This works for numpy's dtypes but fails for pandas specific types like pd.Categorical as Thomas noted. If you are using categoricals is_numeric_dtype function from pandas is a better alternative than np.issubdtype.

    df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0], 
                       'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']})
    df
    Out: 
       A    B   C  D
    0  1  1.0  1j  a
    1  2  2.0  2j  b
    2  3  3.0  3j  c
    
    df.dtypes
    Out: 
    A         int64
    B       float64
    C    complex128
    D        object
    dtype: object
    

    np.issubdtype(df['A'].dtype, np.number)
    Out: True
    
    np.issubdtype(df['B'].dtype, np.number)
    Out: True
    
    np.issubdtype(df['C'].dtype, np.number)
    Out: True
    
    np.issubdtype(df['D'].dtype, np.number)
    Out: False
    

    For multiple columns you can use np.vectorize:

    is_number = np.vectorize(lambda x: np.issubdtype(x, np.number))
    is_number(df.dtypes)
    Out: array([ True,  True,  True, False], dtype=bool)
    

    And for selection, pandas now has select_dtypes:

    df.select_dtypes(include=[np.number])
    Out: 
       A    B   C
    0  1  1.0  1j
    1  2  2.0  2j
    2  3  3.0  3j
    
    0 讨论(0)
提交回复
热议问题