Create Empty Dataframe in Pandas specifying column types

前端 未结 11 1954
萌比男神i
萌比男神i 2020-11-28 08:39

I\'m trying to create an empty data frame with an index and specify the column types. The way I am doing it is the following:

df = pd.DataFrame(index=[\'pbp\         


        
11条回答
  •  爱一瞬间的悲伤
    2020-11-28 09:02

    I found this question after running into the same issue. I prefer the following solution (Python 3) for creating an empty DataFrame with no index.

    import numpy as np
    import pandas as pd
    
    def make_empty_typed_df(dtype):
        tdict = np.typeDict
        types = tuple(tdict.get(t, t) for (_, t, *__) in dtype)
        if any(t == np.void for t in types):
            raise NotImplementedError('Not Implemented for columns of type "void"')
        return pd.DataFrame.from_records(np.array([tuple(t() for t in types)], dtype=dtype)).iloc[:0, :]
    

    Testing this out ...

    from itertools import chain
    
    dtype = [('col%d' % i, t) for i, t in enumerate(chain(np.typeDict, set(np.typeDict.values())))]
    dtype = [(c, t) for (c, t) in dtype if (np.typeDict.get(t, t) != np.void) and not isinstance(t, int)]
    
    print(make_empty_typed_df(dtype))
    

    Out:

    Empty DataFrame
    
    Columns: [col0, col6, col16, col23, col24, col25, col26, col27, col29, col30, col31, col32, col33, col34, col35, col36, col37, col38, col39, col40, col41, col42, col43, col44, col45, col46, col47, col48, col49, col50, col51, col52, col53, col54, col55, col56, col57, col58, col60, col61, col62, col63, col64, col65, col66, col67, col68, col69, col70, col71, col72, col73, col74, col75, col76, col77, col78, col79, col80, col81, col82, col83, col84, col85, col86, col87, col88, col89, col90, col91, col92, col93, col95, col96, col97, col98, col99, col100, col101, col102, col103, col104, col105, col106, col107, col108, col109, col110, col111, col112, col113, col114, col115, col117, col119, col120, col121, col122, col123, col124, ...]
    Index: []
    
    [0 rows x 146 columns]
    

    And the datatypes ...

    print(make_empty_typed_df(dtype).dtypes)
    

    Out:

    col0      timedelta64[ns]
    col6               uint16
    col16              uint64
    col23                int8
    col24     timedelta64[ns]
    col25                bool
    col26           complex64
    col27               int64
    col29             float64
    col30                int8
    col31             float16
    col32              uint64
    col33               uint8
    col34              object
    col35          complex128
    col36               int64
    col37               int16
    col38               int32
    col39               int32
    col40             float16
    col41              object
    col42              uint64
    col43              object
    col44               int16
    col45              object
    col46               int64
    col47               int16
    col48              uint32
    col49              object
    col50              uint64
                   ...       
    col144              int32
    col145               bool
    col146            float64
    col147     datetime64[ns]
    col148             object
    col149             object
    col150         complex128
    col151    timedelta64[ns]
    col152              int32
    col153              uint8
    col154            float64
    col156              int64
    col157             uint32
    col158             object
    col159               int8
    col160              int32
    col161             uint64
    col162              int16
    col163             uint32
    col164             object
    col165     datetime64[ns]
    col166            float32
    col167               bool
    col168            float64
    col169         complex128
    col170            float16
    col171             object
    col172             uint16
    col173          complex64
    col174         complex128
    dtype: object
    

    Adding an index gets tricky because there isn't a true missing value for most data types so they end up getting cast to some other type with a native missing value (e.g., ints are cast to floats or objects), but if you have complete data of the types you've specified, then you can always insert rows as needed, and your types will be respected. This can be accomplished with:

    df.loc[index, :] = new_row
    

    Again, as @Hun pointed out, this NOT how Pandas is intended to be used.

提交回复
热议问题