Create Empty Dataframe in Pandas specifying column types

前端 未结 11 1939
萌比男神i
萌比男神i 2020-11-28 08:39

I\'m trying to create an empty data frame with an index and specify the column types. The way I am doing it is the following:

df = pd.DataFrame(index=[\'pbp\         


        
相关标签:
11条回答
  • 2020-11-28 09:06

    Create Empty Dataframe in Pandas specifying column types

    I think this is perfect!!

    import pandas as pd
    
    c1 = pd.Series(data=None, dtype='string', name='c1')
    c2 = pd.Series(data=None, dtype='bool', name='c2')
    c3 = pd.Series(data=None, dtype='float', name='c3')
    c4 = pd.Series(data=None, dtype='int', name='c4')
    
    df = pd.concat([c1, c2, c3, c4], axis=1)
    
    df.info('verbose')
    

    We create columns as Series and give them the correct dtype, then we concat de Series into a DataFrame, and that's it

    We have the DataFrame constructor with dtypes!

    <class 'pandas.core.frame.DataFrame'>
    Index: 0 entries
    Data columns (total 4 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   c1      0 non-null      string 
     1   c2      0 non-null      bool   
     2   c3      0 non-null      float64
     3   c4      0 non-null      int32  
    dtypes: bool(1), float64(1), int32(1), string(1)
    memory usage: 0.0+ bytes
    
    0 讨论(0)
  • 2020-11-28 09:07

    You can do this by passing a dictionary into the DataFrame constructor:

    df = pd.DataFrame(index=['pbp'],
                      data={'contract' : np.full(1, "", dtype=str),
                            'starting_membership' : np.full(1, np.nan, dtype=float),
                            'projected_membership' : np.full(1, np.nan, dtype=int)
                           }
                     )
    

    This will correctly give you a dataframe that looks like:

         contract  projected_membership   starting_membership
    pbp     ""             NaN           -9223372036854775808
    

    With dtypes:

    contract                 object
    projected_membership    float64
    starting_membership       int64
    

    That said, there are two things to note:

    1) str isn't actually a type that a DataFrame column can handle; instead it falls back to the general case object. It'll still work properly.

    2) Why don't you see NaN under starting_membership? Well, NaN is only defined for floats; there is no "None" value for integers, so it casts np.NaN to an integer. If you want a different default value, you can change that in the np.full call.

    0 讨论(0)
  • 2020-11-28 09:08

    You can do it like this

    import numpy
    import pandas
    
    dtypes = numpy.dtype([
              ('a', str),
              ('b', int),
              ('c', float),
              ('d', numpy.datetime64),
              ])
    data = numpy.empty(0, dtype=dtypes)
    df = pandas.DataFrame(data)
    
    0 讨论(0)
  • 2020-11-28 09:11

    You can use the following:

    df = pd.DataFrame({'a': pd.Series([], dtype='int'),
                       'b': pd.Series([], dtype='str'),
                       'c': pd.Series([], dtype='float')})
    

    then if you call df you have

    >>> df 
    Empty DataFrame 
    Columns: [a, b, c]
    Index: []
    

    and if you check its types

    >>> df.dtypes
    a      int32
    b     object
    c    float64
    dtype: object
    
    0 讨论(0)
  • I found the easiest workaround for me was to simply concatenate a list of empty series for each individual column:

    import pandas as pd
    
    columns = ['contract',
               'state_and_county_code',
               'state',
               'county',
               'starting_membership',
               'starting_raw_raf',
               'enrollment_trend',
               'projected_membership',
               'projected_raf']
    dtype = ['str', 'str', 'str', 'str', 'int', 'float', 'float', 'int', 'float']
    df = pd.concat([pd.Series(name=col, dtype=dt) for col, dt in zip(columns, dtype)], axis=1)
    df.info()
    # <class 'pandas.core.frame.DataFrame'>
    # Index: 0 entries
    # Data columns (total 9 columns):
    # contract                 0 non-null object
    # state_and_county_code    0 non-null object
    # state                    0 non-null object
    # county                   0 non-null object
    # starting_membership      0 non-null int32
    # starting_raw_raf         0 non-null float64
    # enrollment_trend         0 non-null float64
    # projected_membership     0 non-null int32
    # projected_raf            0 non-null float64
    # dtypes: float64(3), int32(2), object(4)
    # memory usage: 0.0+ bytes
    
    0 讨论(0)
提交回复
热议问题