Pandas pytable: how to specify min_itemsize of the elements of a MultiIndex

前端 未结 1 1514
逝去的感伤
逝去的感伤 2020-12-11 23:07

I am storing a pandas dataframe as a pytable which contains a MultiIndex.

The first level of the MultiIndex is a string corresponding to a userID. Now, most of the

相关标签:
1条回答
  • 2020-12-11 23:29

    You need to specify the name of the multi-index level that you want to set a min_itemsize for. Here's an example:

    Create 2 multi-indexed frames

    In [1]: df1 = DataFrame(np.random.randn(4,2),index=MultiIndex.from_product([['abcdefghijklm','foo'],[1,2]],names=['string','number']))
    
    In [2]: df2 = DataFrame(np.random.randn(4,2),index=MultiIndex.from_product([['abcdefghijklmop','foo'],[1,2]],names=['string','number']))
    
    In [3]: df1
    Out[3]: 
                                 0         1
    string        number                    
    abcdefghijklm 1       0.737976  0.840718
                  2       0.605763  1.797398
    foo           1       1.589278  0.104186
                  2       0.029387  1.417195
    
    [4 rows x 2 columns]
    
    In [4]: df2
    Out[4]: 
                                   0         1
    string          number                    
    abcdefghijklmop 1       0.539507 -1.059085
                    2       1.263722 -1.773187
    foo             1       1.625073  0.078650
                    2      -0.030827 -1.691805
    
    [4 rows x 2 columns]
    

    Create a store

    In [9]: store = pd.HDFStore('test.h5',mode='w')
    
    In [10]: store.append('df1',df1)
    

    Here's the length is computed

    In [12]: store.get_storer('df1').table
    Out[12]: 
    /df1/table (Table(4,)) ''
      description := {
      "index": Int64Col(shape=(), dflt=0, pos=0),
      "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1),
      "number": Int64Col(shape=(), dflt=0, pos=2),
      "string": StringCol(itemsize=13, shape=(), dflt='', pos=3)}
      byteorder := 'little'
      chunkshape := (1456,)
      autoindex := True
      colindexes := {
        "index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
        "number": Index(6, medium, shuffle, zlib(1)).is_csi=False,
        "string": Index(6, medium, shuffle, zlib(1)).is_csi=False}
    

    Here's the error you are getting now

    In [13]: store.append('df1',df2)
    
    ValueError: Trying to store a string with len [15] in [string] column but
    this column has a limit of [13]!
    Consider using min_itemsize to preset the sizes on these columns
    

    Specify the min_itemsize with the name of the level

    In [14]: store.append('df',df1,min_itemsize={ 'string' : 15 })
    
    In [15]: store.get_storer('df').table
    Out[15]: 
    /df/table (Table(4,)) ''
      description := {
      "index": Int64Col(shape=(), dflt=0, pos=0),
      "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1),
      "number": Int64Col(shape=(), dflt=0, pos=2),
      "string": StringCol(itemsize=15, shape=(), dflt='', pos=3)}
      byteorder := 'little'
      chunkshape := (1394,)
      autoindex := True
      colindexes := {
        "index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
        "number": Index(6, medium, shuffle, zlib(1)).is_csi=False,
        "string": Index(6, medium, shuffle, zlib(1)).is_csi=False}
    

    Append

    In [16]: store.append('df',df2)
    
    In [19]: store.df
    Out[19]: 
                                   0         1
    string          number                    
    abcdefghijklm   1       0.737976  0.840718
                    2       0.605763  1.797398
    foo             1       1.589278  0.104186
                    2       0.029387  1.417195
    abcdefghijklmop 1       0.539507 -1.059085
                    2       1.263722 -1.773187
    foo             1       1.625073  0.078650
                    2      -0.030827 -1.691805
    
    [8 rows x 2 columns]
    
    In [20]: store.close()
    
    0 讨论(0)
提交回复
热议问题