Saving in a file an array or DataFrame together with other information

后端 未结 6 602
误落风尘
误落风尘 2020-12-23 00:09

The statistical software Stata allows short text snippets to be saved within a dataset. This is accomplished either using notes and/or characteristics.

This is a fea

6条回答
  •  暖寄归人
    2020-12-23 00:56

    jpp's answer is pretty comprehensive, just wanted to mention that as of pandas v22 parquet is very convenient and fast option with almost no drawbacks vs csv (accept perhaps the coffee break).

    read parquet

    write parquet

    At time of writing you'll need to also

    pip install pyarrow
    

    In terms of adding information you have the metadata which is attached to the data

    import pyarrow as pa
    import pyarrow.parquet as pq
    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.normal(size=(1000, 10)))
    
    tab = pa.Table.from_pandas(df)
    
    tab = tab.replace_schema_metadata({'here' : 'it is'})
    
    pq.write_table(tab, 'where_is_it.parq')
    
    pq.read_table('where_is_it.parq')
    which then yield a table

    Pyarrow table
    0: double
    1: double
    2: double
    3: double
    4: double
    5: double
    6: double
    7: double
    8: double
    9: double
    __index_level_0__: int64
    metadata
    --------
    {b'here': b'it is'}
    

    To get this back to pandas:

    tab.to_pandas()
    

提交回复
热议问题