pandas dataframe with 2-rows header and export to csv

后端 未结 4 1185
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-11 03:29

I have a dataframe

df = pd.DataFrame(columns = [\"AA\", \"BB\", \"CC\"])
df.loc[0]= [\"a\", \"b\", \"c1\"]
df.loc[1]= [\"a\", \"b\", \"c2\"]
df.loc[2]= [\"a\         


        
相关标签:
4条回答
  • 2020-12-11 03:35

    I think this is a bug in to_csv. If you're looking for workarounds then here's a couple.

    To read back in this csv specify the header rows*:

    In [11]: csv = "AA,BB,CC
    DD,EE,FF
    ,,
    a,b,c1
    a,b,c2
    a,b,c3"
    
    In [12]: pd.read_csv(StringIO(csv), header=[0, 1])
    Out[12]:
      AA BB  CC
      DD EE  FF
    0  a  b  c1
    1  a  b  c2
    2  a  b  c3
    

    *strangely this seems to ignore the blank lines.

    To write out you could write the header first and then append:

    with open('test.csv', 'w') as f:
        f.write('\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n')
    df.to_csv('test.csv', mode='a', index=False, header=False)
    

    Note the to_csv part for MultiIndex column here:

    In [21]: '\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n'
    Out[21]: 'AA,BB,CC\nDD,EE,FF\n'
    
    0 讨论(0)
  • 2020-12-11 03:40

    Use df.to_csv("test.csv", index = False, tupleize_cols=True) to get the resulting CSV to be:

    "('AA', 'DD')","('BB', 'EE')","('CC', 'FF')"
    a,b,c1
    a,b,c2
    a,b,c3
    

    To read it back:

    df2=pd.read_csv("test.csv", tupleize_cols=True)
    df2.columns=pd.MultiIndex.from_tuples(eval(','.join(df2.columns)))
    

    To get the exact output you wanted:

    with open('test.csv', 'a') as f:
        pd.DataFrame(np.asanyarray(df.columns.tolist())).T.to_csv(f, index = False, header=False)
        df.to_csv(f, index = False, header=False)
    
    0 讨论(0)
  • 2020-12-11 03:55

    Building on top of @DSM's solution:

    if you need (as I did) to apply the same hack to an export to excel, the main change needed (apart from expected differences with the to_excel method) is to actually remove the multiindex used for your column labels...

    That's because .to_excel doesn't support writing out a df having a multiindex for columns but no index (providing index=False to the .to_excel method) contrarily to .to_csv

    Anyway, here's what it would look like:

    >>> writer = pd.ExcelWriter("noblankrows.xlsx")
    >>> headers = pd.DataFrame(df.columns.tolist()).T
    >>> headers.to_excel(
            writer, header=False, index=False)
    >>> df.columns = pd.Index(range(len(df.columns)))  # that's what I was referring to...
    >>> df.to_excel(
            writer, header=False, index=False, startrow=len(headers))
    >>> writer.save()
    >>> pd.read_excel("noblankrows.xlsx").to_csv(sys.stdout, index=False)
    AA,BB,CC
    DD,EE,FF
    a,b,c1
    a,b,c2
    a,b,c3
    
    0 讨论(0)
  • 2020-12-11 04:00

    It's an ugly hack, but if you needed something to work Right Now(tm), you could write it out in two parts:

    >>> pd.DataFrame(df.columns.tolist()).T.to_csv("noblankrows.csv", mode="w", header=False, index=False)
    >>> df.to_csv("noblankrows.csv", mode="a", header=False, index=False)
    >>> !cat noblankrows.csv
    AA,BB,CC
    DD,EE,FF
    a,b,c1
    a,b,c2
    a,b,c3
    
    0 讨论(0)
提交回复
热议问题