Load CSV to Pandas MultiIndex DataFrame

前端 未结 2 1402
被撕碎了的回忆
被撕碎了的回忆 2020-12-03 02:31

I have a 719mb CSV file that looks like:

from, to, dep, freq, arr, code, mode   (header row)
RGBOXFD,RGBPADTON,127,0,27,99999,2
RGBOXFD,RGBPADTON,127,0,33,99         


        
相关标签:
2条回答
  • 2020-12-03 03:11

    You could use pd.read_csv:

    >>> df = pd.read_csv("test_data2.csv", index_col=[0,1], skipinitialspace=True)
    >>> df
                           dep  freq   arr   code  mode
    from       to                                      
    RGBOXFD    RGBPADTON   127     0    27  99999     2
               RGBPADTON   127     0    33  99999     2
               RGBRDLEY    127     0  1425  99999     2
               RGBCHOLSEY  127     0    52  99999     2
               RGBMDNHEAD  127     0    91  99999     2
    RGBDIDCOTP RGBPADTON   127     0    46  99999     2
               RGBPADTON   127     0     3  99999     2
               RGBCHOLSEY  127     0    61  99999     2
               RGBRDLEY    127     0  1430  99999     2
               RGBPADTON   127     0   115  99999     2
    

    where I've used skipinitialspace=True to get rid of those annoying spaces in the header row.

    0 讨论(0)
  • 2020-12-03 03:31

    from_csv() works similarly:

    import pandas as pd
    
    df = pd.DataFrame.from_csv(
        'data.txt',
        index_col = [0, 1]
    )
    
    print df
    
    --output:--
                            dep   freq   arr   code   mode
    from        to                                        
    RGBOXFD    RGBPADTON    127      0    27  99999      2
               RGBPADTON    127      0    33  99999      2
               RGBRDLEY     127      0  1425  99999      2
               RGBCHOLSEY   127      0    52  99999      2
               RGBMDNHEAD   127      0    91  99999      2
    RGBDIDCOTP RGBPADTON    127      0    46  99999      2
               RGBPADTON    127      0     3  99999      2
               RGBCHOLSEY   127      0    61  99999      2
               RGBRDLEY     127      0  1430  99999      2
               RGBPADTON    127      0   115  99999      2
    

    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_csv.html#pandas.DataFrame.from_csv

    From this discussion,

    https://github.com/pydata/pandas/issues/4916

    it looks like read_csv() was implemented to allow you to set more options, which makes from_csv() superfluous.

    0 讨论(0)
提交回复
热议问题