How to convert bytes data into a python pandas dataframe?

前端 未结 3 1042
刺人心
刺人心 2021-01-04 02:00

I would like to convert \'bytes\' data into a Pandas dataframe.

The data looks like this (few first lines):

    (b\'#Settlement Date,Settlement Peri         


        
相关标签:
3条回答
  • 2021-01-04 02:24

    You can also use BytesIO directly:

    from io import BytesIO
    
    df = pd.read_csv(BytesIO(bytes_data))
    

    This will save you the step of transforming bytes_data to a string

    0 讨论(0)
  • 2021-01-04 02:37

    I had the same issue and found this library https://docs.python.org/2/library/stringio.html from the answer here: How to create a Pandas DataFrame from a string

    Try something like:

    from io import StringIO
    
    s=str(bytes_data,'utf-8')
    
    data = StringIO(s) 
    
    df=pd.read_csv(data)
    
    0 讨论(0)
  • 2021-01-04 02:43

    Ok cool, your input formatting is quite awkward but the following works:

    with open('file.txt', 'r') as myfile:
        data=myfile.read().replace('\n', '') #read in file as a string
    
    df = pd.Series(" ".join(data.strip(' b\'').strip('\'').split('\' b\'')).split('\\n')).str.split(',', expand=True)
    
    print(df)
    

    this produces the following:

                     0                  1     2    3     4        5      6   7   \
    0  #Settlement Date  Settlement Period  CCGT  OIL  COAL  NUCLEAR   WIND  PS   
    1        2017-01-01                  1  7727    0  3815     7404   3923   0   
    2        2017-01-01                  2  8338    0  3815     7403   3658  16   
    3        2017-01-01                  3  7927    0  3801     7408   3925   0   
    
           8      9      10     11      12      13     14       15  
    0  NPSHYD  OCGT   OTHER  INTFR  INTIRL  INTNED  INTEW  BIOMASS  
    1     944      0   2123    948     296     856    238           
    2     909      0   2124    998     298     874    288           
    3     864      0   2122    998     298     816    286     None 
    

    In order for this to work you will need to ensure that your input file contains only a collection of complete rows. For this reason I removed the partial row for the purposes of the test.

    As you have said that the data source is an http GET request then the initial read would take place using pandas.read_html.

    More detail on this can be found here. Note specifically the section on io (io : str or file-like).

    0 讨论(0)
提交回复
热议问题