python pandas read_csv delimiter in column data

前端 未结 1 991
日久生厌
日久生厌 2020-11-30 12:54

I\'m having this type of CSV file:

12012;My Name is Mike. What is your\'s?;3;0 
1522;In my opinion: It\'s cool; or at least not bad;4;0
21427;Hello. I like t         


        
相关标签:
1条回答
  • 2020-11-30 13:23

    Dealing with unquoted delimiters is always a nuisance. In this case, since it looks like the broken text is known to be surrounded by three correctly-encoded columns, we can recover. TBH, I'd just use the standard Python reader and build a DataFrame once from that:

    import csv
    import pandas as pd
    
    with open("semi.dat", "r", newline="") as fp:
        reader = csv.reader(fp, delimiter=";")
        rows = [x[:1] + [';'.join(x[1:-2])] + x[-2:] for x in reader] 
        df = pd.DataFrame(rows)
    

    which produces

           0                                              1  2  3
    0  12012               My Name is Mike. What is your's?  3  0
    1   1522  In my opinion: It's cool; or at least not bad  4  0
    2  21427                    Hello. I like this feature!  5  1
    

    Then we can immediately save it and get something quoted correctly:

    In [67]: df.to_csv("fixedsemi.dat", sep=";", header=None, index=False)
    
    In [68]: more fixedsemi.dat
    12012;My Name is Mike. What is your's?;3;0
    1522;"In my opinion: It's cool; or at least not bad";4;0
    21427;Hello. I like this feature!;5;1
    
    In [69]: df2 = pd.read_csv("fixedsemi.dat", sep=";", header=None)
    
    In [70]: df2
    Out[70]: 
           0                                              1  2  3
    0  12012               My Name is Mike. What is your's?  3  0
    1   1522  In my opinion: It's cool; or at least not bad  4  0
    2  21427                    Hello. I like this feature!  5  1
    
    0 讨论(0)
提交回复
热议问题