Pandas read_csv expects wrong number of columns, with ragged csv file

前端 未结 4 1873
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-08 14:34

I have a csv file that has a few hundred rows and 26 columns, but the last few columns only have a value in a few rows and they are towards the middle or end of the file. Wh

4条回答
  •  青春惊慌失措
    2020-12-08 15:25

    The problem with the given solution is that you have to know the max number of columns required. I couldn't find a direct function for this problem, but you can surely write a def which can:

    1. read all the lines
    2. split it
    3. count the number of words/elements in each row
    4. store the max number of words/elements
    5. place that max value in the names option (as suggested by Roman Pekar)

    Here is the def (function) I wrote for my files:

    def ragged_csv(filename):
        f=open(filename)
        max_n=0
        for line in f.readlines():
            words = len(line.split(' '))
            if words > max_n:
                max_n=words
        lines=pd.read_csv(filename,sep=' ',names=range(max_n))
        return lines
    

提交回复
热议问题