Most efficient way to parse a large .csv in python?

前端 未结 4 891
你的背包
你的背包 2020-12-05 15:36

I tried to look on other answers but I am still not sure the right way to do this. I have a number of really large .csv files (could be a gigabyte each), and I want to first

4条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-05 16:14

    How much do you care about sanitization?

    The csv module is really good at understanding different csv file dialects and ensuring that escaping is happing properly, but it's definitely overkill and can often be way more trouble than it's worth (especially if you have unicode!)

    A really naive implementation that properly escapes \, would be:

    import re
    
    def read_csv_naive():
        with open(, 'r') as file_obj:
          return [re.split('[^\\],', x) for x in file_obj.splitlines()]
    

    If your data is simple this will work great. If you have data that might need more escaping, the csv module is probably your most stable bet.

提交回复
热议问题