Most efficient way to parse a large .csv in python?

前端未结

关注

 4  891

你的背包 2020-12-05 15:36

I tried to look on other answers but I am still not sure the right way to do this. I have a number of really large .csv files (could be a gigabyte each), and I want to first

4条回答

轻奢々 (楼主)

2020-12-05 16:14
How much do you care about sanitization?

The csv module is really good at understanding different csv file dialects and ensuring that escaping is happing properly, but it's definitely overkill and can often be way more trouble than it's worth (especially if you have unicode!)

A really naive implementation that properly escapes \, would be:
```
import re

def read_csv_naive():
    with open(, 'r') as file_obj:
      return [re.split('[^\\],', x) for x in file_obj.splitlines()]
```
If your data is simple this will work great. If you have data that might need more escaping, the csv module is probably your most stable bet.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...