I tried to look on other answers but I am still not sure the right way to do this. I have a number of really large .csv files (could be a gigabyte each), and I want to first
How much do you care about sanitization?
The csv module is really good at understanding different csv file dialects and ensuring that escaping is happing properly, but it's definitely overkill and can often be way more trouble than it's worth (especially if you have unicode!)
A really naive implementation that properly escapes \,
would be:
import re
def read_csv_naive():
with open(, 'r') as file_obj:
return [re.split('[^\\],', x) for x in file_obj.splitlines()]
If your data is simple this will work great. If you have data that might need more escaping, the csv module is probably your most stable bet.