Importing CSV that has line breaks within the actual fields

浪尽此生 提交于 2019-12-04 02:24:56

I had that problem too and did not find an way to read the data correctly.

In my case it was an one-time-import, so i made an script that searched for all line-breaks within an column and replaced it with something like #####. Then I imported the data and replaced that by linebreaks.

If you need an regular import you could write you own CSV-Parser, that handles the problem. If the text-columns are within "" you could treat everything between two "" as one columns (with check for escaped " within the content).

danieltalsky

The accepted answer didn't solve the problem for me, but I eventually found this CSV parser library on google code that works well for multiline fields in CSV's.

parsecsv-for-php:
https://github.com/parsecsv/parsecsv-for-php


For historical purposes, the original project home was:
http://code.google.com/p/parsecsv-for-php/

Mike Wilding

My solution is the following:

nl2br(string);

http://php.net/manual/en/function.nl2br.php

Once you get to the individual cell (string) level, run it on the string and it will convert the linebreaks to html breaks for you.

Yes you needs to find that comma and replace by some special characters like combination of {()} and finally replace them with , that you are originally looking for.

Hope that helps you.

Although it is old question the answer might be still relevant to ppl. There is currently new library (framework independent) http://csv.thephpleague.com/ which supports NL chars in fields as well as some filtering.

It's an old thread but i encountered this problem and i solved it with a regex so you can avoid a library just for that. Here the code is in PHP but it can be adapted to other language.

$parsedCSV = preg_replace('/(,|\n|^)"(?:([^\n"]*)\n([^\n"]*))*"/', '$1"$2 $3"', $parsedCSV);

This solutions supposes the fields containing a linebreak are enclosed by double quotes, which seems to be a valid assumption, at least for what i have seen so far. Also, the double quotes should follow a , or be placed at the start of a new line (or first line).

Example:

field1,"field2-part1\nfield2-part2",field3

Here the \n is replaced by a whitespace so the result would be:

field1,"field2-part1 field2-part2",field3

The regex should handle multiple linebreaks as well.

This might not be efficient if the content is too large, but it can help for many cases and the idea can be reused, maybe optimized by doing this for smaller chunks (but you'd need to handle the cuts with fix-sized buffered).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!