I\'m processing a string which is tab delimited. I\'m accomplishing this using the split
function, and it works in most situations. The problem occurs when a f
String.split
implementations will have serious limitations if the data in a tab-delimited field itself contains newline, tab and possibly " characters.
TAB-delimited formats have been around for donkey's years, but format is not standardised and varies. Many implementations don't escape characters (newlines and tabs) appearing within a field. Rather, they follow CSV conventions and wrap any non-trivial fields in "double quotes". Then they escape only double-quotes. So a "line" could extend over multiple lines.
Reading around I heard "just reuse apache tools", which sounds like good advice.
In the end I personally chose opencsv. I found it light-weight, and since it provides options for escape and quote characters it should cover most popular comma- and tab- delimited data formats.
Example:
CSVReader tabFormatReader = new CSVReader(new FileReader("yourfile.tsv"), '\t');