I want to be able to solve problems like this: Getting std :: ifstream to handle LF, CR, and CRLF? where an istream needs to be tokenized by a complex delimiter
This question is about code appearance:
regex will work 1 character at a time, this question is asking to use a library to parse the istream 1 character at a time rather than internally reading and parsing the istream 1 character at a timeistream 1 character at a time will still copy that one character to a temp variable (buffer) this code seeks to avoid buffering all the code internally, depending on a library instead to abstract thatC++11's regexes use ECMA-262 which does not support look aheads or look behinds: https://stackoverflow.com/a/14539500/2642059 This means that a regex could match using only an input_iterator_tag, but clearly those implemented in C++11 do not.
boost::regex_iterator on the other hand does support the boost::match_partial flag (which is not available in C++11 regex flags.) boost::match_partial allows the user to slurp part of the file and run the regex over that, on a mismatch due to end of input the regex will "hold it's finger" at that position in the regex and await more being added to the buffer. You can see an example here: http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/partial_matches.html In the average case, like "A\nB\rC\n\r", this can save buffer size.
boost::match_partial has 4 drawbacks:
"ABC\n" this saves the user no size and he must slurp the whole istreamboost always causes bloatCircling back to answer the question: A standard library regex_iterator cannot operate on an input_iterator_tag, slurping of the whole istream required. A boost::regex_iterator allows the user to possibly slurp less than the whole istream. Because this is a question about code appearance though, and because boost::regex_iterator's worst case requires slurping of the whole file anyway, it is not a good answer to this question.
For the best code appearance slurping the whole file and running a standard regex_iterator over it is your best bet.
I think not. istream_iterator has the input_iterator_tag tag, whereas regex_iterator expects to be initialized using bi-directional iterators (bidirectional_iterator_tag).
If your delimiter regex is complex enough to avoid reading the stream yourself, the best way to do this is to indeed slurp the istream.