Why does std::basic_istream::ignore() extract more characters than specified?

天大地大妈咪最大 提交于 2021-02-18 10:55:46

问题


I have the following code:

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main(int argc, char* argv[]) {
    stringstream buffer("1234567890 ");
    cout << "pos-before: " << buffer.tellg() << endl;
    buffer.ignore(10, ' ');
    cout << "pos-after: " << buffer.tellg() << endl;
    cout << "eof: " << buffer.eof() << endl;
}

And it produces this output:

pos-before: 0
pos-after: 11
eof: 0

I would expect pos-after to be 10 and not 11. According to the specification, the ignore method should stop when any one of the following condition is set:

  1. count characters were extracted. This test is disabled in the special case when count equals std::numeric_limits<std::streamsize>::max()
  2. end of file conditions occurs in the input sequence, in which case the function calls setstate(eofbit)
  3. the next available character c in the input sequence is delim, as determined by Traits::eq_int_type(Traits::to_int_type(c), delim). The delimiter character is extracted and discarded. This test is disabled if delim is Traits::eof()

In this case I expect rule 1 to trigger before all the other rules and to stop when the stream position is 10.

Execution shows that it is not the case. What did I misunderstood ?

I also tried a variation of the code where I ignore only 9 characters. In this case the output is the expected one:

pos-before: 0
pos-after: 9
eof: 0

So it looks like in the case where ignore() extracted the count of characters, it still checks if the next character is the delimiter and if it is, it extracts it too. I can reproduce with g++ and clang++.

I also tried this variation of the code:

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main(int argc, char* argv[]) {
    cout << "--- 10x get\n";
    stringstream buffer("1234567890");
    cout << "pos-before: " << buffer.tellg() << '\n';
    for(int i=0; i<10; ++i)
        buffer.get();
    cout << "pos-after: " << buffer.tellg() << '\n';
    cout << "eof: " << buffer.eof() << '\n';
    
    cout << "--- ignore(10)\n";
    stringstream buffer2("1234567890");
    cout << "pos-before: " << buffer2.tellg() << '\n';
    buffer2.ignore(10);
    cout << "pos-after: " << buffer2.tellg() << '\n';
    cout << "eof: " << buffer2.eof() << '\n';
}

And the result is:

--- 10x get
pos-before: 0
pos-after: 10
eof: 0
--- ignore(10)
pos-before: 0
pos-after: -1
eof: 1

We see that using ignore() produces an end-of-file condition on the file. Indicating that ignore() did try to extract a character after having extracted 10 characters. But in this case, the 3rd condition is disabled and ignore() should not have tried to look at what the next character was.


回答1:


The specification of std::basic_istream::ignore in [istream.unformatted] paragraph 25 is a bit unclear clear: it states "Characters are extracted until any of the following occurs:" without any indication of order. Paragraph 25.1 states that at most n characters are extracted (unless n is std::numeric_limits<std::streamsize>) and paragraph 25.3 states that the characters match. However, even if the conditions can be applied in any order, there is no conflict here: the nth character is not, yet, the expected character and ignore() is supposed to stop.

As was pointed out in a comment, there was/is a bug in libstdc++ which seems to be still present with the library shipping with gcc-10.2.0. Using clang++ with libc++ (if necessary, use -stdlib=libc++ when invoking clang++) doesn't show the same behavior.

As an aside: the unformatted input operations are setting a count of characters read which can be accessed using gcount(). Seeking within a stream is a rather way more expensive operation than accessing this count. Using gcount() also shows the problem (and speaking of expensive operations, I also replaced use of std::endl by using '\n'; see this video or this article for more details):

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

int main() {
    std::istringstream buffer("1234567890 ");
    buffer.ignore(10, ' ');
    std::cout << "gcount: " << buffer.gcount() << '\n';
    std::cout << "eof: " << std::boolalpha << buffer.eof() << '\n';
}



回答2:


cppreference is notorious -- you should generally not rely on it for corner cases in the language, and refer to the spec instead, which says:

Effects: Behaves as an unformatted input function (as described above). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:

  • n != numeric_limits::max() (18.3.2) and n characters have been extracted so far
  • end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
  • traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).

Using "any of" here instead of "one of" makes it clear that ignore will stop if more than one of the conditions applies. That's essentiall the issue here -- both the first and thrid conditions apply, which brings up an underspecified corner case -- the third condition states that the next available character (that matches the delimiter) will also be extracted.

So this is exactly what the library is doing in this case -- the third condition applies, so it extracts the character. The fact that the first condition also applies is immaterial.



来源:https://stackoverflow.com/questions/64204443/why-does-stdbasic-istreamignore-extract-more-characters-than-specified

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!