stringstream unsigned input validation

后端 未结 2 1929
旧时难觅i
旧时难觅i 2021-01-13 03:13

I\'m writing part of program which parses and validates some user input in program console arguments. I choose to use stringstream for that purpose, but encounter a problem

2条回答
  •  日久生厌
    2021-01-13 03:36

    Version disclaimer: The answer is different for C++03. The following deals with C++11.

    First, let's analyse what's happening.

    ss >> res; This calls std::istream::operator>>(unsigned). In [istream.formatted.arithmetic]/1, the effects are defined as follows:

    These extractors behave as formatted input functions (as described in 27.7.2.2.1). After a sentry object is constructed, the conversion occurs as if performed by the following code fragment:

    typedef num_get< charT,istreambuf_iterator > numget;
    iostate err = iostate::goodbit;
    use_facet< numget >(loc).get(*this, 0, *this, err, val);
    setstate(err);
    

    In the above fragment, loc stands for the private member of the basic_ios class.

    Following formatted input functions to [istream::sentry], the main effect of the sentry object here is to consume leading white-space characters. It also prevents executing of the code shown above in case of an error (stream is in failed / eof state).

    The used locale is the "C" locale. Rationale:

    For a the stringstream constructed via stringstream ss(s);, the locale of that iostream is the current global locale at the time of construction (that's guaranteed deep down in the rabbit hole at [ios.base.locales]/4). As the global locale hasn't been changed in the OP's program, [locale.cons]/2 specifies the "classic" locale, i.e. the "C" locale.

    use_facet< numget >(loc).get uses the member function num_get::get(iter_type in, iter_type end, ios_base&, ios_base::iostate& err, unsigned int& v) const; specified in [locale.num.get] (note the unsigned int, everything is still fine). The details of the string -> unsigned int conversion for the "C" locale are lengthy and described in [facet.num.get.virtuals]. Some interesting details:

    • For an unsigned integer value, the function strtoull is used.
    • If the conversion fails, ios_base::failbit is assigned to err. Specifically: "The numeric value to be stored can be one of: [...] the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in val. ios_base::failbit is assigned to err."

    We need to go to C99, 7.20.1.4 for the definition of strtoull, under paragraph 5:

    If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).

    and under paragraph 8:

    If the correct value is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any), and the value of the macro ERANGE is stored in errno

    It seems that it has been debated in the past if negative values are considered valid input for strotoul. In any case, the problem lies here with this function. A quick check on gcc says that it's considered valid input, and therefore the behaviour you observed.


    Historic note: C++03

    C++03 used scanf inside the num_get conversion. Unfortunately, I'm not quite sure (yet) how the conversion for scanf is specified, and under which circumstances errors occur.


    An explicit error check:

    We can manually insert that check either by using a signed value for conversion and testing <0, or we look for the - character (which isn't a good idea because of possible localization issues).

提交回复
热议问题