Difference between scanf() and strtol() / strtod() in parsing numbers

后端 未结 8 1600
陌清茗
陌清茗 2020-12-05 20:28

Note: I completely reworked the question to more properly reflect what I am setting the bounty for. Please excuse any inconsistencies with already-given ans

8条回答
  •  Happy的楠姐
    2020-12-05 20:44

    According to the C99 spec, the scanf() family of functions parses integers the same way as the strto*() family of functions. For example, for the conversion specifier x this reads:

    Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 16 for the base argument.

    So if sscanf() and strtoul() give different results, the libc implementation doesn't conform.

    What the expected results of you sample code should be is a bit unclear, though:

    strtoul() accepts an optional prefix of 0x or 0X if base is 16, and the spec reads

    The subject sequence is defined as the longest initial subsequence of the input string, starting with the first non-white-space character, that is of the expected form.

    For the string "0xz", in my opinion the longest initial subsequence of expected form is "0", so the value should be 0 and the endptr argument should be set to x.

    mingw-gcc 4.4.0 disagrees and fails to parse the string with both strtoul() and sscanf(). The reasoning could be that the longest initial subsequence of expected form is "0x" - which is not a valid integer literal, so no parsing is done.

    I think this interpretation of the standard is wrong: A subsequence of expected form should always yield a valid integer value (if out of range, the MIN/MAX values are returned and errno is set to ERANGE).

    cygwin-gcc 3.4.4 (which uses newlib as far as I know) will also not parse the literal if strtoul() is used, but parses the string according to my interpretation of the standard with sscanf().

    Beware that my interpretation of the standard is prone to your initital problem, ie that the standard only guarantees to be able to ungetc() once. To decide if the 0x is part of the literal, you have to read ahead two characters: the x and the following character. If it's no hex character, they have to be pushed back. If there are more tokens to parse, you can buffer them and work around this problem, but if it's the last token, you have to ungetc() both characters.

    I'm not really sure what fscanf() should do if ungetc() fails. Maybe just set the stream's error indicator?

提交回复
热议问题