How to parse a string to an int in C++?

前端 未结 17 2056
忘了有多久
忘了有多久 2020-11-21 11:01

What\'s the C++ way of parsing a string (given as char *) into an int? Robust and clear error handling is a plus (instead of returning zero).

相关标签:
17条回答
  • 2020-11-21 12:03

    I think these three links sum it up:

    • http://tinodidriksen.com/2010/02/07/cpp-convert-int-to-string-speed/
    • http://tinodidriksen.com/2010/02/16/cpp-convert-string-to-int-speed/
    • http://www.fastformat.org/performance.html

    stringstream and lexical_cast solutions are about the same as lexical cast is using stringstream.

    Some specializations of lexical cast use different approach see http://www.boost.org/doc/libs/release/boost/lexical_cast.hpp for details. Integers and floats are now specialized for integer to string conversion.

    One can specialize lexical_cast for his/her own needs and make it fast. This would be the ultimate solution satisfying all parties, clean and simple.

    Articles already mentioned show comparison between different methods of converting integers <-> strings. Following approaches make sense: old c-way, spirit.karma, fastformat, simple naive loop.

    Lexical_cast is ok in some cases e.g. for int to string conversion.

    Converting string to int using lexical cast is not a good idea as it is 10-40 times slower than atoi depending on the platform/compiler used.

    Boost.Spirit.Karma seems to be the fastest library for converting integer to string.

    ex.: generate(ptr_char, int_, integer_number);
    

    and basic simple loop from the article mentioned above is a fastest way to convert string to int, obviously not the safest one, strtol() seems like a safer solution

    int naive_char_2_int(const char *p) {
        int x = 0;
        bool neg = false;
        if (*p == '-') {
            neg = true;
            ++p;
        }
        while (*p >= '0' && *p <= '9') {
            x = (x*10) + (*p - '0');
            ++p;
        }
        if (neg) {
            x = -x;
        }
        return x;
    }
    
    0 讨论(0)
  • 2020-11-21 12:04

    The C++ String Toolkit Library (StrTk) has the following solution:

    static const std::size_t digit_table_symbol_count = 256;
    static const unsigned char digit_table[digit_table_symbol_count] = {
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xFF - 0x07
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x08 - 0x0F
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x10 - 0x17
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x18 - 0x1F
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x20 - 0x27
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x28 - 0x2F
       0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, // 0x30 - 0x37
       0x08, 0x09, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x38 - 0x3F
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x40 - 0x47
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x48 - 0x4F
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x50 - 0x57
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x58 - 0x5F
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x60 - 0x67
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x68 - 0x6F
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x70 - 0x77
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x78 - 0x7F
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x80 - 0x87
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x88 - 0x8F
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x90 - 0x97
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0x98 - 0x9F
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xA0 - 0xA7
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xA8 - 0xAF
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xB0 - 0xB7
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xB8 - 0xBF
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xC0 - 0xC7
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xC8 - 0xCF
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xD0 - 0xD7
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xD8 - 0xDF
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xE0 - 0xE7
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xE8 - 0xEF
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 0xF0 - 0xF7
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF  // 0xF8 - 0xFF
     };
    
    template<typename InputIterator, typename T>
    inline bool string_to_signed_type_converter_impl_itr(InputIterator begin, InputIterator end, T& v)
    {
       if (0 == std::distance(begin,end))
          return false;
       v = 0;
       InputIterator it = begin;
       bool negative = false;
       if ('+' == *it)
          ++it;
       else if ('-' == *it)
       {
          ++it;
          negative = true;
       }
       if (end == it)
          return false;
       while(end != it)
       {
          const T digit = static_cast<T>(digit_table[static_cast<unsigned int>(*it++)]);
          if (0xFF == digit)
             return false;
          v = (10 * v) + digit;
       }
       if (negative)
          v *= -1;
       return true;
    }
    

    The InputIterator can be of either unsigned char*, char* or std::string iterators, and T is expected to be a signed int, such as signed int, int, or long

    0 讨论(0)
  • 2020-11-21 12:05

    What not to do

    Here is my first piece of advice: do not use stringstream for this. While at first it may seem simple to use, you'll find that you have to do a lot of extra work if you want robustness and good error handling.

    Here is an approach that intuitively seems like it should work:

    bool str2int (int &i, char const *s)
    {
        std::stringstream ss(s);
        ss >> i;
        if (ss.fail()) {
            // not an integer
            return false;
        }
        return true;
    }
    

    This has a major problem: str2int(i, "1337h4x0r") will happily return true and i will get the value 1337. We can work around this problem by ensuring there are no more characters in the stringstream after the conversion:

    bool str2int (int &i, char const *s)
    {
        char              c;
        std::stringstream ss(s);
        ss >> i;
        if (ss.fail() || ss.get(c)) {
            // not an integer
            return false;
        }
        return true;
    }
    

    We fixed one problem, but there are still a couple of other problems.

    What if the number in the string is not base 10? We can try to accommodate other bases by setting the stream to the correct mode (e.g. ss << std::hex) before trying the conversion. But this means the caller must know a priori what base the number is -- and how can the caller possibly know that? The caller doesn't know what the number is yet. They don't even know that it is a number! How can they be expected to know what base it is? We could just mandate that all numbers input to our programs must be base 10 and reject hexadecimal or octal input as invalid. But that is not very flexible or robust. There is no simple solution to this problem. You can't simply try the conversion once for each base, because the decimal conversion will always succeed for octal numbers (with a leading zero) and the octal conversion may succeed for some decimal numbers. So now you have to check for a leading zero. But wait! Hexadecimal numbers can start with a leading zero too (0x...). Sigh.

    Even if you succeed in dealing with the above problems, there is still another bigger problem: what if the caller needs to distinguish between bad input (e.g. "123foo") and a number that is out of the range of int (e.g. "4000000000" for 32-bit int)? With stringstream, there is no way to make this distinction. We only know whether the conversion succeeded or failed. If it fails, we have no way of knowing why it failed. As you can see, stringstream leaves much to be desired if you want robustness and clear error handling.

    This leads me to my second piece of advice: do no use Boost's lexical_cast for this. Consider what the lexical_cast documentation has to say:

    Where a higher degree of control is required over conversions, std::stringstream and std::wstringstream offer a more appropriate path. Where non-stream-based conversions are required, lexical_cast is the wrong tool for the job and is not special-cased for such scenarios.

    What?? We've already seen that stringstream has a poor level of control, and yet it says stringstream should be used instead of lexical_cast if you need "a higher level of control". Also, because lexical_cast is just a wrapper around stringstream, it suffers from the same problems that stringstream does: poor support for multiple number bases and poor error handling.

    The best solution

    Fortunately, somebody has already solved all of the above problems. The C standard library contains strtol and family which have none of these problems.

    enum STR2INT_ERROR { SUCCESS, OVERFLOW, UNDERFLOW, INCONVERTIBLE };
    
    STR2INT_ERROR str2int (int &i, char const *s, int base = 0)
    {
        char *end;
        long  l;
        errno = 0;
        l = strtol(s, &end, base);
        if ((errno == ERANGE && l == LONG_MAX) || l > INT_MAX) {
            return OVERFLOW;
        }
        if ((errno == ERANGE && l == LONG_MIN) || l < INT_MIN) {
            return UNDERFLOW;
        }
        if (*s == '\0' || *end != '\0') {
            return INCONVERTIBLE;
        }
        i = l;
        return SUCCESS;
    }
    

    Pretty simple for something that handles all the error cases and also supports any number base from 2 to 36. If base is zero (the default) it will try to convert from any base. Or the caller can supply the third argument and specify that the conversion should only be attempted for a particular base. It is robust and handles all errors with a minimal amount of effort.

    Other reasons to prefer strtol (and family):

    • It exhibits much better runtime performance
    • It introduces less compile-time overhead (the others pull in nearly 20 times more SLOC from headers)
    • It results in the smallest code size

    There is absolutely no good reason to use any other method.

    0 讨论(0)
  • 2020-11-21 12:06

    In the new C++11 there are functions for that: stoi, stol, stoll, stoul and so on.

    int myNr = std::stoi(myString);
    

    It will throw an exception on conversion error.

    Even these new functions still have the same issue as noted by Dan: they will happily convert the string "11x" to integer "11".

    See more: http://en.cppreference.com/w/cpp/string/basic_string/stol

    0 讨论(0)
  • 2020-11-21 12:08

    The good 'old C way still works. I recommend strtol or strtoul. Between the return status and the 'endPtr', you can give good diagnostic output. It also handles multiple bases nicely.

    0 讨论(0)
提交回复
热议问题