Can I use 2 or more delimiters in C++ function getline? [duplicate]

问题

I would like to know how can I use 2 or more delimiters in the getline functon, that's my problem:

The program reads a text file... each line is goning to be like:

   New Your, Paris, 100
   CityA, CityB, 200

I am using getline(file, line), but I got the whole line, when I want to to get CityA, then CityB and then the number; and if I use ',' delimiter, I won't know when is the next line, so I'm trying to figure out some solution..

Though, how could I use comma and \n as a delimiter? By the way,I'm manipulating string type,not char, so strtok is not possible :/

some scratch:

string line;
ifstream file("text.txt");
if(file.is_open())
   while(!file.eof()){
     getline(file, line);
        // here I need to get each string before comma and \n
   }

回答1:

You can read a line using std::getline, then pass the line to a std::stringstream and read the comma separated values off it

string line;
ifstream file("text.txt");
if(file.is_open()){
   while(getline(file, line)){   // get a whole line
       std::stringstream ss(line);
        while(getline(ss, line, ',')){
             // You now have separate entites here
        }
   }

回答2:

No, std::getline() only accepts a single character, to override the default delimiter. std::getline() does not have an option for multiple alternate delimiters.

The correct way to parse this kind of input is to use the default std::getline() to read the entire line into a std::string, then construct a std::istringstream, and then parse it further, into comma-separate values.

However, if you are truly parsing comma-separated values, you should be using a proper CSV parser.

回答3:

Often, it is more intuitive and efficient to parse character input in a hierarchical, tree-like manner, where you start by splitting the string into its major blocks, then go on to process each of the blocks, splitting them up into smaller parts, and so on.

An alternative to this is to tokenize like strtok does -- from the beginning of input, handling one token at a time until the end of input is encountered. This may be preferred when parsing simple inputs, because its is straightforward to implement. This style can also be used when parsing inputs with nested structure, but this requires maintaining some kind of context information, which might grow too complex to maintain inside a single function or limited region of code.

Someone relying on the C++ std library usually ends up using a std::stringstream, along with std::getline to tokenize string input. But, this only gives you one delimiter. They would never consider using strtok, because it is a non-reentrant piece of junk from the C runtime library. So, they end up using streams, and with only one delimiter, one is obligated to use a hierarchical parsing style.

But zneak brought up std::string::find_first_of, which takes a set of characters and returns the position nearest to the beginning of the string containing a character from the set. And there are other member functions: find_last_of, find_first_not_of, and more, which seem to exist for the sole purpose of parsing strings. But std::string stops short of providing useful tokenizing functions.

Another option is the <regex> library, which can do anything you want, but it is new and you will need to get used to its syntax.

But, with very little effort, you can leverage existing functions in std::string to perform tokenizing tasks, and without resorting to streams. Here is a simple example. get_to() is the tokenizing function and tokenize demonstrates how it is used.

The code in this example will be slower than strtok, because it constantly erases characters from the beginning of the string being parsed, and also copies and returns substrings. This makes the code easy to understand, but it does not mean more efficient tokenizing is impossible. It wouldn't even be that much more complicated than this -- you would just keep track of your current position, use this as the start argument in std::string member functions, and never alter the source string. And even better techniques exist, no doubt.

To understand the example's code, start at the bottom, where main() is and where you can see how the functions are used. The top of this code is dominated by basic utility functions and dumb comments.

#include <iostream>
#include <string>
#include <utility>

namespace string_parsing {
// in-place trim whitespace off ends of a std::string
inline void trim(std::string &str) {
    auto space_is_it = [] (char c) {
        // A few asks:
        // * Suppress criticism WRT localization concerns
        // * Avoid jumping to conclusions! And seeing monsters everywhere! 
        //   Things like...ah! Believing "thoughts" that assumptions were made
        //   regarding character encoding.
        // * If an obvious, portable alternative exists within the C++ Standard Library,
        //   you will see it in 2.0, so no new defect tickets, please.
        // * Go ahead and ignore the rumor that using lambdas just to get 
        //   local function definitions is "cheap" or "dumb" or "ignorant."
        //   That's the latest round of FUD from...*mumble*.
        return c > '\0' && c <= ' '; 
    };

    for(auto rit = str.rbegin(); rit != str.rend(); ++rit) {
        if(!space_is_it(*rit)) {
            if(rit != str.rbegin()) {
                str.erase(&*rit - &*str.begin() + 1);
            }
            for(auto fit=str.begin(); fit != str.end(); ++fit) {
                if(!space_is_it(*fit)) {
                    if(fit != str.begin()) {
                        str.erase(str.begin(), fit);
                    }
                    return;
    }   }   }   }
    str.clear();
}

// get_to(string, <delimiter set> [, delimiter])
// The input+output argument "string" is searched for the first occurance of one 
// from a set of delimiters.  All characters to the left of, and the delimiter itself
// are deleted in-place, and the substring which was to the left of the delimiter is
// returned, with whitespace trimmed.
// <delimiter set> is forwarded to std::string::find_first_of, so its type may match
// whatever this function's overloads accept, but this is usually expressed
// as a string literal: ", \n" matches commas, spaces and linefeeds.
// The optional output argument "found_delimiter" receives the delimiter character just found.
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters, char& found_delimiter) {
    const auto pos = str.find_first_of(std::forward<D>(delimiters));
    if(pos == std::string::npos) {
        // When none of the delimiters are present,
        // clear the string and return its last value.
        // This effectively makes the end of a string an
        // implied delimiter.
        // This behavior is convenient for parsers which
        // consume chunks of a string, looping until
        // the string is empty.
        // Without this feature, it would be possible to 
        // continue looping forever, when an iteration 
        // leaves the string unchanged, usually caused by
        // a syntax error in the source string.
        // So the implied end-of-string delimiter takes
        // away the caller's burden of anticipating and 
        // handling the range of possible errors.
        found_delimiter = '\0';
        std::string result;
        std::swap(result, str);
        trim(result);
        return result;
    }
    found_delimiter = str[pos];
    auto left = str.substr(0, pos);
    trim(left);
    str.erase(0, pos + 1);
    return left;
}

template <typename D>
inline std::string get_to(std::string& str, D&& delimiters) {
    char discarded_delimiter;
    return get_to(str, std::forward<D>(delimiters), discarded_delimiter);
}

inline std::string pad_right(const std::string&     str,
                             std::string::size_type min_length,
                             char                   pad_char=' ')
{
    if(str.length() >= min_length ) return str;
    return str + std::string(min_length - str.length(), pad_char);
}

inline void tokenize(std::string source) {
    std::cout << source << "\n\n";
    bool quote_opened = false;
    while(!source.empty()) {
        // If we just encountered an open-quote, only include the quote character
        // in the delimiter set, so that a quoted token may contain any of the
        // other delimiters.
        const char* delimiter_set = quote_opened ? "'" : ",'{}";
        char delimiter;
        auto token = get_to(source, delimiter_set, delimiter);
        quote_opened = delimiter == '\'' && !quote_opened;
        std::cout << "    " << pad_right('[' + token + ']', 16) 
            << "   " << delimiter << '\n';
    }
    std::cout << '\n';
}
}

int main() {
    string_parsing::tokenize("{1.5, null, 88, 'hi, {there}!'}");
}

This outputs:

{1.5, null, 88, 'hi, {there}!'}

    []                 {
    [1.5]              ,
    [null]             ,
    [88]               ,
    []                 '
    [hi, {there}!]     '
    []                 }

回答4:

I don't think that's how you should attack the problem (even if you could do it); instead:

Use what you have to read in each line
Then split up that line by the commas to get the pieces that you want.

If strtok will do the job for #2, you can always convert your string into a char array.

来源：https://stackoverflow.com/questions/37957080/can-i-use-2-or-more-delimiters-in-c-function-getline

标签

c++

delimiter

getline