how to include C++ input stream delimiters into result tokens

自闭症网瘾萝莉.ら 提交于 2021-02-09 07:00:41

问题


C++ standard library supports a few ways to introduce custom delimiters for input streams, as I understand recommended way is a using new locale and ctype objects:

first way (inherited from ctype specialization) :

struct csv_whitespace : std::ctype<char>
{
    bool do_is(mask m, char_type c) const
    {
        if ((m & space) && c == ' ') {
            return false; // space will NOT be classified as whitespace
        }
        if ((m & space) && c == ',') {
            return true; // comma will be classified as whitespace
        }
        return ctype::do_is(m, c); // leave the rest to the parent class
    }
};
//  for cin stream :
cin.imbue(std::locale(cin.getloc(), new csv_whitespace));

second way (parameterized ctype specialization):

//  getting existing table for ctype<char> specialization
const auto temp = std::ctype<char>::classic_table();
//  create a copy of the table in vector container
std::vector<std::ctype<char>::mask> new_table_vector(temp, temp + std::ctype<char>::table_size);

//  add/remove stream separators using bitwise arithmetic.
//  use char-based indices because ascii codes here are equal to indices
new_table_vector[' '] ^= ctype_base::space;
new_table_vector['\t'] &= ~(ctype_base::space | ctype_base::cntrl);
new_table_vector[':'] |= ctype_base::space;
//  A ctype initialized with new_table_vector would delimit on '\n' and ':' but not ' ' or '\t'.

//  ....
//  usage of the mask above.
cin.imbue(locale(cin.getloc(), new std::ctype<char>(new_table_vector.data())));

But is there way to include a delimiters into a resulted tokens? e.g.

aaa&bbb*ccc%ddd&eee

where

& * %

are delimiters defined using one of methods above. and result strings would be:

aaa

&bbb

*ccc

%ddd

&eee

so you see - that delimiters are included into result strings. this is a question - how to configure (and is it possible?) input stream for that?

Thank you


回答1:


The short answer is no, istreams do not provide an inate method for extracting and retaining separators. istreams provide the following extraction methods:

  • operator>> - discards the delimiter
  • get - does not extract a delimiter at all
  • getline - discard a delimiter
  • read - doesn't respect delimiters
  • readsome - doesn't respect delimiters

However, let's assume that you slurpped your istream into string foo, then you could use a regex like this to tokenize:

((?:^|[&*%])[^&*%]*)

Live Example

This could be used with a regex_token_iterator like this:

const regex re{ "((?:^|[&*%])[^&*%]*)" };
const vector<string> bar{ sregex_token_iterator(cbegin(foo), cend(foo), re, 1), sregex_token_iterator() };

Live Example



来源:https://stackoverflow.com/questions/50154766/how-to-include-c-input-stream-delimiters-into-result-tokens

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!