Regular Expression For Parsing Data

让人想犯罪 __ 提交于 2019-12-20 10:08:27

问题


I am writing an application that reads some data from a simple text file. The data files, that I am interested in, have lines in the following form:

Mem(100) = 120
Mem(200) = 231
Mem(43) = 12
...
Mem(1293) = 12.54

So, as you can understand, the pattern of each line is something like

(\s)*(\t)*Mem([0-9]*) (\s,\t)*= (\s,\t)*[0-9]*(.)*[0-9]*

like I have any number of whitespaces before the character sequence "Mem", followed by a left parenthesis. Then, there is a number and a right parenthesis. Afterwards, there is any number of white spaces until an '=' (equals) character is encountered. Then, any number of white spaces until I come across a (possibly) floating point number.

How can I express that in a C++ regex pattern? I am really new to the regular expression concept in C++ so I would need some help.

Thank you


回答1:


First of all, remember to #include <regex>.

C++ std::regex_match works like the regular expressions in other languages.

Let's start with a simple example:

std::string str = "Mem(100)=120";
std::regex regex("^Mem\\([0-9]+\\)=[0-9]+$");
std::cout << std::regex_match(str, regex) << std::endl;

In this case, our regex is ^Mem\([0-9]+\)=[0-9]+$. Let's take a look at what it does:

  • The ^ at the beginning tells C++ this is where the line starts, so AMem(1)=2 should not match.
  • The $ at the end tells C++ this is where the line ends, so Mem(1)=2x should not match.
  • \\( is a literal ( character. ( has a very special meaning in regular expressions, so we escape it \(. However, the \ character has a special meaning in C++ strings, so we use \\( to tell C++ to pass the \( to the regular expression engine.
  • [0-9] matches a digit. \\dshould also work, but then again maybe not.
  • [0-9]+ means at least one digit. If Mem() is acceptable, then use [0-9]* instead.

As you can see, this is just like the regular expressions you'd find in other languages (such as Java or C# ).

Now, to consider whitespace, use std::regex regex("^\\s*Mem\\([0-9]+\\)\\s*=\\s*[0-9]+\\s*$");

Note that \s includes \t, so no need to specify both. If it didn't, you'd use (\s|\t) or [\s\t], not (\s,\t).

Finally, to include float numbers, we first need to think if Mem(1) = 1. (that is, a dot without a number after it) is acceptable.

If it isn't, then the .23 in 1.23 is optional. In regexes, we use ? to indicate that.

std::regex regex("^[\\s]*Mem\\([0-9]+\\)\\s*=\\s*[0-9]+(\\.[0-9]+)?\\s*$");

Note that we use \. instead of just .. . has a special meaning in regular expressions - it matches any character - so we need to escape it.

If you have a compiler that supports raw strings (e.g. Visual Studio 2013, GCC 4.5, Clang 3.0), you can simplify the regex string:

std::regex regex(R"(^[\s]*Mem\([0-9]+\)\s*=\s*[0-9]+(\.[0-9]+)?\s*$)")

To extract information about the matched string, you can use std::smatch and groups.

Let's start with a small change:

std::string str = " Mem(100)=120";
std::regex regex("^[\\s]*Mem\\(([0-9]+)\\)\\s*=\\s*([0-9]+(\\.[0-9]+)?)\\s*$");
std::smatch m;

std::cout << std::regex_match(str, m, regex) << std::endl;

Note three things:

  1. We added smatch. This class stores extra result info about the match.
  2. We added additional parenthesis around [0-9]*. This defines a group. Groups tell the regex engine to keep track of whatever is within them.
  3. Yet more parenthesis around the floating point number. This defines a second group.

Very importantly the parenthesis that define groups are NOT escaped since we don't want them to match actual parenthesis characters. We actually want the special regex meaning.

Now that we have the groups, we can use them:

for (auto result : m) {
    std::cout << result << std::endl;
}

This will first print the whole string, then the number in Mem(), then the final number.

In other words, m[0] gives us the whole match, m[1] gives us the first group, m[2] gives us the second group and m[3] would give us the third group if we had one.



来源:https://stackoverflow.com/questions/19327562/regular-expression-for-parsing-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!