std regex_search to match only current line

元气小坏坏 提交于 2019-12-31 04:10:24

问题


I use a various regexes to parse a C source file, line by line. First i read all the content of file in a string:

ifstream file_stream("commented.cpp",ifstream::binary);

std::string txt((std::istreambuf_iterator<char>(file_stream)),
std::istreambuf_iterator<char>());

Then i use a set of regex, which should be applied continusly until the match found, here i will give only one for example:

vector<regex> rules = { regex("^//[^\n]*$") };

char * search =(char*)txt.c_str();

int position = 0, length = 0;

for (int i = 0; i < rules.size(); i++) {
  cmatch match;

  if (regex_search(search + position, match, rules[i],regex_constants::match_not_bol | regex_constants::match_not_eol)) 
  {
     position += ( match.position() + match.length() );        
  }

}

But it don't work. It will match the comment not in the current line, but it will search whole string, for the first match, regex_constants::match_not_bol and regex_constants::match_not_eol should make the regex_search to recognize ^$ as start/end of line only, not end start/end of whole block. So here is my file:

commented.cpp:

#include <stdio.h>
//comment

The code should fail, my logic is with those options to regex_search, the match should fail, because it should search for pattern in the first line:

#include <stdio.h>

But instead it searches whole string, and immideatly finds //comment. I need help, to make regex_search match only in current line. The options match_not_bol and match_not_eol do not help me. Of course i can read a file line by line in a vector, and then do match of all rules on each string in vector, but it is very slow, i have done that, and it take too long time to parse a big file like that, that's why i want to let regex deal with new lines, and use positioning counter.


回答1:


If it is not what you want please comment so I will delete the answer

What you are doing is not a correct way of using a regex library.
Thus here is my suggestion for anyone that wants to use std::regex library.

  1. It only supports ECMAScript that somehow is a little poor than all modern regex library.
  2. It has bugs as many as you like ( just I found ):

    1. the same regex but different results on Linux and Windows only C++
    2. std::regex and ignoring flags
    3. std::regex_match and lazy quantifier with strange behavior
  3. In some cases (I test specifically with std::match_results ) It is 200 times slower in comparison to std.regex in d language

  4. It has very confusing flag-match and almost it does not work (at least for me)

conclusion: do not use it at all.


But if anyone still demands to use c++ anyway then you can:

  1. use boost::regex about Boost library because:

    1. It is PCRE support
    2. It has less bug ( I have not seen any )
    3. It is smaller in bin file ( I mean executable file after compiling )
    4. It is faster then std::regex
  2. use gcc version 7.1.0 and NOT below. The last bug I found is in version 6.3.0

  3. use clang version 3 or above

If you have enticed (= persuade) to NOT use c++ then you can use:

  1. Use d regular expression link library for large task: std.regex and why:

    1. Fast Faster Command Line Tools in D
    2. Easy
    3. Flexible drn
  2. Use native pcre or pcre2 link that have been written in c

    • Extremely fast but a little complicated
  3. Use perl for a simple task and specially Perl one-liner link



回答2:


#include <stdio.h> //comment

The code should fail, my logic is with those options to regex_search, the match should fail, because it should search for pattern in the first line:

#include <stdio.h>

But instead it searches whole string, and immideatly finds //comment. I need help, to make regex_search match only in current line.

Are you trying to match all // comments in a source code file, or only the first line?

The former can be done like this:

#include <iostream>
#include <fstream>
#include <regex>

int main()
{
  auto input = std::ifstream{"stream_union.h"};

  for(auto line = std::string{}; getline(input, line); )
  {
    auto submatch = std::smatch{};
    auto pattern = std::regex(R"(//)");
    std::regex_search(line, submatch, pattern);

    auto match = submatch.str(0);
    if(match.empty()) continue;

    std::cout << line << std::endl;
  }
  std::cout << std::endl;

  return EXIT_SUCCESS;
}

And the later can be done like this:

#include <iostream>
#include <fstream>
#include <regex>

int main()
{
  auto input = std::ifstream{"stream_union.h"};
  auto line = std::string{};
  getline(input, line);

  auto submatch = std::smatch{};
  auto pattern = std::regex(R"(//)");
  std::regex_search(line, submatch, pattern);

  auto match = submatch.str(0);
  if(match.empty()) { return EXIT_FAILURE; }

  std::cout << line << std::endl;

  return EXIT_SUCCESS;
}

If for any reason you're trying to get the position of the match, tellg() will do that for you.



来源:https://stackoverflow.com/questions/46087665/std-regex-search-to-match-only-current-line

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!