Create regex from glob expression

后端未结

关注

 6  399

i write program that parse text with regular expression. Regular expression should be obtained from user. I deside to use glob syntax for user input, and convert glob string

相关标签:

6条回答

礼貌的吻别

2020-12-03 10:53
I'm not sure I fully understand the requirements. If I assume the users want to find text "entries" where their search matches then I think this brute way would work as a start.

First escape everything regex-meaningful. Then use non-regex replaces for replacing the (now escaped) glob characters and build the regular expression. Like so in Python:
```
regexp = re.escape(search_string).replace(r'\?', '.').replace(r'\*', '.*?')
```
For the search string in the question, this builds a regexp that looks like so (raw):
```
foo\..\ bar.*?
```
Used in a Python snippet:
```
search = "foo.? bar*"
text1 = 'foo bar'
text2 = 'gazonk foo.c bar.m m.bar'

searcher = re.compile(re.escape(s).replace(r'\?', '.').replace(r'\*', '.*?'))

for text in (text1, text2):
  if searcher.search(text):
    print 'Match: "%s"' % text
```
Produces:
```
Match: "gazonk foo.c bar.m m.bar"
```
Note that if you examine the match object you can find out more about the match and use for highlighting or whatever.

Of course, there might be more to it, but it should be a start.
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-03 10:57

In R, there's the glob2rx function included in the base distribution:

http://stat.ethz.ch/R-manual/R-devel/library/utils/html/glob2rx.html

0 讨论(0)
发布评论:

提交评论
- 加载中...
挽巷

2020-12-03 10:58

Jakarta ORO has an implementation in Java.

0 讨论(0)
发布评论:

提交评论
- 加载中...

谎友^

2020-12-03 11:02

jPaq's RegExp.fromWildExp function does something similar to this. The following is taken from the example that is on the front page of the site:

// Find a first substring that starts with a capital "C" and ends with a
// lower case "n".
alert("Where in the world is Carmen Sandiego?".findPattern("C*n"));

// Finds two words (first name and last name), flips their order, and places
// a comma between them.
alert("Christopher West".replacePattern("(<*>) (<*>)", "p", "$2, $1"));

// Finds the first number that is at least three numbers long.
alert("2 to the 64th is 18446744073709551616.".findPattern("#{3,}", "ol"));

0 讨论(0)

耶瑟儿～

2020-12-03 11:11

I write my own function, using c++ and boost::regex

std::string glob_to_regex(std::string val)
{
    boost::trim(val);
    const char* expression = "(\\*)|(\\?)|([[:blank:]])|(\\.|\\+|\\^|\\$|\\[|\\]|\\(|\\)|\\{|\\}|\\\\)";
    const char* format = "(?1\\\\w+)(?2\\.)(?3\\\\s*)(?4\\\\$&)";
    std::stringstream final;
    final << "^.*";
    std::ostream_iterator<char, char> oi(final);
    boost::regex re;
    re.assign(expression);
    boost::regex_replace(oi, val.begin(), val.end(), re, format, boost::match_default | boost::format_all);
    final << ".*" << std::ends;
    return final.str();
}

it looks like all works fine

0 讨论(0)

逝去的感伤

2020-12-03 11:19
no need for incomplete or unreliable hacks. there's a function included with python for this
```
>>> import fnmatch
>>> fnmatch.translate( '*.foo' )
'.*\\.foo$'
>>> fnmatch.translate( '[a-z]*.txt' )
'[a-z].*\\.txt$'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...