Regex to match a String with optional Conditions [duplicate]

自古美人都是妖i 提交于 2019-12-24 01:44:49

问题


Possible Duplicate:
How do I make part of a regular expression optional in Ruby?

I'm trying to build a regular expression with rubular to match:

On Feb 23, 2011, at 10:22 , James Bond wrote:

OR

On Feb 23, 2011, at 10:22 AM , James Bond wrote:

Here's what I have so far, but for some reason it's not matching? Ideas?

(On.* (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, [12]\d{3}.* at \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:)

How can I make the AM/PM text optional? Either match AM/PM or neither?


回答1:


This seems to catch the date info. I purposely captured in groups, making it easier to build a real date:

regex = /^On (\w+ \d+, \d+), \w+ (\S+) (\w*)\s*,/

[
  'On Feb 23, 2011, at 10:22 , James Bond wrote:',
  'On Feb 23, 2011, at 10:22 AM , James Bond wrote:'  
].each do |ary|
  ary =~ regex
  puts "#{$1} #{$2} #{$3}"
end
# >> Feb 23, 2011 10:22 
# >> Feb 23, 2011 10:22 AM

I purposed didn't try to match on the months. Your sample strings look like quote headers from email messages. Those are very standard and generated by software, so you should see a lot of consistency in the format, allowing some simplification in the regex. If you can't trust those, then go with the matches on month name abbreviations to help ignore false-positive matches. The same things apply for the day, year, and time values.

The important thing in the regex is how to deal with the AM/PM when it's missing.




回答2:


maybe this

(On\s+(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+[12]\d{3},\s+at\s+\d{1,2}:\d{1,2}\s+(?:AM|PM)*,.*wrote:)

however, if you can be verify and be sure that only these lines are unique, you don't have to use a whole range of regex. Maybe it starts with "On" and ends with "wrote:" , your regex might then simple be /^On.*wrote:/




回答3:


Just use the question mark operator after any group you want to be optional, so in this case:

(?:(?:AM|PM) )?

Be sure to match the space as well, otherwise the strings without AM/PM need to include two spaces. The solution with (?:AM|PM)* would also match AMAMPM, so that's probably not what you want. But why do you match those group without creating backreferences? Aren't you going to use the values?

For info on backreferences: http://www.regular-expressions.info/brackets.html



来源:https://stackoverflow.com/questions/5130733/regex-to-match-a-string-with-optional-conditions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!