python regex match optional square brackets

断了今生、忘了曾经 提交于 2019-12-22 08:13:05

问题


I have the following strings:

1 "R J BRUCE & OTHERS V B J & W L A EDWARDS And Ors CA CA19/02 27 February 2003",     
2 "H v DIRECTOR OF PROCEEDINGS [2014] NZHC 1031 [16 May 2014]",  
3 '''GREGORY LANCASTER AND JOHN HENRY HUNTER V CULLEN INVESTMENTS LIMITED AND  
ERIC JOHN WATSON CA CA51/03 26 May 2003''' 

I am trying to find a regular expression which matches all of them. I don't know how to match optional square brackets around the date at the end of the string eg [16 May 2014].

casename = re.compile(r'(^[A-Z][A-Za-z\'\(\) ]+\b[v|V]\b[A-Za-z\'\(\) ]+(.*?)[ \[ ]\d+    \w+ \d\d\d\d[\] ])', re.S) 

The date regex at the end only matches cases with dates in square bracket but not the ones without.

Thank to everybody who answered. @Matt Clarkson what I am trying to match is a judicial decision 'handle' in a much larger text. There is a large variation within those handles, but they all start at the beginning of a line have 'v' for versus between the party names and a date at the end. Mostly the names of the parties are in capital but not exclusively. I am trying to have only one match per document and no false positives.


回答1:


I got all of them to match using this (You'll need to add the case-insensitive flag):

(^[a-z][a-z\'&\(\) ]+\bv\b[a-z&\'\(\) ]+(?:.*?) \[?\d+ \w+ \d{4}\]?)

Regex Demo

Explanation:

  • ( Begin capture group
    • [a-z\'&\(\) ]+ Match one or more of the characters in this group
    • \b Match a word boundary
    • v Match the character 'v' literally
    • \b Match a word boundary
    • [a-z&\'\(\) ]+ Match one or more of the characters in this group
    • (?: Begin non-capturing group
      • .*? Match anything
    • ) End non-capturing group
    • \[?\d+ \w+ \d{4}\]? Match a date, optionally surrounded by brackets
  • ) End capture group



回答2:


How to make Square brackets optional, can be achieved like this:

[\[]* with the * it makes the opening [ optional.

A few recommendations if I may:

  • This \d\d\d\d could be also expressed like this as well \d{4}

  • [v|V] in regex what is inside the [] is already one or other | is not necessary [vV]

And here is what an online demo




回答3:


Using your regex and input strings, it looks like you will match only the 2nd line (if you get rid of the '^' at the beginning of the regex. I've added inline comments to each section of the regular expression you provided to make it more clear.

Can you indicate what you are trying to capture from each line? Do you want the entire string? Only the word immediately preceding the lone letter 'v'? Do you want the date captured separately?

Depending on the portions that you wish to capture, each section can be broken apart into their respective match groups: regex101.com example. This is a little looser than yours (capturing the entire section between quotation marks instead of only the single word immediately preceding the lone 'v'), and broken apart to help readability (each "group" on its own line).

This example also assumes the newline is intentional, and supports the newline component (warning: it COULD suck up more than you intend, depending on whether the date at the end gets matched or not).



来源:https://stackoverflow.com/questions/25510289/python-regex-match-optional-square-brackets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!