Matching dates with regular expressions in Python?

后端 未结 6 1775
-上瘾入骨i
-上瘾入骨i 2020-12-31 17:56

I know that there are similar questions to mine that have been answered, but after reading through them I still don\'t have the solution I\'m looking for.

Using Pyth

6条回答
  •  一向
    一向 (楼主)
    2020-12-31 18:12

    Here's one way to make a regular expression that will match any date of your desired format (though you could obviously tweak whether commas are optional, add month abbreviations, and so on):

    years = r'((?:19|20)\d\d)'
    pattern = r'(%%s) +(%%s), *%s' % years
    
    thirties = pattern % (
         "September|April|June|November",
         r'0?[1-9]|[12]\d|30')
    
    thirtyones = pattern % (
         "January|March|May|July|August|October|December",
         r'0?[1-9]|[12]\d|3[01]')
    
    fours = '(?:%s)' % '|'.join('%02d' % x for x in range(4, 100, 4))
    
    feb = r'(February) +(?:%s|%s)' % (
         r'(?:(0?[1-9]|1\d|2[0-8])), *%s' % years, # 1-28 any year
         r'(?:(29), *((?:(?:19|20)%s)|2000))' % fours)  # 29 leap years only
    
    result = '|'.join('(?:%s)' % x for x in (thirties, thirtyones, feb))
    r = re.compile(result)
    print result
    

    Then we have:

    >>> r.match('January 30, 2001') is not None
    True
    >>> r.match('January 31, 2001') is not None
    True
    >>> r.match('January 32, 2001') is not None
    False
    >>> r.match('February 32, 2001') is not None
    False
    >>> r.match('February 29, 2001') is not None
    False
    >>> r.match('February 28, 2001') is not None
    True
    >>> r.match('February 29, 2000') is not None
    True
    >>> r.match('April 30, 1908') is not None
    True
    >>> r.match('April 31, 1908') is not None
    False
    

    And what is this glorious regexp, you may ask?

    >>> print result
    (?:(September|April|June|November) +(0?[1-9]|[12]\d|30), *((?:19|20)\d\d))|(?:(January|March|May|July|August|October|December) +(0?[1-9]|[12]\d|3[01]), *((?:19|20)\d\d))|(?:February +(?:(?:(0?[1-9]|1\d|2[0-8]), *((?:19|20)\d\d))|(?:(29), *((?:(?:19|20)(?:04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96))|2000))))
    

    (I initially intended to do a tongue-in-cheek enumeration of the possible dates, but I basically ended up hand-writing that whole gross thing except for the multiples of four, anyway.)

提交回复
热议问题