Parsing text usng Combine is not returning any results

空扰寡人 提交于 2019-12-14 02:29:30

问题


I am new to pyparsing. I am attempting to parse some text but don't really understand how pyparsing is behaving.

from pyparsing import *

number = Word(nums)
yearRange = Combine(number+"-"+number)
copyright = Literal("Copyright (C)")+yearRange+Literal("CA. All Rights Reserved.")
copyrightCombine = Combine(copyright)
date = Combine(Word(nums)+"/"+Word(nums)+"/"+Word(nums))
time = Combine(Word(nums)+":"+Word(nums)+":"+Word(nums))
dateTime = Combine(date+time)
pageNumber = Suppress(Literal("PAGE"))+number
pageLine = Word(nums)+"Copyright (C) 1986-2014 CA. All Rights Reserved."+Combine(Word(nums)+"/"+Word(nums)+"/"+Word(nums))+Combine(Word(nums)+":"+Word(nums)+":"+Word(nums))+pageNumber
pageLine2 = number+copyright+dateTime+pageNumber
pageLine3 = Word(nums)+copyright+Combine(Word(nums)+"/"+Word(nums)+"/"+Word(nums))+Combine(Word(nums)+":"+Word(nums)+":"+Word(nums))+pageNumber

test = "1  Copyright (C) 1986-2014 CA. All Rights Reserved.                                                07/05/17  10:58:56     PAGE  1241"
print(pageLine.searchString(test))
print(copyright.searchString(test))
print(copyrightCombine.searchString(test))
print(pageLine2.searchString(test))
print(pageLine3.searchString(test))

Output:

[['1', 'Copyright (C) 1986-2014 CA. All Rights Reserved.', '07/05/17', '10:58:56', '1241']]
[['Copyright (C)', '1986-2014', 'CA. All Rights Reserved.']]
[]
[]
[['1', 'Copyright (C)', '1986-2014', 'CA. All Rights Reserved.', '07/05/17', '10:58:56', '1241']]

I want to use the parser defined as pageLine2 for for some reason the parser copyrightCombine is not returning any results. It seems like when I'm trying to use Combine(), something causes the parse to not return the match.


回答1:


I figured out the behavior occurs because of the way Combine() works. It expects that there will not be any white space between tokens but can be overridden.

According to the documentation:

Combine - joins all matched tokens into a single string, using specified joinString (default joinString=""); expects all matching tokens to be adjacent, with no intervening whitespace (can be overridden by specifying adjacent=False in constructor)



来源:https://stackoverflow.com/questions/45090980/parsing-text-usng-combine-is-not-returning-any-results

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!