Pyparsing - where order of tokens in unpredictable

此生再无相见时 提交于 2019-12-04 01:19:06

问题


I want to be able to pull out the type and count of letters from a piece of text where the letters could be in any order. There is some other parsing going on which I have working, but this bit has me stumped!

input -> result
"abc" -> [['a',1], ['b',1],['c',1]]
"bbbc" -> [['b',3],['c',1]]
"cccaa" -> [['a',2],['c',3]]

I could use search or scan and repeat for each possible letter, but is there a clean way of doing it?

This is as far as I got:

from pyparsing import *


def handleStuff(string, location, tokens):

        return [tokens[0][0], len(tokens[0])]


stype = Word("abc").setParseAction(handleStuff)
section =  ZeroOrMore(stype("stype"))


print section.parseString("abc").dump()
print section.parseString("aabcc").dump()
print section.parseString("bbaaa").dump()

回答1:


I wasn't clear from your description whether the input characters could be mixed like "ababc", since in all your test cases, the letters were always grouped together. If the letters are always grouped together, you could use this pyparsing code:

def makeExpr(ch):
    expr = Word(ch).setParseAction(lambda tokens: [ch,len(tokens[0])])
    return expr

expr = Each([Optional(makeExpr(ch)) for ch in "abc"])

for t in tests:
    print t,expr.parseString(t).asList()

The Each construct takes care of matching out of order, and Word(ch) handles the 1-to-n repetition. The parse action takes care of converting the parsed tokens into the (character, count) tuples.




回答2:


One solution:

text = 'sufja srfjhvlasfjkhv lasjfvhslfjkv hlskjfvh slfkjvhslk'
print([(x,text.count(x)) for x in set(text)])

No pyparsing involved, but it seems like overkill.




回答3:


I like Lennart's one-line solution.

Alex mentions another great option if you're using 3.1

Yet another option is collections.defaultdict:

>>> from collections import defaultdict
>>> mydict = defaultdict(int)
>>> for c in 'bbbc':
...   mydict[c] += 1
...
>>> mydict
defaultdict(<type 'int'>, {'c': 1, 'b': 3})



回答4:


If you want a pure-pyparsing approach, this feels about right:

from pyparsing import *

# lambda to define expressions
def makeExpr(ch):
    expr = Literal(ch).setResultsName(ch, listAllMatches=True)
    return expr

expr = OneOrMore(MatchFirst(makeExpr(c) for c in "abc"))
expr.setParseAction(lambda tokens: [[a,len(b)] for a,b in tokens.items()])


tests = """\
abc
bbbc
cccaa
""".splitlines()

for t in tests:
    print t,expr.parseString(t).asList()

Prints:

abc [['a', 1], ['c', 1], ['b', 1]]
bbbc [['c', 1], ['b', 3]]
cccaa [['a', 2], ['c', 3]]

But this starts to get into an obscure code area, since it relies on some of the more arcane features of pyparsing. In general, I like frequency counters that use defaultdict (haven't tried Counter yet), since it's pretty clear just what you are doing.




回答5:


pyparsing apart -- in Python 3.1, collections.Counter makes such counting tasks really easy. A good version of Counter for Python 2 can be found here.



来源:https://stackoverflow.com/questions/2134416/pyparsing-where-order-of-tokens-in-unpredictable

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!