Using integers/dates as terminals in NLTK parser

我只是一个虾纸丫 提交于 2019-12-04 05:52:32

For this to work, you'll need to tokenize the date so that each digit and slash is a separate token.

from nltk.parse.earleychart import EarleyChartParser
import nltk

grammar = nltk.parse_cfg("""
DATE -> MONTH SEP DAY SEP YEAR
SEP -> "/"
MONTH -> DIGIT | DIGIT DIGIT
DAY -> DIGIT | DIGIT DIGIT
YEAR -> DIGIT DIGIT DIGIT DIGIT
DIGIT -> '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'
""")

parser = EarleyChartParser(grammar)
print parser.parse(["1", "/", "1", "0", "/", "1", "9", "8", "7"])

The output is:

(DATE
  (MONTH (DIGIT 1))
  (SEP /)
  (DAY (DIGIT 1) (DIGIT 0))
  (SEP /)
  (YEAR (DIGIT 1) (DIGIT 9) (DIGIT 8) (DIGIT 7)))

This also affords some flexibility in the form of allowing dates and months to be single-digit.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!