Parse measurements (multiple dimensions) from a given string in Python 3

只愿长相守 提交于 2021-02-19 08:30:06

问题


I'm aware of this post and this library but they didn't help me with these specific cases below. How can I parse measurements like below:

I have strings like below;

"Square 10 x 3 x 5 mm"
"Round 23/22; 24,9 x 12,2 x 12,3"
"Square 10x2"
"Straight 10x2mm"

I'm looking for a Python package or some way to get results like below;

>>> a = amazing_parser.parse("Square 10 x 3 x 5 mm")
>>> print(a)
10 x 3 x 5 mm

Likewise;

>>> a = amazing_parser.parse("Round 23/22; 24,9x12,2")
>>> print(a)
24,9 x 12,2

I also tried to use "named entity recognition" using "ner_ontonotes_bert_mult" model. But the results were like below:

>>> from deeppavlov import configs, build_model
>>> ner_model = build_model(configs.ner.ner_ontonotes_bert_mult, download=True)
>>> print(ner_model(["Round 23/22; 24,9 x 12,2 x 12,3"]))
<class 'list'>: [[['Round', '23', '/', '22', ';', '24', ',', '9', 'x', '12', ',', '2', 'x', '12', ',', '3']], [['O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL']]]

I have no idea how to extract those measurements from this list properly.

I also found this regex:

>>>re.findall("(\d+(?:,\d+)?) x (\d+(?:,\d+)?)(?: x (\d+(?:,\d+)?))?", "Straight 10 x 2 mm")
<class 'list'>: [('10', '2', '')]

But it does leave an empty value in the resulting list if the input contains 2 dimensions and it doesn't work if there is no whitespace between numbers and "x"s. I'm not good with regex...


回答1:


For the given examples, you might use:

(?<!\S)\d+(?:,\d+)? ?x ?\d+(?:,\d+)?(?: ?x ?\d+(?:,\d+)?)*

In parts

  • (?<!\S) Negative lookbehind, assert what is on the left is not a non whitespace char
  • \d+(?:,\d+)? Match 1+ digits and optionally a , and 1+ digits
  • ?x ? Match x between optional spaces
  • \d+(?:,\d+)? Match 1+ digits and optionally a , and 1+ digits
  • (?: Non capturing group
    • ?x ?\d+Matchx` between optional spaces and 1+ digits
    • (?:,\d+)? Optionally match a , and 1+ digits
  • )* Close non capturing group and repeat 0+ times

Regex demo | Python demo

For example

import re

regex = r"(?<!\S)\d+(?:,\d+)? ?x ?\d+(?:,\d+)?(?: ?x ?\d+(?:,\d+)?)*"
test_str = ("Square 10 x 3 x 5 mm\n"
    "Round 23/22; 24,9 x 12,2 x 12,3\n"
    "Square 10x2\n"
    "Straight 10x2mm\n"
    "Round 23/22; 24,9x12,2")
result = re.findall(regex, test_str)
print(result)

Output

['10 x 3 x 5', '24,9 x 12,2 x 12,3', '10x2', '10x2', '24,9x12,2']


来源:https://stackoverflow.com/questions/58097949/parse-measurements-multiple-dimensions-from-a-given-string-in-python-3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!