How to return a string if a re.findall finds no match

自古美人都是妖i 提交于 2021-01-28 07:26:50

问题


I am writing a script to take scanned pdf files and convert them into lines of text to enter into a database. I use re.findall to get matches from a list of regular expressions to get certain values from the tesseract extracted strings. I am having trouble when a regular expression can't find a match I want it to return "Error." So I can see that there is a problem.

I have tried a handful of if/else statements but I can't seem to get any to notice the None value.

from wand.image import Image as Img
import ghostscript
from PIL import Image
import pytesseract
import re
import os

def get_text_from_pdf(pendingpdf,pendingimg):
    with Img(filename=pendingpdf, resolution=300) as img:
        img.compression_quality = 99
        img.save(filename=pendingimg)
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
    extractedtext = pytesseract.image_to_string(Image.open(pendingimg))
    os.unlink(pendingimg)
    return extractedtext

def get_results(vendor,extracted_string,results):
    for v in vendor:
        pattern = re.compile(v)
        for match in re.findall(pattern,extracted_string):
            if type(match) is str:
                results.append(match)
            else:
                results.append("Error")
    return results

pendingpdf = r'J:\TBHscan07022019090315001.pdf'
pendingimg = 'Test1.jpg'
aggind = ["^(\w+)(?:.+)\n+3600",
          "Ticket: (nonsensewordstothrowerror)",
          "Ticket: \d+\s([0-9|/]+)",
          "Product: (\w+.+)\n",
          "Quantity: ([\d\.]+)",
          "Truck (\w+)"]
vendor = aggind
extracted_string = get_text_from_pdf(pendingpdf,pendingimg)
results = []

print(get_results(vendor,get_text_from_pdf(pendingpdf,pendingimg),results))

回答1:


You could do this in a single line:

results += re.findall(pattern, extracted_string) or ["Error"]

BTW, you get no benefit from compiling the pattern inside the vendor loop because you're only using it once.

Your function could also return the whole search result using a single list comprehension:

return [m for v in vendor for m in re.findall(v, extracted_string) or ["Error"]]

It is a bit weird that you would actually want to modify AND return the results list being passed as parameter. This may produce some unexpected side effects when you use the function.

Your "Error" flag may appear several times in the result list, and given that each pattern may return multiple matches, it will be hard to determine which pattern failed to find a value.

If you only want to signal an error when none of the vendor patterns match, you could use the or ["Error"] trick on whole result:

return [m for v in vendor for m in re.findall(v, extracted_string)] or ["Error"]



回答2:


With such an approach for match in re.findall(pattern,extracted_string):
if re.findall(...) won't find any matches - the for loop won't even run.

Save the result of matching into a variable beforehand, then - check with condition:

...
matches = re.findall(pattern, extracted_string)
if not matches:
    results.append("Error")
else:
    for match in matches:
        results.append(match)

Note, when iterating through results of re.findall(...) the check if type(match) is str: won't make sense as each matched item is a string anyway (otherwise - a more sophisticated analysis of string's content could have been implied).




回答3:


re.findall returns an empty list when there are no matches. So it should be as simple as:

result = re.findall(my_pattern, my_text)
if result:
    # Successful logic here
else:
    return "Error"



回答4:


You have

for match in re.findall(pattern,extracted_string):
        if type(match) is str:
            results.append(match)
        else:
            results.append("Error")

but re.findall() returns None when it doesn't find anything, so

for match in re.findall(pattern,extracted_string):

won't enter because match is None.

You need to check match is None outside of the for loop.



来源:https://stackoverflow.com/questions/56855558/how-to-return-a-string-if-a-re-findall-finds-no-match

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!