by combining base url getting text out of image in python using scrapy?

纵然是瞬间 提交于 2019-12-27 02:35:09

问题


i tried this code :

src1 = "https://hms.harvard.edu/"<br/>
src = response.css('div.person-line > div > 
      img::attr("src")').extract_first()<br/>
src = sites/default/files/hms-faculty-emails/BX0UVXkP.jpg <br/>
import urlparse <br/>
urlparse.urljoin(src1, src)<br/>
https://hms.harvard.edu/sites/default/files/hms-faculty-emails/BX0UVXkP.jpg<br/>
src2 = urlparse.urljoin(src1,src)<br/>
email = pytesseract.image_to_string(Image.open(src2))<br/>

i'm getting this error

ioerror errno 22 invalid mode ('rb') or filename

how to get email text out of text image..can any one help please?


回答1:


You should use io.BufferIO buffer, because you call function image_to_string with http path. You need write code like this:

def get_text(src):
    response = urlopen(src)
    buffer = io.BytesIO(response.read())
    return pytesseract.image_to_string(Image.open(buffer))


来源:https://stackoverflow.com/questions/46169963/by-combining-base-url-getting-text-out-of-image-in-python-using-scrapy

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!