Tesseract OCR on AWS Lambda via virtualenv

前端未结

关注

 4  1358

终归单人心 2020-11-30 22:36

I have spent all week attempting this, so this is a bit of a hail mary.

I am attempting to package up Tesseract OCR into AWS Lambda running on Python (I am also usin

4条回答

不思量自难忘° (楼主)

2020-11-30 23:19
Generate zip files using shell scripts to compile code Tesseract 4 for Python 3.7

I have been struggling through this issue for a few days trying to get Tesseract 4 to work on a Python 3.7 Lambda function. Finally I found this article and GitHub which describes how to generate zip files for tesseract, pytesseract, opencv, and pillow using shell scripts that generate the necessary .zip files using Docker images on EC2! This process takes less than 20 minutes using these steps and is reliably reproducible.

Summarized Steps:

Start an Amazon Linux EC2 instance (t2 micro will do just fine)
```
sudo yum update
sudo yum install git-core -y
sudo yum install docker -y
sudo service docker start
sudo usermod -a -G docker ec2-user #allows ec2-user to call docker
```
After running the 5th command you will need to logout and log back in for the change to take effect.
```
git clone https://github.com/amtam0/lambda-tesseract-api.git
cd lambda-tesseract-api/
bash build_tesseract4.sh #takes a few minutes
bash build_py37_pkgs.sh
```
This will generate .zip files for tesseract, pytesseract, pillow, and opencv. In order to use with lambda you need to complete two more steps.
1. Create Lambda layers, one for each zip file, and attach the layers to your Lambda function.
2. Create an Environment Variable. Key : PYTHONPATH and Value : /opt/
(Note: you will probably need to increase your Memory allocation and Timeout)

At this point you are all set to upload your code and start using Tesseract on AWS Lambda! Refer back to the Medium article for a test script.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

Tesseract OCR on AWS Lambda via virtualenv

Generate zip files using shell scripts to compile code Tesseract 4 for Python 3.7