问题
I have a folder which have multiple subfolders and images, and I want use Baidu OCR to extract texts in images files in for each subfolders and write to one excel (need split contents) file for each subfolder named by subfolders name:
folder
        \ sub1\file0.jpg
        \ sub1\file1.jpg
        \ sub1\file2.png
        .
        .
        .
        \ sub2\xxx.png
        \ sub2\yyy.jpg
        \ sub2\zzz.png
        .
        .
        .
Expected results:
folder
        \ sub1\file0.jpg
        \ sub1\file1.jpg
        \ sub1\file2.png
        \ sub1\sub1.xlsx
        .
        .
        .
        \ sub2\xxx.png
        \ sub2\yyy.jpg
        \ sub2\zzz.png
        \ sub2\sub2.xlsx
        .
        .
        .
Here is what I have tried but I don't know how to realize the whole process. Please share your insights and ideas. Thanks.
Step1: iterate all subfolders and image files:
import os
dir_name = "D:/folder"     
for root, dirs, files in os.walk(dir_name, topdown=False):
    for file in files:
        print(file)
        print(root)
        print(dirs)
Step 2: OCR one image
from aip import AipOcr
APP_ID = '<APP_ID>'
API_KEY = '<APP_KEY>'
SECRET_KEY = '<APP_SECRET>'
aipOcr = AipOcr(APP_ID, API_KEY, SECRET_KEY)
filePath = "test.jpg"
def get_file_content(filePath):
    with open(filePath, 'rb') as fp:
        return fp.read()
options = {
    'detect_direction': 'true',
    'language_type': 'CHN_ENG',
    'recognize_granularity': 'big',
    'vertexes_location': 'true',
    #'probability': 'true',
    #'detect_language': 'true'
}
result = aipOcr.basicAccurate(get_file_content(filePath), options)
print(result)
df = DataFrame(result)
writer = ExcelWriter('test.xlsx')
df.to_excel(writer, index = False)
writer.save()
Step3: write an excel file for each subfolder (thanks to @Florian H)
Make empty file for each subfolder using subfolders' name in Python
from os import listdir
from os.path import isfile, join
mypath = "D:/"
def write_files(path):
    folders = [f for f in listdir(path) if not isfile(join(path, f))]
    if len(folders) == 0:
        #Writing the actual File
        open(path+"/"+path.split("/")[-1]+".xlsx", "w+")
    else:
        for folder in folders:
            write_files(path+"/"+folder)
write_files(mypath)
来源:https://stackoverflow.com/questions/54252754/iterate-all-subfolders-and-ocr-images-in-python