Why my code not correctly split every page in a scanned pdf?

前端未结

关注

 3  1432

旧时难觅i 2021-01-02 10:27

Update: Thanks to stardt whose script works! The pdf is a page of another one. I tried the script on the other one, and it also correctly spit each pdf page

3条回答

遥遥无期 (楼主)

2021-01-02 10:35

Your code assumes that p.mediaBox.lowerLeft is (0,0) but it is actually (0, 497)

This works for the file you provided:

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()
for i in range(input.getNumPages()):
    p = input.getPage(i)
    q = copy.copy(p)

    bl = p.mediaBox.lowerLeft
    ur = p.mediaBox.upperRight

    print >> sys.stderr, 'splitting page',i
    print >> sys.stderr, '\tlowerLeft:',p.mediaBox.lowerLeft
    print >> sys.stderr, '\tupperRight:',p.mediaBox.upperRight

    p.mediaBox.upperRight = (ur[0], (bl[1]+ur[1])/2)
    p.mediaBox.lowerLeft = bl

    q.mediaBox.upperRight = ur
    q.mediaBox.lowerLeft = (bl[0], (bl[1]+ur[1])/2)
    if i%2==0:
        output.addPage(q)
        output.addPage(p)
    else:
        output.addPage(p)
        output.addPage(q)

output.write(sys.stdout)

0 讨论(0)

查看其它3个回答