Excel VBA to return Page Count from protected PDF file

微笑、不失礼 提交于 2020-07-24 05:48:53

问题


I need to retrieve the number of pages in PDF files (with security), using Excel VBA.

The following code works when there is no security enabled in the PDF file:

Sub PDFandNumPages()

   Dim Folder As Object
   Dim file As Object
   Dim fso As Object
   Dim iExtLen As Integer, iRow As Integer
   Dim sFolder As String, sExt As String
   Dim sPDFName As String

   sExt = "pdf"
   iExtLen = Len(sExt)
   iRow = 1
   ' Must have a '\' at the end of path
   sFolder = "C:\test\"

   Set fso = CreateObject("Scripting.FileSystemObject")

   If sFolder <> "" Then
      Set Folder = fso.GetFolder(sFolder)
      For Each file In Folder.Files
         If Right(file, iExtLen) = sExt Then
            Cells(iRow, 1).Value = file.Name
            Cells(iRow, 2).Value = pageCount(sFolder & file.Name)
            iRow = iRow + 1
         End If
      Next file
   End If

End Sub

However, if there is any kind of security enabled, then the code is unable to extract the page numbers & returns Zero pages.

Property of PDF file with some kind of security

Note: There is no Password protection to open these PDF files, it only has some security features enabled to prevent modification of the PDF.

Sample PDF with security enabled are available on following Google Drive link: Google Drive PDF with security

My requirement is to tweak the code so that the page numbers in PDF files are displayed whether there is any security or not.

For Python, I found a similar question & solution at this page, however it uses Python libraries. If possible, I'd like an expert on VBA side to suggest how I can replicate this in VBA


回答1:


If the PDF document doesn't have a Permissions Password setup (or if you know the password), you can modify the document restrictions such as page extraction.

  • Open the document manually with a proprietary or 3rd-party editor
  • Go FileProperties
  • In the Security tab, choose Show Details…

  • To make changes to the PDF’s restrictions, go View → Tools → Protection
  • In the Tools Pane, click Encrypt and in the Protection section, choose Remove Security.
  • If there is a Permissions Password, you will need to enter it here.

The permissions will now change to "Allowed".

(Source)


The "Hacky" Method:

If the above method doesn't work for you, there's a workaround that may do the trick. You won't be unlocking the file itself per se, but you can generate an unlocked equivalent that can be edited and manipulated to your heart's content.

  • Open the document that you wish to unlock in Adobe Acrobat Reader
  • Click File and then Print.
  • In the Printers list, select "Microsoft XPS Document Writer" and then click Print.

If you try to use Adobe's PDF printer driver, it will detect that you are attempting to export a secured PDF to a fresh file and it will refuse to continue. Even third-party PDF print drivers tend to choke on such files.

However, by using the XPS Document Writer, you effectively circumvent that check entirely, leaving yourself with an XPS output.

  • Open the new XPS file you have just created and simply repeat the printing process, only this time printing to PDF format.

If you do not have a PDF printer to select in your list of printers, there are various freeware options available online (such as CutePDF Writer) which will allow you to set up a virtual printer that generates PDFs. (Source)


Edit: (Alternate Answer)

Returning the Page Count of a PDF File

To find the total number of pages in a PDF file in VBA, you could open it as a binary file and then parse the file looking for "/Count", and then reading the number that follows.

Below is an example that works on your sample files (6 & 8 pages), but may need "tweaking" depending on the structure of the individual PDF files on hand.

(In some cases, you may be better off to count the individual occurrences of the "/Page" or "/Pages" tags, although that number may need to be reduced by 1 or 2.)

Note that this is not a very efficient way of parsing binaries, so large files could take a while to parse.

Sub Get_PDF_Page_Count()
'scrape PDF file as binary, looking for "/Count" tag, then return the number following it
    Const fName = "C:\your_path_here\1121-151134311859-64.pdf"
    Dim bytTemp As Byte, fileStr As String, c As Long, p1 As Long, p2 As Long

    'open PDF as binary file
    Debug.Print "Reading File '" & fName & "'";
    Open fName For Binary Access Read As #1

    'read file into string
    Do While Not EOF(1)

        'parse PDF file, one byte at a time
        Get #1, , bytTemp
        c = c + 1
        fileStr = fileStr & Chr(bytTemp)

        'check every 20000 characters, if the tag was found yet
        If c / 20000 = c \ 20000 Then
            If InStr(fileStr, "/Count") > 0 Then Exit Do
            ' not found yet, keep going
            Debug.Print ".";
            DoEvents
        End If
    Loop

    'close file
    Close #1
    Debug.Print

    'check if tag was found
    If InStr(fileStr, "/Count") = 0 Then
        Debug.Print "'/Count' tag not found in file: " & fName
        Exit Sub
    End If

    'return page count
    p1 = InStr(fileStr, "/Count")
    p1 = InStr(p1, fileStr, " ") + 1
    p2 = InStr(p1, fileStr, vbLf)
    Beep
    Debug.Print Val(Mid(fileStr, p1, p2 - p1)) & " pages in file: " & fName

End Sub


来源:https://stackoverflow.com/questions/48484855/excel-vba-to-return-page-count-from-protected-pdf-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!