VBA - IE Automation - save as PDF isn't working

后端 未结 1 713
面向向阳花
面向向阳花 2020-12-12 07:04

I\'m trying to automatically download PDFs from a website (job posting) thanks to a IE automation in VBA and for some reason I don\'t manage to generate a single PDF.

<
相关标签:
1条回答
  • 2020-12-12 07:47

    Shadow DOM and invalid link generation:

    The initial job page automated clicking on the target href doesn't generate a viable page link. This is presumably because the important stuff actually happens server side.

    Target href:

    You can click the actual download button on this page

    Download button:

    This launches a new window which is why Selenium is great. Selenium has methods to switch to this new Window. Otherwise, you can use the FindWindow methods I detail later in the answer for finding the Save As window.

    In this new window you cannot interact with the buttons in the way you can normally when scraping as the required content is not available via the DOM. If you examine closely you will see the pdf button is in shadow-root i.e. where you cannot access. This is a design choice. I do need to investigate this possibility (selecting through the shadow DOM using '/deep/' combinator) at some point but I don't think it holds true in VBA.

    Download button in Shadow root:


    Mimicking keyboard actions:

    I am using selenium basic VBA wrapper and APIs to mimic the actions on screen to save as pdf using the Save As Window (see image at very bottom) . Particularly making use of Save keyboard shortcut via SendKeys. This works. I used Spy++ to check the Window tree structure and check Window Class names and Titles.

    I use SendKeys to automate the opening of the Save As dialog for the pdf. I then descend the Window tree structure to get handles on the ComboBox where the file name is entered, so I can send a message i.e. file name to it, and on the Save button so I can click it. You may need a longer wait to ensure download goes through correctly. This bit is a little buggy in my opinion and I hope to improve.

    Window Structure via Spy++

    It is fairly robust. I used Selenium Basic for the ease of working with iframes and getting round same origin policy problems. With IE you cannot simply grab the src link of the iframe and happily navigate onto the page for the pdf print from the original add. What you can do, I believe, is issue an initial XMLHTTP request and grab the src attribute value i.e. link. Then pass that src link to IE and then carry on as shown below for the Windows handling parts.

    With more time I could add the IE version in and will look at a more robust method, than explicit wait time adding, for monitoring for file download before quitting the IE instance. Likely along the lines of this (As stated in one of the answers: Use SetWindowsHookEx to set up a WH_SHELL hook and look for the HSHELL_WINDOWCREATED event.)


    Notes:

    1. This is written for 64 bit. 32 Bit remove PtrSafe. You could switch LongPtr for Longbut I think it remains compatible.
    2. Huge thanks to @ErikvonAsmuth for his enormous patience in going through the APIs with me. Take a look at his excellent answer here for working with Windows.

    VBA:

    Option Explicit
    
    Declare PtrSafe Function SendMessageW Lib "User32" (ByVal hWnd As LongPtr, ByVal wMsg As LongPtr, ByVal wParam As LongPtr, ByVal lParam As LongPtr) As LongPtr
    
    Declare PtrSafe Function FindWindowExW Lib "User32" (ByVal hWndParent As LongPtr, _
                                                         Optional ByVal hwndChildAfter As LongPtr, Optional ByVal lpszClass As LongPtr, _
                                                         Optional ByVal lpszWindow As LongPtr) As LongPtr
    
    Public Declare PtrSafe Function FindWindowW Lib "User32" (ByVal lpClassName As LongPtr, Optional ByVal lpWindowName As LongPtr) As LongPtr
    
    Public Const WM_SETTEXT = &HC
    Public Const BM_CLICK = &HF5
    
    Public Sub GetInfo()
        Dim d As WebDriver, keys As New Selenium.keys
        Const MAX_WAIT_SEC As Long = 5
        Dim t As Date
    
        Set d = New ChromeDriver
        Const URL = "https://www.recrutement.banque-france.fr/detail-offre/charge-de-recrutement-confirme-h-f-2037343/"
        With d
            .start "Chrome"
            .get URL
            .SwitchToFrame .FindElementById("altiframe")
            .FindElementById("btn-pdf").Click
            .SwitchToNextWindow
            .SendKeys keys.Control, "s"
    
            Dim str1 As String, cls As String, name As String
            Dim ptrSaveWindow As LongPtr
    
            str1 = "#32770" & vbNullChar
    
            t = Timer
            Do
                DoEvents
                ptrSaveWindow = FindWindowW(StrPtr(str1))
                If Timer - t > MAX_WAIT_SEC Then Exit Do
            Loop While ptrSaveWindow = 0
    
            Dim duiViewWND As LongPtr, directUIHWND As LongPtr
            Dim floatNotifySinkHWND As LongPtr, comboBoxHWND As LongPtr, editHWND As LongPtr
    
    
            If Not ptrSaveWindow > 0 Then Exit Sub
    
            duiViewWND = FindWindowExW(ptrSaveWindow, 0&)
    
            If Not duiViewWND > 0 Then Exit Sub
    
            directUIHWND = FindWindowExW(duiViewWND, 0&)
    
            If Not directUIHWND > 0 Then Exit Sub
    
            floatNotifySinkHWND = FindWindowExW(directUIHWND, 0&)
    
            If Not floatNotifySinkHWND > 0 Then Exit Sub
    
            comboBoxHWND = FindWindowExW(floatNotifySinkHWND, 0&)
    
            If Not comboBoxHWND > 0 Then Exit Sub
    
            editHWND = FindWindowExW(comboBoxHWND, 0&)
    
            If Not editHWND > 0 Then Exit Sub
    
            Dim msg As String
            msg = "myTest.pdf" & vbNullChar
    
            SendMessageW editHWND, WM_SETTEXT, 0, StrPtr(msg)
    
            .SendKeys keys.Control, "s"
    
            Dim ptrSaveButton As LongPtr
            cls = "Button" & vbNullChar
            name = "&Save" & vbNullChar
    
            ptrSaveButton = FindWindowExW(ptrSaveWindow, 0, StrPtr(cls), StrPtr(name))
    
            SendMessageW ptrSaveButton, BM_CLICK, 0, 0
    
            Application.Wait Now + TimeSerial(0, 0, 4)
    
            .Quit
        End With
    End Sub
    

    Save As Dialog Window:


    References:

    1. Shadow DOM
    2. Using shadow DOM - Developer Mozilla pages.

    Project references:

    1. Selenium Type Library

      `

    0 讨论(0)
提交回复
热议问题