Cycle through webpages and copy data

前端 未结 2 1431
我在风中等你
我在风中等你 2021-01-20 00:26

I created this script for a friend that cycles through a real estate website and snags email address for her (for promotion). The site offers them freely, but it\'s inconve

2条回答
  •  灰色年华
    2021-01-20 00:55

    Although not complete, not optimal, not bugfree, this could help:

    ' VB Script Document
    option explicit
    
    Dim strResult: strResult = Wscript.ScriptName
    Dim numResult: numResult = 0
    Dim ii, IE, pageText, fso, ts, xLink, Links
    
      set fso = createobject("scripting.filesystemobject") 
      set ts = fso.opentextfile("d:\bat\files\28384650_webdump.txt",8,true) 
    
      set IE = createobject("internetexplorer.application") 
    
      'read first page
      IE.navigate "https://netforum.avectra.com/eweb/DynamicPage.aspx?Site=NEFAR&WebCode=IndResult&FromSearchControl=Yes&FromSearchControl=Yes"
      IE.Visible = True
    
    For ii = 1 to 3 '239
      ts.writeLine "-----------------" & ii
      strResult = strResult & vbNewLine & ii
    
      While IE.Busy
        Wscript.Sleep 100
      Wend
      While IE.ReadyState <> 4
        Wscript.Sleep 100
      Wend
      While IE.document.readystate <> "complete" 
          wscript.sleep 100
      Wend
      WScript.Sleep 100
    
      pageText = IE.document.body.innertext
      ts.writeLine pageText
    
      ' get sublinks and collect them in the 'strResult' variable
      Set Links = IE.document.getElementsByTagName("a")
      For Each xLink In Links
        If InStr(1, xLink.href, "WebCode=PrimaryContactInfo" _
          , vbTextCompare) > 0 Then
          If InStr(1, strResult, xLink.href, vbTextCompare) > 0 Then
          Else
            numResult = numResult + 1
            strResult = strResult & vbNewLine & xLink.href
          End If
        End If
      Next
    
      ' read a page of the 'ii' index
      IE.Navigate "javascript:window.__doPostBack('JumpToPage','" & ii+1 & "');"
      IE.Visible = True
    Next
    
      ts.writeLine "===========" & numResult & vbTab & strResult
      ts.close 
    
    Wscript.Echo "All site data copied! " _
        & numResult & vbNewline & strResult
    Wscript.Quit
    

    Explanation:

    • navigates to first page with usual http(s) address
    • navigates to next pages (of the ii+1 index) with javascript ... __doPostBack call (the same as if one fulfill Jump to Page field and click the GO button)
    • not complete: collects (indirect) links to Primary Contact Info webpages where e-mail addresses could be found without getting them
    • not optimal: keeps collecting text of pages visited
    • not bugfree:

      • works fine with freshly cleared MSIE temporary files, history and cookies; otherwise starts at an odd (last visited?) page of netforum.avectra.com
      • navigates to ii+1th page, so fails on the last one.

提交回复
热议问题