Parsing local HTML file using New-Object -ComObject “HTMLFile” broken?

最后都变了- 提交于 2019-12-05 13:03:57

You could try with an Internet Explorer COM object:

$ie = New-Object -COM 'InternetExplorer.Application'

$ie.Navigate("file://$($PWD.Path)/passwordreminder.html")
do {
  Start-Sleep -Milliseconds 100
} until ($ie.ReadyState -eq 4)

# do stuff

I don't have PowerShell v5, though, so I can't test. If HTMLFile is broken, this might be as well.

You can call the Navigate() method (and the loop waiting for it to complete loading the page) in an outer loop if you need to run it repeatedly.

$ie = New-Object -COM 'InternetExplorer.Application'

foreach (...) {
  $ie.Navigate("file://$($PWD.Path)/passwordreminder.html")
  do {
    Start-Sleep -Milliseconds 100
  } until ($ie.ReadyState -eq 4)

  # do stuff
}

This seems to work properly if you provide a UCS-2 byte array instead of a string:

$html = New-Object -ComObject "HTMLFile"
$src = Get-Content -path "./passwordreminder.html" -Raw
$src = [System.Text.Encoding]::Unicode.GetBytes($src)
try
{
    # This works in PowerShell 4
    $html.IHTMLDocument2_write($src)
}
catch
{
    # This works in PowerShell 5
    $html.write($src)
}

Solved by adding path reference and specifying the type of object

Add-Type -Path "C:\Program Files (x86)\Microsoft.NET\Primary Interop Assemblies\Microsoft.mshtml.dll"

$webpage = New-Object mshtml.HTMLDocumentClass

Here is the full code

$url = 'http://website'
$outFile = 'C:\content.txt'
$showCount = 10;

[net.httpwebrequest]$httpwebrequest = [net.webrequest]::create($url)
[net.httpWebResponse]$httpwebresponse = $httpwebrequest.getResponse()
$reader = new-object IO.StreamReader($httpwebresponse.getResponseStream())
$html = $reader.ReadToEnd()
$reader.Close()

Add-Type -Path "C:\Program Files (x86)\Microsoft.NET\Primary Interop Assemblies\Microsoft.mshtml.dll"


$webpage = New-Object mshtml.HTMLDocumentClass
$webpage.IHTMLDocument2_write($html)

$topicElements = $webpage.documentElement.getElementsByClassName('topic')

$time = (Get-Date).ToString("yyyy-MM-dd HH:mm:ss")
$content = '[www.hkgalden.com] [' + $time + '] '

$i = 0;
foreach ($topicElement in $topicElements) {
    $titleElement = $topicElement.getElementsByClassName('title')[0].getElementsByTagName('a')[0]
    $title = $titleElement.innerText

    $usernameElement = $topicElement.getElementsByClassName('username')[0]
    $username = $usernameElement.innerText

    $content += $username + ': ' + $title + ' // '
    $i++
    if ($i -gt $showCount) {
        break
    }
}
#$content
$content | Out-File -Encoding utf8 $outFile

This code snippet works by adding the .NET Framework's mshtml.HTMLDocumentClass type via the Add-Type -AssemblyName cmdlet.

Add-Type -AssemblyName "Microsoft.mshtml"
$html = New-Object -ComObject "HTMLFile"
$svc = Get-Service | Select-Object Name, Status | ConvertTo-Html
$svc | Out-File -FilePath .\report.html -Force
$htmlFile = Get-Content -Path .\report.html -Raw
$html.IHTMLDocument2_write($htmlFile)

The $html variable contains the "HTMLFile" object reference with all its methods and properties.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!