How to extract specific tables from html file using native powershell commands?

后端 未结 2 1535
梦谈多话
梦谈多话 2020-12-10 17:10

I make use of the PAL tool (https://pal.codeplex.com/) to generate HTML reports from perfmon logs within Windows. After PAL processes .blg files from perfmon it dumps the in

相关标签:
2条回答
  • 2020-12-10 18:01

    OK, this isn't thoroughly tested but works with your example table in PS 2.0 with IE11:

    # Parsing HTML with IE.
    $oIE = New-Object -ComObject InternetExplorer.Application
    $oIE.Navigate("file.html")
    $oHtmlDoc = $oIE.Document
    
    # Getting table by ID.
    $oTable = $oHtmlDoc.getElementByID("table6")
    
    # Extracting table rows as a collection.
    $oTbody = $oTable.childNodes | Where-Object { $_.tagName -eq "tbody" }
    $cTrs = $oTbody.childNodes | Where-Object { $_.tagName -eq "tr" }
    
    # Creating a collection of table headers.
    $cThs = $cTrs[0].childNodes | Where-Object { $_.tagName -eq "th" }
    $cHeaders = @()
    foreach ($oTh in $cThs) {
        $cHeaders += `
            ($oTh.childNodes | Where-Object { $_.tagName -eq "b" }).innerHTML
    }
    
    # Converting rows to a collection of PS objects exportable to CSV.
    $cCsv = @()
    foreach ($oTr in $cTrs) {
        $cTds = $oTr.childNodes | Where-Object { $_.tagName -eq "td" }
        # Skipping the first row (headers).
        if ([String]::IsNullOrEmpty($cTds)) { continue }
        $oRow = New-Object PSObject
        for ($i = 0; $i -lt $cHeaders.Count; $i++) {
            $oRow | Add-Member -MemberType NoteProperty -Name $cHeaders[$i] `
                -Value $cTds[$i].innerHTML
        }
        $cCsv += $oRow
    }
    
    # Closing IE.
    $oIE.Quit()
    
    # Exporting CSV.
    $cCsv | Export-Csv -Path "file.csv" -NoTypeInformation
    

    Honestly, I didn't aim for optimal code. It's just an example of how you could work with DOM objects in PS and convert them to PS objects.

    0 讨论(0)
  • 2020-12-10 18:11

    I see you accepted an answer but I thought I'd add a RegEx solution in here too. No COM objects needed for this one, and should be PSv2 friendly I'm pretty sure.

    $Path = 'C:\Path\To\File.html'
    [regex]$regex = "(?s)<TABLE ID=.*?</TABLE>"
    $tables = $regex.matches((GC C:\Temp\test.txt -raw)).groups.value
    ForEach($String in $tables){
        $table = $string.split("`n")
        $CurTable = @()
        $CurTableName = ([regex]'TABLE ID="([^"]*)"').matches($table[0]).groups[1].value
        $CurTable += ($table[1] -replace "</B></TH><TH><B>",",") -replace "</?(TR|TH|B)>"
        $CurTable += $table[2..($table.count-2)]|ForEach{$_ -replace "</TD><TD>","," -replace "</?T(D|R)>"}
        $CurTable | convertfrom-csv | export-csv "C:\Path\To\Output\$CurTableName.csv" -notype
    }
    

    That should output a CSV file for each table found. Such as table6.csv, table9.csv etc. If you wanted to output CSVs per HTML file you could wrap the entire thing in a ForEach loop like:

    ForEach($File in (Get-ChildItem "$Path\*.html")){
        Insert above code here
    }
    

    You would need to modify the $tables = line so that it was GC $file.fullname to that it would load up each file as it iterated through.

    Then just modify the Export-Csv to something like:

    $CurTable | convertfrom-csv | export-csv "C:\Path\To\Output\$($File.BaseName)\$CurTableName.csv" -notype
    

    So if you had Server01.html with 3 tables in it you would get a folder named Server01 with 3 CSV files in it, one for each table.

    0 讨论(0)
提交回复
热议问题