I make use of the PAL tool (https://pal.codeplex.com/) to generate HTML reports from perfmon logs within Windows. After PAL processes .blg files from perfmon it dumps the in
OK, this isn't thoroughly tested but works with your example table in PS 2.0 with IE11:
# Parsing HTML with IE.
$oIE = New-Object -ComObject InternetExplorer.Application
$oIE.Navigate("file.html")
$oHtmlDoc = $oIE.Document
# Getting table by ID.
$oTable = $oHtmlDoc.getElementByID("table6")
# Extracting table rows as a collection.
$oTbody = $oTable.childNodes | Where-Object { $_.tagName -eq "tbody" }
$cTrs = $oTbody.childNodes | Where-Object { $_.tagName -eq "tr" }
# Creating a collection of table headers.
$cThs = $cTrs[0].childNodes | Where-Object { $_.tagName -eq "th" }
$cHeaders = @()
foreach ($oTh in $cThs) {
$cHeaders += `
($oTh.childNodes | Where-Object { $_.tagName -eq "b" }).innerHTML
}
# Converting rows to a collection of PS objects exportable to CSV.
$cCsv = @()
foreach ($oTr in $cTrs) {
$cTds = $oTr.childNodes | Where-Object { $_.tagName -eq "td" }
# Skipping the first row (headers).
if ([String]::IsNullOrEmpty($cTds)) { continue }
$oRow = New-Object PSObject
for ($i = 0; $i -lt $cHeaders.Count; $i++) {
$oRow | Add-Member -MemberType NoteProperty -Name $cHeaders[$i] `
-Value $cTds[$i].innerHTML
}
$cCsv += $oRow
}
# Closing IE.
$oIE.Quit()
# Exporting CSV.
$cCsv | Export-Csv -Path "file.csv" -NoTypeInformation
Honestly, I didn't aim for optimal code. It's just an example of how you could work with DOM objects in PS and convert them to PS objects.
I see you accepted an answer but I thought I'd add a RegEx solution in here too. No COM objects needed for this one, and should be PSv2 friendly I'm pretty sure.
$Path = 'C:\Path\To\File.html'
[regex]$regex = "(?s)<TABLE ID=.*?</TABLE>"
$tables = $regex.matches((GC C:\Temp\test.txt -raw)).groups.value
ForEach($String in $tables){
$table = $string.split("`n")
$CurTable = @()
$CurTableName = ([regex]'TABLE ID="([^"]*)"').matches($table[0]).groups[1].value
$CurTable += ($table[1] -replace "</B></TH><TH><B>",",") -replace "</?(TR|TH|B)>"
$CurTable += $table[2..($table.count-2)]|ForEach{$_ -replace "</TD><TD>","," -replace "</?T(D|R)>"}
$CurTable | convertfrom-csv | export-csv "C:\Path\To\Output\$CurTableName.csv" -notype
}
That should output a CSV file for each table found. Such as table6.csv, table9.csv etc. If you wanted to output CSVs per HTML file you could wrap the entire thing in a ForEach loop like:
ForEach($File in (Get-ChildItem "$Path\*.html")){
Insert above code here
}
You would need to modify the $tables =
line so that it was GC $file.fullname
to that it would load up each file as it iterated through.
Then just modify the Export-Csv to something like:
$CurTable | convertfrom-csv | export-csv "C:\Path\To\Output\$($File.BaseName)\$CurTableName.csv" -notype
So if you had Server01.html with 3 tables in it you would get a folder named Server01 with 3 CSV files in it, one for each table.