Parse ATOM rss feed and remove html tags

后端 未结 2 645
难免孤独
难免孤独 2021-01-26 15:20

am developing this code using powershell. I need to be able to extract the html tags.

  Invoke-WebRequest -Uri \'https://psu.box.com/shared/static/jf36ohodxnw7oe         


        
2条回答
  •  無奈伤痛
    2021-01-26 16:00

    You should be able to use the following script. It makes use of the HTMLFile com object.

      Invoke-WebRequest -Uri 'https://*.rss' -  OutFile C:\*.rss
      [xml]$Content = Get-Content C:\*.rss -Raw
      $Regex = '(?s)SE1046.*?Description := "(?.*?)"'
    
     If ($Content -match $Regex) {
          "Description is '$($Matches['Description'])'"
          # do something here with $Matches['Description']
        }
     Else {
        "No match."
          }
       $Feed = $Content.rss.channel
     ForEach ($msg in $Feed.Item){
    
    
         $ParseData = $msg.description
        ForEach ($Datum in $ParseData){
         If ($Datum -like "Title"){[int]$Upvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf
         If ($Datum -like "comments"){[int]$Downvote = ($Datum).split(' ') | Select-Object -First 1}    #EndIf
        }#EndForEach     
    
        $HTML = New-Object -ComObject "HTMLFile"
        $HTML.IHTMLDocument2_write($ParseData.InnerText)
    
         [PSCustomObject]@{
         'LastUpdated' = [datetime]$msg.pubDate
         'Title' = $msg.title
         'Category' = $msg.category
         'Author' = $msg.author
         'Link' = $msg.link
         'UpVotes' = $Upvote
         'DownVotes' = $Downvote
         'Validations' = $Validation
         'WorkArounds' = $Workaround
         'Comments' = $HTML.all.tags("p") | % InnerText           
         'FeedbackID' = $FeedBackID
        }#EndPSCustomObject
       }
    

提交回复
热议问题