Get the value of an HTML element

后端 未结 4 1630
执念已碎
执念已碎 2020-12-19 10:43

I have the HTML code of a webpage in a text file. I\'d like my program to return the value that is in a tag. E.g. I want to get \"Julius\" out of



        
相关标签:
4条回答
  • 2020-12-19 11:20

    I'd strongly recommend you look into something like the HTML Agility Pack

    0 讨论(0)
  • 2020-12-19 11:26

    i've asked the same question few days ago and ened up using HTML Agility Pack, but here is the regular expressions that you want

    this one will ignore the attributes

    <span[^>]*>(.*?)</span>
    

    this one will consider the attributes

    <span class="hidden first"[^>]*>(.*?)</span>
    
    0 讨论(0)
  • 2020-12-19 11:30

    I would use the Html Agility Pack to parse the HTML in C#.

    0 讨论(0)
  • 2020-12-19 11:31

    You should be using an html parser like htmlagilitypack .Regex is not a good choice for parsing HTML files as HTML is not strict nor is it regular with its format.

    You can use below code to retrieve it using HtmlAgilityPack

    HtmlDocument doc = new HtmlDocument();
    doc.Load(yourStream);
    
    var itemList = doc.DocumentNode.SelectNodes("//span[@class='hidden first']")//this xpath selects all span tag having its class as hidden first
                      .Select(p => p.InnerText)
                      .ToList();
    
    //itemList now contain all the span tags content having its class as hidden first
    
    0 讨论(0)
提交回复
热议问题