C# Regular Expressions - Get Second Number, not First

前端 未结 4 2026

I have the following HTML code:

106.2%  

Which I get the number through two phases:

R         


        
相关标签:
4条回答
  • 2020-12-22 08:03
    string html = @"<td class=""actual""><span class=""revised worse"" title=""Revised From 107.2%"">106.4%</span></td>
    <td class=""actual"">106.2% </td>";
    string patten = @"<td\s+class=""actual"">.*(?<=>)(.+?)(?=</).*?</td>";
    foreach (Match match in Regex.Matches(html, patten))
    {
        Console.WriteLine(match.Groups[1].Value);
    }
    

    I have changed the regex as your wish, The output is

    106.4%
    106.2%
    
    0 讨论(0)
  • 2020-12-22 08:05

    Whenver you have HTML code that comes from different providers or your current one has several CMS that use different HTML formatting style, it is not safe to rely on regex.

    I suggest an HtmlAgilityPack based solution:

    public string getCleanHtml(string html)
    {
        var doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(html);
        return HtmlAgilityPack.HtmlEntity.DeEntitize(doc.DocumentNode.InnerText);
    }
    

    And then:

    var txt = "<td class=\"actual\">106.2% </td>";
    var clean = getCleanHtml(txt);
    txt = "<td class=\"actual\"><span class=\"revised worse\" title=\"Revised From 107.2%\">106.4%</span></td>";
    clean = getCleanHtml(txt);
    

    Result: enter image description here and enter image description here

    You do not have to worry about formatting tags inside and any XML/HTML entity references.

    If your text is a substring of the clean HTML string, then you can use Regex or any other string manipulation methods.

    UPDATE:

    You seem to need the node values from <td> tags. Here is a handy method for you:

    private List<string> GetTextFromHtmlTag(string html, string tag)
    {
       var result = new List<string>();
       HtmlAgilityPack.HtmlDocument hap;
       Uri uriResult;
       if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp)
       { // html is a URL 
           var doc = new HtmlAgilityPack.HtmlWeb();
           hap = doc.Load(uriResult.AbsoluteUri);
       }
       else
       { // html is a string
           hap = new HtmlAgilityPack.HtmlDocument();
           hap.LoadHtml(html);
       }
       var nodes = hap.DocumentNode.ChildNodes.Where(p => p.Name.ToLower() == tag.ToLower() && p.GetAttributeValue("class", string.Empty) == "previous"); // SelectNodes("//"+tag);
        if (nodes != null)
            foreach (var node in nodes)
               result.Add(HtmlAgilityPack.HtmlEntity.DeEntitize(node.InnerText));
        return result;
    }
    

    You can call it like this:

    var html = "<td class=\"previous\"><span class=\"revised worse\" title=\"Revised From 1.3\">0.9</span></td>\n<td class=\"previous\"><span class=\"revised worse\" title=\"Revised From 107.2%\">106.4%</span></td>";
    var res = GetTextFromHtmlTag(html, "td");
    

    enter image description here

    If you need to get only specific tags,

    If you have texts with a number inside, and you need just the number, you can use a regex for that:

    var rx = new Regex(@"[+-]?\d*\.?\d+"); // Matches "-1.23", "+5", ".677"
    

    See demo

    0 讨论(0)
  • Try XML method

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Xml;
    using System.Xml.Linq;
    
    
    namespace ConsoleApplication34
    {
        class Program
        {
    
            static void Main(string[] args)
            {
                string input = "<td class=\"actual\"><span class=\"revised worse\" title=\"Revised From 107.2%\">106.4%</span></td>";
    
                XElement element = XElement.Parse(input);
    
                string value = element.Descendants("span").Select(x => (string)x).FirstOrDefault();
    
            }
    
        }
    
    }
    
    0 讨论(0)
  • 2020-12-22 08:22

    I want to share the solution I have found for my problem.

    So, I can have HTML tags like the following:

    <td class="previous"><span class="revised worse" title="Revised From 1.3">0.9</span></td>
    <td class="previous"><span class="revised worse" title="Revised From 107.2%">106.4%</span></td>
    

    Or simpler:

    <td class="previous">51.4</td>
    

    First, I take the entire line, throught the following code:

    MatchCollection mPrevious = Regex.Matches(html, "<td class=\"previous\">\\s*(.*?)\\s*</td>", RegexOptions.Singleline);
    

    And second, I use the following code to extract the numbers only:

    foreach (Match m in mPrevious)
            {
    
    
                if (m.Groups[1].Value.Contains("span"))
                {
                    string stringtemp = Regex.Match(m.Groups[1].Value, "-?\\d+.\\d+.\">-?\\d+.\\d+|-?\\d+.\\d+\">-?\\d+.\\d+|-?\\d+.\">-?\\d+|-?\\d+\">-?\\d+").Value;
                    int indextemp = stringtemp.IndexOf(">");
                    if (indextemp <= 0) break;
                    lPrevious.Add(stringtemp.Remove(0, indextemp + 1));
                }
                else lPrevious.Add(Regex.Match(m.Groups[1].Value, @"-?\d+.\d+|-?\d+").Value);
            }
    

    First I start to identify if there is a SPAN tag, if there is, I take the two number together, and I have considered diferent posibilities with the regular expression. Identify a character from where to remove non important information, and remove what I don't want.

    It's working perfect.

    Thank you all for the support and quick answers.

    0 讨论(0)
提交回复
热议问题