Regular Expression to Extract the Url out of the Anchor Tag

前端 未结 3 551
隐瞒了意图╮
隐瞒了意图╮ 2020-12-11 12:54

I want to extract the http link from inside the anchor tags? The extension that should be extracted should be WMV files only.

3条回答
  •  北海茫月
    2020-12-11 13:21

    Regex:

    [^\"]*.wmv)(\"|\'))\\s*>(?.*)\\s*
    

    [Note: \s* is used in several places to match the extra white space characters that can occur in the html.]

    Sample C# code:

    /// 
    /// Assigns proper values to link and name, if the htmlId matches the pattern
    /// Matches only for .wmv files
    /// 
    /// true if success, false otherwise
    public static bool TryGetHrefDetailsWMV(string htmlATag, out string wmvLink, out string name)
    {
        wmvLink = null;
        name = null;
    
        string pattern = "[^\"]*.wmv)(\"|\'))\\s*>(?.*)\\s*";
    
        if (Regex.IsMatch(htmlATag, pattern))
        {
            Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
            wmvLink = r.Match(htmlATag).Result("${link}");
            name = r.Match(htmlATag).Result("${name}");
            return true;
        }
        else
            return false;
    }
    
    MyRegEx.TryGetHrefDetailsWMV("Name of File", 
                    out wmvLink, out name); // No match
    MyRegEx.TryGetHrefDetailsWMV("Name of File",
                    out wmvLink, out name); // Match
    MyRegEx.TryGetHrefDetailsWMV("Name of File", out wmvLink, out name); // Match
    

提交回复
热议问题