C# code to linkify urls in a string

别说谁变了你拦得住时间么 提交于 2019-11-26 08:08:29

问题


Does anyone have any good c# code (and regular expressions) that will parse a string and \"linkify\" any urls that may be in the string?


回答1:


It's a pretty simple task you can acheive it with Regex and a ready-to-go regular expression from:

  • http://regexlib.com/

Something like:

var html = Regex.Replace(html, @"^(http|https|ftp)\://[a-zA-Z0-9\-\.]+" +
                         "\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?" +
                         "([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$",
                         "<a href=\"$1\">$1</a>");

You may also be interested not only in creating links but in shortening URLs. Here is a good article on this subject:

  • Resolve and shorten URLs in C#

See also:

  • Regular Expression Workbench at MSDN
  • Converting a URL into a Link in C# Using Regular Expressions
  • Regex to find URL within text and make them as link
  • Regex.Replace Method at MSDN
  • The Problem With URLs by Jeff Atwood
  • Parsing URLs with Regular Expressions and the Regex Object
  • Format URLs in string to HTML Links in C#
  • Automatically hyperlink URL and Email in ASP.NET Pages with C#



回答2:


well, after a lot of research on this, and several attempts to fix times when

  1. people enter in http://www.sitename.com and www.sitename.com in the same post
  2. fixes to parenthisis like (http://www.sitename.com) and http://msdn.microsoft.com/en-us/library/aa752574(vs.85).aspx
  3. long urls like: http://www.amazon.com/gp/product/b000ads62g/ref=s9_simz_gw_s3_p74_t1?pf_rd_m=atvpdkikx0der&pf_rd_s=center-2&pf_rd_r=04eezfszazqzs8xfm9yd&pf_rd_t=101&pf_rd_p=470938631&pf_rd_i=507846

we are now using this HtmlHelper extension... thought I would share and get any comments:

    private static Regex regExHttpLinks = new Regex(@"(?<=\()\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|](?=\))|(?<=(?<wrap>[=~|_#]))\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|](?=\k<wrap>)|\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]", RegexOptions.Compiled | RegexOptions.IgnoreCase);

    public static string Format(this HtmlHelper htmlHelper, string html)
    {
        if (string.IsNullOrEmpty(html))
        {
            return html;
        }

        html = htmlHelper.Encode(html);
        html = html.Replace(Environment.NewLine, "<br />");

        // replace periods on numeric values that appear to be valid domain names
        var periodReplacement = "[[[replace:period]]]";
        html = Regex.Replace(html, @"(?<=\d)\.(?=\d)", periodReplacement);

        // create links for matches
        var linkMatches = regExHttpLinks.Matches(html);
        for (int i = 0; i < linkMatches.Count; i++)
        {
            var temp = linkMatches[i].ToString();

            if (!temp.Contains("://"))
            {
                temp = "http://" + temp;
            }

            html = html.Replace(linkMatches[i].ToString(), String.Format("<a href=\"{0}\" title=\"{0}\">{1}</a>", temp.Replace(".", periodReplacement).ToLower(), linkMatches[i].ToString().Replace(".", periodReplacement)));
        }

        // Clear out period replacement
        html = html.Replace(periodReplacement, ".");

        return html;
    }



回答3:


protected string Linkify( string SearchText ) {
    // this will find links like:
    // http://www.mysite.com
    // as well as any links with other characters directly in front of it like:
    // href="http://www.mysite.com"
    // you can then use your own logic to determine which links to linkify
    Regex regx = new Regex( @"\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b", RegexOptions.IgnoreCase );
    SearchText = SearchText.Replace( "&nbsp;", " " );
    MatchCollection matches = regx.Matches( SearchText );

    foreach ( Match match in matches ) {
        if ( match.Value.StartsWith( "http" ) ) { // if it starts with anything else then dont linkify -- may already be linked!
            SearchText = SearchText.Replace( match.Value, "<a href='" + match.Value + "'>" + match.Value + "</a>" );
        }
    }

    return SearchText;
}



回答4:


It's not that easy as you can read in this blog post by Jeff Atwood. It's especially hard to detect where an URL ends.

For example, is the trailing parenthesis part of the URL or not:

  • http​://en.wikipedia.org/wiki/PCTools(CentralPointSoftware)
  • an URL in parentheses (http​://en.wikipedia.org) more text

In the first case, the parentheses are part of the URL. In the second case they are not!




回答5:


Have found following regular expression http://daringfireball.net/2010/07/improved_regex_for_matching_urls

for me looks very good. Jeff Atwood solution doesn't handle many cases. josefresno seem to me handle all cases. But when I have tried to understand it (in case of any support requests) my brain was boiled.




回答6:


There is class:

public class TextLink
{
    #region Properties

    public const string BeginPattern = "((http|https)://)?(www.)?";

    public const string MiddlePattern = @"([a-z0-9\-]*\.)+[a-z]+(:[0-9]+)?";

    public const string EndPattern = @"(/\S*)?";

    public static string Pattern { get { return BeginPattern + MiddlePattern + EndPattern; } }

    public static string ExactPattern { get { return string.Format("^{0}$", Pattern); } }

    public string OriginalInput { get; private set; }

    public bool Valid { get; private set; }

    private bool _isHttps;

    private string _readyLink;

    #endregion

    #region Constructor

    public TextLink(string input)
    {
        this.OriginalInput = input;

        var text = Regex.Replace(input, @"(^\s)|(\s$)", "", RegexOptions.IgnoreCase);

        Valid = Regex.IsMatch(text, ExactPattern);

        if (Valid)
        {
            _isHttps = Regex.IsMatch(text, "^https:", RegexOptions.IgnoreCase);
            // clear begin:
            _readyLink = Regex.Replace(text, BeginPattern, "", RegexOptions.IgnoreCase);
            // HTTPS
            if (_isHttps)
            {
                _readyLink = "https://www." + _readyLink;
            }
            // Default
            else
            {
                _readyLink = "http://www." + _readyLink;
            }
        }
    }

    #endregion

    #region Methods

    public override string ToString()
    {
        return _readyLink;
    }

    #endregion
}

Use it in this method:

public static string ReplaceUrls(string input)
{
    var result = Regex.Replace(input.ToSafeString(), TextLink.Pattern, match =>
    {
        var textLink = new TextLink(match.Value);
        return textLink.Valid ?
            string.Format("<a href=\"{0}\" target=\"_blank\">{1}</a>", textLink, textLink.OriginalInput) :
            textLink.OriginalInput;
    });
    return result;
}

Test cases:

[TestMethod]
public void RegexUtil_TextLink_Parsing()
{
    Assert.IsTrue(new TextLink("smthing.com").Valid);
    Assert.IsTrue(new TextLink("www.smthing.com/").Valid);
    Assert.IsTrue(new TextLink("http://smthing.com").Valid);
    Assert.IsTrue(new TextLink("http://www.smthing.com").Valid);
    Assert.IsTrue(new TextLink("http://www.smthing.com/").Valid);
    Assert.IsTrue(new TextLink("http://www.smthing.com/publisher").Valid);

    // port
    Assert.IsTrue(new TextLink("http://www.smthing.com:80").Valid);
    Assert.IsTrue(new TextLink("http://www.smthing.com:80/").Valid);
    // https
    Assert.IsTrue(new TextLink("https://smthing.com").Valid);

    Assert.IsFalse(new TextLink("").Valid);
    Assert.IsFalse(new TextLink("smthing.com.").Valid);
    Assert.IsFalse(new TextLink("smthing.com-").Valid);
}

[TestMethod]
public void RegexUtil_TextLink_ToString()
{
    // default
    Assert.AreEqual("http://www.smthing.com", new TextLink("smthing.com").ToString());
    Assert.AreEqual("http://www.smthing.com", new TextLink("http://www.smthing.com").ToString());
    Assert.AreEqual("http://www.smthing.com/", new TextLink("smthing.com/").ToString());

    Assert.AreEqual("https://www.smthing.com", new TextLink("https://www.smthing.com").ToString());
}



回答7:


This works for me:

str = Regex.Replace(str,
                @"((http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)",
                "<a target='_blank' href='$1'>$1</a>");


来源:https://stackoverflow.com/questions/758135/c-sharp-code-to-linkify-urls-in-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!