Removing the url from text using java

前端 未结 7 1829
梦毁少年i
梦毁少年i 2020-12-15 07:59

How to remove the URLs present in text example

String str=\"Fear psychosis after #AssamRiots - http://www.google.com/LdEbWTgD http://www.yahoo.com/mksVZKBz\"         


        
7条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-15 08:42

    Note that if your URL contains characters like & and \ then the answers above will not work because replaceAll can't handle those characters. What worked for me was to remove those characters in a new string variable then remove those characters from the results of m.find() and use replaceAll on my new string variable.

    private String removeUrl(String commentstr)
    {
        // rid of ? and & in urls since replaceAll can't deal with them
        String commentstr1 = commentstr.replaceAll("\\?", "").replaceAll("\\&", "");
    
        String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
        Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(commentstr);
        int i = 0;
        while (m.find()) {
            commentstr = commentstr1.replaceAll(m.group(i).replaceAll("\\?", "").replaceAll("\\&", ""),"").trim();
            i++;
        }
        return commentstr;
    }    
    

提交回复
热议问题