How to remove extra returns and spaces in a string by regex?

可紊 提交于 2019-12-17 19:14:34

问题


I convert a HTML code to plain text.But there are many extra returns and spaces.How to remove them?


回答1:


I'm assuming that you want to

  • find two or more consecutive spaces and replace them with a single space, and
  • find two or more consecutive newlines and replace them with a single newline.

If that's correct, then you could use

resultString = Regex.Replace(subjectString, @"( |\r?\n)\1+", "$1");

This keeps the original "type" of whitespace intact and also preserves Windows line endings correctly. If you also want to "condense" multiple tabs into one, use

resultString = Regex.Replace(subjectString, @"( |\t|\r?\n)\1+", "$1");

To condense a string of newlines and spaces (any number of each) into a single newline, use

resultString = Regex.Replace(subjectString, @"(?:(?:\r?\n)+ +){2,}", @"\n");



回答2:


string new_string = Regex.Replace(orig_string, @"\s", "") will remove all whitespace

string new_string = Regex.Replace(orig_string, @"\s+", " ") will just collapse multiple whitespaces into one




回答3:


I used a lot of algorithm for that. Every loop was good but this was clear and absolute.

//define what you want to remove as char

char tb = (char)9; //Tab char ascii code
spc = (char)32;    //space char ascii code
nwln = (char)10;   //New line char ascii char

yourstring.Replace(tb,"");
yourstring.Replace(spc,"");
yourstring.Replace(nwln,"");

//by defining chars, result was better.



回答4:


You can use Trim() to remove the spaces and returns. In HTML the spaces is not important so you can omit them by using the Trim() method in System.String class.



来源:https://stackoverflow.com/questions/4973524/how-to-remove-extra-returns-and-spaces-in-a-string-by-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!