How can i remove none alphabet chars from a string[]? [duplicate]

匿名 (未验证) 提交于 2019-12-03 02:26:02

问题:

This question already has an answer here:

This is the code:

StringBuilder sb = new StringBuilder(); Regex rgx = new Regex("[^a-zA-Z0-9 -]");  var words = Regex.Split(textBox1.Text, @"(?=(?<=[^\s])\s+\w)"); for (int i = 0; i < words.Length; i++) {     words[i] = rgx.Replace(words[i], ""); } 

When im doing the Regex.Split() the words contain also strings with chars inside for exmaple:

Daniel>

or

Hello:

or

\r\nNew

or

hello---------------------------

And i need to get only the words without all the signs

So i tried to use this loop but i end that in words there are many places with "" And some places with only ------------------------

And i cant use this as strings later in my code.

回答1:

You don't need a regex to clear non-letters. This will remove all non-unicode letters.

public string RemoveNonUnicodeLetters(string input) {     StringBuilder sb = new StringBuilder();     foreach(char c in input)     {         if(Char.IsLetter(c))            sb.Append(c);     }      return sb.ToString(); } 

Alternatively, if you only want to allow Latin letters, you can use this

public string RemoveNonLatinLetters(string input) {     StringBuilder sb = new StringBuilder();     foreach(char c in input)     {         if(c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')            sb.Append(c);     }      return sb.ToString(); } 

Benchmark vs Regex

public static string RemoveNonUnicodeLetters(string input) {        StringBuilder sb = new StringBuilder();        foreach (char c in input)        {             if (Char.IsLetter(c))                 sb.Append(c);        }              return sb.ToString(); }    static readonly Regex nonUnicodeRx = new Regex("\\P{L}");  public static string RemoveNonUnicodeLetters2(string input) {      return nonUnicodeRx.Replace(input, ""); }   static void Main(string[] args) {      Stopwatch sw = new Stopwatch();      StringBuilder sb = new StringBuilder();       //generate guids as input     for (int j = 0; j < 1000; j++)     {         sb.Append(Guid.NewGuid().ToString());     }      string input = sb.ToString();      sw.Start();      for (int i = 0; i < 1000; i++)     {         RemoveNonUnicodeLetters(input);     }      sw.Stop();     Console.WriteLine("SM: " + sw.ElapsedMilliseconds);      sw.Restart();     for (int i = 0; i < 1000; i++)     {         RemoveNonUnicodeLetters2(input);     }      sw.Stop();     Console.WriteLine("RX: " + sw.ElapsedMilliseconds);   } 

Output (SM = String Manipulation, RX = Regex)

SM: 581 RX: 9882  SM: 545 RX: 9557  SM: 664 RX: 10196 


回答2:

do consider it. But as I’ve argued in the comments, regular expressions are actually the correct tool for the job, you’re just making it unnecessarily complicated. The actual solution is a one-liner:

var result = Regex.Replace(input, "\\P{L}", ""); 

\P{…} specifies a Unicode character class we do not want to match (the opposite of \p{…}). L is the Unicode character class for letters.

Of course it makes sense to encapsulate this into a method, as keyboardP did. To avoid recompiling the regular expression over again, you should also consider pulling the regex creation out of the actual code (although this probably won’t give a big impact on performance):

static readonly Regex nonUnicodeRx = new Regex("\\P{L}");  public static string RemoveNonUnicodeLetters(string input) {     return nonUnicodeRx.Replace(input, ""); } 


回答3:

To help Konrad and keyboardP resolve their differences, I ran a benchmark test, using their code. It turns out that keyboardP's code is 10x faster than Konrad's code

    using System;     using System.Collections.Generic;     using System.Linq;     using System.Text;     using System.Text.RegularExpressions;      namespace ConsoleApplication1     {         class Program         {             static void Main(string[] args)             {                 string input = "asdf234!@#*advfk234098awfdasdfq9823fna943";                 DateTime start = DateTime.Now;                 for (int i = 0; i < 100000; i++)                 {                     RemoveNonUnicodeLetters(input);                 }                 Console.WriteLine(DateTime.Now.Subtract(start).TotalSeconds);                 start = DateTime.Now;                 for (int i = 0; i < 100000; i++)                 {                     RemoveNonUnicodeLetters2(input);                 }                 Console.WriteLine(DateTime.Now.Subtract(start).TotalSeconds);             }             public static string RemoveNonUnicodeLetters(string input)             {                 StringBuilder sb = new StringBuilder();                 foreach (char c in input)                 {                     if (Char.IsLetter(c))                         sb.Append(c);                 }                  return sb.ToString();             }             public static string RemoveNonUnicodeLetters2(string input)             {                 var result = Regex.Replace(input, "\\P{L}", "");                 return result;             }         }     } 

I got

0.12 1.2 

as output

UPDATE:

To see if it is the Regex compilation that is slowing down the Regex method, I put the regex in a static variable that is only constructed once.

            static Regex rex = new Regex("\\P{L}");             public static string RemoveNonUnicodeLetters2(string input)             {                 var result = rex.Replace(input,m => "");                 return result;             } 

But this had no effect on the runtime.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!