Finding the number of occurences strings in a specific format occur in a given text

帅比萌擦擦* 提交于 2019-12-13 15:09:39

问题


I have a large string, where there can be specific words (text followed by a single colon, like "test:") occurring more than once. For example, like this:

word:
TEST:
word:

TEST:
TEST: // random text

"word" occurs twice and "TEST" occurs thrice, but the amount can be variable. Also, these words don't have to be in the same order and there can be more text in the same line as the word (as shown in the last example of "TEST"). What I need to do is append the occurrence number to each word, for example the output string needs to be this:

word_ONE:
TEST_ONE:
word_TWO:

TEST_TWO:
TEST_THREE: // random text

The RegEx for getting these words which I've written is ^\b[A-Za-z0-9_]{4,}\b:. However, I don't know how to accomplish the above in a fast way. Any ideas?


回答1:


Regex is perfect for this job - using Replace with a match evaluator:

This example is not tested nor compiled:

public class Fix
{
    public static String Execute(string largeText)
    {
        return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator);
    }

    private Dictionary<String, int> counters = new Dictionary<String, int>();
    private static String[] numbers = {"ONE", "TWO", "THREE",...};
    public String Evaluator(Match m)
    {
        String word = m.Groups[1].Value;
        int count;
        if (!counters.TryGetValue(word, out count))
          count = 0;
        count++;
        counters[word] = count;

        return word + "_" + numbers[count-1] + ":";
    }
}

This should return what you requested when calling:

result = Fix.Execute(largeText);



回答2:


i think you can do this with Regax.Replace(string, string, MatchEvaluator) and a dictionary.

Dictionary<string, int> wordCount=new Dictionary<string,int>();
string AppendIndex(Match m)
{
   string matchedString = m.ToString();
   if(wordCount.Contains(matchedString))
     wordCount[matchedString]=wordCount[matchedString]+1;
   else
     wordCount.Add(matchedString, 1);
  return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2
}


string inputText = "....";
string regexText = @"";

   static void Main() 
   {
      string text = "....";
      string result = Regex.Replace(text, @"^\b[A-Za-z0-9_]{4,}\b:",
         new MatchEvaluator(AppendIndex));
   }

see this: http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx




回答3:


If I understand you correctly, regex is not necessary here.

You can split your large string by the ':' character. Maybe you also need to read line by line (split by '\n'). After that you just create a dictionary (IDictionary<string, int>), which counts the occurrences of certain words. Every time you find word x, you increase the counter in the dictionary.

EDIT

  1. Read your file line by line OR split the string by '\n'
  2. Check if your delimiter is present. Either by splitting by ':' OR using regex.
  3. Get the first item from the split array OR the first match of your regex.
  4. Use a dictionary to count your occurrences.

    if (dictionary.Contains(key)) dictionary[key]++;
    else dictionary.Add(key, 1);

  5. If you need words instead of numbers, then create another dictionary for these. So that dictionary[key] equals one if key equals 1. Mabye there is another solution for that.




回答4:


Look at this example (I know it's not perfect and not so nice) lets leave the exact argument for the Split function, I think it can help

static void Main(string[] args)
{
  string a = "word:word:test:-1+234=567:test:test:";
  string[] tks = a.Split(':');
  Regex re = new Regex(@"^\b[A-Za-z0-9_]{4,}\b");
  var res = from x in tks
  where re.Matches(x).Count > 0
  select x + DecodeNO(tks.Count(y=>y.Equals(x)));
  foreach (var item in res)
  {
    Console.WriteLine(item);
  }
  Console.ReadLine();
}

private static string DecodeNO(int n)
{
 switch (n)
 {
   case 1:
     return "_one";
   case 2:
     return "_two";
   case 3:
     return "_three";
  }
 return "";
}


来源:https://stackoverflow.com/questions/8630235/finding-the-number-of-occurences-strings-in-a-specific-format-occur-in-a-given-t

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!