Regex : how to get words from a string (C#)

后端 未结 6 649
半阙折子戏
半阙折子戏 2020-12-02 21:40

My input consists of user-posted strings.

What I want to do is create a dictionary with words, and how often they’ve been used. This means I want to parse a string,

6条回答
  •  不思量自难忘°
    2020-12-02 22:11

    Using the following

    var pattern = new Regex(
      @"( [^\W_\d]              # starting with a letter
                                # followed by a run of either...
          ( [^\W_\d] |          #   more letters or
            [-'\d](?=[^\W_\d])  #   ', -, or digit followed by a letter
          )*
          [^\W_\d]              # and finishing with a letter
        )",
      RegexOptions.IgnorePatternWhitespace);
    
    var input = "#@!@LOLOLOL YOU'VE BEEN *PWN3D* ! :') !!!1einszwei drei foo--bar!";
    
    foreach (Match m in pattern.Matches(input))
      Console.WriteLine("[{0}]", m.Groups[1].Value);
    

    produces output of

    [LOLOLOL]
    [YOU'VE]
    [BEEN]
    [PWN3D]
    [einszwei]
    [drei]
    [foo]
    [bar]

提交回复
热议问题