What is a regular expression for parsing out individual sentences?

前端 未结 6 956
攒了一身酷
攒了一身酷 2020-11-27 18:16

I am looking for a good .NET regular expression that I can use for parsing out individual sentences from a body of text.

It should be able to parse the following blo

6条回答
  •  盖世英雄少女心
    2020-11-27 19:04

    I used the suggestions posted here and came up with the regex that seams to achieve what I want to do:

    (?\S.+?(?[.!?]|\Z))(?=\s+|\Z)
    

    I used Expresso to come up with:

    //  using System.Text.RegularExpressions;
    /// 
    ///  Regular expression built for C# on: Sun, Dec 27, 2009, 03:05:24 PM
    ///  Using Expresso Version: 3.0.3276, http://www.ultrapico.com
    ///  
    ///  A description of the regular expression:
    ///  
    ///  [Sentence]: A named capture group. [\S.+?(?[.!?]|\Z)]
    ///      \S.+?(?[.!?]|\Z)
    ///          Anything other than whitespace
    ///          Any character, one or more repetitions, as few as possible
    ///          [Terminator]: A named capture group. [[.!?]|\Z]
    ///              Select from 2 alternatives
    ///                  Any character in this class: [.!?]
    ///                  End of string or before new line at end of string
    ///  Match a suffix but exclude it from the capture. [\s+|\Z]
    ///      Select from 2 alternatives
    ///          Whitespace, one or more repetitions
    ///          End of string or before new line at end of string
    ///  
    ///
    /// 
    public static Regex regex = new Regex(
          "(?\\S.+?(?[.!?]|\\Z))(?=\\s+|\\Z)",
        RegexOptions.CultureInvariant
        | RegexOptions.IgnorePatternWhitespace
        | RegexOptions.Compiled
        );
    
    
    // This is the replacement string
    public static string regexReplace = 
          "$& [${Day}-${Month}-${Year}]";
    
    
    //// Replace the matched text in the InputText using the replacement pattern
    // string result = regex.Replace(InputText,regexReplace);
    
    //// Split the InputText wherever the regex matches
    // string[] results = regex.Split(InputText);
    
    //// Capture the first Match, if any, in the InputText
    // Match m = regex.Match(InputText);
    
    //// Capture all Matches in the InputText
    // MatchCollection ms = regex.Matches(InputText);
    
    //// Test to see if there is a match in the InputText
    // bool IsMatch = regex.IsMatch(InputText);
    
    //// Get the names of all the named and numbered capture groups
    // string[] GroupNames = regex.GetGroupNames();
    
    //// Get the numbers of all the named and numbered capture groups
    // int[] GroupNumbers = regex.GetGroupNumbers();
    

提交回复
热议问题