How to split string preserving whole words?

前端 未结 10 2264
梦谈多话
梦谈多话 2020-11-30 09:18

I need to split long sentence into parts preserving whole words. Each part should have given maximum number of characters (including space, dots etc.). For example:

相关标签:
10条回答
  • 2020-11-30 09:36

    Expanding on jon's answer above; I needed to switch g with g.toArray(), and also change max to (max + 2) to get an exact wrapping on the max'th character.

    public static class ExtensionMethods
    {
        public static string[] Wrap(this string text, int max)
        {
            var charCount = 0;
            var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
            return lines.GroupBy(w => (charCount += w.Length + 1) / (max + 2))
                        .Select(g => string.Join(" ", g.ToArray()))
                        .ToArray();
        }
    }
    

    And here is sample usage as NUnit tests:

    [Test]
    public void TestWrap()
    {
        Assert.AreEqual(2, "A B C".Wrap(4).Length);
        Assert.AreEqual(1, "A B C".Wrap(5).Length);
    
        Assert.AreEqual(2, "AA BB CC".Wrap(7).Length);
        Assert.AreEqual(1, "AA BB CC".Wrap(8).Length);
    
        Assert.AreEqual(2, "TEST TEST TEST TEST".Wrap(10).Length);
        Assert.AreEqual(2, "  TEST TEST TEST TEST  ".Wrap(10).Length);
        Assert.AreEqual("TEST TEST", "  TEST TEST TEST TEST  ".Wrap(10)[0]);
    }
    
    0 讨论(0)
  • 2020-11-30 09:36

    While CsConsoleFormat† was primarily designed to format text for console, it supports generating plain text as well.

    var doc = new Document().AddChildren(
      new Div("Silver badges are awarded for longer term goals. Silver badges are uncommon.") {
        TextWrap = TextWrapping.WordWrap
      }
    );
    var bounds = new Rect(0, 0, 35, Size.Infinity);
    string text = ConsoleRenderer.RenderDocumentToText(doc, new TextRenderTarget(), bounds);
    

    And, if you actually need trimmed strings like in your question:

    List<string> lines = text.Trim()
      .Split(new[] { Environment.NewLine }, StringSplitOptions.None)
      .Select(s => s.Trim())
      .ToList();
    

    In addition to word wrap on spaces, you get proper handling of hyphens, zero-width spaces, no-break spaces etc.

    † CsConsoleFormat was developed by me.

    0 讨论(0)
  • 2020-11-30 09:38

    I knew there had to be a nice LINQ-y way of doing this, so here it is for the fun of it:

    var input = "The quick brown fox jumps over the lazy dog.";
    var charCount = 0;
    var maxLineLength = 11;
    
    var lines = input.Split(' ', StringSplitOptions.RemoveEmptyEntries)
        .GroupBy(w => (charCount += w.Length + 1) / maxLineLength)
        .Select(g => string.Join(" ", g));
    
    // That's all :)
    
    foreach (var line in lines) {
        Console.WriteLine(line);
    }
    

    Obviously this code works only as long as the query is not parallel, since it depends on charCount to be incremented "in word order".

    0 讨论(0)
  • 2020-11-30 09:38

    Split the string with a (space), that build up new strings from the resulting array, stopping before your limit for each new segment.

    Untested pseudo-code:

    string[] words = sentence.Split(new char[] {' '});
    IList<string> sentenceParts = new List<string>();
    sentenceParts.Add(string.Empty);
    
    int partCounter = 0;    
    
    foreach (var word in words)
    {
      if(sentenceParts[partCounter].Length + word.Length > myLimit)
      {
         partCounter++;
         sentenceParts.Add(string.Empty);
      }
    
      sentenceParts[partCounter] += word + " ";
    }
    
    0 讨论(0)
  • 2020-11-30 09:43

    It seems like everyone is using some form of "Split then rebuild the sentence"...

    I thought I would take a stab at this the way my brain would logically think about doing this manually, which is:

    • Split on length
    • Go backwards to the nearest space and use that chunk
    • Remove the used chunk and start over

    The code ended up being a little more complex than I was hoping for, however I believe it handles most (all?) edge cases - including words that are longer than maxLength, when the words end exactly on the maxLength, etc.

    Here's my function:

    private static List<string> SplitWordsByLength(string str, int maxLength)
    {
        List<string> chunks = new List<string>();
        while (str.Length > 0)
        {
            if (str.Length <= maxLength)                    //if remaining string is less than length, add to list and break out of loop
            {
                chunks.Add(str);
                break;
            }
    
            string chunk = str.Substring(0, maxLength);     //Get maxLength chunk from string.
    
            if (char.IsWhiteSpace(str[maxLength]))          //if next char is a space, we can use the whole chunk and remove the space for the next line
            {
                chunks.Add(chunk);
                str = str.Substring(chunk.Length + 1);      //Remove chunk plus space from original string
            }
            else
            {
                int splitIndex = chunk.LastIndexOf(' ');    //Find last space in chunk.
                if (splitIndex != -1)                       //If space exists in string,
                    chunk = chunk.Substring(0, splitIndex); //  remove chars after space.
                str = str.Substring(chunk.Length + (splitIndex == -1 ? 0 : 1));      //Remove chunk plus space (if found) from original string
                chunks.Add(chunk);                          //Add to list
            }
        }
        return chunks;
    }
    

    Test usage:

    string testString = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
    int length = 35;
    
    List<string> test = SplitWordsByLength(testString, length);
    
    foreach (string chunk in test)
    {
        Console.WriteLine(chunk);  
    }
    
    Console.ReadLine();
    
    0 讨论(0)
  • 2020-11-30 09:47

    At first I was thinking this might be a Regex kind of thing but here's my shot at it:

    List<string> parts = new List<string>();
    int partLength = 35;
    string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
    
    string[] pieces = sentence.Split(' ');
    StringBuilder tempString = new StringBuilder("");
    
    foreach(var piece in pieces)
    {
        if(piece.Length + tempString.Length + 1 > partLength) 
        {
            parts.Add(tempString.ToString());
            tempString.Clear();        
        }
        tempString.Append(" " + piece); 
    }
    
    0 讨论(0)
提交回复
热议问题