Removing all whitespace lines from a multi-line string efficiently

前端 未结 19 2163
名媛妹妹
名媛妹妹 2020-12-29 04:25

In C# what\'s the best way to remove blank lines i.e., lines that contain only whitespace from a string? I\'m happy to use a Regex if that\'s the best solution.

EDIT

19条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-29 04:52

    In response to Will's bounty, which expects a solution that takes "test\r\n \r\nthis\r\n\r\n" and outputs "test\r\nthis", I've come up with a solution that makes use of atomic grouping (aka Nonbacktracking Subexpressions on MSDN). I recommend reading those articles for a better understanding of what's happening. Ultimately the atomic group helped match the trailing newline characters that were otherwise left behind.

    Use RegexOptions.Multiline with this pattern:

    ^\s+(?!\B)|\s*(?>[\r\n]+)$
    

    Here is an example with some test cases, including some I gathered from Will's comments on other posts, as well as my own.

    string[] inputs = 
    {
        "one\r\n \r\ntwo\r\n\t\r\n \r\n",
        "test\r\n \r\nthis\r\n\r\n",
        "\r\n\r\ntest!",
        "\r\ntest\r\n ! test",
        "\r\ntest \r\n ! "
    };
    string[] outputs = 
    {
        "one\r\ntwo",
        "test\r\nthis",
        "test!",
        "test\r\n ! test",
        "test \r\n ! "
    };
    
    string pattern = @"^\s+(?!\B)|\s*(?>[\r\n]+)$";
    
    for (int i = 0; i < inputs.Length; i++)
    {
        string result = Regex.Replace(inputs[i], pattern, "",
                                      RegexOptions.Multiline);
        Console.WriteLine(result == outputs[i]);
    }
    

    EDIT: To address the issue of the pattern failing to clean up text with a mix of whitespace and newlines, I added \s* to the last alternation portion of the regex. My previous pattern was redundant and I realized \s* would handle both cases.

提交回复
热议问题