How the intents processed in a Text block(Java 13)

冷暖自知 提交于 2020-03-18 09:29:29

问题


I just tried the new text block feature in Java 13 and encountered a small issue.

I have read this article from Jaxcenter.

The closing triple quotation marks will affect the format.

String query = """
            select firstName,
            lastName,
            email
            from User
            where id= ?
        """;

System.out.println("SQL or JPL like query string :\n" + query);

This above format works well. To align with the the closing delimiter ("""), the multiline string left spaces before every lines.

But when I tried to compare the following two text block string, they are same format in the output console, but they are not equals, even after stripIntent.

String hello = """
    Hello,
    Java 13
    """;

String hello2 = """
    Hello,
    Java 13
""";

System.out.println("Hello1:\n" + hello);
System.out.println("Hello2:\n" + hello);

System.out.println("hello is equals hello2:" + hello.equals(hello2));

System.out.println("hello is equals hello2 after stripIndent():" + hello.stripIndent().equals(hello2.stripIndent()));

The output console is like:

hello is equals hello2:false
hello is equals hello2 after stripIndent():false

I am not sure where is wrong, or this is a text block design purpose?

Update: Just print hello2 stripIntent,

System.out.println("hello2 after stripIntent():\n" + hello2.stripIndent());

The whitespaces before every lines are NOT removed by stripIntent as expected.

Updated: After read the related java doc, I think after the text block is compiled, it should has stripped the left intents of the lines in the block. What is the purpose of stripIntent for text block? I know it is easy to understand when use it on a normal string.

The complete code is here.


回答1:


There is a concept of incidental white space.

JEP 355: Text Blocks (Preview)

Compile-time processing

A text block is a constant expression of type String, just like a string literal. However, unlike a string literal, the content of a text block is processed by the Java compiler in three distinct steps:

  • Line terminators in the content are translated to LF (\u000A). The purpose of this translation is to follow the principle of least surprise when moving Java source code across platforms.

  • Incidental white space surrounding the content, introduced to match the indentation of Java source code, is removed.

  • Escape sequences in the content are interpreted. Performing interpretation as the final step means developers can write escape sequences such as \n without them being modified or deleted by earlier steps.

...

Incidental white space

Here is the HTML example using dots to visualize the spaces that the developer added for indentation:

String html = """
..............<html>
..............    <body>
..............        <p>Hello, world</p>
..............    </body>
..............</html>
..............""";

Since the opening delimiter is generally positioned to appear on the same line as the statement or expression which consumes the text block, there is no real significance to the fact that 14 visualized spaces start each line. Including those spaces in the content would mean the text block denotes a string different from the one denoted by the concatenated string literals. This would hurt migration, and be a recurring source of surprise: it is overwhelmingly likely that the developer does not want those spaces in the string. Also, the closing delimiter is generally positioned to align with the content, which further suggests that the 14 visualized spaces are insignificant.
...
Accordingly, an appropriate interpretation for the content of a text block is to differentiate incidental white space at the start and end of each line, from essential white space. The Java compiler processes the content by removing incidental white space to yield what the developer intended.

Your assumption that

    Hello,
    Java 13
<empty line>

equals

....Hello,
....Java 13
<empty line>

is inaccurate since those are essential white spaces and they will not be removed by either the compiler or String#stripIndent.

To make it clear, let's keep representing an incidental white space as a dot.

String hello = """
....Hello,
....Java 13
....""";

String hello2 = """
    Hello,
    Java 13
""";

Let's print them.

Hello,
Java 13
<empty line>

    Hello,
    Java 13
<empty line>

Let's call String#stripIndent on both and print the results.

Hello,
Java 13
<empty line>

    Hello,
    Java 13
<empty line>

To understand why nothing has changed, we need to look into the documentation.

String#stripIndent

Returns a string whose value is this string, with incidental white space removed from the beginning and end of every line.

Then, the minimum indentation (min) is determined as follows. For each non-blank line (as defined by isBlank()), the leading white space characters are counted. The leading white space characters on the last line are also counted even if blank. The min value is the smallest of these counts.

For each non-blank line, min leading white space characters are removed, and any trailing white space characters are removed. Blank lines are replaced with the empty string.

For both Strings, the minimum indentation is 0.

Hello,          // 0
Java 13         // 0    min(0, 0, 0) = 0 
<empty line>    // 0

    Hello,      // 4
    Java 13     // 4    min(4, 4, 0) = 0
<empty line>    // 0

String#stripIndent gives developers access to a Java version of the re-indentation algorithm used by the compiler.

JEP 355

The re-indentation algorithm will be normative in The Java Language Specification. Developers will have access to it via String::stripIndent, a new instance method.

Specification for JEP 355

The string represented by a text block is not the literal sequence of characters in the content. Instead, the string represented by a text block is the result of applying the following transformations to the content, in order:

  1. Line terminators are normalized to the ASCII LF character (...)

  2. Incidental white space is removed, as if by execution of String::stripIndent on the characters in the content.

  3. Escape sequences are interpreted, as in a string literal.




回答2:


TLDR. Your example strings are not equal and it is correct that Java tells you that they are not equal.

Consider reading a description of the String.stripIndent method. Here is a paraphrase from a jaxenter.com post:

The stripIndent method removes whitespace in front of multi-line strings that all lines have in common, i.e. moves the entire text to the left without changing the formatting.

Note the words "that all lines have in common".

Now, apply "that all lines have in common" to the following literal string:

String hello2 = """
    Hello,
    First, notice that the final line of this example has zero spaces.
    Next, notice that all other lines of this example have non-zero spaces.
"""; // <--- This is a line in the text block.

The key take away is "0 != 3".




回答3:


Testing with jshell:

String hello = """
    Hello,
    Java 13
    """;
hello.replace(" ", ".");

results in

"Hello\nJava13\n"

note: no spaces at all

String hello2 = """
    Hello,
    Java 13
""";
hello2.replace(" ", ".");

results in

"....Hello\n....Java13\n"

Note that both results do NOT have spaces in the last line, after the last \n, so stripIndent() does not strip any spaces


stripIndent() does the same as the compiler does with text blocks. Example

String hello3 = ""
    + "    Hello\n"
    + "    Java13\n"
    + "  ";
hello3.stripIndent().replace(" ", ".");

results in

"..Hello\n..Java13\n"

that is, two spaces removed from all 3 lines; two spaces since the last line has 2 spaces (the other lines have more, so at most 2 spaces can be removed from all lines)



来源:https://stackoverflow.com/questions/58030419/how-the-intents-processed-in-a-text-blockjava-13

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!