Format a string using regex in Java

前端 未结 6 1377
执笔经年
执笔经年 2020-12-14 11:16

Is there any way I can format a string into a specific pattern using regex or is stringbuilder + substring a faster approach?

For example, say a phone number --> 123

6条回答
  •  长情又很酷
    2020-12-14 11:46

    Disclaimer

    Since several answers have already addressed the greater efficiency of string builders, etc., I wanted to show you how it could be done with regex and address the benefits of using this approach.

    One REGEX Solution

    Using this matching regex (similar to Alan Moore's expression):

    (.{3})(.{3})(.{4})
    

    allows you to match precisely 10 characters into 3 groups, then use a replace expression that references those groups, with additional characters added:

    ($1) $2-$3
    

    thus producing the replacement like you requested. Of course, it will also match punctuation and letters as well, which is a reason to use \d (encoded into a Java string as \\d) rather than the . wildcard character.

    Why REGEX?

    The potential advantage of a regex approach to something like this is the compression of "logic" to the string manipulation. Since all the "logic" can be compressed into a string of characters, rather than pre-compiled code, the regex matching and replacement strings can be stored in a database for easier manipulation, updating, or customization by an experienced user of the system. This makes the situation more complex on several levels, but allows considerably more flexibility for users.

    With the other approaches (string manipulation), changing a formatting algorithm so that it will produce (555)123-4567 or 555.123.4567 instead of your specified (555) 123-4567 would essentially not be possible merely through the user interface. with the regex approach, the modification would be as simple as changing ($1) $2-$3 (in the database or similar store) into $1.$2.$3 or ($1)$2-$3 as appropriate.

    If you wanted to modify your system to accept "dirtier" input, which might include various attempts at formatting, such as 555-123.4567 and reformat them to something consistent, it would be possible to make a string-manipulation algorithm that would be capable of this and recompile the application to work how you would like. With a regex solution, however, a system overhaul would not be necessary - merely change the parsing and replacement expressions like so (maybe a little complex for beginners to understand right away):

    ^\D*1?\D*([2-9])\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d).*$
    ($1$2$3) $4$5$6-$7$8$9$10
    

    This would allow a significant "upgrade" in the program's ability, as shown in the following reformatting:

    "Input"                       "Output"
    ----------------------------- --------------------------------
    "1323-456-7890 540"           "(323) 456-7890"
    "8648217634"                  "(864) 821-7634"
    "453453453322"                "(453) 453-4533"
    "@404-327-4532"               "(404) 327-4532"
    "172830923423456"             "(728) 309-2342"
    "jh345gjk26k65g3245"          "(345) 266-5324"
    "jh3g24235h2g3j5h3"           "(324) 235-2353"
    "12345678925x14"              "(234) 567-8925"
    "+1 (322)485-9321"            "(322) 485-9321"
    "804.555.1234"                "(804) 555-1234"
    "08648217634"                 
    

    As you can see, it is very "tolerant" of input "formatting" and knows that 1 should be ignored at the beginning of the number and that 0 should cause an error because it is invalid - all stored in a single string.

    The question comes down to performance vs. potential to customize. String manipulation is faster than regex, but future enhancement customization requires a recompile rather than a simple alteration of a string. That said, there are things that can't be expressed very well (or even in as readable a fashion as the above change) and some things that are not possible with regex.

    TL;DR:

    Regex allows storage of parsing algorithms into a relatively short string, which can be easily stored so as to be modifiable without recompiling. Simpler, more focused string manipulation functions are more efficient and can sometimes accomplish more than regex can. The key is to understand both tools and the requirements of the application and use the one most appropriate for the situation.

提交回复
热议问题