Is there any way I can format a string into a specific pattern using regex or is stringbuilder + substring a faster approach?
For example, say a phone number --> 123
The same technique works in Java; you just have to adjust the to Java syntax and API:
s = s.replaceFirst("(\\d{3})(\\d{3})(\\d{4})", "($1) $2-$3");
I don't understand why you're asking about the faster approach, though. Have you tried something like this and experienced performance problems? You can almost certainly do this more efficiently with a StringBuilder, but in practical terms it's almost certainly not worth the effort.
Or were you talking about the time it would take to learn how to accomplish this with a regex relative to hand-coding it with a StringBuilder? That's kind of a moot point now, though. :D
A regular expression matcher with groups is really nothing else but a number of String containers, plus a lot of RE matching code. (You can actually look at the source code and see for yourself.) No way is this cheaper than just using substring()
yourself, especially with a fixed offset as in your case.
I would use a combination of java String.format()
method and String.substring()
One goes for RE
when same can not be done using substring
or is more difficult to do so.
In your case better to just use StringBuilder
and insert()
Assuming phone number length validation is in place (=10 chars)
String phoneNumber = "1234567890";
StringBuilder sb = new StringBuilder(phoneNumber)
.insert(0,"(")
.insert(4,")")
.insert(8,"-");
String output = sb.toString();
System.out.println(output);
Output
(123)456-7890
Since several answers have already addressed the greater efficiency of string builders, etc., I wanted to show you how it could be done with regex and address the benefits of using this approach.
Using this matching regex (similar to Alan Moore's expression):
(.{3})(.{3})(.{4})
allows you to match precisely 10 characters into 3 groups, then use a replace expression that references those groups, with additional characters added:
($1) $2-$3
thus producing the replacement like you requested. Of course, it will also match punctuation and letters as well, which is a reason to use \d
(encoded into a Java string as \\d
) rather than the .
wildcard character.
The potential advantage of a regex approach to something like this is the compression of "logic" to the string manipulation. Since all the "logic" can be compressed into a string of characters, rather than pre-compiled code, the regex matching and replacement strings can be stored in a database for easier manipulation, updating, or customization by an experienced user of the system. This makes the situation more complex on several levels, but allows considerably more flexibility for users.
With the other approaches (string manipulation), changing a formatting algorithm so that it will produce (555)123-4567
or 555.123.4567
instead of your specified (555) 123-4567
would essentially not be possible merely through the user interface. with the regex approach, the modification would be as simple as changing ($1) $2-$3
(in the database or similar store) into $1.$2.$3
or ($1)$2-$3
as appropriate.
If you wanted to modify your system to accept "dirtier" input, which might include various attempts at formatting, such as 555-123.4567
and reformat them to something consistent, it would be possible to make a string-manipulation algorithm that would be capable of this and recompile the application to work how you would like. With a regex solution, however, a system overhaul would not be necessary - merely change the parsing and replacement expressions like so (maybe a little complex for beginners to understand right away):
^\D*1?\D*([2-9])\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d).*$
($1$2$3) $4$5$6-$7$8$9$10
This would allow a significant "upgrade" in the program's ability, as shown in the following reformatting:
"Input" "Output"
----------------------------- --------------------------------
"1323-456-7890 540" "(323) 456-7890"
"8648217634" "(864) 821-7634"
"453453453322" "(453) 453-4533"
"@404-327-4532" "(404) 327-4532"
"172830923423456" "(728) 309-2342"
"jh345gjk26k65g3245" "(345) 266-5324"
"jh3g24235h2g3j5h3" "(324) 235-2353"
"12345678925x14" "(234) 567-8925"
"+1 (322)485-9321" "(322) 485-9321"
"804.555.1234" "(804) 555-1234"
"08648217634" <no match or reformatting>
As you can see, it is very "tolerant" of input "formatting" and knows that 1
should be ignored at the beginning of the number and that 0
should cause an error because it is invalid - all stored in a single string.
The question comes down to performance vs. potential to customize. String manipulation is faster than regex, but future enhancement customization requires a recompile rather than a simple alteration of a string. That said, there are things that can't be expressed very well (or even in as readable a fashion as the above change) and some things that are not possible with regex.
Regex allows storage of parsing algorithms into a relatively short string, which can be easily stored so as to be modifiable without recompiling. Simpler, more focused string manipulation functions are more efficient and can sometimes accomplish more than regex can. The key is to understand both tools and the requirements of the application and use the one most appropriate for the situation.
StringBuilder with substring will be faster, but not always the simplest/best approach. In this case I would just use substring.
String num = "1234567890";
String formatted = "(" + num.substring(0,3) + ") "
+ num.substring(3,6) + "-" + num.substring(6);