I\'m processing some Java source code using Java. I\'m extracting the string literals and feeding them to a function taking a String. The problem is that I need to pass the
If you are reading unicode escaped chars from a file, then you will have a tough time doing that because the string will be read literally along with an escape for the back slash:
my_file.txt
Blah blah...
Column delimiter=;
Word delimiter=\u0020 #This is just unicode for whitespace
.. more stuff
Here, when you read line 3 from the file the string/line will have:
"Word delimiter=\u0020 #This is just unicode for whitespace"
and the char[] in the string will show:
{...., '=', '\\', 'u', '0', '0', '2', '0', ' ', '#', 't', 'h', ...}
Commons StringUnescape will not unescape this for you (I tried unescapeXml()). You'll have to do it manually as described here.
So, the sub-string "\u0020" should become 1 single char '\u0020'
But if you are using this "\u0020" to do String.split("... ..... ..", columnDelimiterReadFromFile) which is really using regex internally, it will work directly because the string read from file was escaped and is perfect to use in the regex pattern!! (Confused?)