Recover wrongly encoded character (Java )

前端 未结 1 1452
北恋
北恋 2020-12-21 12:51

We ran some java code using cron in Linux to persist thousands of records in production database. The locale charmap in that box was \"ANSI_X3.4-1968\". Now, we took followi

相关标签:
1条回答
  • 2020-12-21 13:31

    Basically no. You made the biggest mistake in new String(insertSpecial.getBytes(), "UTF-8"); which again shows that character encoding is surprisingly difficult to handle.

    What that piece of code does, step by step:

    1. Give me the bytes from insertSpecial in the platform encoding
    2. Create a new String from the bytes, telling that the bytes are UTF-8 (even though the bytes were gotten in platform encoding just previously)

    I've seen this code several times, and unfortunately it only breaks things. It's completely unnecessary and it doesn't "convert" anything even if it were written correctly. If the platform encoding is not UTF-8 then it will most likely destroy any special characters (or even the whole String if there's a suitable difference between platform encoding and the one given in the String constructor).

    The question mark is a placeholder for a character that could not be converted, meaning it's forever gone.

    Here's some reading so you won't make that mistake again: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

    0 讨论(0)
提交回复
热议问题