Converting UTF-8 to ISO-8859-1 in Java

后端 未结 4 1512
挽巷
挽巷 2020-12-08 11:45

I am reading an XML document (UTF-8) and ultimately displaying the content on a Web page using ISO-8859-1. As expected, there are a few characters are not displayed correctl

4条回答
  •  没有蜡笔的小新
    2020-12-08 12:27

    With Java 8, McDowell's answer can be simplified like this (while preserving correct handling of surrogate pairs):

    public final class HtmlEncoder {
        private HtmlEncoder() {
        }
    
        public static  T escapeNonLatin(CharSequence sequence,
                                                              T out) throws java.io.IOException {
            for (PrimitiveIterator.OfInt iterator = sequence.codePoints().iterator(); iterator.hasNext(); ) {
                int codePoint = iterator.nextInt();
                if (Character.UnicodeBlock.of(codePoint) == Character.UnicodeBlock.BASIC_LATIN) {
                    out.append((char) codePoint);
                } else {
                    out.append("&#x");
                    out.append(Integer.toHexString(codePoint));
                    out.append(";");
                }
            }
            return out;
        }
    }
    

提交回复
热议问题