Java- Converting from unicode to ANSI

泄露秘密 提交于 2020-01-15 12:25:08

问题


I have a string \u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF. I need to convert it in Avwg wKsewš—i K_v ejwQ` which is in ANSI format. How can I convert this Unicode to ANSI characters in java.

Edit:

resultView.setTypeface(typeFace);
String str=new String("\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF");               
resultView.setText(str);

回答1:


I need to convert it in AvwgwKsewš—i K_v ejwQ which is in ANSI format.

That's not ANSI format. The (misleadingly-named) "ANSI" code pages in Windows are all based around ASCII, with different characters added in the high bytes. Byte 0x41 (A) as a leading letter in an ANSI code page always means Latin A and not Bengali .

What I think you have is a custom symbol font, that maps arbitrary symbols to completely unrelated codepoints. Every such font has its own visual encoding; to convert between Unicode and the custom visual encoding you'd have to build up your own translation table by looking at the glyphs for each character and matching them to the Unicode character that represents the same letter.

I would strongly advise getting a proper Unicode-aware font that supports Bengali instead. Content stuck in an arbitrary font-specific encoding is difficult to deal with (because semantically you really are dealing with a string that means "AvwgwKsewš—i K_v ejwQ", with all the editing and case-changing gotchas that implies.

Visual-encoded fonts are an unhappy relic of the time before Windows had good Unicode (or even ISCII) support. They should not be used for anything today.




回答2:


I'm not sure exactly what you're asking, but I'll assume you're asking how to convert some characters from Unicode into an 8-bit character set. (e.g. ISO-8859-1 is the characterset for 'Western European' languages, like English).

I don't know of any way to automatically detect the relevant 8-bit charset, so I looked up one of your characters (on here http://unicode.org/charts/ ), and I can see that these characters are Bengali.

I think the equivalent 8-bit character set for Bengali is known as x-iscii-be. I don't have this installed on my system, so I couldn't do the conversion successfully.

EDIT: Java does not support the charset x-iscii-be, but I'll leave the remainder of this answer for illustration purposes. See http://download.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html for a list of supported Charsets.

EDIT2: Android certainly doesn't guarantee support for this charset (the only 8-bit characterset it guarantees is ISO-8859-1). See: http://developer.android.com/reference/java/nio/charset/Charset.html .

*So, I think you should run some Charset-detecting code on a Bengali Android device - perhaps it supports this charset. Everything you need is in my code sample. *

In order for Java to convert your data in a different charset, all you need to do in Java is to check that the desired Charset is installed, and then specify the desired Charset when you convert the String into bytes.

The conversion itself would be extremely simple:

    str.getBytes("x-iscii-be");

So, you see, the String itself is stored in a kind of 'normalised' form (i.e. the defaultCharset), and you can treat the getBytes(charsetName) as kind of 'alternative output format' for the String. Sorry - poor explanation!

In your situation, perhaps you just need to assign a Charset to the resultView, and the framework will work its magic for you ...

Here's some test code I put together to illustrate the point, and to check whether a given charset is supported on a system.

I have got this code to output the byte-arrays as 'hex' strings, so that you can see that the data is different after conversion.

import java.io.UnsupportedEncodingException;
import java.math.BigInteger;
import java.nio.charset.Charset;
import java.util.Map.Entry;
import java.util.SortedMap;

public class UnicodeTest {
    public static void main(String[] args) throws UnsupportedEncodingException {
        testWestern();
        testBengali();
    }

    public static void testWestern() throws UnsupportedEncodingException {
        String unicodeStr= "\u00c2"; //This is a capital A with an accent.;
        String charsetName= "ISO-8859-1";
        System.out.println("Input (outputted as default charset - normally unicode): "+unicodeStr);
        attempt8bitCharsetConversion(unicodeStr, charsetName);
    }

    public static void testBengali() throws UnsupportedEncodingException {
        String unicodeStr = "\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF";
        String charsetName= "x-iscii-be";
        System.out.println(unicodeStr);
        attempt8bitCharsetConversion(unicodeStr, charsetName);
    }

    public static void attempt8bitCharsetConversion(String input, String charsetName) throws UnsupportedEncodingException {
        SortedMap<String, Charset> availableCharsets = Charset
                .availableCharsets();
        for (Entry<String, Charset> entry : availableCharsets.entrySet()) {
            if (charsetName.equalsIgnoreCase(entry.getKey())) {
                System.out.println("HEXED input : "+ toHex(input.getBytes(Charset.defaultCharset().name())));
                System.out.println("HEXED output: "+ toHex(input.getBytes(entry.getKey())));
            }
        }
        throw new UnsupportedEncodingException(charsetName+ " is not supported on this system");
    }

    public static String toHex(byte[] input) throws UnsupportedEncodingException {
        return String.format("%x", new BigInteger(input));
    }
}

See also here for more information on charset conversion: http://download.oracle.com/javase/tutorial/i18n/text/string.html

Charactersets are a tricky business, so please forgive my convoluted answer.

HTH




回答3:


I've written a class which can solve the problem of 09CB ো, 09CC ৌ, 09C7 ে, 09C8 ৈ,09BF ি ্য,্র,ৃ in UTF-8, I reshape it by editing font glyph, you don't need to change it to extended ASCII, :( but still i couldn't solve your bengali conjugates. For proper render it require android 3.5 or higher, it'll work smooth on android 4.0 (Ice Cream Sandwich).



来源:https://stackoverflow.com/questions/7943781/java-converting-from-unicode-to-ansi

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!