Check if a String contains encoded characters

前端 未结 6 1037
情深已故
情深已故 2020-12-18 08:46

Hello I am looking for a way to detect if a string has being encoded

For example

    String name = \"Hellä world\";
    String encoded = new String(n         


        
相关标签:
6条回答
  • 2020-12-18 09:15
    String name = "Hellä world";
    String encoded = new String(name.getBytes("utf-8"), "iso8859-1");
    

    This code is just a character corruption bug. You take a UTF-16 string, transcode it to UTF-8, pretend it is ISO-8859-1 and transcode it back to UTF-16, resulting in incorrectly encoded characters.

    0 讨论(0)
  • 2020-12-18 09:19

    Sounds like you want to check if a string that was decoded from bytes in latin1 could have been decoded in UTF-8, too. That's easy because illegal byte sequences are replaced by the character \ufffd:

    String recoded = new String(encoded.getBytes("iso-8859-1"), "UTF-8");
    return recoded.indexOf('\uFFFD') == -1; // No replacement character found
    
    0 讨论(0)
  • 2020-12-18 09:30

    Your question doesn't make sense. A java String is a list of characters. They don't have an encoding until you convert them into bytes, at which point you need to specify one (although you will see a lot of code that uses the platform default, which is what e.g. String.getBytes() with no argument does).

    I suggest you read this http://kunststube.net/encoding/.

    0 讨论(0)
  • 2020-12-18 09:33

    If I correctly understood your question, this code may help you. The function isEncoded check if its parameter could be encoded as ascii or if it contains non ascii-chars.

    public boolean isEncoded(String text){
    
        Charset charset = Charset.forName("US-ASCII");
        String checked=new String(text.getBytes(charset),charset);
        return !checked.equals(text);
    
    }
    
    @Test
    public void testAscii() throws Exception{
        Assert.assertFalse(isEncoded("Hello world"));
    }
    
    
    @Test
    public void testNonAscii() throws Exception{
        Assert.assertTrue(isEncoded("Hellä world"));
    }
    

    You can also check for other charset changing charset var or moving it to a parameter.

    0 讨论(0)
  • 2020-12-18 09:38

    You can check that your string is encoded or not by this code

    public boolean isEncoded(String input) {
    
        char[] charArray = input.toCharArray();
        for (int i = 0, charArrayLength = charArray.length; i < charArrayLength; i++) {
            Character c = charArray[i];
            if (Character.getType(c) == Character.OTHER_LETTER)){
                return true;
            }
        }
        return false;
    }
    
    0 讨论(0)
  • 2020-12-18 09:40

    I'm not really sure what are you trying to do or what is your problem.

    This line doesn't make any sense:

    String encoded = new String(name.getBytes("utf-8"), "iso8859-1");
    

    You are encoding your name into "UTF-8" and then trying to decode as "iso8859-1".

    If you what to encode your name as "iso8859-1" just do name.getBytes("iso8859-1").

    Please tell us what is the problem you encountered so that we can help more.

    0 讨论(0)
提交回复
热议问题