Implement a function to check if a string/byte array follows utf-8 format

后端 未结 5 1801
遥遥无期
遥遥无期 2020-12-16 00:20

I am trying to solve this interview question.

After given clearly definition of UTF-8 format. ex: 1-byte : 0b0xxxxxxx 2- bytes:.... Asked to wri

5条回答
  •  执念已碎
    2020-12-16 00:57

    the CharsetDecoder might be what you are looking for:

    @Test
    public void testUTF8() throws CharacterCodingException {
        // the desired charset
        final Charset UTF8 = Charset.forName("UTF-8");
        // prepare decoder
        final CharsetDecoder decoder = UTF8.newDecoder();
        decoder.onMalformedInput(CodingErrorAction.REPORT);
        decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
    
        byte[] bytes = new byte[48];
        new Random().nextBytes(bytes);
        ByteBuffer buffer = ByteBuffer.wrap(bytes);
        try {
            decoder.decode(buffer);
            fail("Should not be UTF-8");
        } catch (final CharacterCodingException e) {
            // noop, the test should fail here
        }
    
        final String string = "hallo welt!";
        bytes = string.getBytes(UTF8);
        buffer = ByteBuffer.wrap(bytes);
        final String result = decoder.decode(buffer).toString();
        assertEquals(string, result);
    }
    

    so your function might look like that:

    public static boolean checkEncoding(final byte[] bytes, final String encoding) {
        final CharsetDecoder decoder = Charset.forName(encoding).newDecoder();
        decoder.onMalformedInput(CodingErrorAction.REPORT);
        decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
        final ByteBuffer buffer = ByteBuffer.wrap(bytes);
    
        try {
            decoder.decode(buffer);
            return true;
        } catch (final CharacterCodingException e) {
            return false;
        }
    }
    

提交回复
热议问题