Is there a way to use UTF-8 with app engine?

前端 未结 4 480
忘掉有多难
忘掉有多难 2020-12-10 03:06

I\'m looking for some explanation on how the app engine deals with character encodings. I\'m working on a client-server application where the server is on app engine.

<
相关标签:
4条回答
  • 2020-12-10 03:35

    I see two things you should do.

    1) set system-properties (if you are using it) to utf8 in your appengine-web.xml

    <system-properties>
        <property name="java.util.logging.config.file" value="WEB-INF/logging.properties" />
        <property name="file.encoding" value="UTF-8" />
        <property name="DEFAULT_ENCODING" value="UTF-8" />
    </system-properties>
    

    OK that above is what I have but the docs suggest this below:

    <env-variables>
        <env-var name="DEFAULT_ENCODING" value="UTF-8" />
    </env-variables>
    

    https://developers.google.com/appengine/docs/java/config/appconfig

    2) specify the encoding when you set the content type or it will revert to the default

    The content type may include the type of character encoding used, for example, text/html; charset=ISO-8859-4.

    I'd try

    resp.setContentType("application/json; charset=UTF-8");
    

    You could also try a writer which lets you set the content type to it directly.

    http://docs.oracle.com/javaee/1.3/api/javax/servlet/ServletResponse.html#getWriter%28%29
    http://docs.oracle.com/javaee/1.3/api/javax/servlet/ServletResponse.html#setContentType(java.lang.String)

    For what it's worth, I need utf8 for Japanese content and I have no trouble. I'm not using a filter or setContentType anyway. I am using gwt and #1 above and it works.

    0 讨论(0)
  • 2020-12-10 03:39

    Workaround (safe)

    Nothing of these answers worked for me, so I wrote this class to encode UTF-Strings to ASCII-Strings (replacing all chars which are not in the ASCII-table with their table-number, preceded and followed by a mark), using AsciiEncoder.encode(yourString)

    The String can then be decoded back to UTF with AsciiEncoder.decode(yourAsciiEncodedString).

    package <your_package>;
    
    import java.util.ArrayList;
    
    /**
     * Created by Micha F. aka Peracutor.
     * 04.06.2017
     */
    
    public class AsciiEncoder {
    
        public static final char MARK = '%'; //use whatever ASCII-char you like (should be occurring not often in regular text)
    
        public static String encode(String s) {
            StringBuilder result = new StringBuilder(s.length() + 4 * 10); //buffer for 10 special characters (4 additional chars for every special char that gets replaced)
            for (char c : s.toCharArray()) {
                if ((int) c > 127 || c == MARK) {
                    result.append(MARK).append((int) c).append(MARK);
                } else {
                    result.append(c);
                }
            }
            return result.toString();
        }
    
        public static String decode(String s) {
            int lastMark = -1;
            ArrayList<Character> chars = new ArrayList<>();
            try {
                //noinspection InfiniteLoopStatement
                while (true) {
                    String charString = s.substring(lastMark = s.indexOf(MARK, lastMark + 1) + 1, lastMark = s.indexOf(MARK, lastMark));
                    char c = (char) Integer.parseInt(charString);
                    chars.add(c);
                }
            } catch (IndexOutOfBoundsException | NumberFormatException ignored) {}
    
            for (char c : chars) {
                s = s.replace("" + MARK + ((int) c) + MARK, String.valueOf(c));
            }
            return s;
        }
    }
    

    Hope this helps someone.

    0 讨论(0)
  • 2020-12-10 03:57

    Found a way to work around it. This is how I did it:

    • Used "application/json; charset=UTF-8" as the content-type. Alternatively, set the response charset to "UTF-8" (either will work fine, no need to do both).

    • Base64-encoded the input strings that aren't ASCII-safe and come as UTF-8. Otherwise they get converted to ISO-8859-1 when they get to the servlet, apparently.

    • Used resp.getWriter() instead of resp.getOutputStream() to print the JSON response.

    After all those conditions were met, I was finally able to output UTF-8 back to the client.

    0 讨论(0)
  • 2020-12-10 03:59

    This is not specific to GAE, but in case you find it useful: I made my own filter:

    In web.xml

    <filter>
        <filter-name>charsetencoding</filter-name>
        <filter-class>mypackage.CharsetEncodingFilter</filter-class>
    </filter>
        ...
    <filter-mapping>
       <filter-name>charsetencoding</filter-name>
       <url-pattern>/*</url-pattern> 
    </filter-mapping>
    

    (place the filter-mapping fragment quite at the beginning of the filter-mappings, and check your url-pattern.

    And

    public class CharsetEncodingFilter implements Filter {
    
        public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
            HttpServletRequest req = (HttpServletRequest) request;
            HttpServletResponse res = (HttpServletResponse) response;
            req.setCharacterEncoding("UTF-8");
            chain.doFilter(req, res);
            res.setCharacterEncoding("UTF-8");
        }
    
        public void destroy() { }
    
        public void init(FilterConfig filterConfig) throws ServletException { }
    }
    
    0 讨论(0)
提交回复
热议问题