Is there a way to use UTF-8 with app engine?

情到浓时终转凉″ 提交于 2019-11-27 03:14:37

问题


I'm looking for some explanation on how the app engine deals with character encodings. I'm working on a client-server application where the server is on app engine.

This is a new application built from scratch, so we're using UTF-8 everywhere. The client sends some strings to the server through POST, x-www-form-urlencoded. I receive them and echo them back. When the client gets it back, it's ISO-8859-1! I also see this behavior when POSTing to the blobstore, with the parameters sent as UTF-8, multipart/form-data encoded.

For the record, I'm seeing this in Wireshark. So I'm 100% sure I send UTF-8 and receive ISO-8859-1. Also, I'm not seeing mojibake: the ISO-8859-1 encoded strings are perfectly fine. This is also not an issue of misinterpreting the Content-Type. It's not the client. Something along the way is correctly recognizing I'm sending UTF-8 parameters, but is converting them to ISO-8859-1 for some reason.

I'm led to believe ISO-8859-1 is the default character encoding for the GAE servlets. My question is, is there a way to tell GAE not to convert to ISO-8859-1 and instead use UTF-8 everywhere?

Let's say the servlet does something like this:

public void doPost(HttpServletRequest req, HttpServletResponse resp) throws IOException {
    resp.setContentType("application/json");
    String name = req.getParameter("name");
    String json = "{\"name\":\"" + name + "\"}";
    resp.getOutputStream().print(json);
}

I tried setting the character encoding of the response and request to "UTF-8", but that didn't change anything.

Thanks in advance,


回答1:


I see two things you should do.

1) set system-properties (if you are using it) to utf8 in your appengine-web.xml

<system-properties>
    <property name="java.util.logging.config.file" value="WEB-INF/logging.properties" />
    <property name="file.encoding" value="UTF-8" />
    <property name="DEFAULT_ENCODING" value="UTF-8" />
</system-properties>

OK that above is what I have but the docs suggest this below:

<env-variables>
    <env-var name="DEFAULT_ENCODING" value="UTF-8" />
</env-variables>

https://developers.google.com/appengine/docs/java/config/appconfig

2) specify the encoding when you set the content type or it will revert to the default

The content type may include the type of character encoding used, for example, text/html; charset=ISO-8859-4.

I'd try

resp.setContentType("application/json; charset=UTF-8");

You could also try a writer which lets you set the content type to it directly.

http://docs.oracle.com/javaee/1.3/api/javax/servlet/ServletResponse.html#getWriter%28%29
http://docs.oracle.com/javaee/1.3/api/javax/servlet/ServletResponse.html#setContentType(java.lang.String)

For what it's worth, I need utf8 for Japanese content and I have no trouble. I'm not using a filter or setContentType anyway. I am using gwt and #1 above and it works.




回答2:


Found a way to work around it. This is how I did it:

  • Used "application/json; charset=UTF-8" as the content-type. Alternatively, set the response charset to "UTF-8" (either will work fine, no need to do both).

  • Base64-encoded the input strings that aren't ASCII-safe and come as UTF-8. Otherwise they get converted to ISO-8859-1 when they get to the servlet, apparently.

  • Used resp.getWriter() instead of resp.getOutputStream() to print the JSON response.

After all those conditions were met, I was finally able to output UTF-8 back to the client.




回答3:


This is not specific to GAE, but in case you find it useful: I made my own filter:

In web.xml

<filter>
    <filter-name>charsetencoding</filter-name>
    <filter-class>mypackage.CharsetEncodingFilter</filter-class>
</filter>
    ...
<filter-mapping>
   <filter-name>charsetencoding</filter-name>
   <url-pattern>/*</url-pattern> 
</filter-mapping>

(place the filter-mapping fragment quite at the beginning of the filter-mappings, and check your url-pattern.

And

public class CharsetEncodingFilter implements Filter {

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        HttpServletRequest req = (HttpServletRequest) request;
        HttpServletResponse res = (HttpServletResponse) response;
        req.setCharacterEncoding("UTF-8");
        chain.doFilter(req, res);
        res.setCharacterEncoding("UTF-8");
    }

    public void destroy() { }

    public void init(FilterConfig filterConfig) throws ServletException { }
}



回答4:


Workaround (safe)

Nothing of these answers worked for me, so I wrote this class to encode UTF-Strings to ASCII-Strings (replacing all chars which are not in the ASCII-table with their table-number, preceded and followed by a mark), using AsciiEncoder.encode(yourString)

The String can then be decoded back to UTF with AsciiEncoder.decode(yourAsciiEncodedString).

package <your_package>;

import java.util.ArrayList;

/**
 * Created by Micha F. aka Peracutor.
 * 04.06.2017
 */

public class AsciiEncoder {

    public static final char MARK = '%'; //use whatever ASCII-char you like (should be occurring not often in regular text)

    public static String encode(String s) {
        StringBuilder result = new StringBuilder(s.length() + 4 * 10); //buffer for 10 special characters (4 additional chars for every special char that gets replaced)
        for (char c : s.toCharArray()) {
            if ((int) c > 127 || c == MARK) {
                result.append(MARK).append((int) c).append(MARK);
            } else {
                result.append(c);
            }
        }
        return result.toString();
    }

    public static String decode(String s) {
        int lastMark = -1;
        ArrayList<Character> chars = new ArrayList<>();
        try {
            //noinspection InfiniteLoopStatement
            while (true) {
                String charString = s.substring(lastMark = s.indexOf(MARK, lastMark + 1) + 1, lastMark = s.indexOf(MARK, lastMark));
                char c = (char) Integer.parseInt(charString);
                chars.add(c);
            }
        } catch (IndexOutOfBoundsException | NumberFormatException ignored) {}

        for (char c : chars) {
            s = s.replace("" + MARK + ((int) c) + MARK, String.valueOf(c));
        }
        return s;
    }
}

Hope this helps someone.



来源:https://stackoverflow.com/questions/11907764/is-there-a-way-to-use-utf-8-with-app-engine

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!