Is there a way to use UTF-8 with app engine?

情到浓时终转凉″ 提交于 2019-11-27 03:14:37


I'm looking for some explanation on how the app engine deals with character encodings. I'm working on a client-server application where the server is on app engine.

This is a new application built from scratch, so we're using UTF-8 everywhere. The client sends some strings to the server through POST, x-www-form-urlencoded. I receive them and echo them back. When the client gets it back, it's ISO-8859-1! I also see this behavior when POSTing to the blobstore, with the parameters sent as UTF-8, multipart/form-data encoded.

For the record, I'm seeing this in Wireshark. So I'm 100% sure I send UTF-8 and receive ISO-8859-1. Also, I'm not seeing mojibake: the ISO-8859-1 encoded strings are perfectly fine. This is also not an issue of misinterpreting the Content-Type. It's not the client. Something along the way is correctly recognizing I'm sending UTF-8 parameters, but is converting them to ISO-8859-1 for some reason.

I'm led to believe ISO-8859-1 is the default character encoding for the GAE servlets. My question is, is there a way to tell GAE not to convert to ISO-8859-1 and instead use UTF-8 everywhere?

Let's say the servlet does something like this:

public void doPost(HttpServletRequest req, HttpServletResponse resp) throws IOException {
    String name = req.getParameter("name");
    String json = "{\"name\":\"" + name + "\"}";

I tried setting the character encoding of the response and request to "UTF-8", but that didn't change anything.

Thanks in advance,


I see two things you should do.

1) set system-properties (if you are using it) to utf8 in your appengine-web.xml

    <property name="java.util.logging.config.file" value="WEB-INF/" />
    <property name="file.encoding" value="UTF-8" />
    <property name="DEFAULT_ENCODING" value="UTF-8" />

OK that above is what I have but the docs suggest this below:

    <env-var name="DEFAULT_ENCODING" value="UTF-8" />

2) specify the encoding when you set the content type or it will revert to the default

The content type may include the type of character encoding used, for example, text/html; charset=ISO-8859-4.

I'd try

resp.setContentType("application/json; charset=UTF-8");

You could also try a writer which lets you set the content type to it directly.

For what it's worth, I need utf8 for Japanese content and I have no trouble. I'm not using a filter or setContentType anyway. I am using gwt and #1 above and it works.


Found a way to work around it. This is how I did it:

  • Used "application/json; charset=UTF-8" as the content-type. Alternatively, set the response charset to "UTF-8" (either will work fine, no need to do both).

  • Base64-encoded the input strings that aren't ASCII-safe and come as UTF-8. Otherwise they get converted to ISO-8859-1 when they get to the servlet, apparently.

  • Used resp.getWriter() instead of resp.getOutputStream() to print the JSON response.

After all those conditions were met, I was finally able to output UTF-8 back to the client.


This is not specific to GAE, but in case you find it useful: I made my own filter:

In web.xml


(place the filter-mapping fragment quite at the beginning of the filter-mappings, and check your url-pattern.


public class CharsetEncodingFilter implements Filter {

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        HttpServletRequest req = (HttpServletRequest) request;
        HttpServletResponse res = (HttpServletResponse) response;
        chain.doFilter(req, res);

    public void destroy() { }

    public void init(FilterConfig filterConfig) throws ServletException { }


Workaround (safe)

Nothing of these answers worked for me, so I wrote this class to encode UTF-Strings to ASCII-Strings (replacing all chars which are not in the ASCII-table with their table-number, preceded and followed by a mark), using AsciiEncoder.encode(yourString)

The String can then be decoded back to UTF with AsciiEncoder.decode(yourAsciiEncodedString).

package <your_package>;

import java.util.ArrayList;

 * Created by Micha F. aka Peracutor.
 * 04.06.2017

public class AsciiEncoder {

    public static final char MARK = '%'; //use whatever ASCII-char you like (should be occurring not often in regular text)

    public static String encode(String s) {
        StringBuilder result = new StringBuilder(s.length() + 4 * 10); //buffer for 10 special characters (4 additional chars for every special char that gets replaced)
        for (char c : s.toCharArray()) {
            if ((int) c > 127 || c == MARK) {
                result.append(MARK).append((int) c).append(MARK);
            } else {
        return result.toString();

    public static String decode(String s) {
        int lastMark = -1;
        ArrayList<Character> chars = new ArrayList<>();
        try {
            //noinspection InfiniteLoopStatement
            while (true) {
                String charString = s.substring(lastMark = s.indexOf(MARK, lastMark + 1) + 1, lastMark = s.indexOf(MARK, lastMark));
                char c = (char) Integer.parseInt(charString);
        } catch (IndexOutOfBoundsException | NumberFormatException ignored) {}

        for (char c : chars) {
            s = s.replace("" + MARK + ((int) c) + MARK, String.valueOf(c));
        return s;

Hope this helps someone.

