Detect the URI encoding automatically in Tomcat

后端 未结 2 1356
小蘑菇
小蘑菇 2020-12-03 02:36

I have an instance of Apache Tomcat 6.x running, and I want it to interpret the character set of incoming URLs a little more intelligent than the default behavior. In partic

2条回答
  •  囚心锁ツ
    2020-12-03 02:54

    We already did something similar to Roland's solution on SGES2.1.1 (I thing it uses catalina same as some old Tomcats), but it had some problems:

    1. it duplicates what the application server does
    2. it must take care also to internal JSP requests, included pages with parameters ...
    3. it must parse query string
    4. it must do it all again everytime is setRequest called, but later, because of 2.
    5. it is too heavy workaround

    Today, after I read many blogs and advices I deleted the whole class and did only one simple thing: parsed charset from the Content-Type header in the wrapper's constructor and set it to the wrapped instance.

    It works, all our 988 tests succeeded.

    private static final Pattern CHARSET_PATTERN 
        = Pattern.compile("(?i)\\bcharset=\\s*\"?([^\\s;\"]*)");
    private static final String CHARSET_DEFAULT = "ISO-8859-2";
    
    public CisHttpRequestWrapper(final HttpServletRequest request) {
      super(request);
      if (request.getCharacterEncoding() != null) {
        return;
      }
      final String charset = parseCharset(request);
      try {
        setCharacterEncoding(charset);
      } catch (final UnsupportedEncodingException e) {
        throw new IllegalStateException("Unknown charset: " + charset, e);
      }
    }
    
    private String parseCharset(final HttpServletRequest request) {
      final String contentType = request.getHeader("Content-Type");
      if (contentType == null || contentType.isEmpty()) {
        return CHARSET_DEFAULT;
      }
      final Matcher m = CHARSET_PATTERN.matcher(contentType);
      if (!m.find()) {
        return CHARSET_DEFAULT;
      }
      final String charsetName = m.group(1).trim().toUpperCase();
      return charsetName;
    }
    

提交回复
热议问题