I have an instance of Apache Tomcat 6.x running, and I want it to interpret the character set of incoming URLs a little more intelligent than the default behavior. In partic
We already did something similar to Roland's solution on SGES2.1.1 (I thing it uses catalina same as some old Tomcats), but it had some problems:
Today, after I read many blogs and advices I deleted the whole class and did only one simple thing: parsed charset from the Content-Type header in the wrapper's constructor and set it to the wrapped instance.
It works, all our 988 tests succeeded.
private static final Pattern CHARSET_PATTERN
= Pattern.compile("(?i)\\bcharset=\\s*\"?([^\\s;\"]*)");
private static final String CHARSET_DEFAULT = "ISO-8859-2";
public CisHttpRequestWrapper(final HttpServletRequest request) {
super(request);
if (request.getCharacterEncoding() != null) {
return;
}
final String charset = parseCharset(request);
try {
setCharacterEncoding(charset);
} catch (final UnsupportedEncodingException e) {
throw new IllegalStateException("Unknown charset: " + charset, e);
}
}
private String parseCharset(final HttpServletRequest request) {
final String contentType = request.getHeader("Content-Type");
if (contentType == null || contentType.isEmpty()) {
return CHARSET_DEFAULT;
}
final Matcher m = CHARSET_PATTERN.matcher(contentType);
if (!m.find()) {
return CHARSET_DEFAULT;
}
final String charsetName = m.group(1).trim().toUpperCase();
return charsetName;
}