Java web application i18n

后端 未结 3 452
南方客
南方客 2021-02-01 10:16

I\'ve been given the (rather daunting) task of introducing i18n to a J2EE web application using the 2.3 servlet specification. The application is very large and has been in acti

3条回答
  •  暖寄归人
    2021-02-01 11:16

    There are a lot of things that need to be taken care of while internationalizing application:

    Locale detection

    The very first thing you need to think about is to detect end-user's Locale. Depending on what you want to support it might be easy or a bit complicated.

    1. As you surely know, web browsers tend to send end-user's preferred language via HTTP Accept-Language header. Accessing this information in the Servlet might be as simple as calling request.getLocale(). If you are not planning to support any fancy Locale Detection workflow, you might just stick to this method.
    2. If you have User Profiles in your application, you might want to add Preferred Language and Preferred Formatting Locale to it. In such case you would need to switch Locale after user logs in.
    3. You might want to support URL-based language switching (for example: http://deutsch.example.com/ or http://example.com?lang=de). You would need to set valid Locale based on URL information - this could be done in various ways (i.e. URL Filter).
    4. You might want to support language switching (selecting it from drop-down menu, or something), however I would not recommend it (unless it is combined with point 3).

    JSTL approach could be sufficient if you just want to support first method or if you are not planning to add any additional dependencies (like Spring Framework).

    While we are at Spring Framework it has quite a few nice features that you can use both to detect Locale (like CookieLocaleResolver, AcceptHeaderLocaleResolver, SessionLocaleResolver and LocaleChangeInterceptor) and externalizing strings and formatting messages (see spring:message tab).
    Spring Framework would allow you to quite easily implement all the scenarios above and that is why I prefer it.

    String externalization

    This is something that should be easy, right? Well, mostly it is - just use appropriate tag. The only problem you might face is when it comes to externalizing client-side (JavaScript) texts. There are several possible approaches, but let me mention these two:

    1. Have each JSP written array of translated strings (with message tag) and simply access that array in client code. This is easier approach but less maintainable - you would need to actually write valid strings from valid pages (the ones that actually reference your client-side scripts). I have done that before and believe me, this is not something you want to do in large application (but it is probably the best solution for small one).
    2. Another approach may sound hard in principle but it is actually way easier to handle in the future. The idea is to centralize strings on client side (move them to some common JavaScript file). After that you would need to implement your own Servlet that will return this script upon request - the contents should be translated. You won't be able to use JSTL here, you would need to get strings from Resource Bundles directly.
      It is much easier to maintain, because you would have one, central point to add translatable strings.

    Concatenations

    I hate to say that, but concatenations are really painful from Localizability perspective. They are very common and most people don't realize it.

    So what is concatenation then?

    On the principle, each English sentence need to be translated to target language. The problem is, it happens many times that correctly translated message uses different word order than its English counterpart (so English "Security policy" is translated to Polish "Polityka bezpieczeństwa" - "policy" is "polityka" - the order is different).

    OK, but how it is related to software?

    In web application you could concatenate Strings like this:

    String securityPolicy = "Security " + "policy";
    

    or like this:

    Security policy

    Both would be problematic. In the first case you would need to use MessageFormat.format() method and externalize strings as (for example) "Security {0}" and "policy", in the latter you would externalize the contents of the whole paragraph (p tag), including span tag. I know that this is painful for translators but there is really no better way.
    Sometimes you have to use dynamic content in your paragraph - JSTL fmt:format tag will help you here as well (it works lime MessageFormat on the backend side).

    Layouts

    In localized application, it often happens that translated strings are way longer than English ones. The result could look very ugly. Somehow, you would need to fix styles. There are again two approaches:

    1. Fix issues as they happen by adjusting common styles (and pray that it won't break other languages). This is very painful to maintain.
    2. Implement CSS Localization Mechanism. The mechanism I am talking about should serve default, language-independent CSS file and per-language overrides. The idea is to have override CSS file for each language, so that you can adjust layouts on-demand (just for one language). In order to do that, default CSS file, as well as JSP pages must not contain !important keyword next to any style definitions. If you really have to use it, move them to language-based en.css - this would allow other languages to modify them.

    Culture specific issues

    Avoid using graphics, colors and sounds that might be specific for western culture. If you really need it, please provide means of Localization. Avoid direction-sensitive graphics (as this would be a problem when you try to localize to say Arabic or Hebrew). Also, do not assume that whole world is using the same numbers (i.e. not true for Arabic).

    Dates and time zones

    Handling dates in times in Java is to say the least not easy. If you are not going to support anything else than Gregorian Calendar, you could stick to built-in Date and Calendar classes. You can use JSTL fmt:timeZone, fmt:formatDate and fmt:parseDate to correctly set time zone, format and parse date in JSP.

    I strongly suggest to use fmt:formatDate like this:

    
    

    It is important to covert date and time to valid (end user's) time zone. Also it is quite important to convert it to easily understandable format - that is why I recommend default formatting style.
    BTW. Time zone detection is not something easy, as web browsers are not so nice to send anything. Instead, you can either add preferred time zone field to User preferences (if you have one) or get current time zone offset from web browser via client side script (see Date object's methods)

    Numbers and currencies

    Numbers as well as currencies should be converted to local format. It is done in the similar way to formatting dates (parsing is also done similarly):

     
    

    Compound messages

    You already have been warned not to concatenate strings. Instead you would probably use MessgageFormat. However, I must state that you should minimize use of compound messages. That is just because target grammar rules are quite commonly different, so translators might need not only to re-order the sentence (this would be resolved by using placeholders and MessageFormat.format()), but translate the whole sentence in different way based on what will be substituted. Let me give you some examples:

    // Multiple plural forms
    English: 4 viruses found.
    Polish: Znaleziono 4 wirusy. **OR** Znaleziono 5 wirusów.
    
    // Conjugation
    English: Program encountered incorrect character | Application encountered incorrect character.
    Polish: Program napotkał nieznaną literę | Aplikacja napotkała nieznaną literę.
    

    Character encoding

    If you are planning to Localize into languages that does not support ISO 8859-1 code page, you would need to support Unicode - the best way is to set page encoding to UTF-8. I have seen people doing it like this:

    <%@ page contentType="text/html; charset=UTF-8" %>
    

    I must warn you: this is not enough. You actually need this declaration:

    <%@page pageEncoding="UTF-8" %>
    

    Also, you would still need to declare encoding in the page header, just to be on the safe side:

     
    

    The list I gave you is not exhaustive but this is good starting point. Good luck :)

提交回复
热议问题