Adding information, xml column or new table?

问题

We want to extend our database to create Multilanguage support but we are unsure how to do this. Our database looks like this:

ID – Name – Description – (a lot of irrelevant columns)

Option 1 is to add an xml column to the table, in this column we can store the information we need like this:

<translation>
    <language value=’en’>
        <Name value=’’>
        <Description value=’’>
    </language>
    <language value=’fr’>
        <Name value=’’>
        <Description value=’’>
    </language>
</translation>

Does the trick and the advantage is that when I delete the row, I also delete the translations.

Option 2 is to add an extra table, it’s easy to create a table to store the information in, but it requires inner joins when getting the information and more effort to delete rows when the original row is deleted.

What is the preferred option in this case? Or are there other good solutions for this?

回答1:

I'd recommend the "relational" approach, i.e. separate translation table(s). Consider doing it like this:

This model has some nice properties:

For each multi-lingual table, create a separate translation table. This way, you can use the fields appropriate for that particular table, and the translation cannot be "misconnected" to the wrong table.
The existence of the LANGUAGE table and the associated FOREIGN KEYs, ensures that a translation cannot exist for non-existent language, unlike the XML.
ON DELETE CASCADE referential action will ensure no "orphaned" translation can be left behind when a language is removed, unlike the XML.
While XML may be faster in simpler cases, I suspect JOIN is more scalable when the number of languages grows.¹ In any case, measure the difference and decide for yourself if it's significant enough.
Separate fields such as NAME and DESCRIPTION may be easier to index. With XML, you'd probably need a DBMS with special support for XML, or possibly some sort of full-text index.
Fields such as NAME and DESCRIPTION will likely be just regular VARCHARs. OTOH, putting them together may produce XML too large for a regular VARCHAR, forcing you to use a CLOB/BLOB, which may have its own performance complications.
If your DBMS supports clustering (see below), the whole translation table can be stored in a single B-Tree. XML has a lot of redundant data (opening and closing tags), likely making it larger and less cache-friendly than the B-Tree (even when we count-in all the associated overheads).

You'll notice that the model above uses identifying relationships and the resulting PK: {LANGUAGE_ID, TABLEx_ID} can be used for clustering (so the translations that belong to the same language are stored physically close together in the database). As long you have few predominant (or "hot") languages, this should be OK - the caching is done at the database page level, so avoiding mixing "hot" and "cold" data in the same page avoids caching "cold" data (and making the cache "smaller").

OTOH, if you routinely need to query for many languages, consider flipping the clustering key order to: {TABLEx_ID, LANGUAGE_ID}, so all the translations of the same row are stored physically close together in the database. Once you retrieve one translation, other translations of the same row are probably already cached. Or, if you want to extract multiple translations in the single query, you could do it with less I/O.

¹ We can JOIN just to the translation in the desired language. With XML, you must load (and parse) the whole XML, before deciding to use only a small portion of it that pertains to the desired language. Whenever you add a new languages (and the associated translations to the XML), it slows down the processing of existing rows even if you rarely use the new language.

来源：https://stackoverflow.com/questions/14835486/adding-information-xml-column-or-new-table

标签

sql

database-design