How would storing XML in a relational database violate Normalization Principles?

问题

In this book: Regina Obe & Leo Hsu, PostgreSQL Up & Running, p. 101. It is written as an introduction to PostgreSQL XML data-type:

The XML data type, similar to JSON, is “controversial” in a relational database because it violates principles of normalization.

Without further explanation. Could someone detail what is normalization principles and why XML does violate some of those principles.

回答1:

There are many tutorials on relational database normalization in books and on the web. See for example https://en.wikipedia.org/wiki/Database_normalization

First normal form says that a column should contain only "atomic" or "indivisible" values - which if you interpret it over-strictly means you're not even allowed to store a date. Storing an XML document in a column certainly goes against that principle. Which doesn't mean it's necessarily a bad thing to do, just that you need to be aware of the consequences (which generally means that updating the database and keeping it consistent is going to be more difficult).

回答2:

The relational model is a first-order logical model, meaning that variables in our predicates can only contain values. Any structure / associations among values should be recorded as relations so that normalization and other relational features like queries and constraints can operate on them.

Storing complex values like XML or JSON as opaque values is not a problem, but when we interpret these as data structures, we have a higher order model (predicates which vary over predicates). Such models are much more complicated to deal with in general (despite looking more natural at first). For example, it would require additional operators to traverse, join, manipulate, compare and constrain (parts of) hierarchies.

回答3:

Current mainstream thinking about 1NF is that it is undefinable, formally speaking (the reason being that the notion of "atomicity" on which the usual definitions rely is itself not formally definable - and as an illustration I need to point no further than Michael Kay who put the terms "atomic" and "undivisible" in scare quotes). So assessing whether something constitutes a violation of 1NF is not objectively decidable.

The consequence is that, purely formally speaking, "there is no problem" - as rb stated. But that only means that it is terribly hard to pin down precisely, in mathematical terms, what the downsides of the approach actually are, not that there are no such downsides.

EDIT

a good read :

https://www.simple-talk.com/sql/learn-sql-server/facts-and-fallacies-about-first-normal-form/

With the proviso that 1NF should be taken to mean "what the original 1NF intended to address, at least in spirit".

and note in particular that its conclusion mostly applies just as well to your scenario :

The effects of violating 1NF are sometimes considered harmless, though most of them compromise the structural soundness and integrity of the schema. You will do little more than impose added burden on the overall stability of a database if you try to “patch” up the problem by using complex routines that parse and pivot values or rely on external applications to enforce sufficient integrity . In a nutshell, any perception that you have achieved simplicity of design by eschewing 1NF is merely an illusion. On the other hand, there is much to gain by simply embracing it as the most foundational dictum of integrity in data management.

来源：https://stackoverflow.com/questions/39516417/how-would-storing-xml-in-a-relational-database-violate-normalization-principles

标签

xml

relational-database

normalization