Database schema normalization checker?

岁酱吖の 提交于 2019-12-03 12:34:50

It can be done, and there are at least 2 commecially available tools that can do normalization for you: GeneXus and DeKlarit. They use a process named NormalizationBySynthesis

Bill Karwin

A tool like you describe, that tried to analyze your data and metadata and advise you of possible non-normalized structure, would give wrong suggestions so frequently that it would make Clippy, the Microsoft assistant seem like an indispensable writer's aid.

The process of normalization involves mapping software requirements to a logical data model. An analysis tool cannot know your data requirements any better than you do. So it can't infer from an incorrect database design which parts of it are wrong.

I understand you have qualified the question and you have limited expectations for the tool's capability. But you would want it to be useful for ordinary, everyday tasks -- but it would not be reliable even for the most simple cases.

Compare to static code analysis tools. Suppose you write an application and deliver it to your client, and the client says "why can't I send email from this app?" How would a static code analysis tool tell you that you omitted a desired feature? It can't know those requirements.

Likewise, how would a database normalization analysis tool know whether it's appropriate for the UserAccount table to have a single MobilePhoneNumber attribute, or if it would be more appropriate to separate phone numbers into another table so a single user could have multiple phones listed?

bkm

I have thought about this problem too. It is theoritically possible and there are some research papers on this topic. there used to be pretty cool tool at www.dbtools.cs.cornell.edu. This was developed by the famous author Raghu Ramakrishnan. He is currently at Yahoo Research. You can refer to the following papers for more information

  1. Diederich,T. and Miton,J.,(1988),"New Methods and Fast Algorithms for Database Normalization ",ACM Transactions on Database Systems, 13(3),339-365.
  2. Bernstein, P. A.(1986),"Synthesizing Third Normal Form Relations from Functional Dependencies",ACM Transactions on Database Systems , Vol.1. No. 4, pp 277-298.
  3. JMathNorm: A Database Normalization Tool Using Mathematica, Lecture Notes In Computer Science; Vol. 4488, Proceedings of the 7th international conference on Computational Science, Part II, Ali Yazici, Ziya Karakaya

The third link is very interesting. Here is the abstract of the paper:

This paper is about designing a complete interactive tool, named JMathNorm, for relational database (RDB) normalization using Mathematica. It is an extension of the prototype developed by the same authors [1] with the inclusion of Second Normal Form (2NF), and Boyce-Codd Normal Form (BCNF) in addition to the existing Third normal Form (3NF) module. The tool developed in this study is complete and can be used for real-time database design as well as an aid in teaching fundamental concepts of DB normalization to students with limited mathematical background. JMathNorm also supports interactive use of modules for experimenting the fundamental set operations such as closure, and full closure together with modules to obtain the minimal cover of the functional dependency set and testing an attribute for a candidate key. JMathNorm's GUI interface is written in Java and utilizes Mathematica's JLink facility to drive the Mathematica kernel.

I am also very interested in an answer to this question. If anyone has come across a tool, please let us know!

The open-source tool SchemaSpy detects some "Anomalies", like "Tables without indexes" or "Columns that are flagged as both 'nullable' and 'must be unique'". One of these anomalies is "Tables with incrementing column names, potentially indicating denormalization"

http://schemaspy.sourceforge.net/

I think that's hard to implement.

Like, a table

Id | Name | Surname | SSN 

is in 1NF, and

Id | Name | Surname | Mobile

is not, but you can say it neither from the design nor from data, only from the field name.

I've seen once a database for ultrasonography that actually had both GENDER and LMP (last menstrual period) in one table.

It can't be done.

Normalization is defined in terms of functional dependencies which

  1. Cannot be expressed in SQL
  2. Cannot be inferred from the data

You can look at a table such as

A | B
--+--
1 | 1
1 | 2

and deduce that B is not dependent on A (because it has two distinct values for a single value of A), but you can never deduce that a dependency does exist, e.g. A may or may not depend on B.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!