Using a Relational Database for Schemaless Data - Best Practices

After reading a shocking article written by Bret Taylor (co-creator of FriendFeed; current CTO of Facebook), How FriendFeed uses MySQL to store schema-less data, I began to wonder if there are best practices for using a RDBMS such as Oracle, MySQL, or PostgreSQL for storing and querying schemaless data?

Few people like to admit they're using a relational database when NoSQL is the new hotness, which makes it difficult to find good articles on the topic. How do I implement a schemaless (or "document-oriented") database as a layer on top of a relational database?

thats the classic article in this topic: http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html (Using MySQL as a NoSQL - A story for exceeding 750,000 qps on a commodity server)

Storing schemaless data in SQL basically means implementing a key-value store that happens to use SQL as a back end. Since you aren't using any relational features and the schema is fairly trivial you won't find a lot of information on designing SQL databases this way. However, you should be able to find plenty of more general information on designing applications for key-value storage that will apply.

I've researched this issue extensively. It is rather trivial to model schemaless data in a RDBMS using a "properties" table (essentially using key/value pairs). The hard part is indexing and querying against your things. (Essentially all of the complexity that Friendfeed dealt with centered around this issue.)

If you index the properties table you end up with an index against all properties. This is undesirable as it adds too much overhead since you'll only want to query against certain properties. Also, you'll certainly want to access your things via compound indexes. It's incredibly complex to model compound indexes. The only solutions I've found require you to build your own indexes using schema just for that purpose -- very cumbersome. The more I looked at it the less practical it looked.

A good solution to this problem relies on using Partial Indexes (aka. Filtered Indexes).

You won't find much on this topic because most people build single purpose solutions. Their solutions are designed to meet one need very well. NoSQL databases take a lot of the pain out of building these single purpose data stores but you pay for not having the flexibility and some of the built-in controls and safety features of an RDBMS.

The engineers at Quora use MySQL as the data store instead of NoSQLs such as Cassandra, MongoDB, CouchDB etc. They partition data at the application level, which means they partition data only if necessary, keep data on one machine if possible and use a hash of the primary key to partition larger datasets across multiple databases. Application-level partitioning of data works in such a manner that data meeting one set of criteria are “handed off” to one database while data not meeting those criteria (or perhaps a different set of criteria) can be sent to a different database

来源：https://stackoverflow.com/questions/4189709/using-a-relational-database-for-schemaless-data-best-practices

标签

nosql

relational-database

schemaless