Using a Relational Database for Schemaless Data - Best Practices

﹥>﹥吖頭↗ 提交于 2019-11-28 16:44:05

thats the classic article in this topic: http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html (Using MySQL as a NoSQL - A story for exceeding 750,000 qps on a commodity server)

Storing schemaless data in SQL basically means implementing a key-value store that happens to use SQL as a back end. Since you aren't using any relational features and the schema is fairly trivial you won't find a lot of information on designing SQL databases this way. However, you should be able to find plenty of more general information on designing applications for key-value storage that will apply.

I've researched this issue extensively. It is rather trivial to model schemaless data in a RDBMS using a "properties" table (essentially using key/value pairs). The hard part is indexing and querying against your things. (Essentially all of the complexity that Friendfeed dealt with centered around this issue.)

If you index the properties table you end up with an index against all properties. This is undesirable as it adds too much overhead since you'll only want to query against certain properties. Also, you'll certainly want to access your things via compound indexes. It's incredibly complex to model compound indexes. The only solutions I've found require you to build your own indexes using schema just for that purpose -- very cumbersome. The more I looked at it the less practical it looked.

A good solution to this problem relies on using Partial Indexes (aka. Filtered Indexes).

You won't find much on this topic because most people build single purpose solutions. Their solutions are designed to meet one need very well. NoSQL databases take a lot of the pain out of building these single purpose data stores but you pay for not having the flexibility and some of the built-in controls and safety features of an RDBMS.

The engineers at Quora use MySQL as the data store instead of NoSQLs such as Cassandra, MongoDB, CouchDB etc. They partition data at the application level, which means they partition data only if necessary, keep data on one machine if possible and use a hash of the primary key to partition larger datasets across multiple databases. Application-level partitioning of data works in such a manner that data meeting one set of criteria are “handed off” to one database while data not meeting those criteria (or perhaps a different set of criteria) can be sent to a different database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!