What scalability problems have you encountered using a NoSQL data store? [closed]

后端未结

关注

 15  1966

别跟我提以往 2020-12-04 04:19

15条回答

爱一瞬间的悲伤 (楼主)

2020-12-04 04:51
I switched from MySQL(InnoDB) to cassandra for a M2M system, which basically stores time-series of sensors for each device. Each data is indexed by (device_id,date) and (device_id,type_of_sensor,date). The MySQL version contained 20 millions of rows.

MySQL:
- Setup in master-master synchronization. Few problem appeared around loss of synchronization. It was stressful and especially in the beginning could take hours to fix.
- Insertion time wasn't a problem but querying required more and more memory as the data grew. The problem is the indexes are considered as a whole. In my case, I was only using a very thin parts of the indexes that were necessary to load in memory (only few percent of the devices were frequently monitored and it was on the most recent data).
- It was hard to backup. Rsync isn't able to do fast backups on big InnoDB table files.
- It quickly became clear that it wasn't possible to update the heavy tables schema, because it took way too much time (hours).
- Importing data took hours (even when indexing was done in the end). The best rescue plan was to always keep a few copies of the database (data file + logs).
- Moving from one hosting company to an other was really a big deal. Replication had to be handled very carefully.
Cassandra:
- Even easier to install than MySQL.
- Requires a lot of RAM. A 2GB instance couldn't make it run in the first versions, now it can work on a 1GB instance but it's not idea (way too many data flushes). Giving it 8GB was enough in our case.
- Once you understand how you organize your data, storing is easy. Requesting is a little bit more complex. But once you get around it, it is really fast (you can't really do mistake unless you really want to).
- If previous step was done right, it is and stays super-fast.
- It almost seems like data is organized to be backuped. Every new data is added as new files. I personally, but it's not a good thing, flush data every night and before every shutdown (usually for upgrade) so that restoring takes less time, because we have less logs to read. It doesn't create much files are they are compacted.
- Importing data is fast as hell. And the more hosts you have the faster. Exporting and importing gigabytes of data isn't a problem anymore.
- Not having a schema is a very interesting thing because you can make you data evolve to follow your needs. Which might mean having different versions of your data at the same time on the same column family.
- Adding a host was easy (not fast though) but I haven't done it on a multi-datacenter setup.
Note: I have also used elasticsearch (document oriented based on lucene) and I think it should be considered as a NoSQL database. It is distributed, reliable and often fast (some complex queries can perform quite badly).
0 讨论(0)

查看其它15个回答
发布评论:

提交评论
- 加载中...

热议问题