问题
I’m learning Zookeeper and so far I don't understand the purpose of using it for distributed systems that databases can't solve.
The use cases I’ve read are implementing a lock, barrier, etc for distributed systems by having Zookeeper clients read/write to Zookeeper servers. Can’t the same be achieved by read/write to databases?
For example my book describes the way to implement a lock with Zookeeper is to have Zookeeper clients who want to acquire the lock create an ephemeral znode
with a sequential flag set under the lock-znode
. Then the lock is owned by the client whose child znode has the lowest sequence number.
All other Zookeeper examples in the book are again just using it to store/retrieve values.
It seems the only thing that differs Zookeeper from a database/any storage is the “watcher” concept. But that can be built using something else.
I know my simplified view of Zookeeper is a misunderstanding. So can someone tell me what Zookeeper truly provides that a database/custom watcher can’t?
回答1:
I think you're asking yourself the wrong question when you try to figure out the purpose of Zookeeper, instead of asking what Zookeeper can do that "databases" can not do (btw Zookeeper is also a database) ask what Zookeeper is better at than other available databases. If you start to ask yourself that question you will hopefully understand why people decide to use Zookeeper in their distributed services.
Take ephemeral nodes for example, the huge benefit of using them is not that they make a much better lock than some other way. The benefit of using ephemeral nodes is that they will automatically be removed if the client loses connection to Zookeeper.
And then we can have a look at the CAP theorem where Zookeeper closest resembles a CP system. And you must once again decide if this is what you want out of your database.
tldr: Zookeeper is better in some aspects and worse in others compared to other databases.
回答2:
Can’t the same be achieved by read/write to databases?
In theory, yes it is possible, but usually, it is not a good idea to use databases for demanding usecases of distributed coordination. I have seen microservices using relational databases for managing distributed locks with very bad consequences (e.g. thousands of deadlocks in the databases) which in turn resulted in poor DBA-developer relation :-)
Zookeeper has some key characteristics which make it a good candidate for managing application metadata
- Possibility to scale horizontally by adding new nodes to ensemble
- Data is guaranteed to be eventually consistent within a certain timebound. It is possible to have strict consistency at a higher cost if clients desire it (Zookeeper is a CP system in CAP terms)
- Ordering guarantee -- all clients are guaranteed to be able to read data in the order in which they have been written
All of the above could be achieved by databases, but only with significant effort from application clients. Also watches and ephemeral nodes could be achieved by databases by using techniques such as triggers, timeouts etc. But they are often considered inefficient or antipatterns.
Relational databases offer strong transactional guarantees which usually come at a cost but are often not required for managing application metadata. So it make sense to look for a more specialized solution such as Zookeeper or Chubby.
Also, Zookeeper stores all its data in memory (which limits its usecases), resulting in highly performant reads. This is usually not the case with most databases.
来源:https://stackoverflow.com/questions/36312640/whats-the-purpose-of-using-zookeeper-rather-than-just-databases-for-managing-di