One-way Database Synchronization

两盒软妹~` 提交于 2019-12-03 14:34:05

Better ask on serverfault.com (I can't post comments, scripts are broken in SO, so I have to post a full answer)

Update: (switched to Safari, scripts work again, I can post properly)

There is no silver bullet. For ease of use and 'one key turn' deployment nothing can beat replication. Is the only solution that covers deeply conflict detection and resolution, has support for pushing schema changes and comes with a comprehensive set of tools for setting it up and monitoring it. It has been the MS poster child of data synchronization for many years before this 'agenda' was taken over by the .Net crowd. Replication has two underlying problems in my opinion:

  • The technology used to pushing changes is primitive, slow and unreliable. It requires file shares to initiate the replicas and it depends on T-SQL to actually replicate data, resulting in all sort of scalability problems: the replication threads use server worker threads and the fact that they interact with arbitrary tables and application queries lead to blocking and deadlocks. The biggest deployments I've heard of are around 400-500 sites and are done by superhuman MVPs and top dollar consultants. This stops on its track many projects that start at 1500 sites (way beyond largest deployed replication projects). I'm curious to hear if I'm wrong and you know of a SQL Server replication solution deployed with more than 500 sites.
  • The replication metaphor is too data centric. It does not take into account the requirements of distributed applications: need of versioned and formalized contracts, autonomy of data 'fiefdoms', loose coupling from availability and security pov. As a result replication based solution solve the immediate need to 'make data available there', but fail to solve the true problem of 'my app needs to talk with your app'.

At the other end of the spectrum you'll find solutions that truly address the problem of application communication, like services based on queued messaging. But are either painfully slow and riddled with problems rooted in the separation of the communication mechanism (web services and or msmq) and the data storage (DTC transactions between comm and db, no common high availability story, no common recoverability story etc etc). Solutions that are blazingly fast and fully integrated with DB exists in the MS stack, but nobody knows how to use them. Somewhere in between these and replication you'll find various intermediate solutions, like OCS/Synch framework and SSIS based custom solutions. None will offer the ease of setup and monitoring of replication, but they might scale and perform better.

I was involved with several projects that required 'data synchronization' on a very large scale (+1200 sites, +1600 sites) and my solution was to turn the problem on a 'application communication' problem. Once the mindset is changed to this and the data flow is no longer seen as 'record with key X of table Y' but instead 'message communicating the purchase of item X by customer Y' the solution becomes easier to understand and apply. You no longer think in terms of 'insert records in order X-Y-Z so FK relations don't break' but instead in terms of 'process purchase as described by message XYZ'.

In my view replication, and it derivatives (ie. data tracking and data-gram shipping), are solutions anchored in the '80 technologies and view of the data/applications. Obsolete dinosaurs (and by no way turning into birds).

I know this does not even begin to address all your (very legit) concerns, but writing out all I have to say/rant/rable on this topic would fill volumes of paperback...

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!