Transforming (Synchronizing) Data between SQL to HBase

巧了我就是萌 提交于 2019-12-08 07:08:41

问题


We are overhauling our product by completely moving from Microsoft and .NET family to open source (well one of the reasons is cost cutting and exponential increase in data).

We plan to move our data model completely from SQL Server (relational data) to Hadoop (the famous key-Value pair ecosystem).

In the beginning, we want to support both versions (say 1.0 and new v2.0). In order to maintain the data consistency, we plan to sync the data between both systems, which is a fairly challenging task and error prone, but we don't have any other option.

A bit confused where to start from, I am looking up to the community of experts. Any strategy/existing literature or any other kind of guidance in this direction would be greatly helpful.


回答1:


I am not entirely sure how your code is structured, but if you currently have a data or persistence layer, or at least a database access class where all your SQL is executed through, you could override the save functions to write changes to both databases. If you do not have a data layer, you may want to considering writing one before starting the transition.

Otherwise, you could add triggers in MSSQL to update Hadoop, not sure what you can do in Hadoop to keep MSSQL in-sync.

Or, you could have a process that runs every x minutes, that manually syncs the two databases.

Personally, I would try to avoid trying to maintain two databases of record. Moving changes from a new, experimental database to your stable database seems risky. You stand the chance of corrupting your stable system. Instead, I would write a convertor to move data from your relational DB to Hadoop. Then every night or so, copy your data into Hadoop and use it for the development and testing of your new system. I think test users would understand if you said your beta version is just a test playground, and won't effect your live product. If you plan on making major changes to your UI and fear some will not want to transition to 2.0, then you might be trying to tackle too much at once.

Those are the solutions I came up with... Good luck!




回答2:


Consider using a queuing tool like Flume (http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3b2-flume/) to split your input between both systems.



来源:https://stackoverflow.com/questions/6358291/transforming-synchronizing-data-between-sql-to-hbase

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!