hadoop跨集群之间迁移HDFS数据

Hive跨集群迁移数据工作是会出现的事情, 其中涉及到数据迁移, metastore迁移, hive版本升级等。

1. 迁移hdfs数据至新集群
hadoop distcp -skipcrccheck -update hdfs://xxx.xxx.xxx.xxx:8020/user/risk hdfs://xxx.xxx.xxx.xxx:8020/user/
-skipcrccheck 因本次迁移涉及低版本迁移高版本, 如果Hadoop版本则不需要

-update 增量更新, 通过名称和大小比较,源与目标不同则更新

hadoop distcp /apps/hive/warehouse/userinfo hdfs://10.11.32.76:8020/apps/hive/warehouse/（我的环境操作）

hadoop distcp -update /apps/hive/warehouse/ hdfs://10.11.32.76:8020/apps/hive/warehouse/ （差异复制）

xxx.xxx.xxx.xxx这个位置最好使用IP地址，尽量不要使用主机名或集群名称，否则需要配置hosts文件解析

2. 源集群metastore数据备份导出(mysql导出)
mysqldump -u root -p’密码’--skip-lock-tables -h xxx.xxx.xxx.xxx hive > mysql_hive.sql
mysqldump -uroot -p --database hive > mysql_hive_data.sql (我的环境操作)

3. 新的集群导入metastore数据(mysql导入)

mysql -u root -proot --default-character-set=utf8 hvie < mysql_hive.sql
mysql -uroot -p < mysql_data.sql(我的环境操作)

4. 升级hive内容库(如果hive版本需要升级操作，同版本不需要操作)
mysql -uroot -proot risk -hxxx.xxx.xxx.xxx < mysqlupgrade-0.13.0-to-0.14.0.mysql.sql
mysql -uroot -proot risk -hxxx.xxx.xxx.xxx < mysqlupgrade-0.14.0-to-1.1.0.mysql.sql
版本要依据版本序列升序升级,不可跨越版本，如当前是hive0.12打算升级到0.14，需要先升级到0.13再升级到0.14

5. 修改 metastore 内容库的集群信息（重要）

因为夸集群，hdfs访问的名字可能变化了，所以需要修改下hive库中的表DBS和SDS内容，除非你的集群名字或者HA的名字跟之前的一致这个就不用修改了

登录mysql数据库，查看：

mysql> use hive;

mysql> select * from DBS;
+-------+-----------------------+--------------------------------------+---------+------------+------------+

+-------+-----------------------+--------------------------------------+---------+------------+------------+

+-------+-----------------------+--------------------------------------+---------+------------+------------+
1 row in set (0.00 sec)

mysql> select * from SDS;
+-------+-------+------------------------------------------+---------------+---------------------------+-----------------------------------------------+-------------+------------------------------------------------------------+----------+

+-------+-------+------------------------------------------+---------------+---------------------------+-----------------------------------------------+-------------+------------------------------------------------------------+----------+

+-------+-------+------------------------------------------+---------------+---------------------------+-----------------------------------------------+-------------+------------------------------------------------------------+----------+
1 row in set (0.00 sec)

修改操作：

update SDS set LOCATION = replace(LOCATION ,'hdfs://ns2','hdfs://adhoc') ;

如果操作，我这里需要将hdfs://HACluster修改为hdfs://HACluster_New，我为了操作简单，新集群HA起了同样的名字hdfs://HACluster

6. 拷贝hive安装包, 拷贝core-site.xml 与 hdfs-site.xml至 conf中, 后续则正常启动即可。（一般不用操作）

参考：
hive 跨集群迁移
http://blog.csdn.net/w412692660/article/details/50551409

http://blog.csdn.net/ggz631047367/article/details/50754005
HIVE 数据迁移，利用hive提供的export/import工具实现批量同步
http://blog.csdn.net/u9999/article/details/34119441
hadoop 集群跨版本数据迁移 hadoop1迁移到hadoop2上
http://blog.itpub.net/30089851/viewspace-2062010

文章来源: https://blog.csdn.net/qq_39142369/article/details/90442900

标签

HDFS

Hive

Hadoop

mysql

数据迁移

mysql集群

mysql数据库