How to change sqoop metastore?

折月煮酒 提交于 2019-11-28 09:31:10
ravik

The answer is Yes, in my case I am using PostgreSQL. I ran into this recently and I am using Version 1.4.4. I am not sure if what I did is the recommended way, but it works. Here are the steps I followed

  1. In sqoop-site.xml I configured it with, the connect string to my database, username and password.

  2. Created the following object in the database, as Sqoop was failing at it.

    CREATE TABLE SQOOP_ROOT ( version INT, propname VARCHAR(128) NOT NULL, propval VARCHAR(256), CONSTRAINT SQOOP_ROOT_unq UNIQUE (version, propname) );

  3. Inserted the following row (This seems to be the reason your script is failing)

    INSERT INTO SQOOP_ROOT VALUES( NULL, 'sqoop.hsqldb.job.storage.version', '0' );

I think the correct way might be is to download the source, and extend org.apache.sqoop.metastore.JobStorage with you DB implementation.

Sqoop metastore does not support any other database other hsqldb. Number 2 points of notes on the link. cloudera

Public service announcement: Sqoop Metastore on other DBs may fail

We have been able to get PostgreSQL and MySQL working as targets for the Sqoop Metastore on Sqoop 1, replacing the HyperSQL database. There's a little setup and seeding of the database needed, but from then on, it seemed fine.

However, we are seeing cases when we are running many sqoop jobs, updating the metastore concurrently -- sqoop 1.4.6 has no code to trap and handle cases where metastore updates for incremental updates fail due to concurrency issues. In particular, Sqoop _will complete it's import successfully but not update the metastore with the most recently imported values. This will cause the next incremental run will import duplicate data. Sqoop will return a non-zero return code, but data in either Hadoop or the metastore need to be synced afterward in order for data to be correct.

We're not sure there is a solution, but this is an expansion of @SandeerKumar's answer. This may be an issue with HyperSQL as well, but it would be much less likely because HSQL is in memory, so faster.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!