Sqoop job fails with KiteSDK validation error for Oracle import

问题

I am attempting to run a Sqoop job to load from an Oracle db and into Parquet format to a Hadoop cluster. The job is incremental.

Sqoop version is 1.4.6. Oracle version is 12c. Hadoop version is 2.6.0 (distro is Cloudera 5.5.1).

The Sqoop command is (this creates the job, and executes it):

$ sqoop job -fs hdfs://<HADOOPNAMENODE>:8020 \
--create myJob \
-- import \
--connect jdbc:oracle:thin:@<DBHOST>:<DBPORT>/<DBNAME> \
--username <USERNAME> \
-P \
--as-parquetfile \
--table <USERNAME>.<TABLENAME> \
--target-dir <HDFSPATH>  \
--incremental append  \
--check-column <TABLEPRIMARYKEY>

$ sqoop job --exec myJob

Error on execute:

16/02/05 11:25:30 ERROR sqoop.Sqoop: Got exception running Sqoop:
org.kitesdk.data.ValidationException: Dataset name
05112528000000918_2088_<USERNAME>.<TABLENAME>
is not alphanumeric (plus '_')
    at org.kitesdk.data.ValidationException.check(ValidationException.java:55)
    at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:103)
    at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:66)
    at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
    at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
    at org.kitesdk.data.Datasets.create(Datasets.java:239)
    at org.kitesdk.data.Datasets.create(Datasets.java:307)
    at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
    at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:80)
    at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:106)
    at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
    at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:668)
    at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
    at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Troubleshooting Steps:

0) HDFS is stable, other Sqoop jobs are functional, Oracle source DB is up and the connection has been tested.

1) I tried creating a synonym in Oracle, that way I could simply have the --table option as:

--table TABLENAME (without the username)

This gave me an error that the table name was not correct. It needs the full USERNAME.TABLENAME for the --table option.

Error:

16/02/05 12:04:46 ERROR tool.ImportTool: Imported Failed: There is no column found in the target table <TABLENAME>. Please ensure that your table name is correct.

2) I made sure that this is a Parquet issue. I removed the --as-parquetfile option and the job was successful.

3) I wondered if this is somehow caused by the incremental options. I removed the --incremental append & --check-column options and the job was successful. This confuses me.

4) I tried the job with MySQL and it was successful.

Has anyone run into something similar? Is there a way (or is it even advisable) to disable the Kite validation? It seems that the dataset is being created with dots ("."), which then Kite SDK complains about - but this is an assumption on my part as I am not too familiar with Kite SDK.

Thanks in advance,

Jose

回答1:

Resolved. There seems to be a known issue with the JDBC connectivity to Oracle 12c. Using a specific OJDBC6 (instead of 7) did the trick. FYI - the OJDBC is installed in /usr/share/java/ and a symbolic link is created in /installpath.../lib/sqoop/lib/

回答2:

As reported by user @Remya Senan,

breaking the parameter

--hive-table my_hive_db_name.my_hive_table_name

into separate params

--hive-database my_hive_db_name
--hive-table my_hive_table_name

did the trick for me

My environment was

Sqoop v1.4.7
Hive 2.3.3

Tip: I was on emr-5.19.0

回答3:

I also got this error when I was sqoop importing all tables as parquet file on CHD5.8. By looking at error message I felt this implementation does not support directories with "-" in their name. Based on this understanding I removed "-" from directory name and re-ran the sqoop import command and all worked fine. Hope this helps!

来源：https://stackoverflow.com/questions/35276816/sqoop-job-fails-with-kitesdk-validation-error-for-oracle-import

标签

Oracle

Hadoop

Cloudera

sqoop