pig join with java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer

£可爱£侵袭症+ 提交于 2019-12-13 06:13:15

问题


I have two files, in data1

1 3
1 2
5 1

In data2

2 3
2 4

I then tried to read them into pig

d1 = LOAD 'data1';
d2 = foreach d1 generate flatten(STRSPLIT($0, ' +')) as (f1:int,f2:int);
d3 = LOAD 'data2' ;
d4 = foreach d3 generate flatten(STRSPLIT($0, ' +')) as (f1:int,f2:int);
data = join d2 by f1, d4 by f2;

Then I got

2013-08-04 00:48:26,032 [Thread-21] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
    at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:85)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:112)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

Could anybody help me? Thank you.


回答1:


First I'd define a simple schema for the inputs. Based on your example I assume that your inputs are text files.
Now you get the ClassCastException because just applying the schema (f1:int, f2:int) unfortunately won't do any conversion. You need to explicitly cast the output schema of STRSPLIT to (tuple(int,int)) so that flatten can generate f1:int and f2:int from it. I.e:

d1 = LOAD 'data1' as (line:chararray);
d2 = foreach d1 generate flatten((tuple(int,int))(STRSPLIT($0, ' +'))) 
       as (f1:int,f2:int);

d3 = LOAD 'data2' as (line:chararray);
d4 = foreach d3 generate flatten((tuple(int,int))(STRSPLIT($0, ' +')))
       as (f1:int,f2:int);

data = join d2 by f1, d4 by f2;



回答2:


If you are using UDFs in your pig and get this casting exception, besides checking your pig script, also check your UDF script and make sure the actual returned value types match the @outputSchema types.



来源:https://stackoverflow.com/questions/18038760/pig-join-with-java-lang-classcastexception-java-lang-string-cannot-be-cast-to-j

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!