strsplit issue - Pig

前端 未结 2 1832
甜味超标
甜味超标 2021-02-20 04:28

I have following tuple H1 and I want to strsplit its $0 into tuple.However I always get an error message:

DUMP H1:
(item32;item31;,1)

m = FOREACH H1 GENERATE S         


        
相关标签:
2条回答
  • 2021-02-20 05:17

    There is an escaping problem in the pig parsing routines when it encounters this semicolon.

    You can use a unicode escape sequence for a semicolon: \u003B. However this must also be slash escaped and put in a single quoted string. Alternatively, you can rewrite the command over multiple lines, as per Neil's answer. In all cases, this must be a single quoted string.

    H1 = LOAD 'h1.txt' as (splitme:chararray, name);
    
    A1 = FOREACH H1 GENERATE STRSPLIT(splitme,'\\u003B'); -- OK
    B1 = FOREACH H1 GENERATE STRSPLIT(splitme,';');       -- ERROR
    C1 = FOREACH H1 GENERATE STRSPLIT(splitme,':');       -- OK
    D1 = FOREACH H1 {                                     -- OK
        splitup = STRSPLIT( splitme, ';' );
        GENERATE splitup;
    }
    
    A2 = FOREACH H1 GENERATE STRSPLIT(splitme,"\\u003B"); -- ERROR
    B2 = FOREACH H1 GENERATE STRSPLIT(splitme,";");       -- ERROR
    C2 = FOREACH H1 GENERATE STRSPLIT(splitme,":");       -- ERROR
    D2 = FOREACH H1 {                                     -- ERROR
        splitup = STRSPLIT( splitme, ";" );
        GENERATE splitup;
    }
    
    Dump H1;
    (item32;item31;,1)
    
    Dump A1;
    ((item32,item31))
    
    Dump C1;
    ((item32;item31;))
    
    Dump D1;
    ((item32,item31))
    
    0 讨论(0)
  • 2021-02-20 05:32

    STRSPLIT on a semi-colon is tricky. I got it to work by putting it inside of a block.

    raw = LOAD 'cname.txt' as (name,cname_string:chararray);
    
    xx = FOREACH raw {
      cname_split = STRSPLIT(cname_string,';');
      GENERATE cname_split;
    }
    

    Funny enough, this is how I originally implemented my STRSPLIT() command. Only after trying to get it to split on a semicolon did I run into the same issue.

    0 讨论(0)
提交回复
热议问题