How to match ',' in PIG?

和自甴很熟 提交于 2019-12-12 20:42:52

问题


The below pig script gives the count of various characters in a file. It works for all characters except ','.

My code :

A = load 'a.txt';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = filter B by word matches '(.+)';
D = foreach C generate flatten(TOKENIZE(REPLACE(word,'','|'), '|')) as letter;
E = group D by letter;
F = foreach E generate COUNT(D), group;
store F into 'pigfiles/wordcount';

This matches all characters except ',' and gives an output.

Input: (cat a.txt)

HI, I.

Output:(output in file generated)

1 H
2 I
1 .

It doesn't give the count of , in the file. I don't understand why it isn't giving the count of ',' !


回答1:


The first tokenize will eliminate the token separators space, double quote("), coma(,) parenthesis(()), star(*).Instead use replace to tokenize each character and then count.See below

Input

HI, I.

PigScript

A = LOAD 'test3.txt';
B = FOREACH A GENERATE FLATTEN(TOKENIZE(REPLACE((chararray)$0,'','|'), '|')) AS letter;
C = FILTER B  BY letter != ' ';
D = GROUP C BY letter;
E = FOREACH D GENERATE COUNT(C.letter), group;
DUMP E;

Output



来源:https://stackoverflow.com/questions/36930492/how-to-match-in-pig

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!