Not able to split chararray field containing spaces and tabs between the words. Help me with the command using Apache Pig?

末鹿安然 提交于 2019-12-13 03:07:11

问题


Sample.txt File

2017-01-01 10:21:59 THURSDAY    -39 3 Pick up a bus - Travel for two hours
2017-02-01 12:45:19 FRIDAY  -55 8 Pick up a train - Travel for one hour
2017-03-01 11:35:49 SUNDAY  -55 8 Pick up a train - Travel for one hour
I
.
. 

When I executed the suggested command, it got split into three fields.

when I do the below operation, it is not working as expected.

A = LOAD 'Sample.txt' USING PigStorage() as (line:chararray);
B = foreach A generate STRSPLIT(line, ' ', 3);
c = foreach B generate $2;
split C into buslog if $0 matches '.*bus*.', trainlog if $0 matches '.*train*.';

Note:- Dump of C will give below result.

THURSDAY    -39 3 Pick up a bus - Travel for two hours
FRIDAY  -55 8 Pick up a train - Travel for one hour
SUNDAY  -55 8 Pick up a train - Travel for one hour

Requirement: In the above result, i want to split train and bus into two relations, but it is not happening as expected


回答1:


The syntax is .*string.*.Notice that it is .* on both sides of the string.

split C into buslog if $0 matches '.*bus.*', trainlog if $0 matches '.*train.*';


来源:https://stackoverflow.com/questions/47071395/not-able-to-split-chararray-field-containing-spaces-and-tabs-between-the-words

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!