PIG REGEX_EXTRACT ALL function -> no results

北城余情 提交于 2019-12-24 08:16:30

问题


I have been encountering an issue for several hours already. I have a .csv file with JSON strings inside. Every column in that .csv contains a string with several JSON objects. I imported several columns into PigStorage. Worked so far. Then I tried to extract the JSON objects which have the following form:

[{"tmestmp":"2014-05-14T07:01:00","Value":0,"Quality":1},{"tmestmp":"2014-05-14T07:01:00.02","Value":10,"Quality":4},{"tmestmp":"2014-05-14T07:01:00.04","Value":17,"Quality":9},{"tmestmp":"2014-05-14T07:01:00.06","Value":75,"Quality":6},{"tmestmp":"2014-05-14T07:01:00.08","Value":63,"Quality":9}];

This is one column.

The Regex_Extract_All function does not work woth the following lines of code. Does anyone have an idea on that? I receive always empty results. Here is my code :

 A = LOAD '/user/hue/test.csv' USING PigStorage(';') AS (timestamp, mv1, mv2,mv3,mv4,mv5); --using five columns
 B= foreach A generate mv1,mv2,mv3,mv4,mv5; --removing the timestamp in the first column, not needed anymore
 C= foreach B generate REGEX_EXTRACT_ALL($0, '(\\{[^{]*\\})')AS (T:tuple(r1,r2,r3,r4,r5)); 

If I use only one column instead of $0, it does not work as well.

Any help or explanation is very welcome.

Cheers, Joe


回答1:


There is a JsonLoader() to read JSON formatted input. You can use JSsonLoader() instead of using the REGEX and it is very easy to use. Refer http://joshualande.com/read-write-json-apache-pig/ for more Info.



来源:https://stackoverflow.com/questions/24774574/pig-regex-extract-all-function-no-results

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!