Looking up variable keys in pig map

核能气质少年 提交于 2020-01-24 20:36:08

问题


I'm trying to use pig to break text into lowercased words, and then look up each word in a map. Here's my example map, which I have in map.txt (it is only 1 line long):

[this#1.9,is#2.5my#3.3,vocabulary#4.1]

I load this like so:

M = LOAD 'mapping.txt' USING PigStorage AS (mp: map[float]);

which works just fine. Then I do the following to load the text and break it into lowercased words:

LINES = LOAD 'test.txt' USING TextLoader() AS (line:chararray);
TOKENS = FOREACH LINES GENERATE FLATTEN(TOKENIZE(LOWER(line))) as (word:chararray);

Now, I'd like to do something like this:

RESULTS = FOREACH TOKENS GENERATE M.mp#word;

so that if I have a line like "this my my vocabulary", I'd get the following output: 1 3 3 4 , but I keep getting various errors. How can I look up variable values in a map?

I've looked at How can I use the map datatype in Apache Pig? and http://pig.apache.org/docs/r0.10.0/basic.html#map-schema , but these only help if I'm looking up a fixed value in a map, for example M.mp#'this', which is not what I want to do here.


回答1:


You can also FLATTEN M and then JOIN M and LINES based on Token/word (you can do a 'replicated' join on M so it would be copies to each mapper



来源:https://stackoverflow.com/questions/15372657/looking-up-variable-keys-in-pig-map

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!