问题
I have some problems understanding how the map
should be used.
Following this tutorial I created a file containing the following text:
[open#apache]
[apache#hadoop]
The, I was able to load that file without errors:
a = load 'data/file_name.txt' as (M:map [])
Now, how can I take the list of all the "values"? I.e.
(apache)
(hadoop)
Furthermore, I have just started to learn Pig, therefore every hints is going to be very helpful.
回答1:
There is only one way to interact with a map, and that is to use the #
operator. In order for it to have more functionality, you'll have to define some UDFs. Therefore the only way a map can really be used in pure pig is like:
B = FOREACH A GENERATE M#'open' ;
Which produces this as output:
(apache)
()
Note that the value after the #
is a quoted string, it cannot change and must be set before the you run the job.
Also, notice that is creates a NULL for the second line, because that map does not contain a key with the vaule 'open'. This is slightly different then using FILTER on a schema of two chararrays key and value:
B = FILTER A BY key=='open' ;
Which produces the output:
(open,apache)
If only the value is desired, then it can be done simply by:
B = FOREACH (FILTER A BY key=='open') GENERATE value ;
Which produces:
(apache)
If keeping the NULLs is important, they can also be generated by using a bincond:
B = FOREACH A GENERATE (key=='open'?value:NULL) ;
Which produces the same output as M#'open'
.
From my experience maps are not very useful because of how restrictive they are.
来源:https://stackoverflow.com/questions/17772308/understanding-map-syntax