Understanding map syntax

问题

I have some problems understanding how the map should be used.

Following this tutorial I created a file containing the following text:

[open#apache]
[apache#hadoop]

The, I was able to load that file without errors:

a = load 'data/file_name.txt' as (M:map [])

Now, how can I take the list of all the "values"? I.e.

(apache)
(hadoop)

Furthermore, I have just started to learn Pig, therefore every hints is going to be very helpful.

回答1:

There is only one way to interact with a map, and that is to use the # operator. In order for it to have more functionality, you'll have to define some UDFs. Therefore the only way a map can really be used in pure pig is like:

B = FOREACH A GENERATE M#'open' ;

Which produces this as output:

(apache)
()

Note that the value after the # is a quoted string, it cannot change and must be set before the you run the job.

Also, notice that is creates a NULL for the second line, because that map does not contain a key with the vaule 'open'. This is slightly different then using FILTER on a schema of two chararrays key and value:

B = FILTER A BY key=='open' ;

Which produces the output:

(open,apache)

If only the value is desired, then it can be done simply by:

B = FOREACH (FILTER A BY key=='open') GENERATE value ;

Which produces:

(apache)

If keeping the NULLs is important, they can also be generated by using a bincond:

B = FOREACH A GENERATE (key=='open'?value:NULL) ;

Which produces the same output as M#'open'.

From my experience maps are not very useful because of how restrictive they are.

来源：https://stackoverflow.com/questions/17772308/understanding-map-syntax

标签

apache-pig