Completely confused about MapReduce in Riak + Erlang's riakc client

问题

The main thing I'm confused about here (I think) is what the arguments to the qfun are supposed to be and what the return value should be. The README basically doesn't say anything about this and the example it gives throws away the second and third args.

Right now I'm only trying to understand the arguments and not using Riak for anything practical. Eventually I'll be trying to rebuild our (slow, MySQL-based) financial reporting system with it. So ignoring the pointlessness of my goal here, why does the following give me a badfun exception?

The data is just tuples (pairs) of Names and Ages, with the keys being the name. I'm not doing any conversion to JSON or such before inserting the data from the Erlang console.

Now with some {Name, Age} pairs stored in <<"people">> I want to use MapReduce (for no other reason than to understand "how") to get the values back out, unchanged in this first use.

riakc_pb_socket:mapred(
    Pid, <<"people">>,
    [{map, {qfun, fun(Obj, _, _) -> [Obj] end}, none, true}]).

This just gives me a badfun, however:

{error,<<"{\"phase\":0,\"error\":\"{badfun,#Fun<erl_eval.18.17052888>}\",\"input\":\"{ok,{r_object,<<\\\"people\\\">>,<<\\\"elaine\\\">"...>>}

How do I just pass the data through my map function unchanged? Is there any better documentation of the Erlang client than what is in the README? That README seems to assume you already know what the inputs are.

回答1:

There are 2 Riak Erlang clients that serve different purposes.

The first one is the internal Riak client that is included in the riak_kv module (riak_client.erl and riak_object.erl). This can be used if you are attached to the Riak console or if you are writing a MapReduce function or a commit hook. As it is run from within a Riak node it works quite well with qfuns.

The other client is the official Riak client for Erlang that is used by external applications and connects to Riak through the protocol buffers interface. This is what you are using in your example above. As this connects through protocol buffers, it is usually recommended that MapReduce functions in Erlang are compiled and deployed on the nodes of the cluster as named functions. This will also make them accessible from other client libraries.

回答2:

I think my code is actually correct and my problem lies in the fact I'm trying to use the shell to execute the code. I need to actually compile the code before it can be run in Riak. This is a limitation of the Erlang shell and the way it compiles funs.

回答3:

After a few days of playing around, here's a neat trick that makes development easier. Exploit Erlang's RPC support and the fact it has runtime code loading, to distribute your code across all the Riak nodes:

%% Call this somewhere during your app's initialization routine.
%% Assumes you have a list of available Riak nodes in your app's env.
load_mapreduce_in_riak() ->
  load_mapreduce_in_riak(application:get_env(app_name, riak_nodes, [])).

load_mapreduce_in_riak([]) ->
  ok;
load_mapreduce_in_riak([{Node, Cookie}|Tail]) ->
  erlang:set_cookie(Node, Cookie),
  case net_adm:ping(Node) of
    pong ->
      {Mod, Bin, Path} = code:get_object_code(app_name_mapreduce),
      rpc:call(Node, code, load_binary, [Mod, Path, Bin]);
    pang ->
      io:format("Riak node ~p down! (ping <-> pang)~n", [Node])
  end,
  load_mapreduce_in_riak(Tail).

Now you can refer to any of the functions in the module app_name_mapreduce and they'll be visible to the Riak cluster. The code can be removed again with code:delete/1, if needed.

来源：https://stackoverflow.com/questions/15950139/completely-confused-about-mapreduce-in-riak-erlangs-riakc-client

标签

erlang

riak