How to efficiently read thousand of lines from STDIN in Erlang?

混江龙づ霸主 提交于 2019-12-11 12:46:10

问题


I've stumbled upon an issue when reading thousands of lines from STDIN. This would have been an imaginary edge case until I found out that some tests for this problem require reading thousand of lines from STDIN. At first I thought that my algorithms were not optimal, and only by accident I've found out that only reading lines without any computations could make half of the test time out.

Here is part code that times out:

process_queries(0, _) -> ok;
process_queries(N, A) -> 
    case io:fread("", "~s~d~d") of
        {ok, _} -> process_queries(N - 1, A)
        %{ok, ["Q", L, R]} -> process_queries(N - 1, apply_q(L, R, A));
        %{ok, ["U", Idx, Val]} -> process_queries(N - 1, apply_u(Idx, Val, A))
    end
.

I deliberately left comments to show that all the computations were disabled. So this code timed out given N=7984.

Is there a better way of reading and processing thousands lines from STDIN in Erlang?

  • io:get_line gets only one line a time.
  • io:get_chars requires you to know how many character to get.

回答1:


I'd suggest switching stdio to binary and then using io:get_line. Your data's format is pretty simple to parse by splitting on whitespace and converting two values to integers. The following code runs ~10 times faster than your code for me in a simple benchmark. I used escript to benchmark, which means it's highly likely that the difference is actually more than 10 times since escript parses and compiles the code on the fly.

process_queries_2(0, _) -> ok;
process_queries_2(N, A) -> 
    Line = io:get_line(""),
    [X, Y0, Z0, _] = binary:split(Line, [<<$\s>>, <<$\n>>], [global]),
    Y = binary_to_integer(Y0),
    Z = binary_to_integer(Z0),
    % use X, Y, Z
    process_queries_2(N - 1, A).

Here's the code I used to benchmark:

main(["1"]) ->
  ok = io:setopts(standard_io, [binary]),
  process_queries(10000, {});
main(["2"]) ->
  ok = io:setopts(standard_io, [binary]),
  process_queries_2(10000, {}).%
$ time yes 'Q 100 200' | escript a.erl 1
yes 'Q 100 200'  4.64s user 0.11s system 93% cpu 5.063 total
escript a.erl 1  4.67s user 1.44s system 120% cpu 5.062 total
$ time yes 'Q 100 200' | escript a.erl 2
yes 'Q 100 200'  0.36s user 0.01s system 77% cpu 0.466 total
escript a.erl 2  0.40s user 0.10s system 106% cpu 0.464 total

The reason for the speedup is that Erlang Strings are linked lists, which are very inefficient both for CPU time and Memory usage compared to binaries, which is a sequential chunk of memory.




回答2:


There is an excerpt from my solution. There are few tricks how to do it really efficient.

read_command(CP) ->
    {ok, Line} = file:read_line(standard_io),
    [C, A, B] = binary:split(Line, CP, [global, trim_all]),
    {case C of <<"Q">> -> 'Q'; <<"U">> -> 'U' end,
     binary_to_integer(A),
     binary_to_integer(B)}.

read_commands(N, CP) ->
    [ read_command(CP) || _ <- lists:seq(1, N) ].

execute(Array, L) ->
    lists:foldl(fun({'Q', F, T}, A) ->
                        {Val, A2} = query(A, F, T),
                        file:write(standard_io, [integer_to_binary(Val), $\n]),
                        A2;
                   ({'U', I, V}, A) ->
                        update(A, I, V)
                end, Array, L).

read_int_line(CP) ->
    {ok, Line} = file:read_line(standard_io),
    [binary_to_integer(X) || X <- binary:split(Line, CP, [global, trim_all])].

main() ->
    ok = io:setopts([binary]),
    CP = binary:compile_pattern([<<" ">>, <<$\n>>]),
    [N] = read_int_line(CP),
    L = read_int_line(CP),
    N = length(L),
    [K] = read_int_line(CP),
    execute(init(L), read_commands(K, CP)).

You have to write your own init/1, update/3 and query/3 of course.



来源:https://stackoverflow.com/questions/46493207/how-to-efficiently-read-thousand-of-lines-from-stdin-in-erlang

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!