What's the best way to count unique visitors with Hadoop?

前端 未结 4 1859
無奈伤痛
無奈伤痛 2021-01-02 09:33

hey all, just getting started on hadoop and curious what the best way in mapreduce would be to count unique visitors if your logfiles looked like this...

DAT         


        
4条回答
  •  情深已故
    2021-01-02 10:33

    Use the secondary sort to sort on user id. That way, you don't need to have anything in memory -- just stream the data through, and increment your distinct counter every time you see the value change for a particular site id.

    Here is some documentation.

提交回复
热议问题