What's the best way to count unique visitors with Hadoop?

前端 未结 4 1858
無奈伤痛
無奈伤痛 2021-01-02 09:33

hey all, just getting started on hadoop and curious what the best way in mapreduce would be to count unique visitors if your logfiles looked like this...

DAT         


        
4条回答
  •  佛祖请我去吃肉
    2021-01-02 10:21

    It is often faster to use HiveQL to sort many simple tasks. Hive will translate your queries into Hadoop MapReduce. In this case you may use

    SELECT COUNT(DISTINCT username) FROM logviews
    

    You may find a more advanced example here: http://www.dataminelab.com/blog/calculating-unique-visitors-in-hadoop-and-hive/

提交回复
热议问题