What's the best way to count unique visitors with Hadoop?

前端 未结 4 1851
無奈伤痛
無奈伤痛 2021-01-02 09:33

hey all, just getting started on hadoop and curious what the best way in mapreduce would be to count unique visitors if your logfiles looked like this...

DAT         


        
4条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-02 10:31

    My aproach is similar to what tzaman gave with a small twist

    1. map output : (username, siteid) => ("")
    2. reduce output: (siteid) => (1)
    3. map : identity mapper
    4. reduce : longsumreducer (i.e. simply summarize)

    Note that the first reduce does not need to go over any of the records is gets presented. You can simply examine the key and produce the output.

    HTH

提交回复
热议问题