hey all, just getting started on hadoop and curious what the best way in mapreduce would be to count unique visitors if your logfiles looked like this...
DAT
It is often faster to use HiveQL to sort many simple tasks. Hive will translate your queries into Hadoop MapReduce. In this case you may use
SELECT COUNT(DISTINCT username) FROM logviews
You may find a more advanced example here: http://www.dataminelab.com/blog/calculating-unique-visitors-in-hadoop-and-hive/