Understanding algorithms for measuring trends

前端 未结 4 1156
醉梦人生
醉梦人生 2021-01-30 02:17

What\'s the rationale behind the formula used in the hive_trend_mapper.py program of this Hadoop tutorial on calculating Wikipedia trends?

There are actuall

4条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-01-30 02:43

    another way to look at it is this:

    suppose your page and my page are made at same day, and ur page gets total views about ten million, and mine about 1 million till some point. then suppose the slope at some point is a million for me, and 0.5 million for you. if u just use slope, then i win, but ur page already had more views per day at that point, urs were having 5 million, and mine 1 million, so that a million on mine still makes it 2 million, and urs is 5.5 million for that day. so may be this scaling concept is to try to adjust the results to show that ur page is also good as a trend setter, and its slope is less but it already was more popular, but the scaling is only a log factor, so doesnt seem too problematic to me.

提交回复
热议问题