Efficiently determining the probability of a user clicking a hyperlink

前端未结

关注

 4  1585

So I have a bunch of hyperlinks on a web page. From past observation I know the probabilities that a user will click on each of these hyperlinks. I can therefore calculate

相关标签:

4条回答

遇见更好的自我

2020-12-20 11:05

Bayes' Theorem Proof:

P(A,B) = P( A | B ) * P( B )    (1)

since,

P(A,B) = P(B,A)                 (2)

And substituting (2) with (1),

P(A | B) * P( B ) = P (B | A) * P(A)

thus (Bayes' Theorem),

           P( B | A ) * P(A)
P(A | B) = -----------------
                 P(B)

P(A)   -- prior/marginal probability of A, may or may not take into account B
P(A|B) -- conditional/posterior probability of A, given B.
P(B|A) -- conditional probability of B given A.
P(B)   -- prior/marginal probability of B

Consequences,

P( A | B ) = P( A ), then a and b are independent
P( B | A ) = P( B ), and then

and the definition of independence is,

P(A,B) = P(A | B) * P( B ) = P( A )* P( B )

It should be noted, that it is easy to manipulate the probability to your liking by changing the priors and the way the problem is thought of, take a look at this discussion of the Anthropic Principle and Bayes' Theorem.

0 讨论(0)

北恋

2020-12-20 11:13

P/N is actually correct from a frequentist perspective.

You could also use a bayesian approach to incorporate prior knowledge, but since you don't seem to have that knowledge, I guess P/N is the way to go.

If you want, you can also use Laplace's rule which iirc comes down to a uniform prior. Just give each link on the page a start of 1 instead of 0. (So if you count the number a link was clicked, give each a +1 bonus and resemble that in your N.)

[UPDATE] Here is a bayesian approach:

Let p(W) be the probability that a person is in a specific group W. Let p(L) be the probability, that a specific link is clicked. then the probability you are looking for is p(L|W). By Bayes' theorem, you can calculate this by

p(L|W) = p(W|L) * p(L) / p(W)

You can estimate p(L) by the amount L was clicked, p(W) by the size of that group with respect to the rest of the users and p(W|L) = p(W and L) / p(L) by the number of persons of the specific group W that clicked L divided by the probability that L is clicked.

0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2020-12-20 11:14

You need to know how strongly X is correlated with W.

Most likely you also want to have a more complex mathematical model if you want to develop a big website. If you run a website like digg you have a lot of prior knowledge that you have to factor into your calcualtion. That leads to multivariate statistics.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一生所求

2020-12-20 11:17
I made this a new answer since it's fundamentally different.

This is based on Chris Bishop, Machine Learning and Pattern Recognition, Chapter 2 "Probability Distributions" p71++ and http://en.wikipedia.org/wiki/Beta_distribution.

First we fit a beta distribution to the given mean and variance in order to build a distribution over the parametes. Then we return the mode of the distribution which is the expected parameter for a bernoulli variable.
```
def estimate(prior_mean, prior_variance, clicks, views):
  c = ((prior_mean * (1 - prior_mean)) / prior_variance - 1)
  a = prior_mean * c
  b = (1 - prior_mean) * c
  return ((a + clicks) - 1) / (a + b + views - 2)
```
However, I am quite positive that the prior mean/variance will not work for you since you throw away information about how many samples you have and how good your prior thus is.

Instead: Given a set of (webpage, link_clicked) pairs, you can calculate the number of pages a specific link was clicked on. Let that be m. Let the amount of times that link was not clicked be l.

Now let a be the number of clicks to your new link be a and the number of visits to the site be b. Then your probability of your new link is
```
def estimate(m, l, a, b):
  (m + a) / (m + l + a + b)
```
Which looks pretty trivial but actually has a valid probabilistic foundation. From the implementation perspective, you can keep m and l globally.
0 讨论(0)
发布评论:

提交评论
- 加载中...