Efficiently determining the probability of a user clicking a hyperlink

前端 未结 4 1579
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-20 10:46

So I have a bunch of hyperlinks on a web page. From past observation I know the probabilities that a user will click on each of these hyperlinks. I can therefore calculate

相关标签:
4条回答
  • 2020-12-20 11:05

    Bayes' Theorem Proof:

    P(A,B) = P( A | B ) * P( B )    (1) 
    

    since,

    P(A,B) = P(B,A)                 (2)
    

    And substituting (2) with (1),

    P(A | B) * P( B ) = P (B | A) * P(A)
    

    thus (Bayes' Theorem),

               P( B | A ) * P(A)
    P(A | B) = -----------------
                     P(B)
    
    P(A)   -- prior/marginal probability of A, may or may not take into account B
    P(A|B) -- conditional/posterior probability of A, given B.
    P(B|A) -- conditional probability of B given A.
    P(B)   -- prior/marginal probability of B
    

    Consequences,

    P( A | B ) = P( A ), then a and b are independent
    P( B | A ) = P( B ), and then
    

    and the definition of independence is,

    P(A,B) = P(A | B) * P( B ) = P( A )* P( B )
    

    It should be noted, that it is easy to manipulate the probability to your liking by changing the priors and the way the problem is thought of, take a look at this discussion of the Anthropic Principle and Bayes' Theorem.

    0 讨论(0)
  • 2020-12-20 11:13

    P/N is actually correct from a frequentist perspective.

    You could also use a bayesian approach to incorporate prior knowledge, but since you don't seem to have that knowledge, I guess P/N is the way to go.

    If you want, you can also use Laplace's rule which iirc comes down to a uniform prior. Just give each link on the page a start of 1 instead of 0. (So if you count the number a link was clicked, give each a +1 bonus and resemble that in your N.)

    [UPDATE] Here is a bayesian approach:

    Let p(W) be the probability that a person is in a specific group W. Let p(L) be the probability, that a specific link is clicked. then the probability you are looking for is p(L|W). By Bayes' theorem, you can calculate this by

    p(L|W) = p(W|L) * p(L) / p(W)

    You can estimate p(L) by the amount L was clicked, p(W) by the size of that group with respect to the rest of the users and p(W|L) = p(W and L) / p(L) by the number of persons of the specific group W that clicked L divided by the probability that L is clicked.

    0 讨论(0)
  • 2020-12-20 11:14

    You need to know how strongly X is correlated with W.

    Most likely you also want to have a more complex mathematical model if you want to develop a big website. If you run a website like digg you have a lot of prior knowledge that you have to factor into your calcualtion. That leads to multivariate statistics.

    0 讨论(0)
  • 2020-12-20 11:17

    I made this a new answer since it's fundamentally different.

    This is based on Chris Bishop, Machine Learning and Pattern Recognition, Chapter 2 "Probability Distributions" p71++ and http://en.wikipedia.org/wiki/Beta_distribution.

    First we fit a beta distribution to the given mean and variance in order to build a distribution over the parametes. Then we return the mode of the distribution which is the expected parameter for a bernoulli variable.

    def estimate(prior_mean, prior_variance, clicks, views):
      c = ((prior_mean * (1 - prior_mean)) / prior_variance - 1)
      a = prior_mean * c
      b = (1 - prior_mean) * c
      return ((a + clicks) - 1) / (a + b + views - 2)
    

    However, I am quite positive that the prior mean/variance will not work for you since you throw away information about how many samples you have and how good your prior thus is.

    Instead: Given a set of (webpage, link_clicked) pairs, you can calculate the number of pages a specific link was clicked on. Let that be m. Let the amount of times that link was not clicked be l.

    Now let a be the number of clicks to your new link be a and the number of visits to the site be b. Then your probability of your new link is

    def estimate(m, l, a, b):
      (m + a) / (m + l + a + b)
    

    Which looks pretty trivial but actually has a valid probabilistic foundation. From the implementation perspective, you can keep m and l globally.

    0 讨论(0)
提交回复
热议问题