Random Walk on Bipartite Graph with Gremlin

时光毁灭记忆、已成空白 提交于 2019-12-06 06:46:29

问题


I would like to rank items according to a given users preference (items liked by the user) based on a random walk on a directed bipartite graph using gremlin in groovy.

The graph has the following basic structure:

[User1] ---'likes'---> [ItemA] <---'likes'--- [User2] ---'likes'---> [ItemB]

Hereafter the query that I came up with:

def runRankQuery(def userVertex) {
    def m = [:]
    def c = 0
    while (c < 1000) {
        userVertex
            .out('likes')   // get all liked items of current or similar user
            .shuffle[0]     // select randomly one liked item
            .groupCount(m)  // update counts for selected item
            .in('likes')    // get all users who also liked item
            .shuffle[0]     // select randomly one user that liked item
            .loop(5){Math.random() < 0.5}   // follow liked edge of new user (feed new user in loop) 
                                            // OR abort query (restart from original user, outer loop)      
            .iterate()
        c++
    }
    m = m.sort {a, b -> b.value <=> a.value}
    println "intermediate result $m"
    m.keySet().removeAll(userVertex.out('likes').toList())
    // EDIT (makes no sense - remove): m.each{k,v -> m[k] = v / m.values().sum()}
    // EDIT (makes no sense - remove): m.sort {-it.value }
    return m.keySet() as List;
}

However this code does not find new items ([ItemB] in example above) but only the liked items of the given user (e.g. [ItemA]).

  • What do I need to change to feed a new user (e.g. [User2]) with the loop step back to the 'out('likes')' step in order to continue the walk?

  • Once this code is working, can it be seen as an implementation of 'Personalized PageRank'?


Here the code to run the example:

g = new TinkerGraph()

user1 = g.addVertex()
user1.name ='User1'
user2 = g.addVertex()
user2.name ='User2'
itemA = g.addVertex()
itemA.name ='ItemA'
itemB = g.addVertex()
itemB.name ='ItemB'

g.addEdge(user1, itemA, 'likes')
g.addEdge(user2, itemA, 'likes')
g.addEdge(user2, itemB, 'likes')

println runRankQuery(user1)

And the output:

intermediate result [v[2]:1000]
[]
==>null
gremlin> g.v(2).name
==>ItemA
gremlin> 

回答1:


I found this to be a really strange issue. I found several very strange problems which aren't easily explainable and in the end, I'm not sure why they are the way they are. The two big things that are strange to me are:

  1. I'm not sure if there is a problem with the shuffle step. It does not seem to randomize properly in your case here. I can't seem to recreate the problem outside of this case, so I'm not sure if it's somehow related to the size of your data or something else.
  2. I hit strange problems with use of Math.random() to break out of the loop.

Anyway, I think I've captured the essence of your code here with my changes that seem to do what you want:

runRankQuery = { userVertex ->
    def m = [:]
    def c = 0
    def rand = new java.util.Random()
    while (c < 1000) {
        def max = rand.nextInt(10) + 1
        userVertex._().as('x')
            .out('likes')   
            .gather.transform{it[rand.nextInt(it.size())]}
            .groupCount(m) 
            .in('likes')    
            .gather.transform{it[rand.nextInt(it.size())]}
            .loop('x'){it.loops < max}  
            .iterate()
        c++
    }
    println "intermediate result $m"
    m.keySet().removeAll(userVertex.out('likes').toList())
    m.each{k,v -> m[k] = v / m.values().sum()}
    m.sort {-it.value }
    return m.keySet() as List;
}

I replaced shuffle with my own brand of "shuffle" by randomly selecting a single vertex from the gathered list. I also randomly selected a max loops rather than relying on Math.random(). When I run this now, I think I get the results you are looking for:

gremlin> runRankQuery(user1)                                       
intermediate result [v[2]:1787, v[3]:326]
==>v[3]
gremlin> runRankQuery(user1)
intermediate result [v[2]:1848, v[3]:330]
==>v[3]
gremlin> runRankQuery(user1)
intermediate result [v[2]:1899, v[3]:339]
==>v[3]
gremlin> runRankQuery(user1)
intermediate result [v[2]:1852, v[3]:360]
==>v[3]

You might yet get Math.random() to work as it did behave predictably for me on some iterations of working with this.



来源:https://stackoverflow.com/questions/24783212/random-walk-on-bipartite-graph-with-gremlin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!