Query in a MongoDB Map Reduce Function

本秂侑毒 提交于 2019-12-29 09:22:08

问题


I have streamed and saved about 250k tweets into MongoDB and here, I am retrieving it, as you can see, based on a word, or keyword, present in the tweet.

Mongo mongo = new Mongo("localhost", 27017);
DB db = mongo.getDB("TwitterData");
DBCollection collection = db.getCollection("publicTweets");
BasicDBObject fields = new BasicDBObject().append("tweet", 1).append("_id", 0);
BasicDBObject query = new BasicDBObject("tweet", new BasicDBObject("$regex", "autobiography"));
DBCursor cur=collection.find(query,fields);

What I would like to do is to use Map-Reduce and based on the keyword, categorize it and pass it to the reduce function to count the number of tweets under each category, kinda like what you can see here. In the example, he's counting the number of pages as it is a simple number. I wanna do something like:

"if (this.tweet.contains("kword1")) "+
"category = 'kword1 tweets'; " + 
"else if (this.tweet.contains("kword2")) " + 
"category = 'kword2 tweets'; 

and then use the reduce function to get the count, just like in the sample program.

I know that the syntax is incorrect, but that's pretty much what I would like to do. Is there any way of achieving it? Thanks!

PS: Oh, and I'm coding in Java. So the Java syntax would be highly appreciated. Thank you!

The output of the code posted is something like this:

{ "tweet" : "An autobiography is a book that reveals nothing bad about its writer except his memory."}
{ "tweet" : "I refuse to read anything that's not real the only thing I've read since biff books is Jordan's autobiography #lol"}
{ "tweet" : "well we've had the 2012 publication of Ashley's Good Books, I predict 2013 will be seeing an autobiography ;)"}

This of course, is for all tweets with the word "autobiography". What I'd like is to use this in the map function, categorize it as a "autobiography tweet" (and other keywords too), and then send it to the reduce function to count everything and return the number of tweets with the word in it.

Something like:

{"_id" : "Autobiography Tweets" , "value" : { "publicTweets" : 3.0}}
{"_id" : "Biography Tweets" , "value" : { "publicTweets" : 15.0}}

回答1:


You might want to try the following:

    String map = "function() { " +
                 "    var regex1 = new RegExp('autobiography', 'i'); " +
                 "    var regex2 = new RegExp('book', 'i'); " +
                 "    if (regex1.test(this.tweet) ) " +
                 "         emit('Autobiography Tweet', 1); " +
                 "    else if (regex2.test(this.tweet) ) " +
                 "         emit('Book Tweet', 1); " +
                 "    else " +
                 "       emit('Uncategorized Tweet', 1); " +
                 "}";

    String reduce = "function(key, values) { " +
                    "    return Array.sum(values); " +
                    "}";

    MapReduceCommand cmd = new MapReduceCommand(collection, map, reduce,
             null, MapReduceCommand.OutputType.INLINE, null);
    MapReduceOutput out = collection.mapReduce(cmd);

    try {
        for (DBObject o : out.results()) {

            System.out.println(o.toString());

       }
    } catch (Exception e) {
        e.printStackTrace();
    }    



回答2:


Although you already accepted the answer by Kay and this one will likely be ignored, I would like to suggest an alternative solution.

Th MongoDB documentation has an article about how to perform full text search in Mongo. In order to allow text-based fields to be searched quickly for individual words, they suggest to prepare the documents by splitting the textfields into arrays of individual words, store these arrays in the documents together with the full text, and create an index over this array.

Afterwards you can very quickly find all documents which contain a specific word, because your search query can 1. use an index and 2. doesn't have to use a regular expression (which can be very expensive).



来源:https://stackoverflow.com/questions/13732735/query-in-a-mongodb-map-reduce-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!