Oct 31, 2010

Basic usage of MapReduce in MongoDB

After reading some useful MongoDB MapReduce materials (see References below), it is time to try it myself in practice. Consider a business rating system, it allows users to rate businesses by selecting the point in the range 1(bad)-5(good), the document structure is as follow. The business 'Old Novice' got 2 ratings.
{
  name: 'Old Novice',
  rating : [ {user: 'ngsiolei', point: 3},
             {user: 'lei', point: 4} ]
}


Rating distribution (count per rating point) is a common data we need to know. Now i try to get rating distribution group by business. map function walks through all documents, i create a count object for each rating and emit the count object with the key business name. reduce function accumulates the counts associated with keys.
db.runCommand({
    mapreduce: 'business',
    map: function() {
        var rating = this.rating
        for (var i = 0; i < rating.length; i++) {
            var count = { '1' : 0, '2' : 0, '3' : 0, '4' : 0, '5' : 0, 'all' : 0 }; 
            count[rating[i]['point'].toString()] = 1;
            emit(this.name, count);
        }   
    },  
    reduce: function(key, values) {
        var count = { '1' : 0, '2' : 0, '3' : 0, '4' : 0, '5' : 0, 'all' : 0 }; 
        for (var i = 0; i < values.length; i++) {
            count['1'] += values[i]['1'];
            count['2'] += values[i]['2'];
            count['3'] += values[i]['3'];
            count['4'] += values[i]['4'];
            count['5'] += values[i]['5'];
        }   
        count['all'] = count['1'] + count['2'] + count['3'] + count['4'] + count['5'];
        return count;
    },  
    out: 'res20101031',
});

So, i can find a business's rating distribution by simple query
db.res20101031.find({ 'name' : 'Old Novice' });
{ "_id" : "Old Novice", "value" : { "1" : 0, "2" : 0, "3" : 1, 
"4" : 1, "5" : 0, "all" : 2 } }

i had a lesson about reading documentation carelessly. That is, i spent hours on debugging on inconsistent data format between map and reduce and finally found that MongoDB document said it explicitly
The output of the map function's emit (the second argument) and the value returned by reduce should be the same format to make iterative reduce possible. If not, there will be weird bugs that are hard to debug.

References

http://www.mongodb.org/display/DOCS/MapReduce
http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/
http://rickosborne.org/blog/index.php/2010/02/08/playing-around-with-mongodb-and-mapreduce-functions/

No comments:

Post a Comment