Chris Umbel

MapReduce with MongoMapper

A number of rails projects I've been working on lately have used MongoDB for a back-end via MongoMapper. In general it seems to do pretty much anything I'd want to do in a typical web app but finding documentation on how to do it can be difficult.

One such task I came across recently was performing on-the-fly map-reduce. After implementing it myself I decided to share a simple example.

Blog Post Example

Consider the typical Article model which is essentially a blog post. A title, some content and a list of tags. What I'll do is produce aggregate counts that could be used to display a tag cloud.

class Article
  include MongoMapper::Document

  key :title, String
  key :content, String
  key :tags, Array
end

Sample Data

I'll throw in three sample articles from the rails console.

Article.new(:title => 'one', :content => 'article one', :tags => ['number', 'one']).save()
Article.new(:title => 'two', :content => 'article two', :tags => ['number', 'two']).save()
Article.new(:title => 'uno', :content => 'article uno', :tags => ['number', 'uno', 'one']).save()

Map Reduce

Here's the money. I'll just slap a string containing a map javascript function and one containing a reduce javascript function into the Collection.map_reduce method.

I'll encapsulate this into a TagCloud class to be tidy.

class TagCloud
  self.map
    <<-JS
    function(){
      this.tags.forEach(function(tag){
        emit(tag, 1);
      });
    }
    JS
  end

  self.reduce 
    <<-JS
    function(prev, current) {
      var count = 0;

      for (index in current) {
          count += current[index];
      }

      return count;
    }
    JS
  end

  def self.build
    Article.collection.map_reduce(map, reduce, :query => {})
  end
end

In this case my mapping function simply iterates all tags in each document and emits them with a value of 1. The reduce function in turn tallies them up.

Querying

Now in the Article's controller I'll query the map_reduce in the tag_cloud action. This is a perfect action to sit behind a partial.

  def tag_cloud
    # here's where you could also add some filtering or sorting
    @tags = TagCloud.build.find()
  end

Display

Our map_reduce returns a hash keyed on the tag name we can use to display a tag cloud. For simplicity's sake I'll just display counts here, but here's where fancy tag-cloud style formatting could occur.

<% @tags.each do |tag| %>
   <%= "#{tag['_id']}" %> (<%= "#{tag['value']}" %>)
<% end %>

Producing:

number (3.0) one (2.0) two (1.0) uno (1.0)

Sun Aug 01 2010 02:52:42 GMT+0000 (UTC)

5 Comments Comment Feed - Permalink
Thanks for posting. One thing I have seen people do is move each map/reduce to its own class. Something like this:

class PageViewsByMonth
  def self.map
    <<-MAP
      function() {…}
    MAP
  end
  
  def self.reduce
    <<-REDUCE
      function() {…}
    REDUCE
  end
  
  def self.build
    Views.collection.map_reduce(map, reduce)
  end
end
by John Nunemaker on Sun Aug 01 2010 21:32:34 GMT+0000 (UTC)
Great, thanks John!
by Chris Umbel on Mon Aug 02 2010 02:02:54 GMT+0000 (UTC)
I couldn't help it.  I updated the example to match John's style.
by chrisumbel on Sun Sep 05 2010 13:05:48 GMT+0000 (UTC)
Great stuff on m/r. I have a very large data set of hundreds of millions of documents that are basically Events that are logged from impressions. The Events have a field called Publisher and I break this large collection down into smaller manageable collections via the mongo command line that are named pub.<name of publisher>. Any recommendation on how I can access those using MongoMapper? I also further map reduce those publisher collections down to unique tables as well as other aggregations that happen on a nightly cron. Any thought on accessing those through the app? Again, awesome post dude!
by Chris Barretto on Fri Dec 10 2010 09:02:56 GMT+0000 (UTC)
udyobbh
by nmttws on Tue Jan 17 2012 04:03:50 GMT+0000 (UTC)
Add a comment
Name
E mail (Private)
URL
Follow Chris
RSS Feed
Twitter
Facebook
CodePlex
github
LinkedIn
Google