Chris Umbel

MapReduce with MongoMapper

A number of rails projects I've been working on lately have used MongoDB for a back-end via MongoMapper. In general it seems to do pretty much anything I'd want to do in a typical web app but finding documentation on how to do it can be difficult.

One such task I came across recently was performing on-the-fly map-reduce. After implementing it myself I decided to share a simple example.

Blog Post Example

Consider the typical Article model which is essentially a blog post. A title, some content and a list of tags. What I'll do is produce aggregate counts that could be used to display a tag cloud.

class Article
  include MongoMapper::Document

  key :title, String
  key :content, String
  key :tags, Array
end

Sample Data

I'll throw in three sample articles from the rails console.

Article.new(:title => 'one', :content => 'article one', :tags => ['number', 'one']).save()
Article.new(:title => 'two', :content => 'article two', :tags => ['number', 'two']).save()
Article.new(:title => 'uno', :content => 'article uno', :tags => ['number', 'uno', 'one']).save()

Map Reduce

Here's the money. I'll just slap a string containing a map javascript function and one containing a reduce javascript function into the Collection.map_reduce method.

I'll encapsulate this into a TagCloud class to be tidy.

class TagCloud
  self.map
    <<-JS
    function(){
      this.tags.forEach(function(tag){
        emit(tag, 1);
      });
    }
    JS
  end

  self.reduce 
    <<-JS
    function(prev, current) {
      var count = 0;

      for (index in current) {
          count += current[index];
      }

      return count;
    }
    JS
  end

  def self.build
    Article.collection.map_reduce(map, reduce, :query => {})
  end
end

In this case my mapping function simply iterates all tags in each document and emits them with a value of 1. The reduce function in turn tallies them up.

Querying

Now in the Article's controller I'll query the map_reduce in the tag_cloud action. This is a perfect action to sit behind a partial.

  def tag_cloud
    # here's where you could also add some filtering or sorting
    @tags = TagCloud.build.find()
  end

Display

Our map_reduce returns a hash keyed on the tag name we can use to display a tag cloud. For simplicity's sake I'll just display counts here, but here's where fancy tag-cloud style formatting could occur.

<% @tags.each do |tag| %>
   <%= "#{tag['_id']}" %> (<%= "#{tag['value']}" %>)
<% end %>

Producing:

number (3.0) one (2.0) two (1.0) uno (1.0)

Sun Aug 01 2010 02:52:42 GMT+0000 (UTC)

Follow Chris
RSS Feed
Twitter
Facebook
CodePlex
github
LinkedIn
Google