A number of rails projects I've been working on lately have used MongoDB for a back-end
via MongoMapper. In general it seems to do pretty much anything I'd want to do
in a typical web app but finding documentation on how to do it can be difficult.
One such task I came across recently was performing on-the-fly map-reduce. After implementing it myself I decided to share a simple example.
Blog Post Example
Consider the typical Article model which is essentially a blog post. A title, some content and a list of tags. What I'll do is produce aggregate counts that could be used to display a tag cloud.
class Article include MongoMapper::Document key :title, String key :content, String key :tags, Array end
Sample Data
I'll throw in three sample articles from the rails console.
Article.new(:title => 'one', :content => 'article one', :tags => ['number', 'one']).save() Article.new(:title => 'two', :content => 'article two', :tags => ['number', 'two']).save() Article.new(:title => 'uno', :content => 'article uno', :tags => ['number', 'uno', 'one']).save()
Map Reduce
Here's the money. I'll just slap a string containing a map javascript function and one containing a reduce javascript function into the Collection.map_reduce method.
I'll encapsulate this into a TagCloud class to be tidy.
class TagCloud
self.map
<<-JS
function(){
this.tags.forEach(function(tag){
emit(tag, 1);
});
}
JS
end
self.reduce
<<-JS
function(prev, current) {
var count = 0;
for (index in current) {
count += current[index];
}
return count;
}
JS
end
def self.build
Article.collection.map_reduce(map, reduce, :query => {})
end
end
In this case my mapping function simply iterates all tags in each document and emits them with a value of 1. The reduce function in turn tallies them up.
Querying
Now in the Article's controller I'll query the map_reduce in the tag_cloud action. This is a perfect action to sit behind a partial.
def tag_cloud
# here's where you could also add some filtering or sorting
@tags = TagCloud.build.find()
end
Display
Our map_reduce returns a hash keyed on the tag name we can use to display a tag cloud. For simplicity's sake I'll just display counts here, but here's where fancy tag-cloud style formatting could occur.
<% @tags.each do |tag| %>
<%= "#{tag['_id']}" %> (<%= "#{tag['value']}" %>)
<% end %>
Producing:
number (3.0) one (2.0) two (1.0) uno (1.0)
Sun Aug 01 2010 02:52:42 GMT+0000 (UTC)
Comment Feed -
Permalink
Thanks for posting. One thing I have seen people do is move each map/reduce to its own class. Something like this:
class PageViewsByMonth
def self.map
<<-MAP
function() {…}
MAP
end
def self.reduce
<<-REDUCE
function() {…}
REDUCE
end
def self.build
Views.collection.map_reduce(map, reduce)
end
end
by
John Nunemaker
on Sun Aug 01 2010 21:32:34 GMT+0000 (UTC)
I couldn't help it. I updated the example to match John's style.by chrisumbel on Sun Sep 05 2010 13:05:48 GMT+0000 (UTC)
Great stuff on m/r. I have a very large data set of hundreds of millions of documents that are basically Events that are logged from impressions. The Events have a field called Publisher and I break this large collection down into smaller manageable collections via the mongo command line that are named pub.<name of publisher>. Any recommendation on how I can access those using MongoMapper? I also further map reduce those publisher collections down to unique tables as well as other aggregations that happen on a nightly cron. Any thought on accessing those through the app? Again, awesome post dude!by Chris Barretto on Fri Dec 10 2010 09:02:56 GMT+0000 (UTC)