Chris Umbel

Faceted Queries on acts_as_solr Associations

Solr LogoRecently in a rails app that employs Solr (via the acts_as_solr plugin) I've had the need to produce aggregate counts of entities on the far end of a many-to-many relationship. Essentially a tag cloud.

My first attempt was to keep it entirely in ActiveRecord which resulted in a proliferation of SQL command executions. Obviously that wasn't performant. Sure it looked elegant, but was slow and unsustainable. While I could have hand-crafted the SQL it was more performant still to retrieve the aggregations from Solr via facets. Hell, I had the data handy!

Such a faceted query directly from Solr is qutie simple but it required a little research to get it done with acts_as_solr due to my unfamiliarity with it. In order to make it simple for others attempting to do the same thing I figured I'd post a basic example here.

Blog Post/Tag Cloud Example

Let's consider a typical blog post/tag cloud scenario. In the spirit of demonstration I'll try to stick to manually created schemata and whatnot.

The Database

Here's the database schema... Posts, Tags and a junction table. While my example is in SQL Server's T-SQL it's not a dependancy.

create table Posts (
    id int identity primary key,
    title varchar(1024),
    content varchar(max)
)

create table Tags (
    id int identity primary key,
    name varchar(64)	
)

create table Posts_Tags (
    post_id int,
    tag_id int
    primary key (post_id, tag_id)	
)

Sample Data

Now I'll provide three sample posts. The first two tagged to "Sports", the last tagged to "Technology" and all tagged to "Hobbies"

insert into Posts
select 'A post about hockey', 'This is a sample post about hockey'

insert into Posts
select 'A post about football', 'This is a sample post about football'

insert into Posts
select 'A post about Computers', 'This is a sample post about computers'

insert into Tags
select 'Sports'

insert into Tags
select 'Technology'

insert into Tags
select 'Hobbies'

insert into Posts_Tags
select 1, 1 
union
select 2, 1
union
select 3, 2
union
select 1, 3
union
select 2, 3
union
select 3, 3

Conceptual Model

Here's the ActiveRecord model including the acts_as_solr bit. Check out the "include" key. All of that maps tags into a multivalued attribute in Solr. Pay special attention to the lambda function in the "using" key. That function produes the actual string value stored for each tag.

If the "using" key is omitted the tag will be serialized in a form similar to the string "id=1 name=Sports post_id=1 tag_id=1" which isn't what I'm interested in. I can't imagine anybody would be interested in it, honestly. That's why my lambda function produces just the tag's name.

class Tag < ActiveRecord::Base
  has_and_belongs_to_many :posts
end

class Post < ActiveRecord::Base
  has_and_belongs_to_many :tags

  acts_as_solr  :include => [
                            # include the associated tags in Solr
                            {:tags => {
                              # name the entities "tag_name"
                              :as => :tag_name,
                              # solr type for tag names will be string
                              :type => :string,
                              # a post has many tags
                              :multivalued => true,
                              # only store the name of the tag
                              :using => lambda{|tag| tag.name}}
                            }],
                :fields => [:title, :content],
                :facets => [:tag_name]
end

Solr Schema

Here's an excerpt from a manually created Solr schema that will suffice for this example. Note the additional tag_name_facet field. acts_as_solr will employ that field in the production of my tag cloud via a facetted query.

<fields>
  <field name="id" type="string" indexed="true" stored="true"/>  
  <field name="pk_i" type="int" indexed="true" stored="true"/>
  <field name="type_s" type="string" indexed="true" stored="true"/>

  <field name="title_t" type="text" indexed="true" stored="true"/>
  <field name="content_t" type="text" indexed="true" stored="true"/>

  <field name="tag_name_s" type="string" indexed="true" stored="false" multiValued="true"/>  
  <field name="tag_name_facet" type="string" indexed="true" stored="true" multiValued="true"/>  
  
  <field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
</fields>

<copyField source="title_t" dest="text"/>
<copyField source="tag_name_s" dest="text"/>
<copyField source="content_t" dest="text"/>
<copyField source="tag_name_s" dest="tag_name_facet"/>

<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>

Querying Solr

Assuming the data has been indexed it could now be queried in a controller like this:

class PostController < ApplicationController
  def index
    docs = Post.find_by_solr("*", :facets => {:fields => [:tag_name]})
    @tag_facets = docs.facets["facet_fields"]["tag_name_facet"]
  end
end

Notice that I've dug into the factes hash to retreive the counts per tag name. That's the data that will power the tag cloud.

The View

Now I'll render the data. Naturally this would be far fancier in an actual tag cloud.

<% @tag_facets.each_pair do | name, count | %>
  <div><%= name %> (<%= count %>)</div>
<% end %>

which would produce:

Hobbies (3)
Sports (2)
Technology (1)

Thu May 20 2010 02:05:15 GMT+0000 (UTC)

Follow Chris
RSS Feed
Twitter
Facebook
CodePlex
github
LinkedIn
Google