Recently in a rails app that employs
Solr
(via the acts_as_solr plugin) I've
had the need to produce aggregate counts of entities on the far end
of a many-to-many relationship. Essentially a tag cloud.
My first attempt was to keep it entirely in ActiveRecord which resulted in a proliferation of SQL command executions. Obviously that wasn't performant. Sure it looked elegant, but was slow and unsustainable. While I could have hand-crafted the SQL it was more performant still to retrieve the aggregations from Solr via facets. Hell, I had the data handy!
Such a faceted query directly from Solr is qutie simple but it required a little research to get it done with acts_as_solr due to my unfamiliarity with it. In order to make it simple for others attempting to do the same thing I figured I'd post a basic example here.
Blog Post/Tag Cloud Example
Let's consider a typical blog post/tag cloud scenario. In the spirit of demonstration I'll try to stick to manually created schemata and whatnot.
The Database
Here's the database schema... Posts, Tags and a junction table. While my example is in SQL Server's T-SQL it's not a dependancy.
create table Posts (
id int identity primary key,
title varchar(1024),
content varchar(max)
)
create table Tags (
id int identity primary key,
name varchar(64)
)
create table Posts_Tags (
post_id int,
tag_id int
primary key (post_id, tag_id)
)
Sample Data
Now I'll provide three sample posts. The first two tagged to "Sports", the last tagged to "Technology" and all tagged to "Hobbies"
insert into Posts select 'A post about hockey', 'This is a sample post about hockey' insert into Posts select 'A post about football', 'This is a sample post about football' insert into Posts select 'A post about Computers', 'This is a sample post about computers' insert into Tags select 'Sports' insert into Tags select 'Technology' insert into Tags select 'Hobbies' insert into Posts_Tags select 1, 1 union select 2, 1 union select 3, 2 union select 1, 3 union select 2, 3 union select 3, 3
Conceptual Model
Here's the ActiveRecord model including the acts_as_solr bit. Check out the "include" key. All of that maps tags into a multivalued attribute in Solr. Pay special attention to the lambda function in the "using" key. That function produes the actual string value stored for each tag.
If the "using" key is omitted the tag will be serialized in a form similar to the string "id=1 name=Sports post_id=1 tag_id=1" which isn't what I'm interested in. I can't imagine anybody would be interested in it, honestly. That's why my lambda function produces just the tag's name.
class Tag < ActiveRecord::Base
has_and_belongs_to_many :posts
end
class Post < ActiveRecord::Base
has_and_belongs_to_many :tags
acts_as_solr :include => [
# include the associated tags in Solr
{:tags => {
# name the entities "tag_name"
:as => :tag_name,
# solr type for tag names will be string
:type => :string,
# a post has many tags
:multivalued => true,
# only store the name of the tag
:using => lambda{|tag| tag.name}}
}],
:fields => [:title, :content],
:facets => [:tag_name]
end
Solr Schema
Here's an excerpt from a manually created Solr schema that will suffice for this example. Note the additional tag_name_facet field. acts_as_solr will employ that field in the production of my tag cloud via a facetted query.
<fields> <field name="id" type="string" indexed="true" stored="true"/> <field name="pk_i" type="int" indexed="true" stored="true"/> <field name="type_s" type="string" indexed="true" stored="true"/> <field name="title_t" type="text" indexed="true" stored="true"/> <field name="content_t" type="text" indexed="true" stored="true"/> <field name="tag_name_s" type="string" indexed="true" stored="false" multiValued="true"/> <field name="tag_name_facet" type="string" indexed="true" stored="true" multiValued="true"/> <field name="text" type="text" indexed="true" stored="false" multiValued="true"/> </fields> <copyField source="title_t" dest="text"/> <copyField source="tag_name_s" dest="text"/> <copyField source="content_t" dest="text"/> <copyField source="tag_name_s" dest="tag_name_facet"/> <uniqueKey>id</uniqueKey> <defaultSearchField>text</defaultSearchField>
Querying Solr
Assuming the data has been indexed it could now be queried in a controller like this:
class PostController < ApplicationController
def index
docs = Post.find_by_solr("*", :facets => {:fields => [:tag_name]})
@tag_facets = docs.facets["facet_fields"]["tag_name_facet"]
end
end
Notice that I've dug into the factes hash to retreive the counts per tag name. That's the data that will power the tag cloud.
The View
Now I'll render the data. Naturally this would be far fancier in an actual tag cloud.
<% @tag_facets.each_pair do | name, count | %> <div><%= name %> (<%= count %>)</div> <% end %>
which would produce:
Hobbies (3) Sports (2) Technology (1)
Thanks for this helpful article, I am using acts_as_solr on associations as well but differently.
It used to work until I changed the primary keys to string as I needed to generate random ids with uuid-tools.
I just got an error because the pk_i key expected is of type int and not a string. I can't change it to string as some of my other models use int ids.
Do you have any suggestions by any chance for this problem?
Below is the stacktrace:
INFO: [] webapp=/solr path=/select params={fl=pk_i,score&wt=ruby&qt=standard&q=(hsbc)+AND+(type_s:CrmCompanyHolding);sh_held_t+desc&rows=200} hits=94 status=0 QTime=137
Jul 29, 2010 9:35:39 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NumberFormatException: For input string: "079e9e96-991e-11df-bcc3-001279deb1c4"
Thanks,
by
Mazhout on 2010-07-29 13:51:47 UTC
Hi,
Here is a more precise log:
Thanks,
(eval):1: syntax error, unexpected tIDENTIFIER, expecting ']'
...94,'docs'=>[{'pk_i'=>[798bc248-9a15-11df-8829-001279deb1c4],...
^
(eval):1: syntax error, unexpected tIDENTIFIER, expecting $end
...ocs'=>[{'pk_i'=>[798bc248-9a15-11df-8829-001279deb1c4],'scor...
by
Mazhout on 2010-07-29 14:30:13 UTC
Hi, I found the solution: I changed in acts_as_solr/lib/acts_methods the following line: :primary_key_field => "pk_i", to: :primary_key_field => "pk_s", rake solr:reindex and it's ok now, I guess solr can casts id into string. Cheers.by Mazhout on 2010-07-30 01:00:47 UTC
Ok, So the RIGHT solution is very simple: in my models where the id is a string I just need to add: acts_as_solr :fields => [:category], :primary_key_field => 'pk_s'by Mazhout on 2010-07-30 05:59:33 UTC
Sorry I was unable to get back to you. Thanks for posting your solution!by Chris Umbel on 2010-07-30 15:19:59 UTC
Digg it
Reddit
Delicous
Facebook











