XML
All of the examples I'll outline below will depend on the following xml living in a file named "settings.xml" in the current working directory.
<settings>
<timeout>5000</timeout>
<email>example@mail.com</email>
<hosts>
<host url="http://computer1.domain.com">COMPUTER1</host>
<host url="http://computer2.domain.com">COMPUTER2</host>
<host url="http://computer3.domain.com">COMPUTER3</host>
</hosts>
</settings>
Basic Requirements
In order to prepare clojure for the task at hand we'll need to make use of java.io.File and clojure.xml.
(import '(java.io File)) (use 'clojure.xml)
Examples
Just to get warmed up, consider the following. All this does is load the contents of "settings.xml" into a struct-map.
(let [xml-file (File. "settings.xml")] (xml-seq (parse xml-file)))
struct-map in hand we could actually pull data out of it. The following code simply outputs the content of any "email" nodes it finds while traversing the sequence produced by the previous example.
(let [xml-file (File. "settings.xml")]
(doseq [x (xml-seq (parse xml-file))
:when (= :email (:tag x))]
(println (first (:content x)))))
producing the output
example@mail.com
The following example performs the same task but on all "host" nodes. Note that no concern is placed on the actual tree structure. "host" nodes would be matched regardless of their position within the XML hierarchy.
(let [xml-file (File. "settings.xml")]
(doseq [x (xml-seq (parse xml-file))
:when (= :host (:tag x))]
(println (first (:content x)))))
resulting in
COMPUTER1 COMPUTER2 COMPUTER3
Now I'll apply some additional attention to structure. This example will only match "host" nodes that are children of a "hosts" parent.
(let [xml-file (File. "settings.xml")]
(doseq [x (xml-seq (parse xml-file))
:when (= :hosts (:tag x))]
(doseq [y (:content x)
:when (= :host (:tag y))]
(println (first (:content y))))))
resulting in
COMPUTER1 COMPUTER2 COMPUTER3
From here it's simple to extract the contents of attributes as well. Here I'm extracting "url" attributes from "host" tags.
(let [xml-file (File. "settings.xml")]
(doseq [x (xml-seq (parse xml-file))
:when (= :hosts (:tag x))]
(doseq [y (:content x)]
(println (:url (:attrs y))))))
giving us
http://computer1.domain.com http://computer2.domain.com http://computer3.domain.com

Recently in a rails app that employs
Solr
(via the acts_as_solr plugin) I've
had the need to produce aggregate counts of entities on the far end
of a many-to-many relationship. Essentially a tag cloud.
My first attempt was to keep it entirely in ActiveRecord which resulted in a proliferation of SQL command executions. Obviously that wasn't performant. Sure it looked elegant, but was slow and unsustainable. While I could have hand-crafted the SQL it was more performant still to retrieve the aggregations from Solr via facets. Hell, I had the data handy!
Such a faceted query directly from Solr is qutie simple but it required a little research to get it done with acts_as_solr due to my unfamiliarity with it. In order to make it simple for others attempting to do the same thing I figured I'd post a basic example here.
Blog Post/Tag Cloud Example
Let's consider a typical blog post/tag cloud scenario. In the spirit of demonstration I'll try to stick to manually created schemata and whatnot.
The Database
Here's the database schema... Posts, Tags and a junction table. While my example is in SQL Server's T-SQL it's not a dependancy.
create table Posts (
id int identity primary key,
title varchar(1024),
content varchar(max)
)
create table Tags (
id int identity primary key,
name varchar(64)
)
create table Posts_Tags (
post_id int,
tag_id int
primary key (post_id, tag_id)
)
Sample Data
Now I'll provide three sample posts. The first two tagged to "Sports", the last tagged to "Technology" and all tagged to "Hobbies"
insert into Posts select 'A post about hockey', 'This is a sample post about hockey' insert into Posts select 'A post about football', 'This is a sample post about football' insert into Posts select 'A post about Computers', 'This is a sample post about computers' insert into Tags select 'Sports' insert into Tags select 'Technology' insert into Tags select 'Hobbies' insert into Posts_Tags select 1, 1 union select 2, 1 union select 3, 2 union select 1, 3 union select 2, 3 union select 3, 3
Conceptual Model
Here's the ActiveRecord model including the acts_as_solr bit. Check out the "include" key. All of that maps tags into a multivalued attribute in Solr. Pay special attention to the lambda function in the "using" key. That function produes the actual string value stored for each tag.
If the "using" key is omitted the tag will be serialized in a form similar to the string "id=1 name=Sports post_id=1 tag_id=1" which isn't what I'm interested in. I can't imagine anybody would be interested in it, honestly. That's why my lambda function produces just the tag's name.
class Tag < ActiveRecord::Base
has_and_belongs_to_many :posts
end
class Post < ActiveRecord::Base
has_and_belongs_to_many :tags
acts_as_solr :include => [
# include the associated tags in Solr
{:tags => {
# name the entities "tag_name"
:as => :tag_name,
# solr type for tag names will be string
:type => :string,
# a post has many tags
:multivalued => true,
# only store the name of the tag
:using => lambda{|tag| tag.name}}
}],
:fields => [:title, :content],
:facets => [:tag_name]
end
Solr Schema
Here's an excerpt from a manually created Solr schema that will suffice for this example. Note the additional tag_name_facet field. acts_as_solr will employ that field in the production of my tag cloud via a facetted query.
<fields> <field name="id" type="string" indexed="true" stored="true"/> <field name="pk_i" type="int" indexed="true" stored="true"/> <field name="type_s" type="string" indexed="true" stored="true"/> <field name="title_t" type="text" indexed="true" stored="true"/> <field name="content_t" type="text" indexed="true" stored="true"/> <field name="tag_name_s" type="string" indexed="true" stored="false" multiValued="true"/> <field name="tag_name_facet" type="string" indexed="true" stored="true" multiValued="true"/> <field name="text" type="text" indexed="true" stored="false" multiValued="true"/> </fields> <copyField source="title_t" dest="text"/> <copyField source="tag_name_s" dest="text"/> <copyField source="content_t" dest="text"/> <copyField source="tag_name_s" dest="tag_name_facet"/> <uniqueKey>id</uniqueKey> <defaultSearchField>text</defaultSearchField>
Querying Solr
Assuming the data has been indexed it could now be queried in a controller like this:
class PostController < ApplicationController
def index
docs = Post.find_by_solr("*", :facets => {:fields => [:tag_name]})
@tag_facets = docs.facets["facet_fields"]["tag_name_facet"]
end
end
Notice that I've dug into the factes hash to retreive the counts per tag name. That's the data that will power the tag cloud.
The View
Now I'll render the data. Naturally this would be far fancier in an actual tag cloud.
<% @tag_facets.each_pair do | name, count | %> <div><%= name %> (<%= count %>)</div> <% end %>
which would produce:
Hobbies (3) Sports (2) Technology (1)

Just a quick note. I've again changed the architecture of this site. It's now in IronRuby on rails, running on Windows Server 2008 with SQL 2008 R2/Solr.
Previously I was using django on linux with Oracle Express/Solr which was just one node in a long list of architectures I've used here.
Why the change from django? Why the choice of IronRuby-microsoft-ish stack? Well, I'll surely be blogging about that shortly when I update my Tale of a Website post.
In the meantime please keep an eye out for anything not working and let me know if you find something broken.

One thing plagued me, however. Because the universal version doesn't have an actual .app file to put in your Applications directory you can't add it to the dock. While it's not a show-stopper it's clearly suboptimal.
I was certain there must be several ways I could wrap the netbeans startup shell script into an .app that I could add to my dock and maybe even set an icon for.
After some googling about I found several tutorials that did the job. Unfortunately they either involved some unnecessary steps or had information that was no longer accurate in Mac OS X 10.6. 2 Snow Leopard.
While I'm sure it may not be perfect either I decided to outline the steps I followed in the hopes that it will possibly help other Mac users trying to accomplish the same thing.
Wrapping your program in an AppleScript
The first thing we'll do is really all the meat. We'll put a three line AppleScript together that simply executes our target executable, netbeans in my case. Open up the AppleScript editor (located at /Applications/Utilities/AppleScript Editor.app in 10.6.2) and enter code similar to the following:
to run do shell script "/opt/netbeans/bin/netbeans" end run
Make sure to substitute the full path to the executable of your interest into line 2.
Now save the script into your /Applications folder but be careful to save it as an application, not as a script. I'll save mine as /Applications/run_netbeans.app.

Create an icon
Really we could stop now. You could copy the .app file to your dock and life would be merry. It would be ugly, however. We might as well give it a pretty icon while we've gone to all this trouble.
Keep in mind to follow these steps you'll need the Icon Composer which requires you to have the Apple developer tools installed.
It's quite simple, really. Just find or create a .png file of around 128x128-ish that you'd like for your application's icon.
Now fire up Icon Composer which will be located at /Developer/Applications/Utilities/Icon Composer.app. Drag your .png file into the square labeled 128 and ensure the contents that display in the square are correct. Click on the square so it's highlighted and copy it with command-c.
![]()
Using the Finder locate the .app file you created in your /Applications directory and hit command-i. Click on the icon at the top so it's highlighted then hit command-v which will paste your icon into the information window. Close out the window and, POW, you've changed the icon!
Before: ![]()
After: ![]()
Feel free to close out Icon Composer. It's up to you weather you want to save the icon or not. I recommend at least keeping the original .png artwork.
We're done!
Now you can drag your .app to the dock and enjoy!

One thing that's impressed me with Solr
is the flexibility of the Data Import Handlers
(DIHs). When I was new to Solr there were several times I thought for sure I'd
have to write my own extension of DataImportHandler. Every time that's happened
I've been wrong. A transformer or something handled my needs. Sometimes it's wonderful to be wrong! Especially when it means less code I have to write myself!
One of the aspects of DIH's that provide such great flexibility is transformers like RegexTransformer and TemplateTransformer. In this post, however, I'm going to *quickly* cover the ScriptTransformer wich allows you to employ your own custom JavaScript code in the processing of imports.
Prerequisites
Obviously you'll need a functional Solr instance. Also, ScriptTransformers require Java 6 due to JavaScript support. I'll also assume you have an understanding of how dynamicFields work.
Objective
At the office I've recently used a ScriptTransformer to build the field names of dynamicFields and I'm going to do the same in this article. The actual use-case I dealt with was very esoteric and honestly a bit proprietary so I'll substitute an example data scenario here.
Basically I'll import data about students grades for various courses from different institutions. In the resultant Solr index I'll provide a dynamicField for every course to provide easy sorting of students by their grades in the courses they took.
Consider the following MySQL schema and data and try to think beyond this sample data. Think about hundreds of schools, thousands of courses and, well, a ton of students.
create table schools (
id int auto_increment primary key,
name varchar(255)
);
insert into schools (name) values ('Pitt');
insert into schools (name) values ('Penn State');
create table students (
id int auto_increment primary key,
first_name varchar(255),
last_name varchar(255),
current_school_id int references schools(id)
);
insert into students (first_name, last_name, current_school_id) values
('John', 'Doe', 1);
insert into students (first_name, last_name, current_school_id) values
('Bill', 'Miller', 1);
insert into students (first_name, last_name, current_school_id) values
('Jane', 'Dow', 2);
insert into students (first_name, last_name, current_school_id) values
('Dennis', 'Itchison', 2);
create table courses (
id int auto_increment primary key,
school_id int references schools(id),
course_number varchar(10),
name varchar(255)
);
insert into courses (school_id, course_number, name) values
(1, 'CS1501', 'Algorithm Implementations');
insert into courses (school_id, course_number, name) values
(1, 'CS1541', 'Introduction to Computer Architecture');
insert into courses (school_id, course_number, name) values
(2, 'CMPSC465', 'Data Structures and Algorithm');
insert into courses (school_id, course_number, name) values
(2, 'CMPSC473', 'Operating Systems');
create table grades (
id int auto_increment primary key,
value FLOAT,
course_id int references courses(id),
student_id int references students (id)
);
insert into grades (value, course_id, student_id) values (4.0, 1, 1);
insert into grades (value, course_id, student_id) values (2.5, 2, 1);
insert into grades (value, course_id, student_id) values (3.0, 3, 1);
insert into grades (value, course_id, student_id) values (3.0, 1, 2);
insert into grades (value, course_id, student_id) values (3.5, 2, 2);
insert into grades (value, course_id, student_id) values (3.5, 3, 3);
insert into grades (value, course_id, student_id) values (2.5, 4, 3);
insert into grades (value, course_id, student_id) values (3.0, 3, 4);
insert into grades (value, course_id, student_id) values (2.0, 4, 4);
Keep in mind that an idea here is that there would be far too many courses to conceivably have a sparse-style column per course if we were denormalizing a list of students. A student can also have taken courses at several of the institutions despite where they're enrolled now.
Solr Schema
The data above will be transformed into the following Solr schema:
<fields> <field name="id" type="int" indexed="true" stored="true" required="true"/> <field name="first_name" type="string" indexed="true" stored="true"/> <field name="last_name" type="string" indexed="true" stored="true"/> <dynamicField name="grade_*" require="false" type="float"/> </fields>
DIH Configuration
In order to facilitate the transformation of the data into the schema defined above I'll employ the following DIH configuration:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/school"
user="YOUR_USER"
password="YOUR_PASSWORD"/>
<script>
<![CDATA[
function pivotGrades(row) {
var courseNumber = row.get("course_number");
var gradeValue = row.get("value");
var fieldName = "grade_" + courseNumber;
row.put(fieldName, gradeValue);
return row;
}
]]>
</script>
<document name="studentGrades">
<entity name="student" query="
SELECT id, first_name, last_name
FROM students
">
<field column="id" name="id"/>
<field column="first_name" name="first_name"/>
<field column="last_name" name="last_name"/>
<entity name="grade" query="
SELECT course_number, value
FROM grades
INNER JOIN courses
ON grades.course_id = courses.id
WHERE grades.student_id = ${student.id}
" transformer="script:pivotGrades">
</entity>
</entity>
</document>
</dataConfig>
See the script tag? That's where I've defined a pivotGrades javascript function to turn the data from grade sub-entity on its side into dynamicFields. In the real world you might expect to see some more intense text manipulation here to warrant the ScriptTransformation I s'pect.
Querying
All the work I've done above was done specifically so I can easily and concisely sort students by their grades in specific courses. Here's the money:
http://localhost:8080/solr/students/select/?q=*:*&version=2.2&sort=grade_CS1541%20desc
Resulting in:
<response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="sort">grade_CS1541 desc</str> <str name="indent">on</str> <str name="q">*:*</str> <str name="version">2.2</str> </lst> </lst> <result name="response" numFound="4" start="0"> <doc> <str name="first_name">Bill</str> <float name="grade_CS1501">3.0</float> <float name="grade_CS1541">3.5</float> <int name="id">2</int> <str name="last_name">Miller</str> </doc> <doc> <str name="first_name">John</str> <float name="grade_CMPSC465">3.0</float> <float name="grade_CS1541">2.5</float> <int name="id">1</int> <str name="last_name">Doe</str> </doc> <doc> <str name="first_name">Jane</str> <float name="grade_CMPSC465">3.5</float> <float name="grade_CMPSC473">2.5</float> <int name="id">3</int> <str name="last_name">Dow</str> </doc> <doc> <str name="first_name">Dennis</str> <float name="grade_CMPSC465">3.0</float> <float name="grade_CMPSC473">2.0</float> <int name="id">4</int> <str name="last_name">Itchison</str> </doc> </result> </response>

Digg it
Reddit
Delicous
Facebook










