Chris Umbel

Notes on Cloure XML Parsing

I figured I'd share some quick notes I had on a simple task that's not exactly strait forward in Clojure to the Lisp neophyte, like myself: XML Parsing. Clojure goes a long way to making it easy with clojure.xml.parse/xml-seq but complete/concise examples can be difficult to come by.


All of the examples I'll outline below will depend on the following xml living in a file named "settings.xml" in the current working directory.

        <host url="">COMPUTER1</host>
        <host url="">COMPUTER2</host>
        <host url="">COMPUTER3</host>

Basic Requirements

In order to prepare clojure for the task at hand we'll need to make use of and clojure.xml.

(import '( File))
(use 'clojure.xml)


Just to get warmed up, consider the following. All this does is load the contents of "settings.xml" into a struct-map.

(let [xml-file (File. "settings.xml")]
  (xml-seq (parse xml-file)))

struct-map in hand we could actually pull data out of it. The following code simply outputs the content of any "email" nodes it finds while traversing the sequence produced by the previous example.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
        :when (= :email (:tag x))]
    (println (first (:content x)))))

producing the output

The following example performs the same task but on all "host" nodes. Note that no concern is placed on the actual tree structure. "host" nodes would be matched regardless of their position within the XML hierarchy.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
        :when (= :host (:tag x))]
    (println (first (:content x)))))

resulting in


Now I'll apply some additional attention to structure. This example will only match "host" nodes that are children of a "hosts" parent.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
      :when (= :hosts (:tag x))]
      (doseq [y (:content x)
        :when (= :host (:tag y))]
        (println (first (:content y))))))

resulting in


From here it's simple to extract the contents of attributes as well. Here I'm extracting "url" attributes from "host" tags.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
      :when (= :hosts (:tag x))]
      (doseq [y (:content x)]
        (println (:url (:attrs y))))))

giving us

Wed Jun 23 2010 04:06:47 GMT+0000 (UTC)


Faceted Queries on acts_as_solr Associations

Solr LogoRecently in a rails app that employs Solr (via the acts_as_solr plugin) I've had the need to produce aggregate counts of entities on the far end of a many-to-many relationship. Essentially a tag cloud.

My first attempt was to keep it entirely in ActiveRecord which resulted in a proliferation of SQL command executions. Obviously that wasn't performant. Sure it looked elegant, but was slow and unsustainable. While I could have hand-crafted the SQL it was more performant still to retrieve the aggregations from Solr via facets. Hell, I had the data handy!

Such a faceted query directly from Solr is qutie simple but it required a little research to get it done with acts_as_solr due to my unfamiliarity with it. In order to make it simple for others attempting to do the same thing I figured I'd post a basic example here.

Blog Post/Tag Cloud Example

Let's consider a typical blog post/tag cloud scenario. In the spirit of demonstration I'll try to stick to manually created schemata and whatnot.

The Database

Here's the database schema... Posts, Tags and a junction table. While my example is in SQL Server's T-SQL it's not a dependancy.

create table Posts (
    id int identity primary key,
    title varchar(1024),
    content varchar(max)

create table Tags (
    id int identity primary key,
    name varchar(64)	

create table Posts_Tags (
    post_id int,
    tag_id int
    primary key (post_id, tag_id)	

Sample Data

Now I'll provide three sample posts. The first two tagged to "Sports", the last tagged to "Technology" and all tagged to "Hobbies"

insert into Posts
select 'A post about hockey', 'This is a sample post about hockey'

insert into Posts
select 'A post about football', 'This is a sample post about football'

insert into Posts
select 'A post about Computers', 'This is a sample post about computers'

insert into Tags
select 'Sports'

insert into Tags
select 'Technology'

insert into Tags
select 'Hobbies'

insert into Posts_Tags
select 1, 1 
select 2, 1
select 3, 2
select 1, 3
select 2, 3
select 3, 3

Conceptual Model

Here's the ActiveRecord model including the acts_as_solr bit. Check out the "include" key. All of that maps tags into a multivalued attribute in Solr. Pay special attention to the lambda function in the "using" key. That function produes the actual string value stored for each tag.

If the "using" key is omitted the tag will be serialized in a form similar to the string "id=1 name=Sports post_id=1 tag_id=1" which isn't what I'm interested in. I can't imagine anybody would be interested in it, honestly. That's why my lambda function produces just the tag's name.

class Tag < ActiveRecord::Base
  has_and_belongs_to_many :posts

class Post < ActiveRecord::Base
  has_and_belongs_to_many :tags

  acts_as_solr  :include => [
                            # include the associated tags in Solr
                            {:tags => {
                              # name the entities "tag_name"
                              :as => :tag_name,
                              # solr type for tag names will be string
                              :type => :string,
                              # a post has many tags
                              :multivalued => true,
                              # only store the name of the tag
                              :using => lambda{|tag|}}
                :fields => [:title, :content],
                :facets => [:tag_name]

Solr Schema

Here's an excerpt from a manually created Solr schema that will suffice for this example. Note the additional tag_name_facet field. acts_as_solr will employ that field in the production of my tag cloud via a facetted query.

  <field name="id" type="string" indexed="true" stored="true"/>  
  <field name="pk_i" type="int" indexed="true" stored="true"/>
  <field name="type_s" type="string" indexed="true" stored="true"/>

  <field name="title_t" type="text" indexed="true" stored="true"/>
  <field name="content_t" type="text" indexed="true" stored="true"/>

  <field name="tag_name_s" type="string" indexed="true" stored="false" multiValued="true"/>  
  <field name="tag_name_facet" type="string" indexed="true" stored="true" multiValued="true"/>  
  <field name="text" type="text" indexed="true" stored="false" multiValued="true"/>

<copyField source="title_t" dest="text"/>
<copyField source="tag_name_s" dest="text"/>
<copyField source="content_t" dest="text"/>
<copyField source="tag_name_s" dest="tag_name_facet"/>


Querying Solr

Assuming the data has been indexed it could now be queried in a controller like this:

class PostController < ApplicationController
  def index
    docs = Post.find_by_solr("*", :facets => {:fields => [:tag_name]})
    @tag_facets = docs.facets["facet_fields"]["tag_name_facet"]

Notice that I've dug into the factes hash to retreive the counts per tag name. That's the data that will power the tag cloud.

The View

Now I'll render the data. Naturally this would be far fancier in an actual tag cloud.

<% @tag_facets.each_pair do | name, count | %>
  <div><%= name %> (<%= count %>)</div>
<% end %>

which would produce:

Hobbies (3)
Sports (2)
Technology (1)

Thu May 20 2010 02:05:15 GMT+0000 (UTC)


Now in IronRuby on Rails

Just a quick note. I've again changed the architecture of this site. It's now in IronRuby on rails, running on Windows Server 2008 with SQL 2008 R2/Solr.

Previously I was using django on linux with Oracle Express/Solr which was just one node in a long list of architectures I've used here.

Why the change from django? Why the choice of IronRuby-microsoft-ish stack? Well, I'll surely be blogging about that shortly when I update my Tale of a Website post.

In the meantime please keep an eye out for anything not working and let me know if you find something broken.

Mon May 17 2010 04:05:24 GMT+0000 (UTC)


Wrapping Arbitrary Executables in an .app in Mac OS X 10.6.2

Recently I've purchased a new Mac and went through the typical, long and involved process of installing all the software I wanted on it. One application in particular, Netbeans, gave me some trouble. The installer simply refused to work. No matter, I figured. I'll just grab the universal version (it's written in Java after all). Sure enough it ran just fine.

One thing plagued me, however. Because the universal version doesn't have an actual .app file to put in your Applications directory you can't add it to the dock. While it's not a show-stopper it's clearly suboptimal.

I was certain there must be several ways I could wrap the netbeans startup shell script into an .app that I could add to my dock and maybe even set an icon for.

After some googling about I found several tutorials that did the job. Unfortunately they either involved some unnecessary steps or had information that was no longer accurate in Mac OS X 10.6. 2 Snow Leopard.

While I'm sure it may not be perfect either I decided to outline the steps I followed in the hopes that it will possibly help other Mac users trying to accomplish the same thing.

Wrapping your program in an AppleScript

The first thing we'll do is really all the meat. We'll put a three line AppleScript together that simply executes our target executable, netbeans in my case. Open up the AppleScript editor (located at /Applications/Utilities/AppleScript in 10.6.2) and enter code similar to the following:

to run
	do shell script "/opt/netbeans/bin/netbeans"
end run

Make sure to substitute the full path to the executable of your interest into line 2.

Now save the script into your /Applications folder but be careful to save it as an application, not as a script. I'll save mine as /Applications/

Create an icon

Really we could stop now. You could copy the .app file to your dock and life would be merry. It would be ugly, however. We might as well give it a pretty icon while we've gone to all this trouble.

Keep in mind to follow these steps you'll need the Icon Composer which requires you to have the Apple developer tools installed.

It's quite simple, really. Just find or create a .png file of around 128x128-ish that you'd like for your application's icon.

Now fire up Icon Composer which will be located at /Developer/Applications/Utilities/Icon Drag your .png file into the square labeled 128 and ensure the contents that display in the square are correct. Click on the square so it's highlighted and copy it with command-c.

Using the Finder locate the .app file you created in your /Applications directory and hit command-i. Click on the icon at the top so it's highlighted then hit command-v which will paste your icon into the information window. Close out the window and, POW, you've changed the icon!



Feel free to close out Icon Composer. It's up to you weather you want to save the icon or not. I recommend at least keeping the original .png artwork.

We're done!

Now you can drag your .app to the dock and enjoy!

Wed Apr 14 2010 16:04:00 GMT+0000 (UTC)

< 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 >
Follow Chris
RSS Feed