I figured I'd share some quick notes I had on a simple task that's not exactly strait forward in Clojure to the Lisp neophyte, like myself: XML Parsing. Clojure goes a long way to making it easy with clojure.xml.parse/xml-seq but complete/concise examples can be difficult to come by.

XML

All of the examples I'll outline below will depend on the following xml living in a file named "settings.xml" in the current working directory.

<settings>
    <timeout>5000</timeout>
    <email>example@mail.com</email>
    <hosts>
        <host url="http://computer1.domain.com">COMPUTER1</host>
        <host url="http://computer2.domain.com">COMPUTER2</host>
        <host url="http://computer3.domain.com">COMPUTER3</host>
    </hosts>
</settings>

Basic Requirements

In order to prepare clojure for the task at hand we'll need to make use of java.io.File and clojure.xml.

(import '(java.io File))
(use 'clojure.xml)

Examples

Just to get warmed up, consider the following. All this does is load the contents of "settings.xml" into a struct-map.

(let [xml-file (File. "settings.xml")]
  (xml-seq (parse xml-file)))

struct-map in hand we could actually pull data out of it. The following code simply outputs the content of any "email" nodes it finds while traversing the sequence produced by the previous example.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
        :when (= :email (:tag x))]
    (println (first (:content x)))))

producing the output

example@mail.com

The following example performs the same task but on all "host" nodes. Note that no concern is placed on the actual tree structure. "host" nodes would be matched regardless of their position within the XML hierarchy.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
        :when (= :host (:tag x))]
    (println (first (:content x)))))

resulting in

COMPUTER1
COMPUTER2
COMPUTER3

Now I'll apply some additional attention to structure. This example will only match "host" nodes that are children of a "hosts" parent.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
      :when (= :hosts (:tag x))]
      (doseq [y (:content x)
        :when (= :host (:tag y))]
        (println (first (:content y))))))

resulting in

COMPUTER1
COMPUTER2
COMPUTER3

From here it's simple to extract the contents of attributes as well. Here I'm extracting "url" attributes from "host" tags.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
      :when (= :hosts (:tag x))]
      (doseq [y (:content x)]
        (println (:url (:attrs y))))))

giving us

http://computer1.domain.com
http://computer2.domain.com
http://computer3.domain.com
Created on 2010-06-23 04:06:47 UTC
 
0 Comments - Comment Feed - Permalink
Name
E mail (Private)
URL
Body
Human?
Tags:
.Net .net framework 4.0 ADO.NET Android AppleScript Astoria BI BeOS C C++ Data Services EF GNOME GObject Groovy HTML Haiku JVM Java Lucene Mac MongoDB ORM Objective-C Operating Systems Oracle SSRS Solr VS 2010 Vala Web Services appengine c# clojure cloud clr cocoa touch concurrency couchdb cql curl database django dlr dynamic entity framework erlang exchange server filestream full-text functional go iPhone indexes ironpython ironruby jQuery linq lisp lucene mongodb monitoring natural language object oriented parallel performance podcasts powershell python rails refactoring remoting reporting services rs ruby scripting security setpolicies simpledb sql 2008 sql server systems programming testing tools vb virtualization wave webdav windows xml