I figured I'd share some quick notes I had on a simple task that's not exactly strait forward in Clojure to the Lisp neophyte, like myself: XML Parsing. Clojure goes a long way to making it easy with clojure.xml.parse/xml-seq but complete/concise examples can be difficult to come by.
XML
All of the examples I'll outline below will depend on the following xml living in a file named "settings.xml" in the current working directory.
<settings>
<timeout>5000</timeout>
<email>example@mail.com</email>
<hosts>
<host url="http://computer1.domain.com">COMPUTER1</host>
<host url="http://computer2.domain.com">COMPUTER2</host>
<host url="http://computer3.domain.com">COMPUTER3</host>
</hosts>
</settings>
Basic Requirements
In order to prepare clojure for the task at hand we'll need to make use of java.io.File and clojure.xml.
(import '(java.io File)) (use 'clojure.xml)
Examples
Just to get warmed up, consider the following. All this does is load the contents of "settings.xml" into a struct-map.
(let [xml-file (File. "settings.xml")] (xml-seq (parse xml-file)))
struct-map in hand we could actually pull data out of it. The following code simply outputs the content of any "email" nodes it finds while traversing the sequence produced by the previous example.
(let [xml-file (File. "settings.xml")]
(doseq [x (xml-seq (parse xml-file))
:when (= :email (:tag x))]
(println (first (:content x)))))
producing the output
example@mail.com
The following example performs the same task but on all "host" nodes. Note that no concern is placed on the actual tree structure. "host" nodes would be matched regardless of their position within the XML hierarchy.
(let [xml-file (File. "settings.xml")]
(doseq [x (xml-seq (parse xml-file))
:when (= :host (:tag x))]
(println (first (:content x)))))
resulting in
COMPUTER1 COMPUTER2 COMPUTER3
Now I'll apply some additional attention to structure. This example will only match "host" nodes that are children of a "hosts" parent.
(let [xml-file (File. "settings.xml")]
(doseq [x (xml-seq (parse xml-file))
:when (= :hosts (:tag x))]
(doseq [y (:content x)
:when (= :host (:tag y))]
(println (first (:content y))))))
resulting in
COMPUTER1 COMPUTER2 COMPUTER3
From here it's simple to extract the contents of attributes as well. Here I'm extracting "url" attributes from "host" tags.
(let [xml-file (File. "settings.xml")]
(doseq [x (xml-seq (parse xml-file))
:when (= :hosts (:tag x))]
(doseq [y (:content x)]
(println (:url (:attrs y))))))
giving us
http://computer1.domain.com http://computer2.domain.com http://computer3.domain.com
Wed Jun 23 2010 04:06:47 GMT+0000 (UTC)