Chris Umbel

Notes on Cloure XML Parsing

I figured I'd share some quick notes I had on a simple task that's not exactly strait forward in Clojure to the Lisp neophyte, like myself: XML Parsing. Clojure goes a long way to making it easy with clojure.xml.parse/xml-seq but complete/concise examples can be difficult to come by.

XML

All of the examples I'll outline below will depend on the following xml living in a file named "settings.xml" in the current working directory.

<settings>
    <timeout>5000</timeout>
    <email>example@mail.com</email>
    <hosts>
        <host url="http://computer1.domain.com">COMPUTER1</host>
        <host url="http://computer2.domain.com">COMPUTER2</host>
        <host url="http://computer3.domain.com">COMPUTER3</host>
    </hosts>
</settings>

Basic Requirements

In order to prepare clojure for the task at hand we'll need to make use of java.io.File and clojure.xml.

(import '(java.io File))
(use 'clojure.xml)

Examples

Just to get warmed up, consider the following. All this does is load the contents of "settings.xml" into a struct-map.

(let [xml-file (File. "settings.xml")]
  (xml-seq (parse xml-file)))

struct-map in hand we could actually pull data out of it. The following code simply outputs the content of any "email" nodes it finds while traversing the sequence produced by the previous example.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
        :when (= :email (:tag x))]
    (println (first (:content x)))))

producing the output

example@mail.com

The following example performs the same task but on all "host" nodes. Note that no concern is placed on the actual tree structure. "host" nodes would be matched regardless of their position within the XML hierarchy.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
        :when (= :host (:tag x))]
    (println (first (:content x)))))

resulting in

COMPUTER1
COMPUTER2
COMPUTER3

Now I'll apply some additional attention to structure. This example will only match "host" nodes that are children of a "hosts" parent.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
      :when (= :hosts (:tag x))]
      (doseq [y (:content x)
        :when (= :host (:tag y))]
        (println (first (:content y))))))

resulting in

COMPUTER1
COMPUTER2
COMPUTER3

From here it's simple to extract the contents of attributes as well. Here I'm extracting "url" attributes from "host" tags.

(let [xml-file (File. "settings.xml")]
  (doseq [x (xml-seq (parse xml-file))
      :when (= :hosts (:tag x))]
      (doseq [y (:content x)]
        (println (:url (:attrs y))))))

giving us

http://computer1.domain.com
http://computer2.domain.com
http://computer3.domain.com

Wed Jun 23 2010 04:06:47 GMT+0000 (UTC)

Follow Chris
RSS Feed
Twitter
Facebook
CodePlex
github
LinkedIn
Google