Saturday, March 14, 2009

More XML Processing With XMLSlurper

in a previous post I explored how simple processing XML can be with Groovy using XMLSlurper.  The example was fairly simple and it occurred to me that it would be helpful to beef up my example to cover some of the other aspects of working with XML using XMLSlurper.  More specifically, with this post, I’d like to offer a more robust example to cover the following:

1. Referencing an element that does not exist

2. Verify that XMLSlurper is case sensitive

3. How to extract attributes from the xml

4. How to reference xml elements which contain a hyphen

5. Walking a more complex nested graph

For each example in this post, we’ll use the following xml which is housed in a file to demonstrate the features we’re going to explore.

   1: <?xml version='1.0' encoding='UTF-8'?>
   2: <customers>
   3:     <customer id='12345'>
   4:             <firstName>FirstOne</firstName>
   5:             <lastName>FirstLastName</lastName>
   6:             <addresses>
   7:           <address id='11'>
   8:             <line-1>first address line1</line-1>
   9:             <line-2>first address line2</line-2>
  10:             <city>first-city</city>
  11:             <state>first-state</state>
  12:             <postal-code>first-postal</postal-code>
  13:           </address>
  14:           <address id='22'>
  15:             <line-1>second address line1</line-1>
  16:             <line-2>second address line2</line-2>
  17:             <city>second-city</city>
  18:             <state>second-state</state>
  19:             <postal-code>second-postal</postal-code>
  20:           </address>
  21:             </addresses>
  22:     </customer>
  23:     <customer id='67890'>
  24:             <firstName>SecondOne</firstName>
  25:             <lastName>SecondLastName</lastName>
  26:             <addresses>
  27:           <address id='33'>
  28:             <line-1>third address line1</line-1>
  29:             <line-2>third address line2</line-2>
  30:             <city>third-city</city>
  31:             <state>third-state</state>
  32:             <postal-code>third-postal</postal-code>
  33:           </address>
  34:           <address id='44'>
  35:             <line-1>fourth address line1</line-1>
  36:             <line-2>fourth address line2</line-2>
  37:             <city>fourth-city</city>
  38:             <state>fourth-state</state>
  39:             <postal-code>fourth-postal</postal-code>
  40:           </address>
  41:         </addresses>    
  42:     </customer>
  43: </customers>

Here’s the test method that I’ll use to try touch on each point:

   1: public void processAdvancedXMLFile(String fileName)
   2: {
   3:     def file = new File("D:\\temp\\AdvancedExample.xml")
   4:     def customers = new XmlSlurper().parse(file)
   5:  
   6:     println("customers: " + customers)
   7:     println("customers: " + customers.customer)
   8:  
   9:     println("Bogus element: " + customers.bogusField)
  10:     println("Case Matters: " + customers.customer[0].FIRSTNAME)
  11:     println("Customer ID Number: " + customers.customer[0].@id)
  12:  
  13:     println("All Customer first names: " + customers.customer.firstName)
  14:     println("First Customer's first names: " + customers.customer[0].firstName)
  15:  
  16:     println("All Customer last  names: " + customers.customer[0].lastName)
  17:  
  18:     //walk the entire xml tree
  19:     println("Walking the entire xml tree using xmlslurper")
  20:     customers.children().each {customer ->
  21:         println("First Name: " + customer.firstName)
  22:         println("Last Name: " + customer.lastName)
  23:  
  24:         customer.addresses.children().each {address ->
  25:             println("Address id: " + address.@id)
  26:             println("Address line 1: " + address.'line-1')
  27:             println("Address line 2: " + address.'line-2')
  28:             println("Address city: " + address.city)
  29:             println("Address state: " + address.state)
  30:             println("Address postal-code: " + address.'postal-code')
  31:         }
  32:         println("============================================")
  33:     }
  34:  
  35: }

Finally, here’s the output window for a sample run:

   1: customers: FirstOneFirstLastNamefirst address line1first address line2first-cityfirst-statefirst-postalsecond address line1second address line2second-citysecond-statesecond-postalSecondOneSecondLastNamethird address line1third address line2third-citythird-statethird-postalfourth address line1fourth address line2fourth-cityfourth-statefourth-postal
   2: customers: FirstOneFirstLastNamefirst address line1first address line2first-cityfirst-statefirst-postalsecond address line1second address line2second-citysecond-statesecond-postalSecondOneSecondLastNamethird address line1third address line2third-citythird-statethird-postalfourth address line1fourth address line2fourth-cityfourth-statefourth-postal
   3: Bogus element: 
   4: All Customer first names: FirstOneSecondOne
   5: First Customer's first names: FirstOne
   6: Case Matters: 
   7: Customer ID Number: 12345
   8: All Customer last  names: FirstLastName
   9: Walking the entire xml tree using xmlslurper
  10: First Name: FirstOne
  11: Last Name: FirstLastName
  12: Address id: 11
  13: Address line 1: first address line1
  14: Address line 2: first address line2
  15: Address city: first-city
  16: Address state: first-state
  17: Address postal-code: first-postal
  18: Address id: 22
  19: Address line 1: second address line1
  20: Address line 2: second address line2
  21: Address city: second-city
  22: Address state: second-state
  23: Address postal-code: second-postal
  24: ============================================
  25: First Name: SecondOne
  26: Last Name: SecondLastName
  27: Address id: 33
  28: Address line 1: third address line1
  29: Address line 2: third address line2
  30: Address city: third-city
  31: Address state: third-state
  32: Address postal-code: third-postal
  33: Address id: 44
  34: Address line 1: fourth address line1
  35: Address line 2: fourth address line2
  36: Address city: fourth-city
  37: Address state: fourth-state
  38: Address postal-code: fourth-postal
  39: ============================================
  40: BUILD SUCCESSFUL (total time: 6 seconds)

One nice aspect of working with XMLSlurper is that it returns an empty string instead of null anytime an element is not found.  Another key aspect of working with XMLSlurper is that it’s case sensitive.  If you ask XMLSlurper for an element that does not exist or use the improper case for an element you’ll get back an empty string as the response.

In my example, I first requested an element which does not exist: customers.bogusField followed next by one that does exist (firstName) but I tried to retrieve it using the wrong case (FIRSTNAME).  In the output window you can see that nothing comes back in either case.

The next item from my list is to show how easy it is to pull attributes from the xml.  Accessing these attributes is just as easy as accessing elements.  The format is only slightly different.  Where customers.customer[0].firstName would grab the first customer in the xml doc and print out the contents of the firstName element, to instead grab an attribute from the first customer element you would write: customers.customer[0].@nameOfAttribute (in the example provided I pulled the id attribute from the customer using the following code: customers.customer[0].@id

The next item on my list is one of the few quirky things I’ve uncovered when working with XMLSlurper.  If your XML element name has a hyphen in the name you must tweak your syntax just a bit.  It helps to understand why you must do this.  The reason for this is because Groovy sees this as a minus sign instead of a hyphen.  In order to instruct Groovy not to treat this hypen as a minus sign you simply need to enclose the element name in single quotes. 

Here are a couple of lines from my example above which illustrate this:

println("Address line 1: " + address.'line-1')
println("Address line 2: " + address.'line-2')

Walking a more complex tree is the next item I wanted to address.  The example presented here has multiple customers within the customer tag with each customer in turn having simple elements (firstName, lastName) as well as multiple addresses.  The provided example walks this tree and prints out all of the element values using the children() method with a Groovy closure.  I don’t show it in the example but it’s also very easy to print out the name of the XML element by using the ‘name’ method.  Here’s an example for doing that:

println("Prints firstName: " + customers.customer[0].firstName.name())

In a future post I’ll devote some time to moving data between domain objects and xml.

 

==================================================================

An example of pulling elements from an atom feed:

Since someone asked, I’ve added a sample which pulls an id element from a sample atom feed (you can find this sample on wikipedia):

   1: def data = '''<?xml version="1.0" encoding="utf-8"?>
   2: <feed xmlns="http://www.w3.org/2005/Atom">
   3:   <title>Example Feed</title>
   4:   <subtitle>A subtitle.</subtitle>
   5:   <link href="http://example.org/feed/" rel="self"/>
   6:   <link href="http://example.org/"/>
   7:   <updated>2003-12-13T18:30:02Z</updated>
   8:   <author>
   9:     <name>John Doe</name>
  10:     <email>johndoe@example.com</email>
  11:   </author>
  12:   <id>urn:uuid:60a76c80-d399-11d9-b91C-0003939e0af6</id>
  13:   <entry>
  14:     <title>Atom-Powered Robots Run Amok</title>
  15:     <link href="http://example.org/2003/12/13/atom03"/>
  16:     <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
  17:     <updated>2003-12-13T18:30:02Z</updated>
  18:     <summary>Some text.</summary>
  19:   </entry>
  20: </feed>'''
  21:  
  22:  def atomData = new XmlSlurper().parseText(data)
  23:  println("Sample output from our xml atom feed:")
  24:  println("id for the feed: " + atomData.id)
  25:  println("id for the entry: " + atomData.entry.id)
  26:  
  27:  
  28: 

You can also setup XmlSlurper to be aware of your namespaces if needed.  Here’s an example to show how to use that:

   1: public void processXmlWithNamespaces(){
   2:     def loanXml =
   3:     '''<loan
   4:           xmlns:customer="urn:somecompany:customers"
   5:           xmlns:account="urn:somecompany:accounts">
   6:           <customer:name>Joe Customer</customer:name>
   7:           <account:name>First Mortgage</account:name>
   8:           <periods>360</periods>
   9:        </loan>'''
  10:  
  11:     def loan = new XmlSlurper().parseText(loanXml)
  12:  
  13:     println("combined name: " + loan.name)
  14:     def ns = [:]
  15:     ns.customer = "urn:somecompany:customers"
  16:     ns.account = "urn:somecompany:accounts"
  17:     loan.declareNamespace(ns)
  18:  
  19:     println("customer name: " + loan.'customer:name')
  20:     println("account name: " + loan.'account:name')
  21: }

note the single quotes around the ‘account:name’ and ‘customer:name’.  These are needed for the same reason as the quotes around the hyphen.

Here’s the output from a sample run:

   1: combined name: Joe CustomerFirst Mortgage
   2: customer name: Joe Customer
   3: account name: First Mortgage

If you want to learn more about XmlSlurper I’d also highly recommend the book: 

Groovy Recipes by Scott Davis

The format for the namespace example presented above came from his excellent book (Scott’s a good presenter too if you ever have the opportunity to hear him speak).

6 comments:

  1. Handy, thanks ... one question, how would you use XmlSlurper inside a Grails app to pull in an element named 'id' ? (as in an atom feed, where there is an atom : id element)

    'node.id' works in groovysh, but not in a Grails app; I am suspecting the issue is the inject special behaviour Grails has around the id property.

    ReplyDelete
  2. I updated the post to include a couple of new examples. If this doesn't help, feel free to drop me an email (send me the xml you're trying to process).

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Is it also possible to write xml with XmlSlurper or only read?

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. are this examples also valid with XmlParser use? thanks

    ReplyDelete