Introduction

XML and formatted data

The flexibility of XML allows, in the context of a web service for example, to transfer pretty much any kind of data from the simple integers, text or strings to binary data. A special case of using text strings involves the use of XML formatted data as the value of an XML tag.
Common use cases involve using an XML document to transport data strings containing XML special chatacters (”, ‘&’, …) like XML or XHTML data.
If the formatted string is directly used as the value of an XML tag, the resulting XML document will no longer validate its schema as the XML special characters of the value will be in conflict with the actual containing XML.

Handling formatted data as the value of XML tags can be handled using three different methods:
1. Use string character escaping

2. Encode the message with Base64

3. Encapsulate the message within a CDATA tag

JAX-B

JAX-B is one of the main tool used when it comes to handling XML in Java applications; especially in the context of web services. It allows to map any XML simple or complex types to Java classes pretty much like when using entity beans mapped to SQL tables with an ORM tool (Hibernate, JPA, …).
As of Java 6.0, JAX-B is bundled with the standard JDK (although with different version from one Java update to the next) thus making use of the tool even more automatic.
The main components/features that comes with JAX-B are :

  • The Marshaller, a class that transforms Java classes to XML
  • The Unmarshaller, a class that transforms XML to Java classes
  • xjc a utility that can generate Java classes from an XML schema

The Java classes representing the XML types can be controlled quite extensively using JAX-B bindings thus making the whole system very flexible. In the case of handling formatted content, there is one fault with JAX-B; it does not natively support the use of CDATA strings.

This article shows how to handle formatted content using JAX-B using the three methods.

Demo Setup

For the purposes of this example, let us consider a simple web service WSecho. The service exposes a single method echo that allows a client and server to exchange an EchoMessage. The EchoMessage contains three version of the same formatted string data illustrating the three methods presented above.

The echo method signature is :

  public EchoMessage echo(EchoMessage request);

XML schema

The service request and response are defined by the following XML schema

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema 
	xmlns:xs="http://www.w3.org/2001/XMLSchema" 	
	targetNamespace="http://www.example.com/ws/xsd/hello" 
	xmlns="http://www.example.com/ws/xsd/hello"
	elementFormDefault="qualified" >
	        
	<xs:element name="EchoMessage">
		<xs:complexType>
			<xs:sequence>
				<xs:element name="simpleMessage" type="xs:string" minOccurs="0" maxOccurs="1" />
				<xs:element name="cdataMessage" type="CDataString"  minOccurs="0" maxOccurs="1" />
				<xs:element name="base64Message" type="xs:base64Binary" minOccurs="0" maxOccurs="1" />
			</xs:sequence>
		</xs:complexType>
	</xs:element>

	<xs:simpleType name="CDataString">
		<xs:restriction base="xs:string"></xs:restriction>
	</xs:simpleType>
</xs:schema>

This schema defines a service request and response that can be used to exchange three versions of the same message.

  • EchoMessage : the message that is passed between the client and the server. This message contains :
    • simpleMessage : the escaped text, used to illustrate method 1
    • base64Message : the base64 representation of the text, used to illustrate method 2
    • cdataMessage : the unformatted message encapsulated by CDATA tags, used to illustrate method 3

Example messages

The following examples show an XML messages used for passing back and forth the simple formatted text Hello World !

<?xml version="1.0" encoding="UTF-8"?>
<p:EchoMessage xmlns:p="http://www.example.com/ws/xsd/hello"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.example.com/ws/xsd/hello ../WSEcho.xsd  ">
	
		<p:simpleMessage>&lt;hello&gt;&lt;world&gt;Hello World !&lt;/world&gt;&lt;/hello&gt;</p:simpleMessage>
		<p:cdataMessage><![CDATA[<hello><world>Hello World !</world></hello>]]></p:cdataMessage>
		<p:base64Message>PGhlbGxvPjx3b3JsZD5IZWxsbyBXb3JsZCAhPC93b3JsZD48L2hlbGxvPg==</p:base64Message>

</p:EchoMessage>

Using character escaping

This is the simplest method in the respect that everything is automatically handled by the JAX-B Marshaller and Unmarshallers.
The XML tag is simply defined as a String :

  <xs:element name="simpleMessage" type="xs:string" minOccurs="0" maxOccurs="1" />

The corresponding Java attribute is a java.lang.String.
To use this element, directly pass or retrieve the String data using the attributes setter and getter; Jax-B Marshaller and Unmarshaller will automatically escape and unescape the special characters.

This is done as following :

  • passing the message :
      EchoMessage echoRequest = new EchoMessage();		
      echoRequest.setSimpleMessage(MESSAGE);
    

  • retrieving the message :

      String simpleMessage = echoRequest.getSimpleMessage();
    

Thus the string

  <hello><world>Hello World !</world></hello>

Becomes in the XML document

  &lt;hello&gt;&lt;world&gt;Hello World !&lt;/world&gt;&lt;/hello&gt;

Using base64 encoding

With this method the message is passed in the XML as its Base64 representation.
To use this method, in the XML schema define the tag as a base64Binary as such:

  <xs:element name="base64Message" type="xs:base64Binary" minOccurs="0" maxOccurs="1" />

The corresponding code generated by xjc is:

  protected byte[] base64Message;

The base64Message is set as following :

  EchoMessage echoRequest = new EchoMessage();
  echoRequest.setBase64Message(MESSAGE.getBytes());

and retrieved as :

  String base64Message= new String(echoRequest.getBase64Message());

The Marshaller and Unmarshaller convert automatically the byte array version of the String to and from the base64 message. Thus the string

  <hello><world>Hello World !</world></hello>

Becomes in the XML document

  PGhlbGxvPjx3b3JsZD5IZWxsbyBXb3JsZCAhPC93b3JsZD48L2hlbGxvPg==

Using CData

Overview

The purpose of using CDATA elements is to be able to specify that the content of an XML tag should not be parsed. The text that should not be parsed is encapsulated between <![CDATA[ and ]]>.
As stated above, JAX-B does not natively support CDATA; a bit of customization can however make JAX-B CDATA-aware.

Add a new simpleType for CDATA

Assuming that you have control over the XSD, create a new simpleType CDataString that extends the standard string type.

<xs:simpleType name="CDataString">
  <xs:restriction base="xs:string"></xs:restriction>
</xs:simpleType>

The purpose of CDataString is to identify the elements that should be handled using CDATA. Use this simpleType for each element that contain CDATA strings.

Create a custom handler for CDataString

Create a Java Class CDataAdapter that will convert String to and from CDATA. This class contains two public static methods:

  • public static String parse(String s) : removes the CDATA start and end tag from String s
  • public static String print(String s) : encapsulate the given String with CDATA start and end tag
public class CDataAdapter  {

	private static final String CDATA_START = "";

	/**
	 * Check whether a string is a CDATA string
	 * @param s the string to check
	 * @return
	 */
	public static boolean isCdata(String s) {
		s = s.trim();
		boolean test = (s.startsWith(CDATA_START) &amp;&amp; s.endsWith(CDATA_STOP));
		return test;
	}

	/**
	 * Parse a CDATA String.<br />
	 * If is a CDATA, removes leading and trailing string<br />
	 * Otherwise does nothing
	 * @param s the string to parse
	 * @return the parsed string
	 */
	public static String parse(String s)  {

		StringBuilder sb = null;
		s = s.trim();

		if(isCdata(s)) {
			sb = new StringBuilder(s);
			sb.replace(0, CDATA_START.length(), "");

			s = sb.toString();
			sb = new StringBuilder(s);
			sb.replace(s.lastIndexOf(CDATA_STOP), s.lastIndexOf(CDATA_STOP)+CDATA_STOP.length(),"");
			s = sb.toString();
		}
		return s;
	}

	/**
	 * Add CDATA leading and trailing to a string if not already a CDATA
	 * @param s
	 * @return
	 */
	public static String print(String s) {
		if(isCdata(s)) {
			return s;
		} else {
			return CDATA_START + s + CDATA_STOP;
		}
	}
}

Add custom JAX-B binding

The aim here is to tell the JAX-B Marshaller and Unmarshaller that the CDataString should be handled by the CDataAdapter class. This is achieved using JAX-B binding; either embedded or external
In the case of the former, add the following binding directly in the XSD file.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema 
	xmlns:xs="http://www.w3.org/2001/XMLSchema" 	
	targetNamespace="http://www.example.com/ws/xsd/hello" 
	xmlns="http://www.example.com/ws/xsd/hello"
	xmlns:jaxb="http://java.sun.com/xml/ns/jaxb"
	jaxb:version="2.0"
	elementFormDefault="qualified" >
	
	
	<xs:annotation>
		<xs:appinfo>
			<jaxb:globalBindings>
				<jaxb:javaType name="java.lang.String" xmlType="CDataString"
					parseMethod="com.example.sandbox.ws.helper.CDataAdapter.parse"
					printMethod="com.example.sandbox.ws.helper.CDataAdapter.print"/>
			</jaxb:globalBindings>
			
		</xs:appinfo>
	</xs:annotation>

This specifies that any CDataString element should be handled with the class com.example.sandbox.ws.helper.CDataAdapter.
In fine this will cause the Marshaller to process CDataString through the print method and the Unmarshaller through the parse method.
The corresponding class EchoMessage generated by xjc now shows:

public class EchoMessage {

    protected String simpleMessage;
    @XmlJavaTypeAdapter(Adapter1 .class)
    protected String cdataMessage;
    protected byte[] base64Message;

The Adapter1 is also generated by xjc and reads:

public class Adapter1  extends XmlAdapter
{
    public String unmarshal(String value) {
        return (com.example.sandbox.ws.helper.CDataAdapter.parse(value));
    }

    public String marshal(String value) {
        return (com.example.sandbox.ws.helper.CDataAdapter.print(value));
    }
}

Override the default character escape mecanism

By default, the JAX-B Marshaller and Unmarshaller will automatically escape/unescape any XML special characters. When handling CDATA elements, this feature must be disabled. This is achieved by specifying a custom implementation of the CharacterEscapeHandler used by the JAX-B Marshaller :

JAXBContext jcb = JAXBContext.newInstance(clazz);
Marshaller m = jcb.createMarshaller();
m.setProperty(
	"com.sun.xml.bind.marshaller.CharacterEscapeHandler",
	new CdataCharacterEscapeHandler());

where CdataCharacterEscapeHandler is a custom class that is CDATA aware. It will escape characters only for string that do not represent CDATA strings. The source for this class is :

public class CdataCharacterEscapeHandler implements CharacterEscapeHandler {

	public CdataCharacterEscapeHandler() {
		super();
	}

	/**
	 * @param ch The array of characters.
	 * @param start The starting position.
	 * @param length The number of characters to use.
	 * @param isAttVal true if this is an attribute value literal.
	 */
	public void escape(char[] ch, int start, int length, boolean isAttVal, Writer writer) throws IOException {

		if(CDataAdapter.isCdata(new String(ch))) {
			writer.write( ch, start, length );
		} else {
			useStandardEscape(ch, start, length, isAttVal, writer);
		}
	}

	private void useStandardEscape(char[] ch, int start, int length, boolean isAttVal, Writer writer) throws IOException {
		CharacterEscapeHandler escapeHandler = StandardEscapeHandler.getInstance();
		escapeHandler.escape(ch, start, length, isAttVal, writer);
	}

	/**
	 * A standard XML character escape handler
	 * @author coderleaf
	 *
	 */
	private static final class StandardEscapeHandler implements CharacterEscapeHandler {

		private static StandardEscapeHandler instance;

		public static final StandardEscapeHandler getInstance() {

			if(instance == null) {
				instance = new StandardEscapeHandler();
			}

			return instance;
		}

		private StandardEscapeHandler() {
			super();
		}

		public void escape(char[] ch, int start, int length, boolean isAttVal, Writer out) throws IOException {

			int limit = start + length;
			for (int i = start; i &lt; limit; i++) {
				char c = ch[i];

				if (c == '&amp;' || c == '' || (c == '\"' &amp;&amp; isAttVal)
						|| (c == '\'' &amp;&amp; isAttVal)) {

					if (i != start) {
						out.write(ch, start, i - start);
					}

					start = i + 1;

					switch (ch[i]) {
					case '&amp;':
						out.write("&amp;");
						break;

					case '':
						out.write("&gt;");
						break;

					case '\"':
						out.write("&quot;");
						break;

					case '\'':
						out.write("&apos;");
						break;
					}
				}
			}

			if (start != limit) {
				out.write(ch, start, limit - start);
			}
		}
	}
}

Using the CDATA element

Using the CDATA element in the Java code is as simple as with the standard character escaping method:
Setting the message :

EchoMessage echoRequest = new EchoMessage();
echoRequest.setCdataMessage(MESSAGE);

Retrieving the message :

String cdataMessage = echoRequest.getCdataMessage();

The difference however comes from the fact that during the Marshall and Unmarshall phases, the custom class we have defined are used; therefore the String

<hello><world>Hello World !</world></hello>

now translates to :

<![CDATA[<hello><world>Hello World !</world></hello>]]>

Alternatives

If you do not have control over the XSD, you can either :

  • Add an additional XSD file and make use of the XSD redefine tag to change the definition of the relevant tags so as to replace string element to CDataString
  • Use an external binding whose Xpath identifies each element that should be considered as a CDATA string and specify the CDataHandler

The external binding file would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<bindings xmlns="http://java.sun.com/xml/ns/jaxb"
	xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
	xsi:schemaLocation="http://java.sun.com/xml/ns/jaxb http://java.sun.com/xml/ns/jaxb/bindingschema_2_0.xsd"
	version="2.1">
	
	<globalBindings fixedAttributeAsConstantProperty="true">
		<serializable uid="1"/>
    	   	
		<jaxb:javaType name="java.lang.String" xmlType="CDataString"
			parseMethod="com.example.sandbox.ws.helper.CDataAdapter.parse"
			printMethod="com.example.sandbox.ws.helper.CDataAdapter.print"/>
    </globalBindings>    
</bindings>

Integration with an application server

Overview

The examples shown above are usable almost directly in the context of an application server.
The only catch comes from controlling the Marshaller configuration; which in our case corresponds to being able to pass a custom implementation of the CharacterEscapeHandler.

With most implementation there is no direct access to the Marshaller that is used by the JAX-RS or JAX-WS implementation. According to the Framework you are using you will have to dig in the JAX-RS/WS configuration files to be able to specify custom Marshaller arguments.

Marshaller configuration for Apache CXF

As an example, let us consider using Apache CXF with Spring to implement the WSecho REST service. In the case Marshaller properties can be set through the jaxbProvider used by the jaxrs server.

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:jaxrs="http://cxf.apache.org/jaxrs"
       xmlns:jaxws="http://cxf.apache.org/jaxws"
       xmlns:context="http://www.springframework.org/schema/context"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
       http://cxf.apache.org/jaxrs http://cxf.apache.org/schemas/jaxrs.xsd
       http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
       http://cxf.apache.org/jaxws http://cxf.apache.org/schemas/jaxws.xsd">

    <import resource="classpath:META-INF/cxf/cxf.xml"/>
    <import resource="classpath:META-INF/cxf/cxf-extension-jaxrs-binding.xml"/>
    <import resource="classpath:META-INF/cxf/cxf-extension-soap.xml" />
    <import resource="classpath:META-INF/cxf/cxf-servlet.xml"/>

    <context:component-scan base-package="com.example"/>


    <jaxrs:server id="restContainer" address="/">

		<jaxrs:providers>
			<ref bean="jaxbProvider" />
		</jaxrs:providers>

  
        <jaxrs:serviceBeans>
            <ref bean="WSecho"/>
        </jaxrs:serviceBeans>
  
         <jaxrs:extensionMappings>
            <entry key="xml" value="application/xml" />
        </jaxrs:extensionMappings>
        
    </jaxrs:server>

	<bean id="jaxbProvider" class="org.apache.cxf.jaxrs.provider.JAXBElementProvider">
		<property name="marshallerProperties">
			<map>
				<entry>
					<key>
						<value>com.sun.xml.bind.marshaller.CharacterEscapeHandler</value>
					</key>

					<ref bean="cdataCharacterEscapeHandler" />
				</entry>
			</map>
		</property>
	</bean>
	
	<bean id="cdataCharacterEscapeHandler" class="com.example.sandbox.ws.helper.CdataCharacterEscapeHandler"/>

</beans>
Advertisements