Wednesday, August 30, 2006

Type-preserving copy in XSLT 2.0

Disclaimer

This post refers to FXSL, because its currying functionality was the starting point and the context of the following thoughts. But there is no official link between these and FXSL, so neither Dimitre nor Colin could be judged as guilty for what is written here. I want to thanks them a lot for all their valuable input, while all remaining errors are only mine.

Problematic

A few months ago, I finally had a look at FXSL. This is a project that provides first-class object functions. That opens up some very interesting possibilities, and the possibility of a more functional programming style.

An interesting feature is the ability to curry parameters to a function, to create an other function of a lesser order. The principle is to attach parameters to the function. This new function can then be used as any other function, with specified parameters bound to specified values.

To achive this goal, we need a complex structure, because we have to be able to retrieve the original function and each curried parameter. The first thing that comes in mind is to use a sequence of the needed items. But this is not possible. We want to be able to use the resulting function as any other function object. For example to be able to create a sequence of functions. As sequences can not be nested, we would not be able to retrieve the new function after having added it to a sequence (only each individual item, no longer related to each other).

Instead, FXSL uses a dynamically built element as complex container. An element is at the same time a unique item and a complex structure, from which we may easily retrieve specific pieces of information.

But unlike sequences, the content of an element cannot reference an item. When we attach an item to a tree in XSLT, it is copied. A lot of properties are copied as is, but some change. The most obvious is that atomic items are no longer atomics, but become nodes. So it is not possible to know later if we attached an atomic value or a text node, for example.

If we do nothing special, the type is changed too. It is always set to xs:untyped. But we want to preserve it, because it can change the result of the evaluation of the new function (with curried parameters).

Solution

The idea is to have two functions. f:copy-with-type that takes a sequence of zero or more items as arguments and returns a node, and f:get-typed that takes a node obtained by the former as its argument and returns a sequence of zero or mode items:

<xsl:function name="f:copy-with-type" as="node()">
  <xsl:param name="arg" as="item()*"/>
  <copy>
    <!-- Still to implement... -->
  </copy>
</xsl:function>

<xsl:function name="f:get-typed" as="item()*">
  <xsl:param name="arg" as="element(copy)"/>
  <!-- Still to implement... -->
</xsl:function>

The solution is different if we are in Basic mode or Schema Aware mode (SA). It is different for nodes and atomic values also.

For nodes in Basic mode, it is simple. A node can never have an annotation other than xs:untyped. So just using xsl:copy-of is enough. In SA mode, XSLT 2.0 has also the solution: just use the attribute [xsl:]validation with the value "preserve". This will preserve the type annotation for the copied nodes:

<!-- In Basic mode -->
<xsl:when test="$arg instance of node()">
  <node>
    <xsl:copy-of select="$arg"/>
  </node>
</xsl:when>

<!-- In Schema Aware mode -->
<xsl:when test="$arg instance of node()">
  <node xsl:validation="preserve">
    <xsl:copy-of select="$arg" validation="preserve"/>
  </node>
</xsl:when>

For atomic values, it is more complex. Actually, there is no way to say "I want to get the type of this atomic value and copy them (the value and the type) to the tree". The only way we have to simulate this is by using an xsl:choose on the type of the item (using instance of). In SA mode, we can use the attribute [xsl:]type to set the container element type to the same type as the item. But in Basic mode, it is impossible to set the type of a node to something else than xs:untyped. Instead, we use as the container element name the name of the simple type. This will act as a constructor function later (actually, these constructors are already defined in FXSL).

<!-- In Basic mode -->
<xsl:when test="$arg instance of xs:double">
  <f:double>
    <xsl:copy-of select="$arg"/>
  </f:double>
</xsl:when>

<!-- In Schema Aware mode -->
<xsl:when test="$arg instance of xs:double">
  <atomic xsl:type="xs:double">
    <xsl:copy-of select="$arg" validation="preserve"/>
  </atomic>
</xsl:when>

Below is what the whole solution looks like:

<!-- In Basic mode -->

<xsl:function name="f:get-typed" as="item()*">
  <xsl:param name="arg" as="element(copy)"/>
  <xsl:apply-templates select="$arg/*" mode="f:get-typed"/>
</xsl:function>

<xsl:template match="node" mode="f:get-typed" as="node()">
  <xsl:sequence select="@*|node()"/>
</xsl:template>

<xsl:template match="f:*" mode="f:get-typed" as="item()">
  <xsl:sequence select="f:apply(., data(.))"/>
</xsl:template>

<xsl:function name="f:copy-with-type" as="node()">
  <xsl:param name="arg" as="item()*"/>
  <copy>
    <xsl:sequence select="for $a in $arg return
                            f:copy-with-type-1($a)"/>
  </copy>
</xsl:function>

<xsl:function name="f:copy-with-type-1" as="node()">
  <xsl:param name="arg" as="item()"/>
  <xsl:choose>
    <xsl:when test="$arg instance of node()">
      <node>
        <xsl:copy-of select="$arg"/>
      </node>
    </xsl:when>
    <xsl:otherwise>
      <xsl:when test="$arg instance of xs:a-basic-type">
        <f:a-basic-type>
          <xsl:copy-of select="$arg"/>
        </f:a-basic-type>
      </xsl:when>
      <!-- An xsl:when by simple type here... --> 
      ...
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

<!-- In SA mode -->

<xsl:function name="f:get-typed" as="item()*">
  <xsl:param name="arg" as="element(copy)"/>
  <xsl:apply-templates select="$arg/*" mode="f:get-typed"/>
</xsl:function>

<xsl:template match="node" mode="f:get-typed" as="node()">
  <xsl:sequence select="@*|node()"/>
</xsl:template>

<xsl:template match="atomic" mode="f:get-typed" as="item()">
  <xsl:sequence select="data(.)"/>
</xsl:template>

<xsl:function name="f:copy-with-type" as="node()">
  <xsl:param name="arg" as="item()*"/>
  <copy xsl:validation="preserve">
    <xsl:sequence select="for $a in $arg return
                            f:copy-with-type-1($a)"/>
  </copy>
</xsl:function>

<xsl:function name="f:copy-with-type-1" as="node()">
  <xsl:param name="arg" as="item()"/>
  <xsl:choose>
    <xsl:when test="$arg instance of node()">
      <node xsl:validation="preserve">
        <xsl:copy-of select="$arg" validation="preserve"/>
      </node>
    </xsl:when>
    <xsl:otherwise>
      <xsl:when test="$arg instance of xs:a-type">
        <atomic xsl:type="xs:a-type">
          <xsl:copy-of select="$arg" validation="preserve"/>
        </atomic>
      </xsl:when>
      <!-- An xsl:when by simple type here... --> 
      ...
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

For the actual complete files, you can go to:

Problem & Future

Off course, there is a problem with atomic items in SA mode. Because we use an xsl:choose, we have to know statically all the possible types. For the standard types, it is not a problem, but it is not usable as is with user-defined types.

Two compatible techniques could be used to help to live with this restriction. The first one is the combination of the import mechanism of XSLT and the possibility to define first-class object functions. If we think about facilities to define resolver functions by namespace (i.e. by piece of XML Schema), that could result in a flexible system.

The second technique is to use a generator for pieces of XSLT code. Actually, I use such a simple generator to generate the whole two xsl:choose elements (with an xsl:when by atomic type). The input document is an ad-hoc document that lists the standard simple types an XSLT processor has to know. But we could maybe write a generator that takes as input XML Schemas.

I hope this will be the subject of an other post.

Labels:

Monday, August 28, 2006

Add a namespace node to an element in XQuery

David Carlisle just sent me the way to add a namespace node to an element in XQuery. Here is his example:

declare function local:add-ns-node(
    $elem   as element(),
    $prefix as xs:string,
    $ns-uri as xs:string
  ) as element()
{
  element { QName($ns-uri, concat($prefix, ":x")) }{ $elem }/*
};

local:add-ns-node(<xxx><a/></xxx>, "p1", "uri2")

Run with Saxon 8.7.3 for Java, it results in:

<?xml version="1.0" encoding="UTF-8"?>
<xxx xmlns:p1="uri2">
   <a/>
</xxx>

The context of the discussion can be found in this thread (you'll have to go deep in the thread to see the David's post).

Maybe a candidate to an XQuery FAQ? Anyway, thanks David.

Labels:

Thursday, August 24, 2006

Translate SAX events to a DOM tree

I had to pass the XML document provided by a piece of software to an other piece of software. The first one provides the document as SAX events. But the second one expects a DOM Document. So here is a SAX-events-to-DOM-Document translator:
public class SaxToDom
{
    public SaxToDom(XMLReader reader, InputSource input) {
        myReader = reader;
        myInput  = input;
    }

    public Document makeDom() {
        Document doc = null;
        try {
            // Find the implementation
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            factory.setNamespaceAware(true);
            DocumentBuilder        builder = factory.newDocumentBuilder();
            DOMImplementation      impl    = builder.getDOMImplementation();

            // Create the document
            doc = impl.createDocument(null, null, null);

            // The Handlers and the actual building
            SaxToDomHandler handlers = new SaxToDomHandler(doc);
            myReader.setContentHandler(handlers);
            myReader.setErrorHandler(handlers);
            myReader.parse(myInput);
        }
        // For the catch handlers below, use your usual logging facilities.
        catch (DOMException e) {
            System.err.println(e); 
        }
        catch (ParserConfigurationException e) {
            System.err.println(e); 
        }
        catch (SAXException e) {
            System.err.println(e); 
        }
        catch (IOException e) {
            System.err.println(e); 
        }
        return doc;
    }

    private XMLReader   myReader;
    private InputSource myInput;
}


class SaxToDomHandler
    extends DefaultHandler
{
    public SaxToDomHandler(Document doc) {
        myDoc         = doc;
        myCurrentNode = myDoc;
    }

    // Add it in the DOM tree, at the right place.
    public void startElement(String uri, String name, String qName, Attributes attrs) {
        // Create the element.
        Element elem = myDoc.createElementNS(uri, qName);
        // Add each attribute.
        for ( int i = 0; i < attrs.getLength(); ++i ) {
            String ns_uri = attrs.getURI(i);
            String qname  = attrs.getQName(i);
            String value  = attrs.getValue(i);
            Attr   attr   = myDoc.createAttributeNS(ns_uri, qname);
            attr.setValue(value);
            elem.setAttributeNodeNS(attr);
        }
        // Actually add it in the tree, and adjust the right place.
        myCurrentNode.appendChild(elem);
        myCurrentNode = elem;
    }

    // Adjust the current place for subsequent additions.
    public void endElement(String uri, String name, String qName) {
        myCurrentNode = myCurrentNode.getParentNode();
    }

    // Add a new text node in the DOM tree, at the right place.
    public void characters(char[] ch, int start, int length) {
        String str  = new String(ch, start, length);
        Text   text = myDoc.createTextNode(str);
        myCurrentNode.appendChild(text);
    }

    // Add a new text node in the DOM tree, at the right place.
    public void ignorableWhitespace(char[] ch, int start, int length) {
        String str  = new String(ch, start, length);
        Text   text = myDoc.createTextNode(str);
        myCurrentNode.appendChild(text);
    }

    // Add a new text PI in the DOM tree, at the right place.
    public void processingInstruction(String target, String data) {
        ProcessingInstruction pi = myDoc.createProcessingInstruction(target, data);
        myCurrentNode.appendChild(pi);
    }

    // For the handlers below, use your usual logging facilities.
    public void error(SAXParseException e) {
        System.err.println("Erreur non fatale  (ligne " + e.getLineNumber() + ", col " +
                           e.getColumnNumber() + ") : " + e.getMessage());
    }

    public void fatalError(SAXParseException e) {
        System.err.println("Erreur fatale : " + e.getMessage());
    }

    public void warning(SAXParseException e) {
        System.err.println("Warning : " + e.getMessage());
    }

    private Document myDoc;
    private Node     myCurrentNode;
}

Labels: ,