Wednesday, August 30, 2006

Type-preserving copy in XSLT 2.0

Disclaimer

This post refers to FXSL, because its currying functionality was the starting point and the context of the following thoughts. But there is no official link between these and FXSL, so neither Dimitre nor Colin could be judged as guilty for what is written here. I want to thanks them a lot for all their valuable input, while all remaining errors are only mine.

Problematic

A few months ago, I finally had a look at FXSL. This is a project that provides first-class object functions. That opens up some very interesting possibilities, and the possibility of a more functional programming style.

An interesting feature is the ability to curry parameters to a function, to create an other function of a lesser order. The principle is to attach parameters to the function. This new function can then be used as any other function, with specified parameters bound to specified values.

To achive this goal, we need a complex structure, because we have to be able to retrieve the original function and each curried parameter. The first thing that comes in mind is to use a sequence of the needed items. But this is not possible. We want to be able to use the resulting function as any other function object. For example to be able to create a sequence of functions. As sequences can not be nested, we would not be able to retrieve the new function after having added it to a sequence (only each individual item, no longer related to each other).

Instead, FXSL uses a dynamically built element as complex container. An element is at the same time a unique item and a complex structure, from which we may easily retrieve specific pieces of information.

But unlike sequences, the content of an element cannot reference an item. When we attach an item to a tree in XSLT, it is copied. A lot of properties are copied as is, but some change. The most obvious is that atomic items are no longer atomics, but become nodes. So it is not possible to know later if we attached an atomic value or a text node, for example.

If we do nothing special, the type is changed too. It is always set to xs:untyped. But we want to preserve it, because it can change the result of the evaluation of the new function (with curried parameters).

Solution

The idea is to have two functions. f:copy-with-type that takes a sequence of zero or more items as arguments and returns a node, and f:get-typed that takes a node obtained by the former as its argument and returns a sequence of zero or mode items:

<xsl:function name="f:copy-with-type" as="node()">
  <xsl:param name="arg" as="item()*"/>
  <copy>
    <!-- Still to implement... -->
  </copy>
</xsl:function>

<xsl:function name="f:get-typed" as="item()*">
  <xsl:param name="arg" as="element(copy)"/>
  <!-- Still to implement... -->
</xsl:function>

The solution is different if we are in Basic mode or Schema Aware mode (SA). It is different for nodes and atomic values also.

For nodes in Basic mode, it is simple. A node can never have an annotation other than xs:untyped. So just using xsl:copy-of is enough. In SA mode, XSLT 2.0 has also the solution: just use the attribute [xsl:]validation with the value "preserve". This will preserve the type annotation for the copied nodes:

<!-- In Basic mode -->
<xsl:when test="$arg instance of node()">
  <node>
    <xsl:copy-of select="$arg"/>
  </node>
</xsl:when>

<!-- In Schema Aware mode -->
<xsl:when test="$arg instance of node()">
  <node xsl:validation="preserve">
    <xsl:copy-of select="$arg" validation="preserve"/>
  </node>
</xsl:when>

For atomic values, it is more complex. Actually, there is no way to say "I want to get the type of this atomic value and copy them (the value and the type) to the tree". The only way we have to simulate this is by using an xsl:choose on the type of the item (using instance of). In SA mode, we can use the attribute [xsl:]type to set the container element type to the same type as the item. But in Basic mode, it is impossible to set the type of a node to something else than xs:untyped. Instead, we use as the container element name the name of the simple type. This will act as a constructor function later (actually, these constructors are already defined in FXSL).

<!-- In Basic mode -->
<xsl:when test="$arg instance of xs:double">
  <f:double>
    <xsl:copy-of select="$arg"/>
  </f:double>
</xsl:when>

<!-- In Schema Aware mode -->
<xsl:when test="$arg instance of xs:double">
  <atomic xsl:type="xs:double">
    <xsl:copy-of select="$arg" validation="preserve"/>
  </atomic>
</xsl:when>

Below is what the whole solution looks like:

<!-- In Basic mode -->

<xsl:function name="f:get-typed" as="item()*">
  <xsl:param name="arg" as="element(copy)"/>
  <xsl:apply-templates select="$arg/*" mode="f:get-typed"/>
</xsl:function>

<xsl:template match="node" mode="f:get-typed" as="node()">
  <xsl:sequence select="@*|node()"/>
</xsl:template>

<xsl:template match="f:*" mode="f:get-typed" as="item()">
  <xsl:sequence select="f:apply(., data(.))"/>
</xsl:template>

<xsl:function name="f:copy-with-type" as="node()">
  <xsl:param name="arg" as="item()*"/>
  <copy>
    <xsl:sequence select="for $a in $arg return
                            f:copy-with-type-1($a)"/>
  </copy>
</xsl:function>

<xsl:function name="f:copy-with-type-1" as="node()">
  <xsl:param name="arg" as="item()"/>
  <xsl:choose>
    <xsl:when test="$arg instance of node()">
      <node>
        <xsl:copy-of select="$arg"/>
      </node>
    </xsl:when>
    <xsl:otherwise>
      <xsl:when test="$arg instance of xs:a-basic-type">
        <f:a-basic-type>
          <xsl:copy-of select="$arg"/>
        </f:a-basic-type>
      </xsl:when>
      <!-- An xsl:when by simple type here... --> 
      ...
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

<!-- In SA mode -->

<xsl:function name="f:get-typed" as="item()*">
  <xsl:param name="arg" as="element(copy)"/>
  <xsl:apply-templates select="$arg/*" mode="f:get-typed"/>
</xsl:function>

<xsl:template match="node" mode="f:get-typed" as="node()">
  <xsl:sequence select="@*|node()"/>
</xsl:template>

<xsl:template match="atomic" mode="f:get-typed" as="item()">
  <xsl:sequence select="data(.)"/>
</xsl:template>

<xsl:function name="f:copy-with-type" as="node()">
  <xsl:param name="arg" as="item()*"/>
  <copy xsl:validation="preserve">
    <xsl:sequence select="for $a in $arg return
                            f:copy-with-type-1($a)"/>
  </copy>
</xsl:function>

<xsl:function name="f:copy-with-type-1" as="node()">
  <xsl:param name="arg" as="item()"/>
  <xsl:choose>
    <xsl:when test="$arg instance of node()">
      <node xsl:validation="preserve">
        <xsl:copy-of select="$arg" validation="preserve"/>
      </node>
    </xsl:when>
    <xsl:otherwise>
      <xsl:when test="$arg instance of xs:a-type">
        <atomic xsl:type="xs:a-type">
          <xsl:copy-of select="$arg" validation="preserve"/>
        </atomic>
      </xsl:when>
      <!-- An xsl:when by simple type here... --> 
      ...
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

For the actual complete files, you can go to:

Problem & Future

Off course, there is a problem with atomic items in SA mode. Because we use an xsl:choose, we have to know statically all the possible types. For the standard types, it is not a problem, but it is not usable as is with user-defined types.

Two compatible techniques could be used to help to live with this restriction. The first one is the combination of the import mechanism of XSLT and the possibility to define first-class object functions. If we think about facilities to define resolver functions by namespace (i.e. by piece of XML Schema), that could result in a flexible system.

The second technique is to use a generator for pieces of XSLT code. Actually, I use such a simple generator to generate the whole two xsl:choose elements (with an xsl:when by atomic type). The input document is an ad-hoc document that lists the standard simple types an XSLT processor has to know. But we could maybe write a generator that takes as input XML Schemas.

I hope this will be the subject of an other post.

Labels:

2 Comments:

Anonymous Anonymous said...

Hi Florent,

This exploration of type-preserving copy is very useful.

I have just the following comments:

1. In the code of f:copy-with-type-1() (maybe you can think of a name that is more easy to remember?) the order of the <xsl:when> instructions is very important.

The only correct ordering is from more specific (derived) types to their base types.

Any violation of this ordering will produce unwanted results, for example an xs:integer could be typed as xs:decimal.

2. My second comment is about the user-defined simple types.

Maybe we are paying too much attention to this problem. Instead of trying to correctly discover such user-defined types, it would be better to impose the restriction that simple-typed arguments of first-class functions must be only of the standard simple types.

THis is not a big restriction. If a user-defined type was really required for an argument, the function code can immediately apply the necessary constructor to the standard-typed argument and thus obtain the user-defined typed value.

THank you for your work in this area. It has been pleasure to read.

Cheers,
Dimitre Novatchev

23:48  
Blogger Florent Georges said...

Hi Dimitre

Thanks for your comment.

I agree the name of f:copy-with-type-1() is not the better name I ever found for a function. I named it in part after the ELisp de-facto convention for utility routines, and in part because it is the implementation of f:copy-with-type() for a single item as argument.

The order of the xsl:when clauses is clearly important. They are generated by data/func-copy-make-whens.xsl from data/xs-simple-types.xml (note the hierarchical relation between the types), and you can see the result in data/func-copy-whens.xml. I guess (I hope) this kind of relation could be infered from XML Schemas, but I didn't write such a generator yet.

About your second comment, I'm a little bit septic. The first point is that the result of calling a function could be verry different if the type of the argument is not preserved. It is more true for nodes, but it is still true for atomic items.

The second point is that the user will choose if he needs an exact type-preserving copy for user-defined simple types. If he doesn't, he'll just use the builtin implementation (the only one we can provide, only aware about the standard simple types). If he does, he'll can provide a custom resolving function (maybe generated).

The user will can register a default resolving function, but also provide different resolving functions at several call points of f:copy-with-type(), as an additional argument.

But all of this still needs to be proved by an implementation...

Thanks a lot. Regards,

--drkm

03:06  

Post a Comment

<< Home