Wednesday, September 07, 2011

Packaging extension steps for Calabash

In my previous blog post, I introduced how to develop an extension XProc step in Java, for the Calabash processor. Even though writing such an extension is quite easy when you know what to do, the configuration part for the final user is quite tricky. That complexity could be a serious argument for a potential user to give up even before he/she is able to run an example using your extension step. See the previous blog entry for details, but basically the user has to configure the classpath for Calabash with your JAR and all its dependencies, point to your config file when launching Calabash, and import your library into the main pipeline (after having decided where to install your extension step).

At the end of the previous post, I introduced the idea of having such extension steps, writen in Java for Calabash, supported out of the box by the repository implementing the Packaging System. I played a little bit with the idea and came up with the following design (and implementation). Of course you still have to provide the same information (the step interface, its implementation, and the link between its type and the class implementing it), but the goal is to enable the author to do it once for all, so the user can simply use the following commands to install the package and run a pipeline using it:

> xrepo install http://example.org/path/to/your-package.xar
> calabash pipeline.xproc
...
The only constraint on the user is to use the absolute URI you defined to import the XProc library you wrote with the step interface declaration. This absolute URI will be resolved automatically into the user local repository, and the repository system will configure Calabash with the Java code automatically. In order to achieve that goal, you, as an extension step author, have to provide a package with the following structure:

expath-pkg.xml
calabash.xml
your-steps/
   your-steps-lib.xpl
   your-steps.jar
   dependency.jar

This structure looks familiar to whoever knows the structure of a standard package: you have the package descriptor, namely expath-pkg.xml, containing meta-information about the package and its content, then within the package directory you have the components, the content itself of the package. In addition, you have an additional descriptor, specific to Calabash, that is calabash.xml. In this case, the content of the package is an XProc library containing the step declarations, the JAR file with the compiled Java implementation of your extension steps, and all its dependencies (the other Java libraries it uses). Let's see how the two descriptors carry out all the information needed in order to use the extension steps. First the standard package descriptor, expath-pkg.xml:

<package xmlns="http://expath.org/ns/pkg"
         name="http://example.org/lib/your-steps"
         abbrev="your-steps"
         version="0.1.0"
         spec="1.0">

   <title>Your XProc steps for Calabash</title>

   <dependency processor="http://xmlcalabash.com/"/>

   <xproc>
      <import-uri>http://example.org/your-steps/lib.xpl</import-uri>
      <file>your-steps-lib.xpl</file>
   </xproc>

</package>

Besides the usual informations about the package (its name, textual description, version number, etc.), we tell that this package is specific to Calabash (by depending on that processor). We also declare a public component, a standard XProc library, by assigning a public, absolute URI to it, and by linking to its file by name, within the package content. Indeed, keep in mind that this library declares the step interfaces and is standard XProc, it remains the same even if there are several implementations. The library itself is:

<p:library xmlns:p="http://www.w3.org/ns/xproc"
           xmlns:y="http://example.org/ns/your-steps"
           version="1.0">

   <p:declare-step type="y:some-of-your-steps">
      <p:input  port="source" primary="true"/>
      <p:output port="result" primary="true"/>
      <p:option name="username"/>
   </p:declare-step>

   <p:declare-step type="y:another-one">
      <p:output port="result" primary="true"/>
   </p:declare-step>

</p:library>

Finally, the second descriptor, specific to Calabash and named calabash.xml, describe the informations about the Java implementation: the JAR files to add to the classpath, and the Java class implementing each of the extension step types:

<package xmlns="http://xmlcalabash.com/ns/expath-pkg">

   <jar>your-steps.jar</jar>
   <jar>dependency.jar</jar>

   <step>
      <type>{http://example.org/ns/your-steps}some-of-your-steps</type>
      <class>org.example.yours.SomeStep</class>
   </step>

   <step>
      <type>{http://example.org/ns/your-steps}another-one</type>
      <class>org.example.yours.AnotherStep</class>
   </step>

</package>

The JAR files are referenced by filenames (relative to the package content dir), the step types are identified by there QName (using Clark notation, to represent both the namespace URI and the local name as one single string), and the implementation class is referenced by it fully qualified name.

The package author has just to respect those conventions and to provide those two descriptor. He/she can package everything up by zipping this into one single ZIP file (usually using the extension *.xar, for XML ARchive). He/she is then able to publish and distribute the package to users. If the users have support for the packages, the only piece of documentation to provide is the public URI of the XProc library, to import it into their own pipeline.

An interesting point is that this strategy is usable as well for private extensions. Let's take the set of XSLT 2.0 stylesheets for DocBook for instance. A pipeline, or even a set of pipelines, might make perfect sense to drive some processings using this large application. If that processing needs some extensions to the standard languages, then it is possible to write extension steps for Calabash, integrate them within the package with the standard XSLT stylesheets and XProc pipelines, and to use it internally. If the XProc library declaring the steps is not publicly exposed in the package descriptor, then only the other components in the package itself can use it.

In that case, a user using Calabash just installs the package like any other package, and does not have to, you know, configure the extensions...

Labels: , ,

Sunday, September 04, 2011

Writing an extension step for Calabash, to use BaseX

Introduction

Writing an extension for Calabash in Java involves three different things: 1/ the Java class itself, which has to implement the interface XProcStep, 2/ binding a step name to the implementation class, and 3/ declaring the step in XProc.

Java

Let's take, as an example, a step evaluating a query using the standalone BaseX processor. The goal is not to have a fully functional step, nor to have a best-quality-ever step with error reporting and such, but rather to emphasize how to glue all the things together. The step has one input port, named source, and one output port, named result. The step gets the string value of the input port (typically a c:query element) and evaluates it as an XQuery, using BaseX. The result is parsed as an XML document and sent to the output port (it is a parse error if the result of the query is not an XML document or element). Let's start with the Java class implementing the extension step:

/****************************************************************************/
/*  File:       BasexStandaloneQuery.java                                   */
/*  Author:     F. Georges - H2O Consulting                                 */
/*  Date:       2011-08-31                                                  */
/*  Tags:                                                                   */
/*      Copyright (c) 2011 Florent Georges.                                 */
/* ------------------------------------------------------------------------ */


package org.fgeorges.test;

import com.xmlcalabash.core.XProcException;
import com.xmlcalabash.core.XProcRuntime;
import com.xmlcalabash.io.ReadablePipe;
import com.xmlcalabash.io.WritablePipe;
import com.xmlcalabash.library.DefaultStep;
import com.xmlcalabash.runtime.XAtomicStep;
import java.io.StringReader;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import net.sf.saxon.s9api.DocumentBuilder;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.XdmNode;
import org.basex.core.BaseXException;
import org.basex.core.Context;
import org.basex.core.cmd.XQuery;


/**
 * Sample extension step to evaluate a query using BaseX.
 *
 * @author Florent Georges
 * @date   2011-08-31
 */
public class BasexStandaloneQuery
        extends DefaultStep
{
    public BasexStandaloneQuery(XProcRuntime runtime, XAtomicStep step)
    {
        super(runtime,step);
    }

    @Override
    public void setInput(String port, ReadablePipe pipe)
    {
        mySource = pipe;
    }

    @Override
    public void setOutput(String port, WritablePipe pipe)
    {
        myResult = pipe;
    }

    @Override
    public void reset()
    {
        mySource.resetReader();
        myResult.resetWriter();
    }

    @Override
    public void run()
            throws SaxonApiException
    {
        super.run();

        XdmNode query_doc = mySource.read();
        String query_txt = query_doc.getStringValue();
        XQuery query = new XQuery(query_txt);
        Context ctxt = new Context();
        // TODO: There should be something more efficient than serializing
        // everything and parsing it again...  Plus, if the result is not an XML
        // document, wrap it into a c:data element.  But that's beyond the point.
        String result;
        try {
            result = query.execute(ctxt);
        }
        catch ( BaseXException ex ) {
            throw new XProcException("Error executing a query with BaseX", ex);
        }
        DocumentBuilder builder = runtime.getProcessor().newDocumentBuilder();
        Source src = new StreamSource(new StringReader(result));
        XdmNode doc = builder.build(src);

        myResult.write(doc);
    }

    private ReadablePipe mySource = null;
    private WritablePipe myResult = null;
}

An extension step has to implement the Calabash interface XProcStep. Calabash provides a convenient class DefaultStep that implements all the methods with default behaviour, good for most usages. The only thing we have to do is to save the input and output for later use, and to reset them in case the step object is reused. And of course to provide the main processing in run(). The processing itself, in the run() method, we read the value from the source port, get its string value, execute it using the BaseX API, and parse the result as XML to write it to the result port.

As you can see, there is nothing in the class itself about the interface of the step: its type name, its inputs and outputs, its options, etc. This is done in two different places. First you link the step type to the implementation class, then you declare the step with XProc.

Tell Calabash about the class

Linking the step type to the implementation class is done in a Calabash config file. So you have to create a new config file, and pass it to Calabash on the command line with the option --config (in abbrev -c). The file itself is very simple, and link the step type (a QName) and the class (a fully qualified Java class name):

<xproc-config xmlns="http://xmlcalabash.com/ns/configuration"
              xmlns:fg="http://fgeorges.org/ns/tmp/basex">

   <implementation type="fg:ad-hoc-query"
                   class-name="org.fgeorges.test.BasexStandaloneQuery"/>

</xproc-config>

Declare the step

Finally, declaring the step in XProc is done using the standard p:declare-step. If it contains no subpipeline (that is, if it contains only p:input, p:output and p:option children), then it is considered as a declaration of a step the implementation of which is somewhere else; if it contains a subpipeline, then this is a step type definition, with the implementation defined in XProc itself. The declaration can be copied and pasted in the main pipeline itself, but as with any other language, the best practice is rather to declare it in an XProc library and to import this library (composed only with step declarations) within the main pipeline using p:import. In our case, we define the step type to have an input port source, an output port result (both primary), and without any option:

<p:library xmlns:p="http://www.w3.org/ns/xproc"
           xmlns:fg="http://fgeorges.org/ns/tmp/basex"
           xmlns:pkg="http://expath.org/ns/pkg"
           pkg:import-uri="http://fgeorges.org/tmp/basex.xpl"
           version="1.0">

   <p:declare-step type="fg:ad-hoc-query">
      <p:input  port="source" primary="true"/>
      <p:output port="result" primary="true"/>
   </p:declare-step>

</p:library>

Using it

Now that we have every pieces, we can write an example main pipeline using this new extension step:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:c="http://www.w3.org/ns/xproc-step"
                xmlns:fg="http://fgeorges.org/ns/tmp/basex"
                name="pipeline"
                version="1.0">

   <p:import href="basex-lib.xpl"/>

   <p:output port="result" primary="true"/>

   <fg:ad-hoc-query>
      <p:input port="source">
         <p:inline>
            <c:query>
               &lt;res> { 1 + 1 } &lt;/res>
            </c:query>
         </p:inline>
      </p:input>
   </fg:ad-hoc-query>

</p:declare-step>

To run it, just issue the following command on the command line (where basex-steps.jar is the JAR file you compiled the extension step class into):

> java -cp ".../calabash.jar:.../basex-6.7.1.jar:.../basex-steps.jar" \
       -c basex-config.xml \
       example.xproc

If you use this script, you can then use the following command:

> calabash ++add-cp .../basex-6.7.1.jar \
           ++add-cp .../basex-steps.jar" \
           -c basex-config.xml \
           example.xproc

Packaging

Update: The mechanism described in this section has been implemented, see this blog entry.

If you want to publicly distribute your extension, you have to provide your users with 1/ the JAR file, 2/ the config file and 3/ the library file. Thus the user needs to correctly configure Java with the JAR file, to correctly configure Calabash with the config file, and to use a suitable URI in the p:import/@href in his/her pipeline. This is a lot of different places where the user can make a mistake.

The EXPath Packaging open-source implementation for Calabash does not support Java extension steps yet, but it is planned to support them, in order to handle that configuration part automatically. The goal is to have the library author to define an absolute URI for the XProc library (declaring the steps), which the user uses in p:import, regardless of where it is actually installed (it will be resolved automatically). The details (classpath setting, XProc library resolving, and Calabash config) should then be handled by the packaging support. Once the package of the extension step has been installed in the repository, one can then execute the following pipeline (note the import URI has changed):

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:c="http://www.w3.org/ns/xproc-step"
                xmlns:fg="http://fgeorges.org/ns/tmp/basex"
                name="pipeline"
                version="1.0">

   <p:import href="http://fgeorges.org/tmp/basex.xpl"/>

   <p:output port="result" primary="true"/>

   <fg:ad-hoc-query>
      <p:input port="source">
         <p:inline>
            <c:query>
               &lt;res> { 1 + 1 } &lt;/res>
            </c:query>
         </p:inline>
      </p:input>
   </fg:ad-hoc-query>

</p:declare-step>
by invoking simply the following command:
> calabash example.xproc

Labels: , , ,