Friday, October 02, 2009

EXPath Packaging System prototype implementation for Saxon

Introduction

After having released a first implementation of EXPath Packaging System for eXist, here is a version for Saxon. You can read this previous blog entry to get more information on the packaging system; in particular, it says: "The concept is quite simple: defining a package format to enable users to install libraries in their processor with just a few clicks, and to enable library authors to provide a single package to be installed on every processors, without the need to document (and maintain) the installation process for each of them."

The package manager for Saxon is a graphical application (a textual front-end will be provided soon,) and is provided as a single JAR file. Go to the implementations page, or use this following direct link to get the JAR. Run it as usual, for instance by double-clicking on it or by executing the command java -jar expath-pkg-saxon-0.1.jar. That will launch the package manager window.

Repositories

The implementation for Saxon differs from the one for eXist in a fundamental way: Saxon does not have a home directory where you can put the installed packaged, and you can invoke Saxon in so many different ways (while the eXist core is always started the same way.) That involves two different aspects regarding package management with Saxon: the package manager itself that installs and remove packages, and a way to configure Saxon itself, regardless with the way you invoke it. In addition, the homeless property of Saxon needs to introduce the concept of package repository.

A repository is a directory dedicated to installing packages, and should only be modified through the package manager. It contains the packages themselves (under a form usable by Saxon) as well as administrative informations to be able to use them (like catalogs, etc.) The graphical package manager allows one to create a new repository directly from the graphical interface, as well as switching between different repositories (if you need to maintain several repositories for several purposes.)

Importing stylesheet

But as I said above, having a repository full of packages is not enough. You have to configure Saxon to use this repository. Because you can invoke Saxon in a plenty of ways, the configuration itself is implemented as a Java helper class that you can use in your own code if you invoke Saxon from within Java (for instance in a Java EE web application.) If you use Saxon from the command line, there is a script that takes care of configuring everything for you.

But before looking in details at how to configure Saxon to use a repository, let's have a look at how a stylesheet can use an installed package. This is the whole point of the packaging system, after all. The goal is simply to be able to use a public import URI in an import statement, this URI being automatically resolved to its local copy in the repository. Like a namespace URI is just a kind of identifier (it is just used as a string, your processor does not try to actually access anything at that address,) the public import URI is an identifier to a specific stylesheet. This machanism supports also having functions implemented in Java. So all you need to do is to use this public URI, like the following:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:h="http://www.example.org/hello"
                version="2.0">

   <xsl:import href="http://www.example.org/hello.xsl"/>

   <xsl:template ...>
      ...
      <xsl:value-of select="h:hello('world')"/>

For XQuery, this is a bit different as XQuery does have a module system. But this is actually very similar. XQuery library modules are identified by their namespace URI. Once again, it can be seen as a public identifier for that XQuery module. So let's say we have an XQuery library module for the namespace URI http://www.example.org/hello, then you can simply write a module that imports it as following:

import module namespace h = "http://www.example.org/hello";
h:hello('world')

And that's it! In the package samples section below, you can see completes examples of such importing stylesheets and queries, as well as the packages they use.

Java configuration

To configure Saxon to use a repository from Java, you need to get a Configuration object. This is a central class in Saxon, which is used almost everywhere in the Saxon code base. You can get it from a Saxon TransformerFactory or from a S9API Processor. With that object on the one hand, and a File object pointing to the repository directory on the other hand, you can just call:

// the repo directory
File          repo   = ...;
// the Saxon config object
Configuration config = ...;
// the EXPath Pkg configurer
ConfigHelper  helper = new ConfigHelper(repo);
// actually configure Saxon
helper.config(config);

Besides the Java code itself, you have to be sure 1/ to have an actual repository at the location you pass to the ConfigHelper constructor and 2/ to have the JAR files used by and containing the extension functions written in Java into your classpath. The only exception to this rule is when you register such an extension function (written in Java) to Saxon 9.2; in this case EXPath Pkg will try to dynamically add the JAR files from the repository to the classpath. But playing with the classpath at runtime is not something I would recommend in Java.

Shell script

When using Saxon from the command line, EXPath Pkg comes with an alternate class to launch Saxon (this class automatically uses ConfigHelper to configure Saxon) as well as with a shell script to launch Saxon with the correct classpath.

To use this shell script (only available on Unix-like systems for now, including Cygwin under Windows) you have to set the environment variables SAXON_HOME to the directory where you put the Saxon JAR files, EXPATH_PKG_JAR to the EXPath Pkg JAR file, and APACHE_XML_RESOLVER_JAR to the XML Resolver JAR file from Apache. Additionally, you can set EXPATH_REPO to the repository directory, to not have to explicitely give it as an option each time you invoke Saxon. If all the above environment variables have been correctly set, and the script added to your PATH, you can just invoke Saxon as usual: saxon -s:source.xml -xsl:stylesheet.xsl.

Use saxon --help to get the usage help of this script. You can set the EXPath repository (and thus override EXPATH_REPO if it is set) with the option --repo=. You can add items to the classpath with the option --add-cp=. You can set the classpath (so overriding SAXON_HOME and other environment variables) with the option --cp=. The script detects if Saxon SA is present, and if so will use the SA version. You can force either B or SA version with either --b or --sa. You can also set any option to the Java Virtual Machine by using --java=, for instance to set a system property, and --mem= to set the amount of memory of the virtual machine (shortcut for the Java option -Xmx) And finally, you can also set the HTTP and HTTPS proxy information with --proxy=host:port (for instance --proxy=proxyhost:8080.)

Package samples

The first example is a packaged version of Priscilla Walmsley's FunctX. This package contains both the XSLT and the XQuery versions of this library. Of course, the XQuery module defines a module namespace, but the XSLT stylesheet does not have any public import URI (as this is behind the standard.) I chose the URI http://www.functx.com/functx-1.0.xsl, but keep in mind this is not official by any means, this is just the URI I chose. It is intended that library authors package their own libraries and choose the public URIs themselves.

The package itself is a plain ZIP file. If you open it or unzip it with your preffered tool, you can see that at the top level, there is a file named expath-pkg.xml. This is the package descriptor, that defines what the package contains (at least what is publicly exported from the package, so what can be used from within a stylesheet or a query.) In the case of this FunctX package, this descriptor looks like:

<package xmlns="http://expath.org/mod/expath-pkg">
   <module version="1.0" name="functx">
      <title>FunctX library for XQuery 1.0 and XSLT 2.0</title>
      <xsl>
         <import-uri>http://www.functx.com/functx-1.0.xsl</import-uri>
         <file>functx-1.0-doc-2007-01.xsl</file>
      </xsl>
      <xquery>
         <namespace>http://www.functx.com</namespace>
         <file>functx-1.0-doc-2007-01.xq</file>
      </xquery>
   </module>
</package>

To install the package, just download it to a temporary location, launch the package manager as explained at the beginning of this blog post, choose "install" in the file menu, and choose the package on your filesystem. To test if it is correctly installed, write the following stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:f="http://www.functx.com"
                version="2.0">

   <xsl:import href="http://www.functx.com/functx-1.0.xsl"/>

   <xsl:template match="/" name="main">
      <result>
         <xsl:sequence select="f:date(1979, 9, 1)"/>
      </result>
   </xsl:template>

</xsl:stylesheet>

and/or the following XQuery main module (depending on what you want to test):

import module namespace f = "http://www.functx.com";

<result> {
   f:date(1979, 9, 1)
}
</result>

To evaluate them, make sure you configured the shell script correctly, as explained above, then open a shell and type one of the following command (or both) where style.xsl is the file where you saved the above stylesheet and query.xq is the file where your saved the above query:

$ saxon -xsl:style.xsl -it:main
<result>1979-09-01</result>
$ saxon --xq query.xq
<result>1979-09-01</result>
$ 

If you prefer to test from Java, just write a simple main class that evaluates the above stylesheet and/or query, taking care of using ConfigHelper to set up the Saxon Configure object. For instance, if you want to use the S9API, you can configure the Processor object like the following (don't forget to add the EXPath Pkg and the Apache XML resolver JAR files to your classpath):

// the repo directory
File         repo   = new File("...");
// the EXPath Pkg configurer
ConfigHelper helper = new ConfigHelper(repo);
// the Saxon processor
Processor    proc   = new Processor(false);
// actually configure Saxon
helper.config(proc.getUnderlyingConfiguration());
// then use 'proc' as usual...

The second sample package provides a single function: ext:hello($who). It is written in Java. Besides other stuff related to the packaging itself, it contains a JAR file with the implementation of that extension function. To test it, just follow the same steps as for the FunctX package, except that you have to add the installed JAR file (from within the repository) to your claspath (this is done automatically for you if you use the shell script, but not if you test it from a Java program.)

Conclusion

This is just a prototype implementation of a package manager for Saxon, which is consistent with the one for eXist. The main issue is the configuration of the classpath, but I think this is best let to the user than having to deal with the classpath, in particular within the context of a Java EE application. This issue shows up also in your IDE configuration. For now, I configure oXygen by adding the catalogs from the repository to the oXygen's main catalog list, and the extension JAR files to the oXygen classpath, so the built-in Saxon processors can be used exactly as usual. But such issues can be resolved by native support right into the processors ad IDEs.

Besides this classpath issue, I am convinced that package management will really improve the current situation, and maybe could be the missing piece to distribute real general-purpose libraries for XQuery and XSLT, and one of the basis to other systems, like an implementation-independent XRX system.

Labels: , , ,