Sunday, November 15, 2009

EXPath Packaging System: the on-disk repository layout

While working on the implementation for Calabash of the EXPath Packaging System, I was rewriting, again, a repository manager, dedicated to Calabash. Exactly as I did for Saxon one month earlier. Why? The repositories provide the same features. It should be then possible to make Calabash and Saxon share the same repository, if Saxon just ignore components other than XSLT and XQuery (for instance XProc pipelines) in that repository. So one just has to maintain one single repository for his/her whole computer (or one repository dedicated to a single project, like a Java EE application.)

Going further, I think the layout of such an on-disk repository should be part of the packaging specification itself. An implementation does not have to use such a standard repository, but if it does, it doesn't have to worry about package installation, repository management software, or even about the resolving mecanism between a component URI and the actual file with that component. One repository layout, one set of softwares for all those tasks.

This introduce a new concept. Each kind of component (XSLT, XQuery, XML Schema, etc.) has its own URI space. For instance, when using Saxon for a transform, it will resolve xsl:import URIs only in the XSLT space, when using Calabash, it will use the right space for each step. The resolving machinery is based on OASIS XML Catalogs. The repository has a top-level catalog for each URI space.

The global view of the repository is a set of subdirectories, one per package installed. The package is unzip exactly has it has been created (with the exact same files and the exact same structure.) One of those direct subdirectories is special. Its name is .expath-pkg/ and it contains the catalogs and other administrative files. It can also contain config files dedicated to a specific processor; for instance the extensions written in Java for Saxon need some config file to be stored there. There is one top-level catalog for each URI space in the repository, as well as for each package there is one catalog for each URI space it contains. The top level catalogs just point to all existing catalogs at the package level.

repo/
   .expath-pkg/
      xquery-catalog.xml
      xslt-catalog.xml
      .saxon/
         ...        [Saxon-specific stuff at the repository level]
      lib1/
         xquery-catalog.xml
         xslt-catalog.xml
         saxon/
            ...     [Saxon-specific stuff in lib1]
      lib2/
         ...
   lib1/
      query.xq
      style.xsl
   lib2/
      ...

There is a specific project aimed only at managing such a repository. There is for now only a command line interface, but there should be a graphical interface in the near future. The same project provides helpers to other Java-based applications to use repositories. For instance, the implementations for Saxon and Calabash use this JAR file to get resolving support for some URI spaces, based on the Norman's resolver for XML Catalogs. It could then be used in applications like Kernow and oXygen, or even in eXist. The following are the steps needed to setup the repository management application, Saxon and Calabash to have a usable packaging system.

  • 1/ download expath-pkg-repo-0.1.jar. I create a shell script on my system to use it easily by typing just xrepo, but this is a simple JAR file you can execute by java -jar pkg-repo.jar. Hereafter I simply use xrepo to refer to this application.
  • 2/ set $EXPATH_REPO, for instance to ~/share/expath/repo or to /usr/local/share/expath/repo or to c:/expath/repo
  • 3/ initialize the repository with xrepo create $EXPATH_REPO
  • 4/ put saxon and calabash scripts into your $PATH, with the following environment variables to be able to use them
  • 5/ set SAXON_CP to the classpath required to execute Saxon; it must contain the following JARs: saxon9he.jar (or any other version), resolver.jar, expath-pkg-repo-0.1.jar and expath-pkg-saxon-0.2.jar
  • 6/ set CALABASH_CP to the classpath required to execute Calabash; it must contain the following JARs: my modified version of Calabash, saxon9he.jar (or any other 9.2 version), resolver.jar, expath-pkg-repo-0.1.jar, expath-pkg-saxon-0.2.jar and expath-pkg-calabash-0.1.jar
  • 4b/ instead of the steps 4, 5 and 6 (for example if you do not have a Unix shell,) you can just create a simple script with the appropriate classpath and Java command to launch Saxon, as well as one for Calabash. The only drawback is that the JAR files for extensions written in Java for Saxon won;t be taken automatically from the repository

We are now going to test the EXPath HTTP Client, delivered as a XAR file. First, we create three test files: an XSLT stylesheet, an XQuery main module and an XProc pipeline. All those files are simple and use the extension function http:send-request() to send an HTTP request to a website, get the result, and extract the HTML title. Save them somewhere as, say, http-client-test.xsl, http-client-test.xq and http-client-test.xproc:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:http="http://www.expath.org/mod/http-client"
                xmlns:h="http://www.w3.org/1999/xhtml"
                exclude-result-prefixes="http h"
                version="2.0">

   <xsl:import href="http://www.expath.org/mod/http-client.xsl"/>

   <xsl:template name="main">
      <xsl:variable name="request" as="element()">
         <http:request href="http://www.fgeorges.org/" method="get"/>
      </xsl:variable>
      <title>
         <xsl:value-of select="http:send-request($request)
                                 / h:html/h:head/h:title"/>
      </title>
   </xsl:template>

</xsl:stylesheet>
import module namespace http = "http://www.expath.org/mod/http-client";
declare namespace h = "http://www.w3.org/1999/xhtml";

http:send-request(
   <http:request href="http://www.fgeorges.org/" method="get"/>
)
  / h:html/h:head/h:title
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:c="http://www.w3.org/ns/xproc-step">

   <p:input  port="source"/>
   <p:output port="result"/>

   <p:xslt template-name="main">
      <p:input port="stylesheet">
         <p:document href="http-client-test.xsl"/>
      </p:input>
      <p:input port="parameters">
         <p:empty/>
      </p:input>
   </p:xslt>

</p:declare-step>

If you try to evaluate those test files before installing the package, you will get errors from Saxon and Calabash (disclaimer: I rewrote the outputs of both processors, just make them more easily readable, but the meaning stays intact):

$ saxon -xsl:http-client-test.xsl -it:main
File not found: http://www.expath.org/mod/http-client.xsl

$ saxon --xq http-client-test.xq
Cannot locate module for namespace http://www.expath.org/mod/http-client

$ calabash http-client-test.xproc
File not found: http://www.expath.org/mod/http-client.xsl

Now, install the package directly from the Internet (just press ENTER at both questions from the installer, to keep the default values,) then try again the test files:

$ xrepo install http://www.cxan.org/tmp/expath-http-client-0.1.xar
Install module EXPath HTTP Client? [true]: 
Install it to dir [expath-http-client-0.1]: 

$ saxon -xsl:http-client-test.xsl -it:main
<title>Florent Georges</title>

$ saxon --xq http-client-test.xq
<title xmlns="http://www.w3.org/1999/xhtml">Florent Georges</title>

$ calabash http-client-test.xproc
<title>Florent Georges</title>

While I think the runtime support for the packaging is best handled in each processor's internals, having a common repository layout (and actually shared repositories) could help processors to implement it and especially to have a set of independent applications to manage repositories and packages.

The next is, finally, to release a new version of the specification, including this repository layout. See the EXPath Packaging page for more information, and subscribe to the EXPath mailing list to stay tunned.

Labels: , ,

0 Comments:

Post a Comment

<< Home