Automate documentation with XProc and the GitHub Web API

for the Open Source framework transpect / @mkraetke


  1. Background
  2. Towards a modular architecture
  3. Establishing standards
  4. Implementation
  5. Outlook


code documentation before transpect

The picture which seems from the middled-age shows a monk which writes in a book with blank pages. One page shows the text 'README'.

code documentation before transpect

  • just a few READMEs, Wiki pages existed
  • some inline XSLT comments
  • no common methodology or standards

how the code was organized
before transpect

  • many repositories in our non-public SVN
  • code was copied from project to project
  • several languages for sticking XSLT pipelines together

Towards a modular architecture

Pipelining previously

  • pipelines sticked together with Make, Ruby, Perl
  • XSLT micropipelines
  • depends on preinstalled tools and OS

Pipelining with XProc

  • declarative vocabulary
  • port connections
  • interoperability

Encapsulation and modularity

Declaring canonical import URIs with XML Catalogs

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <rewriteURI uriStartString="" 
  <nextCatalog catalog="../docx2hub/xmlcatalog/catalog.xml"/>
  <nextCatalog catalog="../xproc-util/xmlcatalog/catalog.xml"/>
  <nextCatalog catalog="../xslt-util/xmlcatalog/catalog.xml"/>
  <nextCatalog catalog="../evolve-hub/xmlcatalog/catalog.xml"/>
  <nextCatalog catalog="../mml2tex/xmlcatalog/catalog.xml"/>
  <nextCatalog catalog="../xml2tex/xmlcatalog/catalog.xml"/>

Import and use steps

<p:import href=""/>
<xml2tex:convert name="xml2tex">
  <p:input port="conf">
    <p:pipe port="result" step="load-config"/>
  <p:with-option name="table-model" select="$table-model"/>
  <p:with-option name="table-grid" select="$table-grid"/>
  <p:with-option name="debug" select="$debug"/>
  <p:with-option name="debug-dir-uri" select="$debug-dir-uri"/>
  <p:with-option name="status-dir-uri" select="$status-dir-uri"/>
  <p:with-option name="fail-on-error" select="$fail-on-error"/>

We entitled it transpect and published it under a FreeBSD license.

Still just some guides and inline documentation existed, no standards.

Establishing standards

Coding style

more details here

Naming conventions

  |  |--catalog.xml
  |  |--myStylesheet.css
  |  |--myPipeline.xpl
  |  |--myXSLT.xsl

Moving to GitHub: The good parts

SVN is our favorite version control system, but …

… we wanted to lower the barriers for external users to use our code, file bug reports, make pull requests etc.

The Good, the Bad and GitHub

  • GitHub downtimes (bad for our CI system)
  • SVN adapter seems sometimes not to work properly
  • Git submodules (Detached HEAD)


XProc inline documentation

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p=""
  xmlns:c="" version="1.0">
  <p:documentation xmlns="">
    The documentation may include 
    <a href="">HTML</a> 
    markup as well.
  <p:input port="source">
      <doc>Hello world!</doc>
  <p:output port="result"/>

Accessing the GitHub API

  • XProc p:http-request and Calabash transparentJSON extension used to access the GitHub API
  • two pipelines: accessing the organization and recursively list folder and files in a repository
  • XProc pipelines, XSLT stylesheets and XML catalogs are included as well
<?xml version="1.0" encoding="UTF-8"?>
  <j:item xmlns:j="" type="object">
  <j:language type="string">XProc</j:language>
  <j:svn_005furl type="string"></j:svn_005furl>
  <j:forks type="number">0</j:forks>
  <j:ssh_005furl type="string"></j:ssh_005furl>
  <j:full_005fname type="string">transpect/cascade</j:full_005fname>
  <j:clone_005furl type="string"></j:clone_005furl>
  <j:default_005fbranch type="string">master</j:default_005fbranch>
  <j:private type="boolean">false</j:private>
  <j:open_005fissues_005fcount type="number">0</j:open_005fissues_005fcount>
  <j:description type="string">Libraries to implement a transpect cascade configuration</j:description>
  <j:git_005furl type="string">git://</j:git_005furl>
  <j:has_005fissues type="boolean">true</j:has_005fissues>
  <j:contents_005furl type="string">{+path}</j:contents_005furl>  

Generating the documentation XML

  1. analyze XProc documentation tags, input and output ports, options, imports and generate DocBook
  2. XInclude is used to include this in the general DocBook documentation

HTML representation

  • HTML template and MaterializeCSS framework
  • DocBook converted to HTML and injected into the template
  • Output HTML files are stored with XProc


commit HTML and XML (still manually) to update



  • writing documentation
  • writing documentation
  • writing documentation


  • Jenkins integration
  • use inline documentation of input and output ports, options
  • perhaps visualize XProc pipelines

Thank you!