The logo is a pig framed between two angle brackets.

XProc 3.0 Tutorial

Working with Text Documents

In contrast to its predecessors, XProc 3.0 offers distinctive steps for manipulating text documents:

XProc 3.0 accepts also text documents as input. You just need to add content-types="text/plain" to tell the XProc processor to treat the input as text.

<p:text-count/>

The p:text-count/ step counts the number of lines of a text document. The text document must be passed to its input port and the step returns an XML document containing a single c:result/ element with the line count. Let’s consider we have this text document and are too busy to count the lines ourselves:

From time to time,
one needs a rhyme.

To get the number of lines of this text document, we can utilize the XProc pipeline below:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source" content-types="text/plain"/>
  
  <p:output port="result"/>
  
  <p:text-count/>
  
</p:declare-step>

Output

<?xml version="1.0" encoding="UTF-8"?>
<c:result xmlns:c="http://www.w3.org/ns/xproc-step">2</c:result>

<p:text-head/>

The step returns a number of lines from the input document starting from the begin of the document. The number of lines can be passed with the count option. If we want to get the first line from our little poem above, we can use this pipeline:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source" content-types="text/plain"/>
  
  <p:output port="result"/>
  
  <p:text-head count="1"/>
  
</p:declare-step>

Output

From time to time,

<p:text-tail/>

In contrast to the previous step, p:text-tail returns lines from the end of the document.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source" content-types="text/plain"/>
  
  <p:output port="result"/>
  
  <p:text-tail count="1"/>
  
</p:declare-step>

Output

one needs a rhyme.

<p:text-join/>

If we want to merge multiple documents, we can use the step p:text-join. The step takes all documents that appear on the input port and concatenates them in a single document that is returned by the result port. The step provides a separator option whose value will be inserted between the documents. The values of the prefix and suffix options are added to the beginning and end of the document, respectively.

Let’s consider we have written another great verse and want to attach it to our poem above:

and if you're bright
you use this website. 

The pipeline below concatenates the two text documents. We insert the XML entity for a newline character &#xa; via the separator option because the last line of the first document contains no newline. If you would like to know how to use your XProc processor to pass multiple documents to a single input port, please read the article The XProc Processor.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
  
  <p:input port="source" sequence="true" content-types="text/plain"/>
  
  <p:output port="result"/>
  
  <p:text-join separator="&#xa;"/>
  
</p:declare-step>

Output

From time to time,
one needs a rhyme.
and if you're bright
you use this website. 

<p:text-replace/>

This step uses a regular expression to match all occurrences of this pattern and replace them with a replacement string. The changed document is returned on the output port. To demonstrate this step, we extend our p:text-join pipeline from above.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
  
  <p:input port="source" sequence="true" content-types="text/plain"/>
  
  <p:output port="result"/>
  
  <p:text-join separator="&#xa;"/>
  
  <p:text-replace pattern="website" 
                  replacement="super phenomenal XProc 3.0 tutorial"/>

</p:declare-step>

Output

From time to time,
one needs a rhyme.
and if you're bright
you use this super phenomenal XProc 3.0 tutorial. 

<p:text-sort/>

Another step for text documents is p:text-sort. The step sorts all lines in a document by a sort key which you can specify with an XPath expression via the sort-key option. The sort key is applied to each line. There are other options which help you to refine your sort:

  • the order option can be set either to define whether the results are presented in ascending or descending order. Permitted values are either ascending or descending
  • the case-order option defines whether lower or upper characters are processed first. Either the the values lower-first or upper-first are permitted
  • Since the sort order is language-dependent, you may need to specify the language with the lang option.
    • Read more…