Working with Text Documents
In contrast to its predecessors, XProc 3.0 offers distinctive steps for manipulating text documents:
<p:text-count/>
The p:text-count/
step counts the number of lines of a text document. The text document must be passed to its input port and the step returns an XML document containing a single c:result/
element with the line count. Let’s consider we have this text document and are too busy to count the lines ourselves:
From time to time, one needs a rhyme.
To get the number of lines of this text document, we can utilize the XProc pipeline below:
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source" content-types="text/plain"/>
<p:output port="result"/>
<p:text-count/>
</p:declare-step>
Output
<?xml version="1.0" encoding="UTF-8"?> <c:result xmlns:c="http://www.w3.org/ns/xproc-step">2</c:result>
<p:text-head/>
The step returns a number of lines from the input document starting from the begin of the document. The number of lines can be passed with the count
option. If we want to get the first line from our little poem above, we can use this pipeline:
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source" content-types="text/plain"/>
<p:output port="result"/>
<p:text-head count="1"/>
</p:declare-step>
Output
From time to time,
<p:text-tail/>
In contrast to the previous step, p:text-tail
returns lines from the end of the document.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source" content-types="text/plain"/>
<p:output port="result"/>
<p:text-tail count="1"/>
</p:declare-step>
Output
one needs a rhyme.
<p:text-join/>
If we want to merge multiple documents, we can use the step p:text-join
. The step takes all documents that appear on the input port and concatenates them in a single document that is returned by the result port. The step provides a separator
option whose value will be inserted between the documents. The values of the prefix
and suffix
options are added to the beginning and end of the document, respectively.
Let’s consider we have written another great verse and want to attach it to our poem above:
and if you're bright you use this website.
The pipeline below concatenates the two text documents. We insert the XML entity for a newline character 

via the separator
option because the last line of the first document contains no newline. If you would like to know how to use your XProc processor to pass multiple documents to a single input port, please read the article The XProc Processor.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source" sequence="true" content-types="text/plain"/>
<p:output port="result"/>
<p:text-join separator="
"/>
</p:declare-step>
Output
From time to time, one needs a rhyme. and if you're bright you use this website.
<p:text-replace/>
This step uses a regular expression to match all occurrences of this pattern and replace them with a replacement string. The changed document is returned on the output port. To demonstrate this step, we extend our p:text-join
pipeline from above.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source" sequence="true" content-types="text/plain"/>
<p:output port="result"/>
<p:text-join separator="
"/>
<p:text-replace pattern="website"
replacement="super phenomenal XProc 3.0 tutorial"/>
</p:declare-step>
Output
From time to time, one needs a rhyme. and if you're bright you use this super phenomenal XProc 3.0 tutorial.
<p:text-sort/>
Another step for text documents is p:text-sort
. The step sorts all lines in a document by a sort key which you can specify with an XPath expression via the sort-key
option. The sort key is applied to each line. There are other options which help you to refine your sort:
- the
order
option can be set either to define whether the results are presented in ascending or descending order. Permitted values are eitherascending
ordescending
- the
case-order
option defines whether lower or upper characters are processed first. Either the the valueslower-first
orupper-first
are permitted - Since the sort order is language-dependent, you may need to specify the language with the
lang
option.