Rerouting and Interrupting the Data Flow

The data flow in a pipeline does not have to follow a linear order. On the contrary, if steps draw data from other steps in the pipeline, the data flow can branch and merge, be interrupted at some point and continued later. For this purpose, XProc comes with two very simple but frequently used steps: p:identity and p:sink.

We used p:identity before. The purpose of this step is simply to replicate its input as output. In contrast, p:sink takes is input and discards it. p:sink is one of the steps without an output port. Thanks to their simple functionality, these steps are perfect for controlling the data flow in a pipeline. Let’s have a look at this example:


<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
  
  <p:input port="source">
    <p:inline>
      <doc>Hello XPorc!</doc>
    </p:inline>
  </p:input>
  
  <p:output port="result"/>
  
  <p:identity name="identity"/>
  
  <p:sink/>
  
  <p:identity>
    <p:with-input pipe="result@identity"/>
  </p:identity>
  
</p:declare-step>

Output

<?xml version="1.0" encoding="UTF-8"?>
<doc>Hello XPorc!</doc>

Let’s take a closer look what’s happening:

The first p:identity takes the input and pass it to the output unchanged. The @name attribute is inserted that we can reference the step’s output later.
The p:sink discards the output of our p:identity.
After p:sink, there is no output but our next step expects an input. To avoid an error, the second p:identity draws its input from the first p:identity and continues the data flow.

p:identity and p:sink are ideal to reroute and stop the data flow. Frequently, you may save the p:identity if you connect the input port of another step directly with p:with-input. Nevertheless, you should keep in mind that your XProc code becomes easier to read the less you interrupt the data flow and continue it elsewhere.

Rerouting and Interrupting the Data Flow

Read more…