Steps for Basic XML Manipulations
When processing XML, you usually want to make small changes to the data, for example insert, rename or delete elements and attributes. XProc offers various steps for these tasks that allow you to manipulate XML without having to write an XSLT. Here is a selection of steps to start with:
<p:insert/>
p:insert
takes a document and inserts it into another document. For this purpose, the step features two input ports: An insertion
port for the document to be inserted and a source
port into which the insert occurs. The match
option expects a selection pattern that selects the XML context where the document should be inserted. The position
option controls where the insertion is being made in context of the match
expression and accepts either one of the values: first-child
, last-child
, before
and after
. In this arbitrary example, we insert a paragraph after the title:
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source">
<p:inline>
<doc>
<title>Poem about XML</title>
</doc>
</p:inline>
</p:input>
<p:output port="result"/>
<p:insert match="/doc/title" position="after">
<p:with-input port="insertion">
<p:inline>
<para>A node is a node is a node.</para>
</p:inline>
</p:with-input>
</p:insert>
</p:declare-step>
Output
<?xml version="1.0" encoding="UTF-8"?> <doc> <title>Poem on XML</title> <para>A node is a node is a node.</para> </doc>
<p:delete/>
With p:delete you can delete nodes in an XML document. In the following example, we delete all empty @name
attributes in the input document. The selection pattern //plant[not(@name)]
is passed via the match
option.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source">
<p:inline>
<office-plants>
<plant name="Saintpaulia ionantha">African Violet</plant>
<plant name="Dracaena">Snake Plant</plant>
<plant>Cactus</plant>
</office-plants>
</p:inline>
</p:input>
<p:output port="result"/>
<p:delete match="//plant[not(@name)]"/>
</p:declare-step>
Output
<?xml version="1.0" encoding="UTF-8"?> <office-plants> <plant name="Saintpaulia ionantha">African Violet</plant> <plant name="Dracaena">Snake Plant</plant> </office-plants>
<p:rename/>
For renaming elements, attributes, or processing instructions, XProc provides the p:rename
step. Each node matched by the pattern specified in the match
option is renamed to the name in the new-name
option. We are using p:rename
here to rename the element name foo
into bar
.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source">
<p:inline>
<foo>My element</foo>
</p:inline>
</p:input>
<p:output port="result"/>
<p:rename match="foo" new-name="bar"/>
</p:declare-step>
Output
<?xml version="1.0" encoding="UTF-8"?> <bar>My element</bar>
<p:replace/>
The p:replace
step replaces matching nodes with the top-level node(s) of another document. Therefore, the step has a replacement
port that includes the document which replaces the matched nodes of the document arriving at the source
port. The pipeline below shows how p:replace
works:
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source">
<p:inline>
<foo>My element</foo>
</p:inline>
</p:input>
<p:output port="result"/>
<p:replace match="foo">
<p:with-input port="replacement">
<p:inline>
<bar>My replacement</bar>
</p:inline>
</p:with-input>
</p:replace>
</p:declare-step>
Output
<?xml version="1.0" encoding="UTF-8"?> <bar>My replacement</bar>
<p:wrap/>
With p:wrap
you can wrap matching nodes with a new parent element just like wrapping a box with wrapping paper. The nodes to be matched are passed with the match
option and the wrapper element is specified by the wrapper
option.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source">
<p:inline>
<doc>
<para>I am a very strong believer in
listening and learning from others.</para>
<source>Ruth Bader Ginsburg</source>
</doc>
</p:inline>
</p:input>
<p:output port="result"/>
<p:wrap match="/doc/para" wrapper="quote"/>
</p:declare-step>
Output
<?xml version="1.0" encoding="UTF-8"?> <doc> <quote> <para>I am a very strong believer in listening and learning from others.</para> </quote> <source>Ruth Bader Ginsburg</source> </doc>
<p:unwrap/>
This step provides the reverse operation and unwraps nodes from their parent element specified with the match
option:
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source">
<p:inline>
<doc>
<quote>
<para>I am a very strong believer in
listening and learning from others.</para>
</quote>
<source>Ruth Bader Ginsburg</source>
</doc>
</p:inline>
</p:input>
<p:output port="result"/>
<p:unwrap match="quote"/>
</p:declare-step>
Output
<?xml version="1.0" encoding="UTF-8"?> <doc> <para>I am a very strong believer in listening and learning from others.</para> <source>Ruth Bader Ginsburg</source> </doc>
<p:add-attribute/>
<p:add-attribute/> creates attributes on matching nodes and provides the result on the output port. We used this step in previous lessons frequently, so let’s take a slightly more complex example. Here we want to add the matching price for each product. Product and price are connected via their @id
and @ref
attribute and we use p:viewport
to iterate over all products.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0"
name="my-pipeline">
<p:input port="source" primary="true">
<p:inline>
<product-catalogue>
<product id="prc-01">High Pressure Water Broom</product>
<product id="prc-02">Electric Pasta Maker</product>
<product id="prc-03">Retractable Telescoping Stool</product>
</product-catalogue>
</p:inline>
</p:input>
<p:input port="prices" primary="false">
<p:inline>
<prices>
<price ref="prc-01">120.43</price>
<price ref="prc-02">40.87</price>
<price ref="prc-03">61.23</price>
</prices>
</p:inline>
</p:input>
<p:output port="result"/>
<p:viewport match="product">
<p:variable name="product-id" select="product/@id"/>
<p:add-attribute match="product" attribute-name="price">
<p:with-option name="attribute-value" select="//price[@ref eq $product-id]" pipe="prices@my-pipeline"/>
</p:add-attribute>
</p:viewport>
</p:declare-step>
Output
<?xml version="1.0" encoding="UTF-8"?> <product-catalogue> <product price="120.43" id="prc-01">High Pressure Water Broom</product> <product price="40.87" id="prc-02">Electric Pasta Maker</product> <product price="61.23" id="prc-03">Retractable Telescoping Stool</product> </product-catalogue>