Handling JSON Data in Pipelines
JSON support was added with XProc 3.0. An XProc pipeline accepts JSON as input, and you can transform content to be treated as JSON. For example, the p:cast-content-type
step can convert JSON from XML. It works basically the same as the xml-to-json()
function in XPath. All you need to do is create a mapping following a simple XML schema, and you can then convert that mapping to JSON. The following pipeline creates a sample JSON with date and time information utilizing all available elements of that schema.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source">
<p:inline>
<map xmlns="http://www.w3.org/2005/xpath-functions">
<array key="date">
<number>2024</number>
<number>28</number>
<number>11</number>
</array>
<array key="time">
<number>19</number>
<number>47</number>
</array>
<string key="timezone">Europe/Berlin</string>
<boolean key="valid">true</boolean>
<null key="null"/>
</map>
</p:inline>
</p:input>
<p:output port="result"/>
<p:cast-content-type content-type="application/json"/>
</p:declare-step>
Output
{ "date": [2024,28,11], "time": [19,47], "timezone": "Europe/Berlin", "valid": true, "null": null }
<p:json-join/>
This step takes a sequence of JSON documents as input and creates a single JSON array from them. Suppose we want to create a pipeline that can randomly produce dishes that cover 80% of German cuisine. To do this, we pass three XML elements that represent the three typical components of a German dish, each of which consists of three particular ingredients. We use p:for-each
to iterate over the three documents. Within the iteration, we create a random number between 1 and 3 to arbitrarily select an ingredient. Then we create a JSON document from the selected element. After the iteration, we use p:json-join
to merge the three JSON documents into one. Voila! Here is your typical German dish recommendation generated with XProc.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source" sequence="true">
<array key="meat" xmlns="http://www.w3.org/2005/xpath-functions">
<string>schnitzel</string>
<string>sausage</string>
<string>fish</string>
</array>
<array key="garnish" xmlns="http://www.w3.org/2005/xpath-functions">
<string>cooked potatoes</string>
<string>mashed potatoes</string>
<string>fried potatoes</string>
</array>
<array key="vegetables" xmlns="http://www.w3.org/2005/xpath-functions">
<string>sauerkraut</string>
<string>mixed vegetables</string>
<string>salad</string>
</array>
</p:input>
<p:output port="result" content-types="application/json"/>
<p:for-each>
<p:with-input select="*:array"/>
<p:variable name="random-number-from-one-to-three"
select="random-number-generator()?permute(1 to 3)[1]"/>
<p:filter select="*:array/*[{$random-number-from-one-to-three}]"/>
<p:cast-content-type content-type="application/json"/>
</p:for-each>
<p:json-join/>
</p:declare-step>
Output
["schnitzel","cooked potatoes","sauerkraut"]
<p:json-merge/>
The step p:json-merge
merges a sequence of JSON documents into a single JSON object. A typical use case is depicted in the pipeline below. The pipeline takes a sequence of JSON objects and use p:json-merge
to perform the merge. As the JSON objects are declared inline, we need to add the attribute expand-text="false"
at the input port, so the XProc processor does not treat the curly brackets as beginning of an Attribute Value Template (AVT). Also notable is the attribute duplicates
. This attribute states how to deal with duplicate keys and may have one of the following values: reject
, use-first
, use-last
, use-any
or combine
.
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source" content-types="application/json"
sequence="true" expand-text="false">
<p:inline content-type="application/json">
{"name":"Jonathan Osterman"}
</p:inline>
<p:inline content-type="application/json">
{"alias":"Doctor Manhattan"}
</p:inline>
<p:inline content-type="application/json">
{"date":"1949-12-01"}
</p:inline>
</p:input>
<p:output port="result"/>
<p:json-merge duplicates="reject"/>
</p:declare-step>
Output
{"name":"Jonathan Osterman","alias":"Doctor Manhattan","date":"1949-12-01"}