Working With Documents
Document Representation and Properties
XProc is designed to process data in the form of documents. In XProc, documents are described by their representation and properties. The representation is the content of the document. Properties are document-related metadata such as the MIME type or the base URI of the document. With XProc, you can change both the representation and the properties of a document.
The following properties are specified by default:
content type | the MIME type according to RFC 2046 |
base uri | the base URI of the document |
serialization | a set of serialization properties as XPath map |
Types of Documents
Although primarily designed to handle XML documents, XProc can handle other documents as well:
- XML is treated as tree-like data in the form of the XML Document Model (XDM) by the XProc processors.
- HTML is transformed internally and treated as XDM
- JSON is represented as map, a data model that was introduced with XPath 3.0 and represents a set of entries that contains associated keys and values. The XProc processor should treat it the same way like the XPath function parse-json().
- Text files are treated as simple XDM tree with a document node and a text node.
- Other: The XProc spec does not impose any rules on how to handle other documents such as binary files and leaves this to the XProc processor.
Casting Content Types
There is no secret built-in magic in XProc to cast from incompatible data structures to each other. For example, you cannot cast from a tree-like data model such as XML to a map-like model like JSON. However, you can create an XPath Map from XML and convert it to JSON. The following pipeline shows how a map in XML notation is converted with <p:cast-content-type/> to JSON. But you can also use XPath functions like parse-xml()
, parse-json()
and json-to-xml()
with XProc.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
version="3.0">
<p:input port="source">
<p:inline>
<fn:map>
<fn:string key="my-message">Hello XPorc!</fn:string>
</fn:map>
</p:inline>
</p:input>
<p:output port="result"/>
<p:cast-content-type content-type="application/json"/>
</p:declare-step>
Output
{"my-message":"Hello XPorc!"}