Connecting the Dots
Between BITS and PDF

A Comparison of Technologies
for Automated Typesetting

@xporc@mstdn.social / @mkraetke

Print-based Publishing Workflows

  • XML is created after print PDF
  • XML source: page layout file or PDF
  • problem: extracting semantics from the visual representation

XML First

  • XML is available after the editorial stage
  • page layout software carries XML through the entire process
  • XML export after typesetting

InDesign and XML


  • XML import/export
  • XPath for ExtendScript

  • IDML

  • add features to IDML

XML and InDesign

  • except IDML no development (XSLT 1.0,
    XPath 1.0 subset)
  • not XML agnostic
  • corrections can corrupt XML data

Is automated typesetting the solution?

Automated Typesetting

  • Print PDF is generated automatically from source
  • no manual editing necessary
  • XML not affected by corrections

Automated Typesetting: Layout

  • usually WYSIWYM (except ID Server)
  • layout configuration
  • no interactive changes possible

Automated Typesetting: XML

  • corrections are made in the XML, not in the layout
  • XML not affected by typesetting
  • but sometimes PIs necessary

InDesign Server

InDesign Server

  • layout engine without a built-in user interface
  • support for multiple instances, error capturing and logging
  • run via SOAP, COM or InDesign plugins

Pros and Cons

  • ➕ work with your accustomed designers
  • ➖ XML interfaces are just as outdated as in the desktop version
  • ➖ limited error feedback, high license costs

XSL-FO

XSL-FO

  • XML-based vocabulary for formatting (print) media
  • W3C (2001): Extensible Stylesheet Language
  • implementations: Antenna House Formatter, RenderX XEP and Apache FOP

Vocabulary

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> 
  <fo:layout-master-set>
    <fo:simple-page-master master-name="myPageMaster" page-width="210mm" page-height="297mm">
      <fo:region-body/> 
    </fo:simple-page-master> 
  </fo:layout-master-set>
  <fo:page-sequence master-reference="myPageMaster"> 
    <fo:flow flow-name="xsl-region-body"> 
      <fo:block font-size="12pt">Hello World!</fo:block>
    </fo:flow> 
  </fo:page-sequence> 
</fo:root>

Pros and Cons

  • ➕ pure XML solution
  • ➖ standard not maintained anymore, several limitations
  • ➖ proprietary extensions necessary

PrintCSS

PrintCSS

  • extends CSS to format page-based content
  • W3C working drafts: CSS Paged Media Module Level 3, CSS Generated Content for Paged Media Module
  • implementations: Antenna House Formatter, Prince XML, PDFreactor
p.author{ string-set: author content(); }
@page{ 
  size: 210mm 297mm;
}
@top-center {
  content:string(author);
}
@page:left{ 
  @bottom-left{
    content:counter(page);
  }
}

Pros and Cons

  • ➕ CSS is simple and straight-forward
  • ➖ limited standard, many things left to the implementors
  • ➖ XSLT preprocessing needed for generated content

LaTeX

LaTeX

  • LaTeX is a collection of macros that simplify and extend the use of TeX
  • based on TeX programming language
  • Open Source
\documentclass{article}
\begin{document}
  \section{Hello world!}
\end{document}

Two Methods to Convert From XML to TeX

  1. parsing XML with xmltex
  2. generating LaTeX with XSLT (what else, of course?)

Pros and Cons

  • ➕ Open Source, rich ecosystem of packages
  • ➖ not standardized, conflicting packages
  • ➖ lack of XML integration

xerif: typesetting system

  • Automated typesetting system
  • module of le-tex transpect framework
  • implemented in XSLT, XProc, LaTeX

xml2tex

<?xml version="1.0" encoding="UTF-8"?>
<set xmlns="http://transpect.io/xml2tex">  
  <template context="book-part/book-part-meta/title-group/title">
    <rule break-after="2" name="chapter" type="cmd">
      <param/>
    </rule>
  </template>
</set>

CoCoTeX LaTeX framework

\begin{heading}{chapter}
  \tpTitle{Playing with Maps: Cartographic Games in the Western Culture}
\end{heading}

CoCoTeX LaTeX framework

\begin{heading}{chapter}
  \begin{tpAuthor}
    \tpFullName{Seville, Adrian}
    \tpAffil{1}
    \tpEmail{adrian.seville@btopenworld.com}
  \end{tpAuthor}
  \begin{tpAuthor}
    \tpFullName{Depaulis, Thierry}
    \tpAffil{2}
  \end{tpAuthor}
  \begin{tpAuthor}
    \tpFullName{Bekkering, Geert H.}
    \tpAffil{3}
  \end{tpAuthor}
  \tpTitle{Playing with Maps: Cartographic Games in the Western Culture}
  \tpDOI{10.1163/9789004681149\_002}
  \tpTocAuthorNameList{Adrian Seville et al.}
  \tpCopyright{Adrian Seville, Thierry Depaulis and Geert H. Bekkering}
  \tpYear{2023}
  \tpRunTitle{Playing with Maps}
  \tpRunAuthorNameList{Seville et al.}
  \tpAbstract{This is the first academic book wholly devoted to games based on maps. The authors are experts in their respective fields: board games, playing cards and dissected puzzles, bringing an informed historical approach to the development and diffusion of these games in Western Europe and America up to about the beginning of the twentieth century. The book gives an overview of research in this area of material studies, setting it in the context of cultural and cartographic history, while introductory background is given for those new to the field. Collectors and curators will also benefit from the taxonomic approach, particularly in the board game section, assisting in the organisation of this highly diverse material.}
  \tpKeywords{board games – playing cards – dissected maps – history of cartography – cultural history – material culture}
  \begin{tpAffil}
    \tpAffilID{1}
    \tpAffiliation{University of Edinburgh, \uppercase{UK}}
  \end{tpAffil}
  \begin{tpAffil}
    \tpAffilID{2}
    \tpAffiliation{Independent scholar}
  \end{tpAffil}
    \begin{tpAffil}
    \tpAffilID{3}
  \tpAffiliation{University of Utrecht, The Netherlands}
  \end{tpAffil}
\end{heading}

Demo Time!

Outlook

Criteria for Automated Typesetting Systems

  1. typographic quality
  2. XML integration
  3. difficulty
  4. configuration
  5. performance
  6. network effects
  7. vendor lock-in
  8. licensing costs

There is no such thing as the perfect typesetting technology, but rather the solution that is most practical for a particular purpose.

Thank you for your attention!