A build system for Web content

· work-in-progress ·

In order to produce great content and integrate it successfully into a website or app, the case has already been made for starting with content during the prototyping phase and building the canvas from the content instead of the opposite. Also, with the rise of modern build tools for Web development, many options have emerged and they have been extensively surveyed by the community, along with their classic alternatives.

For a content designer, it makes sense to put all this together and set out to find a build system that works well for content. Using a content build system should result in improved content workflows that help reduce friction between the work of content editors, designers and developers. It should also help deliver quality products more consistently to the integrators, who will be handed assets that feel substantial and fall into place naturally within the site or app structure.

In my work with scientific or academic organizations, I’ve been led to use different content building workflows depending on the respective practices of the creators on the one hand and the developers or integrators on the other. I will be presenting one possible workflow that targets a static website with technical content, using both venerable tools like make and shiny modern formats like SVG and CommonMark. The emphasis will be on the programmatic techniques that convene with text content that is highly structured (with e.g. WAI-ARIA accessibility markup) and finely styled (with modern CSS respecting recommended practices). I will also mention alternatives (for example, Gulp in place of make) in order to show general applicability.

Content acquisition phase

If I am working on a pre-existing website, I make sure to start with the actual content that is already in the wild and has been serving its purpose with real users.

I use a HTML content selection tool to make “atomic” files of the blocks identified as self-contained content units in the original HTML page. This is one of the first steps in my typical make file.

Atomic content styling

Because the content atoms have been extracted into separate files, they can be included as individual content frames in my SVG/ODG, PDF or PNG prototype. The software used for graphical prototyping does not come into the scope of the present article per se, but I will provide a more detailed account of it elsewhere. Now this is an important detail: ideally my graphical prototype references the content atoms, so by default they remain as separate files and don’t need to be explicitly re-included if I modify them from outside. This allows me to lead the content prototyping (acquisition) and the content styling steps concurrently. Right now, LibreOffice (ODG, SVG and PDF), GIMP (PNG) and the TeX system (SVG and PDF) are my tools of choice for that because they have some intrinsic advantages when working with highly-structured HTML, but of course many alternatives exist. You can consider the prototyping tool like another swappable module in the system I’m describing here. In fact, not all projects require these tools for prototyping; I like to prototype directly in HTML and CSS when the context is right.

Disposition in canvas

The definitions of content atom and canvas here will depend on multiple variables given by the domain and the particular project. Typically, a content atom can be an article element if each article is short, but if the whole content run is lengthier each atom might be a section. Now it’s important to remember that, in the modern HTML world-view, section elements are very different from div elements and I believe it is proper to enclose content in as many sections as necessary.

Structuring the content in this way has many advantages over simply relying on the sequence of h1 ... h6 elements dispersed among paragraphs. section elements add paired delimiters that make more relations explicit, which in turn makes contextual styling and transclusion easier. The elusive “HTML layout algorithm” might have been shunned by the mainstream HTML5 implementations, but at any rate nothing prevents us from adding more structure around the parts of content runs! In HTML5.1, we are even explicitly allowed to add a header element as the first element of any enclosing section – it will even automatically receive a different ARIA role from that of the page’s outermost header. This is great for e.g. titling blocks containing ancillary information items that must receive their own scoped style rules.

Beyond the atomic runs, the canvas styling proper refers to the local “narrative” layout, which can be handled by CSS modules such as Flexbox – more on that below.

Content plate build

This is the step that can be automated more easily. Once the assets are ready, the modules can be assembled into what I like to call a plate. I do not say page, because the final HTML page that actually gets published might be handled further downstream depending on the organization’s needs. The plate is the content set in its canvas, bringing with it all of the scoped styles necessary for its full interpretation.

The main tasks here are actually transclusion and validation. Much of the work done here is what is traditionally associated with template processors, server-side includes and suchlike.

A “grammar” of prototyping and building

All that we have seen so far is a series of conditional steps, with varying numbers and levels of sub-task at each step. There happens to exist a convenient way to represent such processes both schematically and operationally: grammatical forms. Below is a BNF-like grammar that sums up the steps I have described in the previous sections. Following early examples, I use the arrow as the derivation metasymbol, since it works just as well as the established ::= but is arguably clearer. It’s also very enlightening to note that in its original use, BNF encourages natural-language descriptions in the angled brackets and that these variable expressions were originally called classes (in the mathematical sense). In our case, the variables represent classes of tasks.

Web content build → 〈 content acquisition 〉〈 atomic content styling 〉〈 disposition in canvas 〉〈 content plate build 〉; Content acquisition → 〈 write-up 〉 | 〈 conversion 〉 | 〈 copy 〉; Atomic content styling → 〈 live CSS attribution 〉 | 〈 graphical sculpting 〉〈 CSS conversion 〉; Disposition in canvas → 〈 HTML∘CSS composition 〉 | 〈 graphical frame manipulation 〉〈 HTML∘CSS composition 〉; Content plate build → 〈 atom conclusion 〉〈 canvas hanging 〉〈 validation 〉; HTML∘CSS composition → 〈 Flexbox composition 〉 | 〈 Grid layout 〉 | 〈 Block positioning 〉;

I use the expression “HTML∘CSS composition”, with a relation composition operator, very much on purpose: in this step, the correct discursive placement of content items respects the standards and best practices, with special regard to the HTML content categories, the use of idiomatic, scoped CSS, and a reasoned separation of concerns between the emphasis structure notation (HTML) and the graphical designation language (CSS). The resulting structure is very much a product type, since the meaning of the content is complete within a given canvas; the emphasis structure and the graphical style make a complete meaningful item.

In that line of thought, I must put emphasis on my deep appreciation for Flexbox. I have written Flexbox composition because I think the Flexbox module is the most appropriate so far for logical composition. It encourages one to make content structures that can show their inner relations by way of the style rules themselves. In much the same way that Python is sometimes called executable pseudocode, I would be tempted to call Flexbox an interpretable modular style guide.

In the content plate build phase, the first item in the sequence is named atom conclusion, where conslusion has its etymological sense. What I mean to convey is that node sequences, considered as sets, will be both included within other sets and concatenated into a canvas that can then be apprehended graphically (a visual sum). Here again I playfully use theoretical vocabulary in what is nonetheless a reasoned way.

Depending on the tools at each step, conversions might happen multiple times both to and from HTML. The rule that I set for myself is: though some format pairings make round-trip conversions impractical or downright impossible, all data items in the source should appear in some form in the destination; if a data item cannot be sensibly reused, that poses the question of whether that data item was correctly encoded to begin with.

Not all class variables are fully derived to terminal steps; they could be, but that is not the main point. The main point is that this description drives further improvements to the workflow and, correspondingly, the build script (makefile or otherwise). Thanks to that, I can leave you with both a concrete tool and a more abstract methodology that can be usefully adapted to many domains.