Getting LaTeX on to the web

Introduction

We have had quite a few questions on TeX.SX that reduce to:

How do I put LaTeX on the web?

As it stands, this is too broad and too vague to be answered, with the result that many of our answers to this question are not very focussed (and therefore not all that much use). It can mean everything from:

How do I take a LaTeX document that I already have and put it on the web so that it looks exactly the same? (Answer: link to the PDF.)

to:

How do I get the equation E = m c2 to appear on my webpage? (Answer: <i>E = m c<sup>2</sup></i>.)

At this breadth, the question really is unanswerable. However, it can be narrowed to the point where it is, whilst still too broad for the TeX.SX site, suitable for a blog post. Hence this one. That question is:

How can I use my LaTeX skills to produce web pages?

The Question

Let me start by expanding that question, in particular to differentiate it from the two variants given above.

What I mean by this question is that I want to write a document with the intention from the outset that it be a web page. I also know what system I am going to use to serve the web page, be it as a raw (X)HTML file, a WordPress post, or a page on some wiki. The system puts certain limitations on what I can do with my document, but I know what those limitations are at the outset. Thus I do not want to replicate all of LaTeX in, say, a WordPress post. Rather, I want to be able to do everything I could in a WordPress post by using LaTeX.

I also, for the sake of this post, am assuming that I’ll be doing a lot of these posts. So I do not want to do away with the conveniences of LaTeX – in particular, its programmable side. In fact, I’ll make that a key requirement: I want to be able to do true macro expansion. If I just wanted to get E = m c2 to appear, then I could accept just about any solution since I’m only going to use it once. But by the time I’ve written \mathbb{R} as many times as I have, then the relief at being able to write \newcommand{\R}{\mathbb{R}} is unimaginable. Indeed, the vast majority of the features that TeX has are rendered meaningless by the fact that I, as author, have no power to force the output to appear in a particular fashion (I can suggest, but that is different). Kerning, ligatures, hyphenation, all of these are at the mercy of the browser. It is that LaTeX is familiar to me and that I can customise it that make me want to use it to produce webpages. So my system must not cripple either
of these features.

Framing the Solution

Now, if I were going to produce HTML, or even XHTML+MathML, then the amazing program TeX4HT would be perfect. But I also want to be able to author blog posts, nLab pages, and StackExchange answers. For those, I need a LaTeX to some-variant-of-Markdown converter. Other systems might need other output formats. There are programs, such as pandoc, which convert a subset of LaTeX to some of these formats, but these tend to be written as parsers and so can run into difficulty when coping with the full extensibility of LaTeX’s macro capability. Also, for many of these converters, “LaTeX” is often confused with “Mathematics” and the focus is often to get the Mathematics right, leaving the rest behind.

A quote that I’ve seen a fair bit on the TeX.SX site is the following:

The only thing that can understand TeX is tex itself.

Indeed, having tried writing TeX-parsers in both PHP and Perl, I’m convinced of the truth of this. So what I’m really after is some way to convince LaTeX itself to produce output suitable for inputting to WordPress, Instiki, or whatever.

Now, one major trap to avoid here is to think that the solution is to write a LaTeX-plugin for, say, WordPress so that I could simply cut-and-paste my LaTeX source in to WordPress. This would be wrong for many reasons, the biggest being that the conversion from source to target format is made by WordPress when the page is rendered. So having a true LaTeX-to-HTML converter at that stage would be slow and prone to error (imagine what would happen if I inadvertently wrote \newcommand{\a}{\a}\a). It would also mean that the WordPress installation had to depend on external programs, something that systems administrators Don’t Like. The fact that many systems actually use this idea is more of an indication of the paucity of alternatives than a demonstration of how good it is.

The other trap to avoid is of using something like TeX4HT to produce the HTML and paste that straight in to WordPress. This makes it harder to edit posts, which may not be a tremendous burden on WordPress but would kill half the benefits of using a wiki. Moreover, Markdown (and similar) provides a neat buffer against validation. By entering Markdown, I don’t have to worry about making sure that my post is valid HTML. If I produce the HTML myself, then I do. And woe betide me if I ever decide to switch from HTML to XHTML!

Describing the Solution

Thus we have the final scenario. I wish to write a post, say on WordPress, as a LaTeX document, using all the benefits of LaTeX, but knowing all along that my document will be put into WordPress. I shall then compile it with LaTeX and, somehow, produce a text file that I can cut-and-paste in to WordPpress. WordPress will then do its magic on the post, producing lovely HTML which you can all then read.

Turns out the only difficult part of this is getting LaTeX to produce text output. It can’t, or at least it can’t without great difficulty. But, as Aditya points out in the comments to my question on this, the program pdftotext works wonderfully. Of course, there are hiccups along the way (working out how to deal with the placement of new lines being one, figuring out how to keep the double quotes mechanism being another), but by and large, making LaTeX in to a preprocessor for Markdown isn’t all that hard.

What is slightly more intricate is making LaTeX in to a preprocessor for Markdown+Mathematics. Nearly all of the ways of getting mathematics on to the web start with some sort of LaTeX-like syntax. So when getting mathematics on to the web, one has to allow certain commands to go through. It takes a little work to do this properly, particularly as the arguments also have to go through as arguments to the escaped commands, but again: it can be made to work. (The techniques for this, by the way, could be used to turn LaTeX in to a LaTeX preprocessor, selectively expanding certain macros and letting others through.)

Is what I’ve described possible? Does it work? Well, the proof of the pudding, as they say, is in the reading. After all, how do you think I wrote this post?

Epilogue

After writing this post, I cut-and-pasted the resulting text file in to the relevant place in WordPress and previewed the page. Very oddly, the quotations weren’t coming through correctly. Something in WordPress was blocking me from using the > quotation syntax. One solution would be to fix the code, as it’s an odd thing to have happen. But that looks a little tricky, to say the least. Fortunately I can still use the

HTML environment. In an ordinary setting, I would then have to go through my post changing all instances of > something to

something

. However, all I needed to do was add the following code at the top of my source:

\renewenvironment{quote}{\html{blockquote}}{\endhtml}

3 thoughts

  1. At the moment, I’d say it’s an “alpha” package. It is certainly usable and I’m using it to write blog posts (all of my contributions here) and nLab pages (and on other Instiki installations), and parts of my website. But I frequently encounter things that need tweaking so it’ll be a while before I consider it good for general release. If you’re interested in trying it out, send me an email (I’m fairly easy to find).

  2. Yeah this seems really interesting. Could you post it on github or as a Gist somewhere for people to view? I’d be interested

Leave a Reply