LaTeX4Web: a simple LATEX TO HTML converter (javascript)

You are on the LaTeX4Web (LATEX to HTML converter) page on Eric Chopin's web site (Home)

LaTeX4Web 1.4: a LATEX TO HTML converter (javascript) released 20150120

SEE THE README FILE USER GUIDE / TUTORIAL

Performance tested with my latest publication (source (67kb, 20 pages) | HTML output | original pdf):
it takes about 500s on a PII 300MHz RAM=64, 45s on a PIV 2.2GHz RAM=512
v1.2 has been also tested with another plublication (source (72kb, 27 pages, 1 image) | HTML output | original pdf):
Download the full package (compressed with winzip) | v1.2| v1.1| v1.0
Download the full package (.tar for unix/linux) | v1.2| v1.1| v1.0

This page contains a javascript code that converts TeX, LaTeX mathematical formulas (Thanks Mr. Knuth!) into some HTML code that you can preview in a separate browser. The code only recognizes a small subset of latex commands (a bit less than 200 so far). See the tutorial and index of commands with basic samples. Note that All the javascript code runs on your computer, so when the page is loaded, there is no need to be connected on internet to use this program. Just save the full page (see in the "File" menu on your browser and choose "Save As..."). On older browser, only the main HTML will be saved, and not the other dependent files. In this case, just download the full package with the above links, you'll get the whole program and the tutorial web page.

This script works with a deterministic finite automata, and its great advantage is that you can easily customize the html output if you don't like mine. If you want to convert big latex files with a correct interpretation of most of existing tex/latex commands, check the TTH program, which is written in C and works well under linux or windows. TTH is much more complex and you'll need to recompile it if you want to change the html rendering. In my script as well as in TTH, adding new latex commands that trigger some actions is not very easy since you have to rebuild the tables used by the automata. In my case it is a bit simpler since the main transition table is not compressed, whereas TTH was built using the FLEX scaner generator (do not confuse with the FLEX java compiler, which usurped the name and has nothing to do with it).

To improve performance you can do something very simple. I have tried to build some HTML code with clean indentations but this consumes a lot of spaces and unnecessary carriage return. You can decrease a lot the size of the output by removing this spaces (and calls to the Spaces() function) and a lot of windows carriage return ( "\r\n" ) within the code in this page.

This "software" is given with no warranty of any kind. It may contain some bugs (feel free to give me some feedback when you find some). It is totally free and open source (like any other javascript), so that you can customize it or improve it if you want.

This script is referenced on ScriptSearch.com

CHANGES IN VERSION 1.4 FROM VERSION 1.3

You have now the possibility to add new tokens recognized by the DFA. By clicking on "Re-generate DFA", an additional component computes the DFA arrays using the LaTeX_tok.js as an input. Therefore if you add new tokens, the new component will write in the output box the definitions of the 3 arrays needed by the dfa, that must be copied in the LaTeX_asc.js,LaTeX_acc.js and LaTeX_dfa_comp.js files.
Bug fix: in an expression like \int du_1, the _1 was interpreted as the lower bound of the integral

CHANGES IN VERSION 1.3 FROM VERSION 1.2

Transition Table is now stored in a compressed format, and uncompressed when you load the page. The file latex_dfa.js is now only of size 9kb instead of 88kb, which will shorten the download time if you have a small bandwith, but has no effect on execution time.
Now the HTML output of the main release can compressed (click on "Compress Output"), that is to say there's no more indentation of the HTML tags , which consumes a lot of bytes (Typically you gain 25% on the output size), but I still develop new releases on the basis of the version which includes these indentations . This is because the script is easier to debug with a clean output.
Bug fix: if the last token is followed only by 1 character, then the first char of the token was appended on the output. The cause was a bad calculation of the CurPos variable in the GetNextToken function when the end of the input text is reached. This is corrected.
Underscores are not treated as subscript outside of math mode. This allows to use freely underscores, for instance in file names of images like in <IMG src="File_Name.jpg">
Added aliases to support \centerline and \newpage
Added aliases to support letters with accents like in french, that is to support code like \'e or \`u etc...
Style correction of exponents following a \rightX where X can be ], } ) or |. Exponents following such a right delimiter must be included in a table cell that is vertically aligned on top.
Added support for unicode letters (cyrillic,japanese etc). These letters simply always have transitions to the Null state (StateId=0), so that they are simply transfered to the output without parsing (Thanks to Ilia Kantor for this suggestion, among others).
Display of \oint, which was not correct, is fixed, applying the same rules as for \int regarding lower/upper bound display.

CHANGES IN VERSION 1.2 FROM VERSION 1.1

\section, \subsection, \subsubsection are now implemented, and a table of contents is automatically added at the end. The TOC html code is stored in a variable named g_TOC, which is appended at the end of the text, but can alternately be placed at the beginning, or in a popup if you add the suitable javascript to control the main document from the Table Of Content html code.
Added \sla to aliases, to put a slash on the preceding symbol, in latex_aliases.js
\begin{array}...\end{array} is now implemented. The alignment parameters following optionally the \begin{array} token are ignored (too complex too be managed here).
Equation numbers are now placed in a new cell in an equation array, exactly as if it were an expression. This is because if the last cell contains a matrix, then the equation number was under the matrix instead of on its right, because so far the html code for the equation number was just appended to the current expression in the current equationarray cell.
Added support for syntax like \bar{...} \vec{...} \tilde{...} \hat{...} (in previous versions the soft could parse \bar X or \bar\lambda but not \bar{\bf p}).
Implementation of \left and \right, needed for the array environment. It produces a square bracket or a vertical bar whatever the folling delimiter is, except if it is a dot (in this case it does nothing).
Added support for \begin{itemize}..\end{itemize} and \begin{enumerate}..\end{enumerate} environments, using regular expressions (DFA not involved).

CHANGES IN VERSION 1.1 FROM VERSION 1.0

\label and \ref bug fix. In case the anchor keyword used as a parameter of \label or \ref contains a managed token (like "-" or "<") the anchor was splitted in parts and cross-refs were badly set. This is fixed, now the anchor is no more parsed using the DFA, all the characterss between \label{ or \ref{ and the ending "}" character are extracted globally from the source to set the anchor tag.
\tilde command was not correctly implemented, this is now fixed.
\tilde \hat or \bar followed by a token like \lambda was not rendered correctly. The tilde was before the token, whereas it must be above. This is now fixed.
Background color of the output was changed to the same color of my web site.
Bug fix: \i was not recognized, because the transition table was buggy, due to an error in the generation algorithm I used to generate it. Now it's ok.
The DFA now supports the following additional tokens:
\cite, \bibitem, \section, \subsection, \subsubsection, \footnote. However they are not implemented yet in the main script. For this release, these tokens are still interpreted using regular expressions. But if you want to customize the script and implement these tokens, the DFA is ready and the token ids are 192, 193, 194, 195, 196, 197 respectively.

TODO LIST for future releases

Is really \matrix{...} useful (does not really exist in LaTeX). May be further replaced by another command. Any suggestion is welcome.
Adding support for \index{...} and generate the index table.
Implementation of \overbrace and \underbrace (not prioritary)
Implementation of \Big (not prioritary)
Bufferizing the output to get better performance
Implementing stacks for opening delimiters and html output of nested groups. Sometimes, a left delimiter size is determined by the content following the delimiter. Managing this to be able to draw parenthesis or brackets of different sizes depending on the nesting level of the current text would require a stack of left delimiters, and a stack for the parsed output at each nesting level. When closing a group, one looks in the stacks and append the newly generated html code to the html code at the previous nesting level. This will not be easy to implement, especially if bufferizing is also implemented. This will be probably for version 2.0 or more.