LaTeX4Web 1.4: a LATEX TO HTML converter (javascript) released 20150120
SEE THE README FILE USER GUIDE / TUTORIAL
Performance tested with my latest publication (source (67kb, 20 pages) | HTML output | original pdf):
it takes about 500s on a PII 300MHz RAM=64, 45s on a PIV 2.2GHz RAM=512
v1.2 has been also tested with another plublication (source (72kb, 27 pages, 1 image) | HTML output | original pdf):
Download the full package (compressed with winzip) | v1.2| v1.1| v1.0
Download the full package (.tar for unix/linux) | v1.2| v1.1| v1.0
This page contains a javascript code that converts TeX,
LaTeX mathematical formulas (Thanks Mr. Knuth!)
into some HTML code that you can preview in a separate browser. The code only
recognizes a small subset of latex commands (a bit less than 200 so far). See the
tutorial and index of commands with basic samples. Note that
All the javascript code runs on your computer, so when the page is loaded, there is no need
to be connected on internet to use this program. Just save the full page (see in the "File" menu
on your browser and choose "Save As..."). On older browser, only the main HTML will be saved,
and not the other dependent files. In this case, just download the full package with the
above links, you'll get the whole program and the tutorial web page.
This script works with a deterministic finite automata, and its great advantage is that
you can easily customize the html output if you don't like mine. If you want to convert
big latex files with a correct interpretation of most of existing tex/latex commands,
check the TTH program,
which is written in C and works well under linux or
windows. TTH is much more complex and you'll need to recompile it if you want to change
the html rendering. In my script as well as in TTH, adding new latex commands that trigger
some actions is not very easy since you have to rebuild the tables used by the automata.
In my case it is a bit simpler since the main transition table is not compressed, whereas TTH was built
using the FLEX scaner generator
(do not confuse with the FLEX java compiler, which usurped the name and has nothing to do
with it).
To improve performance you can do something very simple. I have tried to build
some HTML code with clean indentations but this consumes a lot of spaces and unnecessary
carriage return. You can decrease a lot the size of the output by removing this
spaces (and calls to the Spaces() function) and a lot of windows carriage return ( "\r\n" )
within the code in this page.
This "software" is given with no warranty of any kind. It may contain some bugs (feel free
to give me some feedback when you find some). It is totally free and open source
(like any other javascript), so that you can customize it or improve it if you want.
This script is referenced on ScriptSearch.com
CHANGES IN VERSION 1.4 FROM VERSION 1.3
- You have now the possibility to add new tokens recognized by the DFA.
By clicking on "Re-generate DFA", an additional component computes the DFA arrays
using the LaTeX_tok.js as an input. Therefore if you add new tokens,
the new component will write in the output box the definitions of the 3 arrays needed by the dfa, that
must be copied in the LaTeX_asc.js,LaTeX_acc.js and LaTeX_dfa_comp.js
files.
- Bug fix: in an expression like \int du_1, the _1 was interpreted as the lower bound of the integral
CHANGES IN VERSION 1.3 FROM VERSION 1.2
- Transition Table is now stored in a compressed format, and uncompressed when you
load the page. The file latex_dfa.js is now only of size 9kb instead of 88kb, which
will shorten the download time if you have a small bandwith, but has no effect on execution time.
- Now the HTML output of the main release can compressed (click on "Compress Output"),
that is to say there's no more indentation of the HTML tags ,
which consumes a lot of bytes (Typically you gain 25% on the output size),
but I still develop new releases on the basis of the version which includes these indentations .
This is because the script is easier to debug with a clean output.
- Bug fix: if the last token is followed only by 1 character, then the first char of the token
was appended on the output. The cause was a bad calculation of the CurPos variable in the GetNextToken function
when the end of the input text is reached. This is corrected.
- Underscores are not treated as subscript outside of math mode. This allows to use
freely underscores, for instance in file names of images like in <IMG src="File_Name.jpg">
- Added aliases to support \centerline and \newpage
- Added aliases to support letters with accents like in french, that is to support code like
\'e or \`u etc...
- Style correction of exponents following a \rightX where X can be ], } ) or |. Exponents following such
a right delimiter must be included in a table cell that is vertically aligned on top.
- Added support for unicode letters (cyrillic,japanese etc). These letters simply always
have transitions to the Null state (StateId=0), so that they are simply transfered to the output
without parsing (Thanks to Ilia Kantor for this suggestion, among others).
- Display of \oint, which was not correct, is fixed, applying the same rules as for \int regarding
lower/upper bound display.
CHANGES IN VERSION 1.2 FROM VERSION 1.1
- \section, \subsection, \subsubsection are now implemented, and a table
of contents is automatically added at the end. The TOC html code is stored in a variable
named g_TOC, which is appended at the end of the text, but can alternately be placed at the
beginning, or in a popup if you add the suitable javascript to control the main document
from the Table Of Content html code.
- Added \sla to aliases, to put a slash on the preceding symbol, in latex_aliases.js
- \begin{array}...\end{array} is now implemented. The alignment parameters following optionally
the \begin{array} token are ignored (too complex too be managed here).
- Equation numbers are now placed in a new cell in an equation array, exactly as if
it were an expression. This is because if the last cell contains a matrix, then the equation
number was under the matrix instead of on its right, because so far the html code for
the equation number was just appended to the current expression in the current
equationarray cell.
- Added support for syntax like \bar{...} \vec{...} \tilde{...} \hat{...}
(in previous versions the soft could parse \bar X or \bar\lambda but not \bar{\bf p}).
- Implementation of \left and \right, needed for the array environment. It produces
a square bracket or a vertical bar whatever the folling delimiter is,
except if it is a dot (in this case it does nothing).
- Added support for \begin{itemize}..\end{itemize} and \begin{enumerate}..\end{enumerate}
environments, using regular expressions (DFA not involved).
CHANGES IN VERSION 1.1 FROM VERSION 1.0
- \label and \ref bug fix. In case the anchor keyword used as a parameter of \label or \ref
contains a managed token (like "-" or "<") the anchor was splitted in parts
and cross-refs were badly set. This is fixed, now the anchor is no more parsed using the DFA, all the
characterss between \label{ or \ref{ and the ending "}" character are extracted globally from the source
to set the anchor tag.
- \tilde command was not correctly implemented, this is now fixed.
- \tilde \hat or \bar followed by a token like \lambda was not rendered correctly. The
tilde was before the token, whereas it must be above. This is now fixed.
- Background color of the output was changed to the same color of my web site.
- Bug fix: \i was not recognized, because the transition table was buggy, due to an error
in the generation algorithm I used to generate it. Now it's ok.
- The DFA now supports the following additional tokens:
\cite, \bibitem, \section, \subsection, \subsubsection, \footnote. However they are not implemented
yet in the main script. For this release, these tokens are still interpreted using regular
expressions. But if you want to customize the script and implement these tokens,
the DFA is ready and the token ids are 192, 193, 194, 195, 196, 197 respectively.
TODO LIST for future releases
- Is really \matrix{...} useful (does not really exist in LaTeX). May be further replaced
by another command. Any suggestion is welcome.
- Adding support for \index{...} and generate the index table.
- Implementation of \overbrace and \underbrace (not prioritary)
- Implementation of \Big (not prioritary)
- Bufferizing the output to get better performance
- Implementing stacks for opening delimiters and html output of nested groups.
Sometimes, a left delimiter size is determined by the content following the delimiter.
Managing this to be able to draw
parenthesis or brackets of different sizes depending on the nesting level of the current
text would require a stack of left delimiters, and a stack for the parsed output at each nesting level.
When closing a group, one looks in the stacks and append the newly generated html code to the
html code at the previous nesting level. This will not be easy to implement, especially if bufferizing
is also implemented. This will be probably for version 2.0 or more.