[Chugalug] Programmatically generating LaTeX

Robert A. Kelly III bluethegrappler at gmail.com
Fri Aug 23 16:00:44 UTC 2013


On 08/22/2013 07:14 PM, Dan Lyke wrote:
> On Thu, 22 Aug 2013 18:41:29 -0400
> "Robert A. Kelly III" <bluethegrappler at gmail.com> wrote:
>> What are the usual or recommended ways of handling this? I'm sure it
>> is something that has been addressed before as I have heard many
>> times of LaTeX being generated programmatically to generate PDF
>> reports, by Emacs calendar and org-mode, etc.
> 
> I think a good portion of this is related to issues that come up
> whenever someone tries to write a markdown-like processor. I've got
> such a processor that makes a bunch of assumptions, I use similar code
> on both Flutterby.com and Flutterby.net, but it occasionally gets
> things wrong.
> 
> One of the problems is that I don't have explicit ways in the language
> that gets parsed to override certain idioms. For instance, at some
> point I put in a _underscore quoted_ string, which is great, things
> which are underscore quoted get surrounded with the "<cite>" tag and
> linked to a Wiki facility...
> 
> ... except that if someone writes some C code that has an expression
> that uses variables with leading and trailing underscores...
> 
>    int abc_ = 8;
>    int _xyz = 4;
>    a = _xyz - abc_
> 
> All of a sudden that's no longer code that can be copied and pasted,
> 'cause  some of those underscores disappear into <cite> tags.
> 
> On Flutterby.com, I give everyone the option of just using straight
> HTML and not trying to interpret the text context stuff.
> 
> On Flutterby.net, I haven't figured out what to do yet. I want to do
> more of this kind of thing (ie: No reason that I can't turn "4th" into
> "4<sup>th</sup>"), but I need override tags, and the language starts to
> get kinda complex, and...
> 
> If I were smart at the very least I'd offer up some sort of override
> tag, but it also becomes hard to proof-read and notice that, for
> instance, some of those dashes are a little narrow...
> 
> So this starts to become an "autocorrect" problem. Except that at least
> there you get real-time feedback, on mine you wait 'til the page is
> written and you mash "submit".
> 
> Another way to go about this, and probably a smarter one, is to promote
> semantic tagging, ie: "<phonenumber>123-345-6789</phonenumber>" which
> can then be rendered in a phone number style, with whatever that means
> for dashes.

After some thought, I imagine that many of the use cases for something
like this get by because the input they deal with is limited in some
way. Generating a calendar doesn't require processing paragraphs of free
form text and a report might have the text descriptions coded into the
template, with the variable data being structured data from a database, etc.

I suppose you could deal with free form text by giving up many of the
niceties of LaTeX, e.g.- using \frenchspacing so there is no distinction
between inter-sentence and inter-word spacing, and simply using
hyphen-minus everywhere.

I think that if you want to take advantage of these features when
generating multiple formats from a common source (e.g.- LaTeX for print
and HTML for the web), you would really have to use a source that has
some representation for these things. This basically equates to what you
suggested, supporting more semantics.

You could, of course, actually generate HTML and other formats from
LaTeX source, but then again, you might what to use sematic features of
those other representations that are not present in LaTeX, (eg- <abbr
title="Computer-Assisted Telephone Interviewing">CATI</abbr>), in which
case you really need a source that supports the relevant semantics of both.

> Personally based on all of the upside-down and contained-in-circles and
> other variations on standard Latin alphabets I'm seeing, I think
> Unicode is a step a few thousand years backwards to heiroglyphs, but
> then most users are probably stuck at a pre-CE level of
> sophistication...

I assume you are referring to some of the common abuses of unicode, such
as finding various glyphs from other scripts that resemble upside-down
latin characters, etc. For all of the potential abuses, unicode avoids
the problems of different encodings for different scripts, makes it easy
to mix multiple scripts within a single text (such as including quotes
from Greek in an English text), and includes proper representations for
accented characters, ligatures, and typographical features such as a
distinct hyphen, minus, en dash, and em dash, etc. Not that unicode
doesn't have draw-backs, but it certainly solves a lot of potential
problems.

In the context of using it as the source encoding for XeLaTeX, it would
allow the inclusion of typographically correct quotations, en dash, em
dash, accented characters, etc in a form that is appropriate to many
contexts (unicode can be included directly in HTML output) without
fussing with LaTeX-specific representations like "``Quoted text''",
"1982--2013", or "r\'{e}sum\'{e}".



More information about the Chugalug mailing list