Wednesday, 4 July 2012

(Semi)automatic indexing in LaTeX

Perhaps this may be of some help for LaTeX users. After I finished writing the book (by the way: proofs expected in August), I had to deal with the problem of creating an index, which the editor definitely wanted. You may think this is the least of the problems once you've actually written the whole thing, but, as it turns out, it is not quite like that...

Of course, I used LaTeX to write the book, which means that, theoretically, you can use the command \makeindex to generate the index. Unfortunately, this requires that a suitable tag \index is included every time that a particular word (or, more generally, concept) is found in the document.

In other words, this meant: 
a) creating a list of concepts that I thought should be indexed;
b) inserting the tag \index{concept} next to every single instance of that concept.

I started to do that but after 10 minutes I realised this would be too tedious. I briefly lost the will to live and played around with the idea of calling the whole writing-a-book off. Fortunately, I soon came to my senses and started to look for a cleverer way. A nice solution was here. It still requires some tweaking, but that I think it gave me quite a good result with a relatively small amount of work. 

The whole thing is based on a TCL code which adds suitable text in the LaTeX document. In order to do so, you need to prepare the whole procedure.

First, you need to create a text file containing all the concepts to be indexed. The syntax is quite intuitive and basically you need to declare the concepts and synonims. In particular, you need to follow LaTeX convention whereby you can nest concepts within each others. For example, I wanted to nest the concept "prior" within the concept "probability"; you can do so by using the notation 
In addition, you can link combinations of words to a given concept. For example, I wanted every instance of the words "prior probability" found in the text to be indexed as 
and to do so I only needed to include in the file the syntax 
prior probability -> probability!prior.

This is relatively straightforward, except that you have to think carefully about what you want in your index, and the list of concepts/words can be quite long. I created a spreadsheet with all the main concepts and tried to think of all the others that I wanted to nest within them and then translated this into a suitable text file.

Next you need to use the TCL programme that does all the tagging, ie puts the tag \index{concept} in the text, according to the list you've specified in the previous step. Because the marked-up file(s) can be quite messy and thus difficult to read, I thought it would be more efficient to work on copies of the original LaTeX files. Thus I ended up with an original copy (with no tags) and an "indexed" one, including all the tags, which can be compiled to produce the final document with the index.

At this point, theoretically, you only need to run the \makeindex command to produce the complete file. I say theoretically because this procedure is not 100% effective and there are still some problems. In my case, I had several exceptions that I need to deal with to avoid problems when compiling the LaTeX file (for example, an extra "\" would be occasionally inserted in the text, which leads to errors in the compilation). 

The errors can be long-ish to fix, but not to identify since LaTeX will tell you where they are when compiling the document. If you can do a bit of Linux programming (but you probably can do this on Macs and Windows machine too, perhaps using DOS-like commands) they are not too difficult to correct; I did most of it using the command 
sed -i
which allows you to modify a given string within a text file. 

All in all, I think it was very helpful and saved me some time. I still had to do some work, but I think not as much as I would have, had I decided to create the index from scratch in the original LaTeX files. 

No comments:

Post a Comment