Monday, January 23, 2012

Searching Whitecotton

One of the challenges of working with Colonial Zapotec is that although there is an enormous dictionary, it is a Spanish-to-Zapotec dictionary, in only a vague approximation of alphabetical order.  Joe and Ruth Bradley Whitecotton produced a very valuable resource in 1993 when they produced a Zapotec-to-Spanish reverse version.  This is much better for trying to go from Zapotec to Spanish, but still there are problems --

Zapotec verbs normally have a prefix, so verb stems might be distributed among different aspects, relativized forms, nominalized forms, etc.  Nouns might appear alone, but also as parts of compounds or in variant spellings. It would be nice to be able to find all of the relevant forms.   People at UCLA produced a PDF version of Whitecotton (which is otherwise hard to obtain), but this is an image file and not searchable.

After a little fiddling, I was able to apply optical character recognition to this PDF, resulting for the first time in a searchable version of the Whitecotton and Whitecotton Colonial Zapotec -- Spanish dictionary.  In the first afternoon of having it available, I'm already finding words with more ease.

Title page for the Whitecotton and Whitecotton dictionary

PDF version, now processed via OCR and searchable

No comments: