Monday, May 14, 2012

More on Importing Cordova into the FLEx database

For many years, Thom Smith Stark and his students worked on an electronic version of the giant Cordova dictionary of Colonial Valley Zapotec.  I've been working for the last few days on the process for importing this into FLEx.

I have a few different versions of the electronic Cordova.  One is a MS-Access database, and the advantage of this version is that each of the multiple Zapotec words listed as the translation of the Spanish gets its own record.  In the design of the MS-Access, the ID number is keyed to the entry in the Cordova dictionary and there are two linked tables. (One with entry, folio number and Spanish, the other with the Zapotec, notes, etc.) To see both you construct a MS-Access query, which looks like this:



This can be output as an Excel file.  (Then from Excel =--> Sheet swiper --> FLEx.)

After the export to Excel, you need to add a row to the top with \lx over the column that will be the Lexical Form and \gn (=Gloss National) for the Spanish.  The original database has a Folio field that tells you what page of the book the word is located on.  I called this field \cordova page number. And for all the rows, I added a field \source with the value [Cordova import].

In order to get Sheet swiper to work properly, the \lx column needs to be first.  So I reordered the Excel columns.  I also sorted the Zapotec column to remove blanks.  (In the original book, these are cross-references from one Spanish entry to another.)

The original database also has two Zapotec fields, one with diacritics (ZAP_COMP) and one without (ZAP).  I decided not to import the version without diacritics, so I didn't put a backslash code over that column in the Excel.  The result looks something like this


Then I ran Sheet Swiper, which converts the Excel file into a standard format dictionary file.  That is fairly easy.


Within FLEx, you use the Lexicon | Import | Standard Format lexical data dialogue to pull this into FLEx.


You have to go through a few steps here.  Most are pretty easy to understand, but a few are not completely intuitive.  In the old Shoebox/Toolbox format the languages were divided into Vernacular, National, Regional, English.  I had used Spanish as the equivalent of National in earlier versions, and that is why I put the \gn tag over that column.  So in the dialogues, you need to tell FLEx that for this project National = Spanish.

For the various fields, you also need to tell FLEx where they will go in the entry.  If it doesn't know where they go, it will put the information in a field called "Import Residue".  That's okay, since it doesn't lose the information, but it is better to specify where it will go, if you know.  I wanted the Folio number to go in the Source field in each entry so I specified it in that way.

Running through this all let me import the first 483 test entries from Cordova to FLEx.

Within FLEx, I also made a few global changes.  Since the Zapotec form listed has all kinds of information in it, I wanted to segregate this out into other fields and work towards having just the root of the word as the Lexeme Form.  But because I didn't want to lose any information, I copied all of the information from the original entry into the Citation Form field via the Bulk Edit Entries dialogues.  A few examples:

The Zapotec field has information about things that are corrected somewhere (presumably in the pages of errata at the beginning.)  This entry has a correction in the original.  I changed the Lexical Form to reflect the correction and show the original form + correction in the Citation Form field.



The original also contains information about the prefix that a verb takes in the completive and potential aspects.  (The form is cited in the habitual.)  In the TSS database, this follows the field marker /cv/

This entry shows how the information got processed.  I created custom fields for the completive, potential, and habitual.  Then I filtered the imported entries to find /cv/ and used the "Click copy" feature in Bulk Edit to populate these fields.  After the fields had been populated, I used the "Delete" feature in Bulk Edit to remove this from the Lexeme Form:


No comments: