Working linguist blog: May 2012

Wednesday, May 30, 2012

Scilicet in Cordova dictionary entries

In the Cordova dictionary of Colonial Valley Zapotec, the abbreviation scillicet means 'that is, for example'. In the Spanish orthography of the time, this is the long s followed by a period. In Thom Smith Stark's markup, this is {¥s[cilicet].}

Here is a screenshot of a sample Cordova entry that uses this.

Almoçadas means something like 'handfuls' in Spanish. The FLEx entry that shows my best guess of the correct interpretation of the Cordova entry is as follows:

I think the o in front of the word xoopa is probably Spanish o 'or'.

This example shows some of the difficulties of deciphering all the information that Cordova includes in his entries!

Bulk editing Cordova Zapotec entries

Continuing on my efforts to make the enormous Cordova dictionary of colonial Zapotec.

My general goal is to have the Lexeme Form of each verb show the verb root. Cordova's usual practice was to cite a verb in the habitual aspect for the 1st person singular.

The habitual has several allomorphs, written in the following way by Cordova (with my best guess at the intended phonemic?

<to> /ru-/
<ti> /ri-/
<t> /r-/

Also sometime as <te>, though I'm not sure about /re-/ as an allomorph of the habitual in modern Valley Zapotec.

The first person is usually written as

<a> /=a/ after a consonant
<ya> /=ya/ after a vowel

The version of the database that we have inherited from Thom Smith Stark often has these separated from the root as follows:

to+chìba-ya ticha-pitào, 'bendezir algo o consagrar' ('bless something or consecrate')

So the stem should be

chiba

with /to-/ and /-ya/ stripped away. This entry also shows that the /-ya/ is not necessarily final in the entry, since Cordova often includes a typical object along with the verb. Here the object is ticha pitào
'word of God'.

I've worked through most of the verbs in the 5000 imported Cordova entries at this point.

My first step was to copy all the information in the original entry to the Citation Form field, so that I always have the original form available. Then I word on the Lexeme Form field to a.) remove the habitual aspect prefixes b.) remove the 1st singular suffix, b.) put the information about completive and potential aspects into special fields.

The procedure uses the Bulk Edit function of FLEx, generally searching for various allomorphs of the habitual and 1sg and replacing them with nothing. This is easiest for the entries where Thom Smith Stark's analysis, where + separates the prefix , - precedes the suffix. I can search for entries with to+, ti+, t+, -a, and -ya pretty easily.

Here are some screen shots, first filtering to find all the examples of the pattern. The search uses regular expressions, so ^ means at the beginning of the record and the \ makes the following + be interpreted literally as + (not some function).

Here is the bulk replace setup screen:

And here is an example of an entry after the Bulk Replace has removed the to- prefix.

(I also changed the part of speech for all items with the to+ pattern to Verb.) More difficult are the entries where Thom did not do the analysis. It's not correct to remove every initial to sequence, since some of the resulting items are just nouns that start with to. For example, the noun tola 'sin' shouldn't be changed to la with a to prefix.

What I tried here was searching for the pattern to...a or to...ya, then inspecting the results to make sure that the Spanish gloss seems to be a verbal form (generally cited in either the infinitive or the past participle form). I changed the part of speech for all of the good instances to Verb, and make a few manual changes to other parts of speech when I could figure it out.

After the valid instances of verbal to...(y)a were identified, I filtered the data to show only verbs and then used the same Bulk Replace method to delete the prefixes and suffixes from the Lexeme Form.

Monday, May 14, 2012

More on Importing Cordova into the FLEx database

For many years, Thom Smith Stark and his students worked on an electronic version of the giant Cordova dictionary of Colonial Valley Zapotec. I've been working for the last few days on the process for importing this into FLEx.

I have a few different versions of the electronic Cordova. One is a MS-Access database, and the advantage of this version is that each of the multiple Zapotec words listed as the translation of the Spanish gets its own record. In the design of the MS-Access, the ID number is keyed to the entry in the Cordova dictionary and there are two linked tables. (One with entry, folio number and Spanish, the other with the Zapotec, notes, etc.) To see both you construct a MS-Access query, which looks like this:

This can be output as an Excel file. (Then from Excel =--> Sheet swiper --> FLEx.)

After the export to Excel, you need to add a row to the top with \lx over the column that will be the Lexical Form and \gn (=Gloss National) for the Spanish. The original database has a Folio field that tells you what page of the book the word is located on. I called this field \cordova page number. And for all the rows, I added a field \source with the value [Cordova import].

In order to get Sheet swiper to work properly, the \lx column needs to be first. So I reordered the Excel columns. I also sorted the Zapotec column to remove blanks. (In the original book, these are cross-references from one Spanish entry to another.)

The original database also has two Zapotec fields, one with diacritics (ZAP_COMP) and one without (ZAP). I decided not to import the version without diacritics, so I didn't put a backslash code over that column in the Excel. The result looks something like this

Then I ran Sheet Swiper, which converts the Excel file into a standard format dictionary file. That is fairly easy.

Within FLEx, you use the Lexicon | Import | Standard Format lexical data dialogue to pull this into FLEx.

You have to go through a few steps here. Most are pretty easy to understand, but a few are not completely intuitive. In the old Shoebox/Toolbox format the languages were divided into Vernacular, National, Regional, English. I had used Spanish as the equivalent of National in earlier versions, and that is why I put the \gn tag over that column. So in the dialogues, you need to tell FLEx that for this project National = Spanish.

For the various fields, you also need to tell FLEx where they will go in the entry. If it doesn't know where they go, it will put the information in a field called "Import Residue". That's okay, since it doesn't lose the information, but it is better to specify where it will go, if you know. I wanted the Folio number to go in the Source field in each entry so I specified it in that way.

Running through this all let me import the first 483 test entries from Cordova to FLEx.

Within FLEx, I also made a few global changes. Since the Zapotec form listed has all kinds of information in it, I wanted to segregate this out into other fields and work towards having just the root of the word as the Lexeme Form. But because I didn't want to lose any information, I copied all of the information from the original entry into the Citation Form field via the Bulk Edit Entries dialogues. A few examples:

The Zapotec field has information about things that are corrected somewhere (presumably in the pages of errata at the beginning.) This entry has a correction in the original. I changed the Lexical Form to reflect the correction and show the original form + correction in the Citation Form field.

The original also contains information about the prefix that a verb takes in the completive and potential aspects. (The form is cited in the habitual.) In the TSS database, this follows the field marker /cv/

This entry shows how the information got processed. I created custom fields for the completive, potential, and habitual. Then I filtered the imported entries to find /cv/ and used the "Click copy" feature in Bulk Edit to populate these fields. After the fields had been populated, I used the "Delete" feature in Bulk Edit to remove this from the Lexeme Form:

Sunday, May 13, 2012

Two things you should listen to with attention

In this bit of the Feria doctrina, I was interested in the verb /zoba..tiyaga/, which means 'listen'. The first part is 'set', and the second part is 'ear'. I think these function together as a single (complex) root in modern Zapotec. The evidence is that the subject agreement follows 'ear' only. Since modern Valley Zapotec languages are not generally pro-drop, that's good evidence that the pronoun after 'ear' is the subject of the preceding.

So it is not 'you set your ear' but the equivalent of 'you ear-set'. (Zapotec-like equivalents:
Not Set=you ear=you but set-ear=you).

We could call this a kind of incorporation in modern Valley Zapotec (though confined to a small set of V + N combinations).

But in this colonial document, the two parts are separated by an adverbial element chahui. That seems to imply that the two parts were less lexicalized as a compound 500 years ago.

I remember that Pam Munro, John Foreman, (and Aaron Sonnenschein?) talked about something like this in the colonial Valley Zapotec documents at a SSILA meeting some number of years ago.

Saturday, May 12, 2012

Working with Thom Smith Stark's material on colonial Zapotec

I've undertaken the task of seeing whether it is possible to incorporate into our project the enormous work that Thom Smith Stark and his group put into creating an electronic and searchable version of the massive Cordova (1567) dictionary of Colonial Valley Zapotec. The big problem for modern researchers is that the Cordova dictionary is only Spanish - Zapotec, so it cannot be used to read documents.

There is tremendous potential here, but one initial challenge is figuring out the various notes, abbreviations, conventions, and file formats involved.

For example, I have one set of Word documents which are a reversal of the Cordova, now alphabetized Zapotec to Spanish. Here is an image of part of one of them:

Here are my guesses about what some of the mark-up means. In the Spanish column, I think the asterisk must indicate a word for which a new entry ought to be created. So for the second word, I think this means make an entry for this Zapotec word under 'dar cuenta o razon' and also under 'razon, dar cuenta o'.

In the Zapotec, I think the | after /ti/ is showing that this is a prefix. I don't know exactly why there is also a + symbol at this point. The ÷ precedes a clitic /=a/. (Cordova's convention was to list all the verbs in the 1st person habitual.)

In Cordova's dictionary, at the entry for a verb he lists the prefixes that are used for the preterite (or completive). So when we see prt> in TSS, that means that the completive prefix is what follows. Sometimes the completive attaches to a different form of the root.

For example, with ti-bee=a quij 'fuego sacar con yslabon o assi', the prt> co+lè means that the completive is co-lèe=a quij

For comparison, here is Cordova's entry for this verb

Although Córdova writes this all as one word tibèeaquij, Thom's morphological analysis is that the /=a/ is the 1st person. So the verb must end after this, and quij is a separate word. Thus - in Thom's dictionary seems to mean 'the following is a separate word'.

Thursday, May 10, 2012

Proceedings of CILLA V

The proceedings of the

Conference on Indigenous Languages of Latin America-V

are now available at http://www.ailla.utexas.org/site/cilla5_toc.html

They include my paper on Negation as raising, as well as many other interesting papers!

Tuesday, May 8, 2012

More on comparatives in Triqui

In previous posts 1 and 2, I discussed comparatives in Copala Triqui. The following sentence shows a comparative-like structure that I haven't noticed before, which uses síj 'reach, arrive' before another verb:

I don't see or can't find another example like this in my corpus, so I will want to check with our language consultant for his thoughts.

Monday, May 7, 2012

Orthography worries in San Dionisio Zapotec

Unfortunately, the meetings in Oaxaca left me less certain, rather than more certain, about what the best practical orthography to use in Zapotec is. I have been using an orthography which is essentially the same as that used in Isthmus and Mitla Zapotec, but the meeting made it fairly clear to me that no one agrees on what to use, especially including speakers of the various kinds of Valley Zapotec.

In my current dictionary database, I've got too many spellings floating around, possibly confusing me. The old one that I used is the top one given, but I've now added an Americanist style phonetic field just to avoid too much confusion.

The citation field was initially composed from the old practical, but deleting tones, vowel length, and breathiness, along the lines used in the Cali chiu simplified orthography for San Lucas Quiavini Zapotec (Munro and Lillehaugen). The other puzzling/difficult question is how to represent the difference between plain vowels and diphthongs in the practical orthography.

Finally, the difference between ʃ, ʒ, tʃ, and dʒ is a plague. <ch> for /tʃ/ is the only simple solution. But every other way of writing these seems to cause reading problems. Currently, I am leaning toward a solution where /ʃ/ is <x>, /ʒ/ is /zh/, and /dʒ/ is <dx>.

I am getting indications from my speaker that she is finding the simplified orthography too simplified in some areas (the diphthong issue being a prominent difficulty in her reading). I still don't know how to solve this issue.

Sunday, May 6, 2012

Negative focus in Colonial Zapotec

The following example shows an interesting case of a preverbal negative focus in Colonial Valley Zapotec. This is very much like what would occur in San Dionisio Ocotepec Zapotec in the same context.

Look at the part about 'no one can rise', where we get aca ru-ti benni zoaca chapi...

Here in SDOZ, we would have ru-te'ca biiny 'no person', where the /ru-/ prefix on the negative matches the animacy of the noun that follows.

Thursday, May 3, 2012

Have you been a witch or sorcerer

From the text in Colonial Valley Zapotec that I am working on today. This is from a ca. 1823 Confessionario. (And I know that 1823 is a couple of years too late for the colonial period in Mexico, but the language is pretty similar!)

I don't know the meaning difference between the two Zapotec words used here. They are both given as synonyms for brujo in Cordova.

And another question from the same document, a few lines earlier:

Wednesday, May 2, 2012

Experimenting with semantic domains

I have been playing around a bit with the Semantic Domain feature in FLEx today. I have mostly found this fairly unwieldy, since the lists of semantic domains are in an elaborate hierarchy and it takes a long time to find the domain you want.

However, if there are only a few domains that you are interested in, you can also just add them as top-level domains (using the Lists | Semantic Domains menu). One that I am interested in is the psychological verbs that show up with ni'yaj after them.

I added psych verbs as a top-level semantic domain, as follows

Then in the entry of a verb that licenses this marker, I select this as "Semantic Domain" in the entry:

If I want to see all the verbs that have been identified for this semantic domain, I use the "Classified Dictionary" option in the Lexicon pane, and it shows me the following: