Working linguist blog

Friday, May 17, 2013

Possible output of Colonial Valley Zapotec dictionary

Maybe it is because I have been working with output formats for Copala Triqui and San Dionisio Ocotepec Zapotec dictionaries lately, but I spent some time today thinking about what an eventual output format for a Colonial Valley Zapotec dictionary might look like.

This is one draft of a format to consider, where the entries have the form:

CVZ orthography (part of speech) English gloss Spanish gloss (Examples) {Cordova page numbers, references to other sources such as Whitecotton} {Cordova original entry/entries} (alternate spellings)

Here is one entry fleshed out with examples, with several Cordova entries merged together …

I also put the different languages in different colors to make it easier to distinguish the different parts of the entry from each other. Examples here are chosen to illustrate some of the range of different morphology that the root chono appears with. I pulled them from the corpus using the 'Find Example' feature in FLEx and edited them for length. I also italicized the word or phrase that contains chono and the corresponding part of the translation to make it easier for the reader.

(One thing I can see already is that I have been inconsistent in the abbreviations for the names of the texts -- sometimes Feria and sometimes Doctrina!)

A posted version of a dictionary like this would allow users who don't have access to the full FLEx project to nevertheless search through it and be able to advance over what is currently available to them. (Of course, I think what we have still needs lots of clean up before we would have a version that would be ready for posting.)

Thursday, May 16, 2013

New draft of the San Dionisio Ocotepec Zapotec dictionary posted

I'm more and more adopting the philosophy that we shouldn't keep our data buried in our field notes and computers until it is perfect. It's more likely to be useful to others if we make it available in intermediate draft stages, so I'm starting to post draft versions of various projects on my page at Academia.edu. (Inspired by the wisdom of Peter Austin, who posts draft grammars and dictionaries on his page!)

Today I've posted the current, imperfect state of the San Dionisio Ocotepec Zapotec dictionary. (Ethnologue ZTU) at http://www.academia.edu/3546909/San_Dionisio_Ocotepec_Zapotec_--_Spanish_--_English_dictionary_interim_draft_version_

Here's a screenshot of one part of it:

Monday, April 22, 2013

Progressive aspect as an innovation in Central Zapotec

An updated version of my paper about the evolution of progressive aspect in Central Zapotec languages has just been posted on my page at Academic.edu.

My essential argument is that progressive aspect marking with ca- is innovation in Central Zapotec, and you can still see the early stages of this in Colonial Valley Zapotec. In the 16th century texts, ca- isn't yet an obligatory marker of progressive (as it is in modern languages in this group).

The Central Zapotec languages include the ones shown in this tree (some branches omitted for visibility), using the classification of Thom Smith Stark (2007):

Tuesday, April 16, 2013

The send/receive feature in FLEx 8

The send/receive feature in Fieldworks Language Explorer (FLEx) 8 has just made collaboration on language projects much easier, and my colleague and I have finally got one of projects up and running under the new system.

Download is available at SIL Fieldworks Download. A few comments on things that we learned in the process:

You need an account at languagedepot.org where the joint files will be kept. Each member will need an account. If you previously had an account and files there, the account was probably set up for some other kind of synchronization capability (LIFT, which is lexicon only), and you'll need help from the staff to reset your account to the new standard (FLExBridge).
After you get the account set up there, one member of the current team (call this person manager1) sends a complete copy of the project. To send the project, you use the menu option in FLEx that says Send/Receive

The wrench icon is where you enter your information from languagedepot about your account information. Here you need the project ID they assigned you, your login, and password.
After you and your colleagues on the project are sure that you're ready and that a complete version is on the languagedepot site, everyone who is not manager1 should delete their own copy of the project from their computer, then go to FLEx and use the menu item Send/Receive --> Get Project from Colleague. This will download a fresh copy that will (hopefully) be set up for synchronization with others.
The way we tested this was for the two of us to be online with the database and a gmail chat window. We separately modified a text and a lexical entry, then pressed the "Send/Receive" button afterward to check that the project was updating and merging properly.
This feature is still in beta. Occasionally the merge was slow or produced warning messages such as "operation timed out; retrying", however it ultimately always succeeded.
If the operation detects a conflict, it flips up a separate window with the conflicting entries/texts. You can't send/receive until you look at these and click a box that says "resolved". You may not notice that this window is present, but if it is there, the ordinary FLEx window acts oddly and you can't do ordinary editing.

Wednesday, April 10, 2013

Some photos of San Dionisio Ocotepec

And for a break from linguistics, some photos of San Dionisio Ocotepec which I stumbled across on the web.

Town hall in San Dionisio, Entrance to the town, entrance to the church

Tuesday, April 2, 2013

More "perfect" plus habitual in Colonial Zapotec

The following passage from Feria shows another instance of the so-called "perfect" hua~oa- before the habitual aspect marker <t>.

The semantic function in this case seems to be what the aspect theorists call the 'universal perfect' (See a nice explanation of the different kinds of perfect in this paper by Paul Kiparsky).

'Because very numerous are the sins the devil always teaches us; the are uncountable; the wants to make us sin in the face, the hear, the ear, the mouth...'

The notable thing is the occurence of oa-ti- before the verb 'desire'. Since the devil's desires are eternal, it is presumably appropriate to use the universal perfect here.

Thursday, March 28, 2013

New instances of progressive aspect in Colonial Valley Zapotec

I was very pleased to find some more instances of the progressive aspect marker in Colonial Valley Zapotec texts this morning, since it is not generally known that this morpheme was present in the language at this stage. Smith Stark (2008), for example, doesn't include any mention of it in an otherwise comprehensive overview of the Colonial Valley Zapotec aspect morphology.

These new examples involve the verb 'do' and 'be sitting'. It's interesting that with the first verb, the form in the text is

ca-g-oni

where the /g/ looks like a reflex of the potential aspect. Thom Smith Stark argued in a (2004) paper that the progressive evolved from a construction that involved a verb of position followed by a verb in the potential.

Saturday, March 23, 2013

Central Zapotec languages

The tree below is the current version of something that I am working on. It's intended to show major branches of Central Zapotec (according to the classification of Smith Stark 2007), and it also shows (in brackets and italics) where we currently have Colonial Valley Zapotec documents in our FLEx corpus.

I know that there are many other Colonial Valley Zapotec documents out there, so the tree will get 'bushier' over time, but I found this a useful graphic view of where the documents come from in terms of distribution.

Friday, March 8, 2013

The function of hua- in Colonial Valley Zapotec

In Colonial Valley Zapotec texts, we often see an aspect marker that is spelled something like hue or hua before the verb. The function of this marker is a bit uncertain, and in the modern Valley Zapotec languages that I am most familiar with, it is no longer in use.

The following passages, from an 1823 Catechism, however, seem to show that hua is not strictly speaking an aspect marker, but another kind of morpheme that precedes the aspect prefix /r-/.

In modern Valley Zapotec /r-/ is habitual, but in Colonial Valley Zapotec, /r-/ had a wider range of uses. Perhaps at this point in the 19th century, the combination huari was the normal way to indicate repeated actions?

Contrast the following where the /r-/ appears without the hua. Here we seem to have more of an eternal 'gnomic' style reading of the clauses:

Monday, March 4, 2013

A verb of position as a diachronic source for continuous aspect in Zapotec?

This morning I was reading a bit of

Bybee, Joan, Revere Perkins, and William Pagliuca (1994). The Evolution of Grammar. Tense,
Aspect, and Modality in the Languages of the World, Chicago: The University of Chicago.

where she discusses the evolution of progressive aspect in languages around the world. In many languages, it originates in some construction that involves a locative expression, either with a verb of position or something like 'be in X'.

This got me to wondering about the Continuous aspect in modern Valley Zapotec languages. This is not a well-attested aspect in Colonial Valley Zapotec, but I find some examples of incipient grammaticalization, all in the construction 'I say', where the verb nni is used with a preceding ca- prefix that looks very much like the continuous aspect ca- found in modern Valley Zapotec.

A possible connection to a verb of position is the word cáá 'hang, be located (in a high place)'. The lexical entry for this word in San Dionisio Ocotepec Zapotec in the current draft of my own dictionary looks like the following:

In the Colonial Valley Zapotec examples that I have, only 'I speak' is used with a preceding ca:

It is probably also significant that both of these examples are accompanied by the adverb anna 'now', as would be appropriate for a continous aspect marker.

But I do want to emphasize that continous aspect marking is incipient in Colonial Valley Zapotec, and I haven't identified any other examples in the texts. In modern Valley Zapotec languages, you can use continous for a wide range of verbs; in CVZ only the verb 'say' is attested so far.

Sunday, March 3, 2013

An expression of judgment with chiba in Colonial Zapotec

The following is my best guess at the analysis of a passage from Feria's Doctrina in Colonial Valley Zapotec. What is of some interest is trying to work out the right analysis of the word chiba here. It probably is related to the word 'put', which is a causative of 'sit':

The word that follows is xihui, which means 'sin'. So the whole phrase, we know from the translation, means 'condemned by Pontius Pilate'. But what is the literal Zapotec? Chiba xihui seems like 'place sin', so is 'condemn' = 'place sin', with =ni serving as the subject of this verb?

Still, it's hard to understand the syntax of the part that follows this... justicia ni pettogo xihui xitichani Iuez nila Poncio Pylato. From looking at other examples of 'judge' it seems that the verb ttogo is always followed by a form of ticha 'word', so that this is a quasi-idiomatic expression. (The possessor of the word seems to be the person who is judged.) "Judge a person's sins" seems to involve putting the word xihui between these two parts.

Saturday, March 2, 2013

More examples of regular expressions in FLEx

This regular expression is used to help me separate out the right allomorph of the habitual prefix of the Colonial Valley Zapotec verb. The habitual aspect comes in a few allomorphic variants, which are orthographically usually written <te, to, ti>. I want to strip this from the citation form and put it in a separate field called Cordoba habitual form.

The procedure is to copy the whole citation form to the Cordova habitual form, then use a regular expression in bulk edit to remove everything except the prefix. The following search and replace comes fairly close to what I want:

This bulk replace operation looks for the beginning of the record (^), then any number of characters (.*), then t followed by one character (\w). I put the sequence t & one character inside the capturing parentheses, because I want to be able to refer to it in the Replace operation on the next line.

After the parenthesis, I have any number of characters (.*) and the end of the record ($).

This is replaced by t, whatever the letter that was captured in the previous parenthesis was ($1) and a hyphen.

So if the first line finds blah blah ti+capaya blah blah it will be replaced by ti-.

This mostly works -- except that the way the original Cordova entries came to me, there are sometime several Zapotec verbs listed together in a single entry; possibly with differential potential prefixes. So I have to give the results a visual inspection to make sure nothing has gone wrong before hitting the Apply button in Bulk Edit. If there is a record where this will give the wrong result, I can just uncheck it, and edit it manually.

The counter part to this regular expression is the one that takes the citation form, strips off the habitual prefix and returns just the portion minus that prefix. The search and replace that will do that is the following:

The Find expression looks for a t followed by one word-forming character and a literal (+) at the beginning of a record (^). It starts to capture everything from here to the end of the record ($) and stores it in the variable $1. When this search and replace is applied, the effect is to take a word like

ti+capaya

and replace it with capaya.

Friday, March 1, 2013

More sophisticated regular expression searching in the Cordova

I am not a computational linguist, by any means, but I have slowly been learning enough about regular expressions to be able to do some useful things with FLEx. One aspect just learned is how to use regular expressions in search and replace operations in FLEx. (The FLEx help menus are not really very explicit on this.)

In a FLEx search and replace function — in Bulk Edit, for example — each thing that is enclosed in parentheses will return some set of results, called a capture. You can refer to this capture with the variable $. So the material in the first set of parentheses is $1. The material in the second capture is $2, and so on. Here is an example of how I used this information in the Colonial Valley Zapotec database.

Córdova normally cites a verb in the 1st person habitual. Depending on the allomorph of the verb, the habitual of the verb might be /ti, to, te/. The verb root will usually be four to eight letters long. And the first person will end in /a/.

So if the form cited is tichapa, I would like it to be segmented ti+chap-a.

The "Find what" on the first line sets up a first capture group, which is the prefix, made up of t plus either e, i, or o. (Elements between square brackets are options.) Because this whole first unit is between parentheses, it is capture group one, which I can refer to as $1 in the "Replace With" line below.

I want to replace it with the same thing, followed by a + to show the boundary.

The next capture group is a group of letters (shown by \w — meaning any wordforming character), and I have shown the number as between 4 and 8. (On second thought, perhaps the lower number should have been three…)

Since this is the second capture group, I can refer to it by $2, in the "Replace With" line, and this time I replace it with the same thing, followed by a hyphen.

This is a first attempt at using the regular expressions with FLEx, but I think I can already see how they are going to make it possible to accomplish more sophisticated data manipulation as we try to get the Córdova diccionario into a format that we can understand.

Monday, February 18, 2013

The worst religious metaphor ever

I think the following passage from the Feria Doctrina (in Colonial Valley Zapotec, 1567) may be one of the worst extended religious metaphors ever. God is a like an abusive husband -- he beats us to punish us for our sins, but won't leave us so long as we are faithful.

The beni niguio [ni=ti-gapa __ ][ ni=ti-bibe __ loo lechela=ni]] 'man who beats and thrashes his wife' is an interesting example. It shows what Pam Munro, John Foreman, Aaron Sonnenschein, and Heriberto Avelino have called the Covert Subject Construction -- here across apparent conjoined relative clauses. 'His wife' ought to be the object of both clauses, so this is also an example of Right Node Raising, and the possessor pronoun on 'his wife' is somehow responsible for licensing the omission of the subjects of both preceding verbs.

Thursday, December 20, 2012

'Goody goody' in Trique

One of my favorite lexical entries in a while is the difficult to translate interjection arsínj, said when a bad person, or a person who has rejected your good advice, gets into trouble. The closest we could come up with in English was 'Goody! Goody!'

Here is an English expression of the same sentiment.

Orange oil in darkness — A poetic view of two ways of doing linguistics

Orange oil in darkness

The useful part
of things is elegance --
in mathematics, bridges.

Even in hedges
of ripe persimmons
or mandarin oranges,

elegance solves
for the minimum possible
then dissolves.

The art is what's extra:
a fragrance penciled in,
or long division's inescapable reminder.

Not quite unplanned for,
more the unexpected, impractical gift.
Not the figures traced

in the bridge's stanchions,
but the small
and lovely sounds they make in the wind.

Who drew that in?
Who could have?
...

-- Jane Hirshfield "The Lives of the Heart"

This poem sums up about as well as anything the aesthetic that I've come to after three decades in linguistics. I am profoundly moved by the elegance of language, and it is something that I try to convey to my students and in my writing. But I'm also delighted by the quirky art of language -- the delicate curlicues, impractical extravagances, and intensely detailed categorizations of the world. I seem to only be happy in linguistics when I'm able to be both moved and delighted!

Friday, November 23, 2012

Clarification on older practical orthography for tone in Copala Triqui

In an earlier post, I was uncertain about the right interpretation of the tone orthography used in Hollenbach's (1977) versions of the Sun and Moon myth. Barbara Hollenbach was kind enough to clarify for me that in this previous version of the practical orthography " The three lower patterns (1, 2, 13) were written with a single macron on the final vowel."

So cacaā could represent /kakaa¹, kakaa², kakaa¹³/. The particular line that worried me in the previous post contained the word <cacaā>. In the current orthographic practice, macron is replaced with underscore, so that would translate to cacaa̱ , and <aa̱> = /aa³¹/. However, in the orthography of the 1977 texts, cacaā had a different interpretation, and in this text, the right interpretation is /kakaa²/.

Another way in which it is easy to get confused by the 1977 texts is that 31 tone is not indicated in any special way in that orthography. (It is marked by aa̱ in the current practice, as just mentioned.)

The differences can (I think be summed up this way) for long vowels:

Tone Current Orthog Previous Orthog (1977)

3 aa aa
32 aa aa
31 aa̱ aa
4 aá aá
5 áá aá
1 a̱a̱ aā
2 a̱a̱ aā
13 a̱a aā

The previous orthography thus makes only three distinctions — <a, ā, á> — while the current orthography makes six <aa, aa̱, a̱a, a̱a̱, aá, áá> for long vowels.

This does create a practical problem for entering in the 1977 texts in the FLEx database — at one level, it seems desirable to preserve the original orthography of these texts. On the other hand, it is strongly at variance with the majority of the material, which uses the more recent orthography. I can't necessarily predict the modern orthography of the word from the older orthography without looking it up in the dictionary and understanding the morphology, so there is no way to do it automatically.

Thursday, November 22, 2012

More on causative in Copala Triqui

I wrote a bit about the odd causative construction in Copala Triqui in this post. To recap, the normal syntax is

[VSO] cause SUBJ.

In the examples cited there, the aspect of the verb 'make' ('yaj) seems to match the aspect of the verb in the complement. The following example -- if I am interpreting it correctly, seems to show a failure of aspect-matching:

One slight uncertainty -- this text comes from Hollenbach's (1977) version of these texts, published in Tlalocan. The orthography uses a macron over the vowel to represent low register tone, rather than a line under the vowel. So the original form of the word that I've retranscribed as cacaa̱ was written cacaā.

In the usual form of the practical orthography for Triqui tone, cacaa̱ represents only cacaa³¹, while
cacaa is either cacaa³ or cacaa³². So as the text is written, it would normally correspond to c-acaa³¹. However, the word for 'burn' in Triqui is acaa³² (practical orthography acaa) and its potential is c-acaa² (c-aca̱a̱).

I'm tentatively guessing that this represents an earlier version of the practical orthography where perhaps both 31 and 32 tones were written vv̱…

If my interpretation is correct, then the two instances of the verb 'burn' are completive, while 'will make' qui'ya̱j is potential.

Update 11/23/2012

Barbara Hollenbach has kindly clarified for me by email the way the older practical orthography for tone worked. I've tried to summarize my understanding in this post. That means that my analysis in the earlier post is incorrect in identifying the instances of the verb <cacaā> as cacaa³². Instead, it is an older way of writing c-acaa², which is the potential aspect form of 'burn'. Thus the aspect of make (potential)' does match the aspect of the potential, and I don't have a counterexample to the tentative generalization that the two always match. (A counterexample might still be found, but this doesn't turn out to be a valid counterexample.)

The new analysis of this passage is shown below:

Tuesday, November 20, 2012

Then he ordered the opossum

I think the syntax of the following line in a Triqui folktale is pretty interesting for its lack of anything like PRO or Condition B effects:

Literally, 'He ordered the opossum (that) it will lower it in the water'. The verb tanij is in the potential aspect, as is normal for the complement of a verb like 'order'. But nothing seems to signal the coreference between the main clause object 'opossum' and the two pronouns coreferent to it in the embedded clause.

Thursday, November 8, 2012

Phrase-book material in a lexicon

Native speakers or learners of a language often find it useful to have a set of common phrases in the language. But do these go in the dictionary?

It seems awkward to have entries like the following:

But an alternative is to use the Publications field in FLEx to define a separate Phrasebook. You configure this via Lists | Publications and add a new Publication called Phrasebook. The default publication is Main Dictionary, so if you haven't changed anything, that is where all the entries will appear.

The default seems to be that all entries are in all publications. I used the Bulk Edit commands to remove Phrasebook from the Publication lists of all the entry, then went back in and selected the entries I wanted for the Phrasebook.

For this entry, this means adjusting the Publications field to say only Phrasebook and not Main Dictionary. After changing the field, it now looks like this:

In the lexicon pane, we can pick which dictionary to display. The default will be the main dictionary:

From the pull-down arrow next to the words Main Dictionary Entries, we can also select other publications, such as the Phrasebook. The following screen shows a partially populated Phrasebook: