Working linguist blog: January 2012

Tuesday, January 31, 2012

Root vs stem in FLEx

My friend Danny asked me a question about morpheme types in FLEx that prompted some thought. FLEx allows you to specify a pretty wide range of morpheme types, including a distinction between roots and stems.

The documentation resources that come with FLEx describe the distinction in the following terms:

I can imagine that there are projects where the root/stem distinction has some practical application, but I have not found much use for it my own work.

I don't use the automatic parsing function of FLEx, so I do manual parsing instead. Possibly something in the set-up of project with morphological rules might want the rules to be sensitive to whether affixes attach to roots vs stems.

Apart from that, it has seemed to me like I have been able to do pretty much everything I need without really using the distinction very much.

For my own purposes, I use 'stem' for nearly everything, and 'compound' for most of the rest and 'bound stem' for a small number of morphemes that appear in compounds but not independently. The possible advantage of 'stem' as the default, is that you can use the 'Components' field to show when stems contain other morphemes.

These two screen shots show the Triqui verbs achén 'pass' and its causative tacuachén 'deliver'. They show that tacuachén has achén as a component and achén is part of a complex form tacuachén.

(The default order of the fields in the dictionary puts components right after the lemma. I don't much like this, so I think I would edit the dictionary configuration to put them closer to the end.)

This isn't a problem with the design of FLEx, but one of practical implications for a person in language documentation is that sometimes the software gives you the ability to make more distinctions that are really needed for a dictionary and text collection. At least it gives me more options than I seem to need so far...

Stranded prepositions in Copala Triqui

Stranded prepositions are certainly possible in Copala Triqui, but mostly I have them from elicitation. So it is nice to come across occasional examples in text:

Monday, January 30, 2012

Strange psych verbs in Copala Triqui

A certain subclass of transitive verbs of psychological attitude appear with a very odd biclausal structure in Copala Triqui, where the object of the verb is syntactically the object of a verb ni'yaj 'see'. Here are some examples:

We might compare expressions in English like 'they looked with envy on...'

Verbs so far that have this kind of complement structure are niha' rá 'be happy (with)'; aran' rá 'like', uun rá 'think (of)', chumán rá 'believe in', ...

The most frequent structure is EmotionVerb SUBJ1 ni'yaj SUBJ2 man OBJ, where SUBJ2 is a pronoun coreferent with SUBJ1. If the whole construction appears in a relative, SUBJ1 can be omitted:

Sunday, January 29, 2012

Stranded accusatives in Copala Triqui

A small lesson for me on why it is useful to work through texts -- you encounter things that you would not have known were possible. Today I found three examples of stranded accusative markers in the Copala Triqui New Testament translation, and this is a construction that I did not know was grammatical!

I think it probably important that all of these appear in relative clause contexts -- ''the apostles that Jesus named ACC", "the ones that Jesus named", and "the ones that you put in jail ACC".

Saturday, January 28, 2012

Why has the evil one entered your heart?

At least two things strike me as interesting about this passage. One is that 'your heart' doesn't require an obligatory accusative, even though 'you' would. So possessed body parts occupy an interesting place on the hierarchy of direct object animacy. The other is the phrase ndaa rihaan. Ndaa alone means 'up to' and rihaan alone is 'to, for'. The combination seems to mean 'even up to' or an emphatic version of rihaan.

Friday, January 27, 2012

More Triqui New Testament

I discovered today that Wordbibles.org has a copy of the Copala Triqui New Testament in a format that is considerably easier to use than the PDF version of the Testament. When you cut and paste from this site, all the font information is preserved (particularly the underscore diacritic and the saltillo.) For the right column, you can cut and paste from an English translation.

I've been using copy and paste from this site along with Translation Editor to produce parallel versions of the first few chapters of Acts in this way.

Transferring the first sentence over to FLEx gives us

There are still some rough patches, but enough of the vocabulary is familiar to make this worth doing.

Thursday, January 26, 2012

The body was formed of mud and so it can easily be destroyed.

Working on this passage in Colonial Valley Zapotec. Verbs are generally harder to work out in this lg than other parts of the sentence. I was pleased to work out several new nouns, however!

Wednesday, January 25, 2012

Improved text statistics in FLEx

I was pleased to see that the latest release of Fieldworks Language Explorer (7.2 beta) has much improved statistics. I had been estimating the number of words in my Timucua corpus as 20,000 in some recent papers, but running the stats now shows that there are nearly twice as many words as I thought. (A pleasing discovery!)

Tuesday, January 24, 2012

23rd psalm in Choctaw

Because sheep aren't native to North America, the translation of the 23rd Psalm has always been a little difficult in Native American languages. Choctaw uses a compound that originally meant 'tame rabbit' as its translation of 'sheep' and a shepherd is one who watches over tame rabbits. (For modern speakers, of course, 'tame rabbit' is an opaque compound for 'sheep').

The following is the 23rd Psalm in Choctaw with a rough analysis of the first verse.

(I do have a few puzzles about this verse -- the 'because' suffix is usually -hatoko, rather than yatoko...)

Monday, January 23, 2012

Searching Whitecotton

One of the challenges of working with Colonial Zapotec is that although there is an enormous dictionary, it is a Spanish-to-Zapotec dictionary, in only a vague approximation of alphabetical order. Joe and Ruth Bradley Whitecotton produced a very valuable resource in 1993 when they produced a Zapotec-to-Spanish reverse version. This is much better for trying to go from Zapotec to Spanish, but still there are problems --

Zapotec verbs normally have a prefix, so verb stems might be distributed among different aspects, relativized forms, nominalized forms, etc. Nouns might appear alone, but also as parts of compounds or in variant spellings. It would be nice to be able to find all of the relevant forms. People at UCLA produced a PDF version of Whitecotton (which is otherwise hard to obtain), but this is an image file and not searchable.

After a little fiddling, I was able to apply optical character recognition to this PDF, resulting for the first time in a searchable version of the Whitecotton and Whitecotton Colonial Zapotec -- Spanish dictionary. In the first afternoon of having it available, I'm already finding words with more ease.

Title page for the Whitecotton and Whitecotton dictionary

PDF version, now processed via OCR and searchable

The foul body

Continuing the wax and the wick analogy, more on the filthiness of the body...

Saturday, January 21, 2012

The wax and the wick

This passage from the Feria doctrina (1567) is in Colonial Valley Zapotec.

I can't completely make out the Spanish, but the following images show a very rough analysis.

Wednesday, January 18, 2012

Words for think in Triqui

Why the two different words for 'think' in this passage from the Copala Triqui New Testament? It seems that maybe guun rá introduces a quote, while nuchruj raa is more of an intransitive

Monday, January 16, 2012

Timucua doctrina (Movilla 1635)

The cover page and first folio from the 1635 Movilla Doctrina:

The document (apart from the cover and prologue) is entirely in Timucua. A small breakthrough was finding the Spanish original which this is a translation of. As the Timucua cover page explains, this is a translation of a doctrina that Cardinal Belarmino composed at the command of Pope Clement the 8th.

The Spanish version was available online at the National Library of Portugal. Here is an image of the cover page:

After a bit of examination, it was clear that there was an excellent correspondence, with matching proper names, sections, etc. That made it possible to read and analyse the Movilla.

A colonial Zapotec will

I love this image of a will written in Zapotec from 1740 (I think!) The Zapotexts group at UCLA has been working out the analysis of this document.

Our current journal article

Our latest journal article, hot off the press at the Journal of Natural Language Engineering. (A collaboration between linguistics, communication, and computer science.)

You can get a PDF of the article at my page at academia.edu

Saturday, January 14, 2012

Speaker dictionary vs linguist dictionary in Zapotec

The idea for the San Dionisio Ocotepec Zapotec dictionary is to have two versions

The dictionary for linguists will have

the regular orthography with phonation, length, and tone indicated
no separate pronunciation
English before Spanish
it also includes clitics and affixes
possibly it includes more loan words that appear in texts (but which speakers may not want in their dictionary)

The dictionary for native speakers will have

the practical orthography
a separate pronunciation field that shows phonation, length, and tone between [ ]
Spanish gloss before the English

It is not too difficult to produce both these dictionaries in FLEx. You want to go List | Publications and add "Linguist's dictionary". By default, most lexical entries will show in their Publications Setting that they appear in both the Main Dictionary and the Linguist's Dictionary:

For items that you don't want to appear in the Main Dictionary, you find the entry, and delete "Main Dictionary" from the Publish In list.

You can do this manually, or you can Bulk Edit the lexical entries. You filter the entries to show only those with +, -, or = in the Lexeme form field. (Or for Borrowed in some field, or whatever you want to exclude.) You can then use the Bulk Edit to change the value of the Publish In field to whatever you want.

Once you have two dictionaries listed in the Publications list, you can use the Tools | Configure menu to include, format, and order fields differently in the two dictionaries.

A remaining puzzle now -- the regular orthography that I designed a few years ago now seems too complex for Zapotec speakers to use. It is directly transferable to a phonemic orthography, but uses Spanish-influenced spelling (like <c> and <qu> for /k/, <ch> for /tʃ/, etc.). My previous papers on the language used this orthography. But does it have any real use now? Should I just write things for linguists in IPA? (Confession -- I think IPA is too cumbersome for anything but phonology...)

Differences between San Pablo Güilá Zapotec and San Dionisio Ocotepec Zapotec

In the Ethnologue classification of languages, San Pablo Güilá Zapotec and San Dionisio Ocotepec Zapotec are supposed to be two dialects of the same language. (They both have the Ethnologue code ztu).

These two towns are both located in the southern Tlacolula Valley area of Oaxaca and they are adjacent to each other. The main source of information I have for Güilá is the 2009 dissertation of Francisco Arellanes.

[Arellanes, Francisco. 2009. El sistema fonológico y las propriedades fonéticas del zapoteco de San Pablo Güilá. Descripción y análisis formal. Tesis doctoral. Colegio de México.]

However, as I work through this thesis, it becomes clearer to me that the Güilá and San Dionisio varieties are more different from each other than you might expect if they were simply dialects.

Some differences:

Presence of a /ɨ/ phoneme in Güilá. There is no phonemic /ɨ/ in San Dionisio, though [ɨ] appears as an allophone of /i/ after /ts/ and /dz/ affricates. Sample contrasts :

Güilá gɨ:ʒ 'roncha (spots on the skin)' Arell. p 150/ San Dionisio ge:dʒ 'grano (pimple)'
Güilá mɨ:lj 'dinero (money)'/San Dionisio me:lj

Affricates /dz/ and /dʒ/ in San Dionisió; these have simplified to /z/ and /ʒ/ in Güilá. (This is a sound change that Güilá shares with other Zapotec such as San Lucas Quiaviní Zapotec.) The previous example also shows this correspondence

Güilá gɨ:ʒ 'roncha (spots on the skin)' Arell. p 150/ San Dionisio ge:dʒ 'grano (pimple)'
Güilá na:ʒ 'mojado (wet)'/San Dionisio na:dʒ

Fortis L is /ld/ in San Dionisio but /l/ or /lθ/ in Güilá in at least some positions (syllable final only?)

Güilá bel: 'pescado (fish)'/San Dionisio beʰld

Other kinds of changes (affecting loans)

Güilá btʃar 'spoon'/San Dionisio kutʃarr 'spoon' (< Spanish cucharra 'spoon')
Güilá ta:ɸ 'cuento (account)'/San Dionisio ta:bl (< Spanish tabla 'table)
Güilá pun: 'punta (first pressing of mezcal)'/San Dionisio punt (<Spanish punta)

[BTW, I do not really understand the semantic shift here; possibly there are meanings of cuento and tabla in Spanish that are more similar to each other that I am unaware of?]

Friday, January 13, 2012

Draft of San Dionisio Ocotepec Zapotec dictionary (in revised practical orthography)

Below is a screenshot of a current version of a dictionary of San Dionisio Ocotepec Zapotec in a simplified orthography. I do not have the phonetic field consistently filled out, so it shows up for some entries (like bduld) but not others. (I mostly filled it in for entries where there is something unusual, like one of the diphthongs.)

I am debating whether it would be better to leave this out of the printed version or whether I ought to populate this field for other entries.

Thursday, January 12, 2012

Producing the Triqui address

Here are the first few lines of "Words of counsel for the Triqui people" (Nana̱ naguan' rihaan nij síí chihaan') as a Word document:

We produced this by first entering the text in the baseline of FLEx. The advantage of doing this is that we were able to do an interlinear analysis and double check the spelling and gloss of each word, adding any new items to the lexicon. Here's the baseline view:

And here is the Analyse tab view:

We used the analysis to constantly revise the baseline spelling. Note also that we have both the practical orthography (soj, me) and the phonetic spellings (zoj³, me³).

A mistake we made was copying the baseline text to a word processing document too soon. Although we thought we had the final version, we discovered various additional errors that needed to be corrected. Perhaps we should have deleted the Triqui from the word processor and pasted it in again (though that would require a certain amount of clean up). But thinking that it was just a few words, we changed items in FLEx and then did the equivalent change in Word.

The danger of this approach is that we risked getting the two versions out of sync with each other, and we had a small catastrophe with a missing paragraph in the Word version. A lesson for the future is that we need to delay copying anything from baseline to the absolute last minute!

Wednesday, January 11, 2012

Triqui testament to FLEx, part 4

The following shows what this looks like in the Analyse tab for FLEx.

(Click to enlarge)

Working through this is a somewhat slow process, but I'm finding lots of new phrases and constructions this way. It expands the collection of texts to include a genre that I don't otherwise have good data for.

Triqui testament to FLEx, part 3

After clean-up, I used the Translation Editor function in FLEx (Bible translator edition) to neatly line up the Triqui version and its Spanish translation:

(Click to enlarge)

Then I used the Tools > Back Translation > Use Interlinear Text Tool to look at the same text in FLEx and to work on the Interlinear analysis.

Triqui testament to FLEx, part 2

After downloading the PDF, you can cut and paste into a word-processing document. Unfortunately, the underline and the glottal stop don't come out correctly, so it requires a bit of clean-up. Here is a sample page from the testament and the first stage result of cut and paste...

Result of cut and paste

Getting the Triqui New Testament into our FLEx project, part 1

For the last couple of months, I've been working on trying to get text from the Copala Triqui translation of the New Testament into the Fieldworks Language Explorer project to try to expand the range of textual and lexical material that we have available.

The PDF version of the text is available at worldbibles.org. Here is the front page of the volume

(A small note -- the actual published paper volume reflects an earlier orthography where low tone is marked with a line above the vowel. In Hollenbach's current orthography, this has been replaced with a line under the vowel.)

Tuesday, January 10, 2012

Simplified orthography for Zapotec

I'm trying a fairly radically simplified practical orthography for San Dionisio Zapotec, which doesn't show tone, vowel length, or phonation contrasts other than checked.

The following FLEx screen shows the options, where Lexeme Form is the full phonemic spelling and Citation Form is the practical spelling.

Zapotec valence

Reading through the description of a proposed volume for Brill on valence-changing devices in Zapotecan languages. I'd like to contribute something on San Dionisio Ocotepec Zapotec.

Following Kaufman, the editors (Aaron S and Natalie O) talk about *o- as one of the markers of causative, but I have always thought of it in the terms that Briggs (1961) (and maybe Pickett, but I don't remember) laid out, where some of the aspect markers have two allomorphs. So the habitual is ru- for some verb classes and ri- for others, but the -u and -i portions of these prefixes are not synchronically segmentable.

Also, an interesting comparison to the causative in Copala Triqui, which is usually tuk- added to the verb. The t- portion here is cognate to si- or ti-/di- prefixes in Zapotec and Chatino; the u- is possibly cognate to *o-, and the k- to the fortition prefix.

Incongruous lexical units and metaphor identification

I picked up this book by Gerard Steen et al. about metaphors at the annual LSA meeting. I need to summarize it for the rest of our team. A key idea in their method for finding metaphors is the identification of 'incongruous lexical units' which may be signals of 'cross-domain mapping'. The 'incongruous' idea is related to work by Charteris-Black and his students.

Monday, January 9, 2012

Timucua Lord's prayer, version 2

Here is another version of the Lord's Prayer in Timucua (Movilla 1635). Notice that in this one as well, the Timucua says 'may we be given eternal life' instead of 'thy kingdom come'.

The author (or editor) of the second version was Gregorio de Movilla (1635). So a little more than 20 years later, he retained the 'eternal life' part, but simplified the 'being brought together with the Lord in heaven' from the 1612 version.

Timucua Lord's prayer, version 1

The following version of the Lord's Prayer is from Timucua. I've given a rough interlinear gloss.

I think the literal translation is something like " ‎‎Our father who lives in heaven, may your name be honored and known, may our souls (be) come together with (where?) the Lord lives, (in?) eternal life."

The puzzle for me is why the Spanish priest who translated this (Fr. Pareja in about 1612) didn't try to translate something more directly equivalent to 'thy kingdom come'. Does this show that he considered 'kingdom' an inappropriate idea for conveying Christianity to the natives?

They didn't have their own kingdoms, and the only king they probably knew about was the King of Spain...