Wednesday, June 27, 2012

Farsi language learning software

I am trying to learn Farsi this summer for one of our research projects.

I've been exploring two online language learning systems.  One is Headstart2, developed by the military for soldiers.  But it seems that anyone can register and use the system for free.

Some features for Headstart2

  • In each unit, there is an initial vocabulary introduction section, then a subsequent group of exercises in increasing difficulty that involve using this vocabulary.
  • For example, in the following screenshot, the system is helping the user learn words that describe people, such as 'old', 'fat', 'man', 'blonde', 'tall', etc.


The question at the bottom asks "Who is the short man?"   The pictures alone are often enough to answer, but the text underneath the first picture ("Amir is short") also helps, especially when it is not completely obvious from the picture.
  • Some tasks can be rather complex, as in the following screen, where you need to be able to read the names of cities, days, and times in order to find the right flight.


Some likes and dislikes:
  • You can always go back and forth in the lessons to double check vocabulary.
  • You can play sound clips of all the words
  • The system also teaches you to read the Farsi alphabet
  • The lessons get complex fairly quickly.  I think it might be a bit overwhelming for some users.
  • There is a link for a 'glossary', but this doesn't work for me.  So there is no single place that all the vocabulary is collected
  • There are tests at the end of each chapter, so you can check your progress.

The second system is Mango .  My library system has a subscription for patrons.

Mango has a simpler rotation of activities.  It is always along the lines of 

English1   ---  Farsi1
English1  ---  ?

where at the prompt, you are supposed to say the Farsi word.  The system is based on a spaced repetition model, so it is similar overall to Pimsleur, but it presents the material in both written and audio form.

Here is a sample screenshot


If you mouse over the words, you can see their transcription or play any word individually.  The play button at the left plays the whole thing.   

The general approach for each section is to present a long phrase, then break it apart into constituent parts, drill each separately (interleaving them), then recombine them into the whole phrase at the end of the section.

Likes and dislikes
  • As with Headstart2, there is no central vocabulary that collects all the words.
  • All the words are presented in Farsi orthography.  Since I know how to read this alphabet, this works for me.  But I also tried a little of the Mandarin course, and without knowing any characters, it was much, much more difficult to do the lessons.  (Essentially it then works without any readable written portion, if you exclude the mouse-over transcriptions.)
  • The variety of tasks is low.
A possible weakness of both programs -- I see drilling/repetition of words within a chapter.  But once you've left the chapter on colors (for instance), the color words don't seem to reappear in the subsequent chapters, so it would be easy to forget them.

Both systems are far advanced above what is available for the endangered languages I know.  Would it be possible for one of these authoring systems to be made available to people working on small/endangered languages?  

I know that Rosetta Stone is working with Navajo and Chitimacha on language preservation and revitalization projects.  Unfortunately, I don't have access to their product on Farsi for a comparison to the other two.

Monday, June 18, 2012

Homograph numbers as an aid to merging entries

In a previous post I mentioned the issue of merging what are separate entries in Cordova's dictionary of Colonial Valley Zapotec.

Today I experimented with using the Homograph Numbers feature in FLEx to help me locate these entries.  In the configure columns menu, one of the options is Homograph Numbers.  I chose to restrict that entries with a homograph number greater than 0.  (Ordinary entries don't have homograph numbers; they are only assigned when two entries have identical Lexeme Forms.)



This locates about 6000 entries where there is a homograph.  In this screenshot, zèni is both 'tomar en la mano' and 'tener en la mano', so these entries should be merged.

Unfortunately, that still leaves me with lots of entries to look at -- some of them should be merged and some should not, but you need to look at the translations to decide.

I think the homograph number method would also fail to catch entries that are mostly identical, but differ by the placement of an accent, e.g. if there were an entry zéni that means the same thing as zèni.

Friday, June 15, 2012

Merging entries from Cordova's colonial Zapotec dictionary

A couple of weeks ago, I managed to import all of Cordova's 1567 dictionary of Spanish into my FLEx project.

Since that book is Spanish -- Zapotec, one thing that is revealed by created a project that can sort on the Zapotec is how many times Zapotec words are listed under multiple translations in Spanish.  This gives us a much better sense of the range of meanings of the Zapotec word.

Consider the following merged entry for aba-queta 'fall', which combines information from six different Spanish - Zapotec entries

.  

Here it would not have been too difficult to see this, since all but one are listed under caer 'fall' in Spanish.

But the value is much clearer when the Spanish glosses are more various.   Consider the following entry for aate 'harden',  where the various Spanish glosses are not alphabetically adjacent.  Nor is the idea that the same word is used for freezing and hardening obvious.


Thursday, June 14, 2012

Senses of psychological verbs in Copala Triqui

It is very difficult to accurately translate all the different psychological and emotional verbs into English or Spanish, since Triqui divides the world up somewhat differently.

Case in point is the verb aráya'anj , which has a range of meanings that includes 'be amazed at something; be concerned about someone; be alarmed at something.'   Working through the large corpus with this verb allows us to find examples that show the different shades of meaning.



Here's an example of the 'be concerned' sense:


Here is an example of the 'be amazed' sense


And here is an example of the 'alarmed'  sense



The common semantics of this verb seem to include an aspect of worry/lack of knowledge about the future and concern for people affected in the future.

We also see this verb in combination with the ni'yanj psych marker to show an attitude of amazement toward someone.


Wednesday, June 13, 2012

Accidentally working with different project versions in FLEx -- how to fix the problem

I was away for a couple of weeks and my graduate student was working with our Triqui FLEx database.  She and our native speaker colleague were looking for entries where there was no example sentence and he was creating examples for these entries.

Unfortunately, when I got back and we looked at the project together, we saw that she had accidentally opened and modified an old version of the project.

At first, I thought we would just have to look for the entries modified in the old project and cut and paste them to the new, but after a bit of thought, I found a much easier way.  Since this problem is likely to arise in any collaborative FLEx project, it's possibly useful to others as well.

Since I knew the dates that she had worked with our speaker while I was away, the first step was to go to Bulk Edit and add the Date Modified field to the Column Choices.  The Restrict choice lets you select the dates that you want to look at.


Once I had done this, I could see that there were 14 entries modified during this time.

So I exported these entries from the older version of the project via LIFT.  Then I imported the same entries into the current project version.



This resulted in a little bit of duplication.  Luckily the import log that is produced showed me all the entries where this was an issue:


What I needed to then was to look at the entries for the listed conflicts.  The first one involved duplicated entries, so I used the Merge Entry feature to combine the two forms of ananj chij.  The other three conflicts involved duplicated senses, so I went to each entry and merged the senses.  (Pull-down menu to the left of the Sense label.)

It's still best to try to avoid working on different version of a project, and I don't know what the solution would be if the interlinear texts had been modified.  But if the accidental use of different versions only affects lexical entries, then
  • filtering by Date Modified, 
  • exporting entries from Project A, 
  • importing entries to Project B, and 
  • checking the Import Log for conflicts
results in a much simpler solution than cutting and pasting.

Thursday, June 7, 2012

Culling blanks from the Colonial Zapotec

In the process of importing entries from Cordova dictionary of Colonial Valley Zapotec, I forgot that I needed to weed out the cases where the Zapotec Lexeme Form is blank.

This is generally because the original entry in Cordova is a cross reference.  For example:


Where we don't have a Zapotec word for the entry under Cepo de animales.  The underlying file from Thom Smith Stark's group has all of these listed as records with no Zapotec form.

That doesn't make much sense in a Zapotec- Spanish - English dictionary, so I weeded those out tonight. That results in a dictionary with 46,712 entries.  (There are still plenty of duplicates in there also...)

Here's a screenshot with the new number:


Sunday, June 3, 2012

The mercies of the Spanish man in the 16th century

From the Zapotec Doctrina of 1567.  There are still some parts of the Zapotec analysis to be worked out,  but of interest for its horrifying gender politics!

(Click to enlarge, if you dare...)


Friday, June 1, 2012

Cordova import complete!

I finished importing entries from the massive Cordova dictionary of Colonial Zapotec into the FLEx project today.  Lots of entries will need editing and clean-up, but it's great to have them all there finally.   At this point there are about 47,000 entries in the lexicon for Colonial Valley Zapotec -- lots of them probably will need to be merged, since each entry represents a different Spanish language translation in the original (with perhaps several different entries corresponding to the same Zapotec word).  But on the other hand, a lot of entries also need to be split, since more than one Zapotec word shows up in the entry.

Here's the screenshot of the last form that was imported.  Note the nice big number at the bottom on the left :)