Monday, June 18, 2012

Homograph numbers as an aid to merging entries

In a previous post I mentioned the issue of merging what are separate entries in Cordova's dictionary of Colonial Valley Zapotec.

Today I experimented with using the Homograph Numbers feature in FLEx to help me locate these entries.  In the configure columns menu, one of the options is Homograph Numbers.  I chose to restrict that entries with a homograph number greater than 0.  (Ordinary entries don't have homograph numbers; they are only assigned when two entries have identical Lexeme Forms.)



This locates about 6000 entries where there is a homograph.  In this screenshot, zèni is both 'tomar en la mano' and 'tener en la mano', so these entries should be merged.

Unfortunately, that still leaves me with lots of entries to look at -- some of them should be merged and some should not, but you need to look at the translations to decide.

I think the homograph number method would also fail to catch entries that are mostly identical, but differ by the placement of an accent, e.g. if there were an entry zéni that means the same thing as zèni.

No comments: