Saturday, March 2, 2013

More examples of regular expressions in FLEx

This regular expression is used to help me separate out the right allomorph of the habitual prefix of the Colonial Valley Zapotec verb.  The habitual aspect comes in a few allomorphic variants, which are orthographically usually written <te, to, ti>.  I want to strip this from the citation form and put it in a separate field called Cordoba habitual form.

The procedure is to copy the whole citation form to the Cordova habitual form, then use a regular expression in bulk edit to remove everything except the prefix.   The following search and replace comes fairly close to what I want:


This bulk replace operation looks for the beginning of the record (^), then any number of characters (.*), then t followed by one character (\w).   I put the sequence t & one character inside the capturing parentheses, because I want to be able to refer to it in the Replace operation on the next line.
After the parenthesis, I have any number of characters (.*) and the end of the record ($).

This is replaced by t, whatever the letter that was captured in the previous parenthesis was ($1) and a hyphen.

So if the first line finds   blah blah ti+capaya blah blah  it will be replaced by ti-.

This mostly works -- except that the way the original Cordova entries came to me, there are sometime several Zapotec verbs listed together in a single entry; possibly with differential potential prefixes.  So I have to give the results a visual inspection to make sure nothing has gone wrong before hitting the Apply button in Bulk Edit.  If there is a record where this will give the wrong result, I can just uncheck it, and edit it manually.


The counter part to this regular expression is the one that takes the citation form, strips off the habitual prefix and returns just the portion minus that prefix.   The search and replace that will do that is the following:



The Find expression looks for a t followed by one word-forming character and a literal (+) at the beginning of a record (^).  It starts to capture everything from here to the end of the record ($) and stores it in the variable $1.   When this search and replace is applied, the effect is to take a word like

ti+capaya

and replace it with capaya.

No comments: