Wednesday, October 30, 2013

Bulk edit, character formatting, and regular expressions

I am continuing to proofread and correct the Copala Triqui online dictionary at copalatriqui.webonary.org.

One error that crept in at some point was the use of the wrong character styles for accented letters and anything that follows them.  (I think perhaps I cut and pasted these examples from a Word document...)  Consider the following entry:


In this entry, the accented characters are smaller than the characters before them in a way that looks odd.

In FLEx, I found that these characters don't have the default paragraph type, but instead are in a character style called "Character Style 2".  Checking on Character Style 2, I see that it is 12pt, while the Normal Text style is currently set at 16.  I change the Normal style up and down depending on what screen is being used; when we project the entries on the wall for work with our language consultant, I set it to a large font for better visibility.

 I suppose I could have redefined the font characteristics of Character Style 2 to 16pt, but it seemed to me that it would be a pain to always have to remember to change both the Normal Style and Character Style 2.

After trying various regular expressions, I found that this one gives the desired results:


The parentheses set up a capture group, which is referred to by the variable $1 in the Replace With field.  So this regular expression says "look for any character in áéíóúńj with Character Formatting 2 and make that the 1st variable" then "replace it with the 1st Variable but with Default Paragraph Characters format"

Some notes:
  • I included n and j because the phonotactics of Triqui make these the only consonants that would follow an accute accent
  • You can access the formats under the Format button. It seems like you need to select the whole Find or Replace string first to apply this format.
  • In the Bulk Replace window, I set Example Sentences as the Target Field.
After doing this, I exported the project as XHTML again and uploaded it to Webonary.  Here is the same entry after the changes: