Thursday, February 16, 2012

SayMore to FLEx

I've been experimenting today with SayMore, which is a way of organizing language documentation materials.

I think I'm probably typical of many people doing this work in that my materials are scattered across several locations -- on the hard drive at my office, on my laptop, in Dropbox -- and they are in several different formats (sound files, text files, some images, etc.)  One of the potential advantages of SayMore would be using it to organize links to all these things and keep the metadata about speakers, permissions, dates, formats, in a single place.  This screen, for example, has information about the various contributors and their roles.  There is a central repository for these, where you can keep background info (age, native language, contact information, permission form).  In these screen, Román Vidal López is the speaker in this audio clip.




I've also been interested in the new Annotation feature (present in the Alpha release), which provides a way to do first-pass transcription of audio and then export the annotations into a format that FLEx can read, for further analysis and correction.

To test this out today, I got an audio portion of the Address to the Triqui people and added it to SayMore.  (I did something wrong because the file comes out with the name NewEvent...)


When you have this in a format that you like, you can export it:


It ends up in a format called FLEx Interlinear XML.

You can now import this into a FLEx project.  Here is what the screen looks like after File | Import |FLExText Interlinear.


You browse to wherever the export file was located.  One little thing is that the default extension for an import file is .flextext, so at first you won't see your file.  You need to click on the FileType button and select .xml.


After you do that, (if everything is working right), you should see your transcription as an Interlinear Text in FLEx:


Overall, I was fairly pleased with the whole process.  There were one or two little glitches that may improve in the later releases:

a.) Before you do the annotation, you need to do a segmentation of the audio into little chunks.  You can then listen to these chunks at any speed you like, and they repeat until you've filled in the Transcription line to your satisfaction.   However -- if you start doing the transcription and discover you've accidentally put the segment boundary in a bad place (perhaps you cut off the last vowel), it does not seem like you can ever edit the segmentation.  At least, it wasn't obvious to me how to do this.

b.) To transcribe the Triqui properly, I needed to be able use the underscore diacritic (for low register tone).  I have a Keyman keyboard that allows me to do this properly in FLEx, but it did not seem that it would work in SayMore.  Possibly I missed some dialogue that would allow me to pick the font to a Unicode font that supports this diacritic?  Or something that allows me to pick my keyboard?

Still, these are fairly minor problems, and I think I could easily see using this for my next text transcription session.

Compared to ELAN, Praat, or Transcriber, the SayMore annotation tool is less elaborate.  (In my view ELAN is way too elaborate and unwieldy for my needs.  Praat is more a tool more suited to a phonetician's needs, and Transcriber works fine but its output does not import into FLEx in any straightforward way.)

Since my working style relies heavily on FLEx, anything that imports smoothly into that program has a huge advantage.  It is possible that if I were more focused on phonological, intonational, or gestural properties of the texts, it would be worth spending the time on something like Praat or Elan.  But since my interest is more keyed to morphosyntax and lexicon, I like a fairly light transcription tool that will help me do first-pass transcription, and SayMore looks promising for that.

2 comments:

John Hatton said...

Aaron, thanks for the review and feedback. We've updated the export to use the ".flextext" extension used by the newer FLEx versions. You mentioned that this was an alpha test version, and I'd like to emphasize what that means for folks... this version is extremely experimental, with lots of new features but lots of unfinished parts, too. We strongly recommend that people avoid the alpha test version unless they want to help us by testing and reporting problems. With your help, we'll get a beta out in a month or two and follow that with a stable release by mid-year.
John Hatton
SIL International

Jack Martin said...

This was very helpful. It looks as though you can change the segmentation by selecting the segment and pressing DELETE. Any transcription associated with it moves to the previous segment.

I'm a little confused about how these time values in the segmentation will be used by other programs, though. It's a lot of work dividing the text this way, so one hopes FLEx or ELAN will be able to read these time values.

I also need a Unicode underscore and turned to Language Geek's font for Choctaw: http://www.languagegeek.com/keyboard_general/all_keyboards.html. That seemed to work fine, and it's free.