Monday, January 20, 2014

Discovering distribution differences in the Timucua texts with Voyant

Voyant Tools is a set of text exploration tools for use in the Digital Humanities.  I started using it today to see what it might be able to tell me about the Timucua texts.

Within FLEx, I filtered the texts so that only those with 1612 Cat in the title were included, and I altered the lines visible in Interlinear View so that only the Timucua is shown.  That looks like this:


The logic is that we want to compare the Timucua in this set of texts with the Timucua in another set of texts.  (If we included the glosses, translation, etc. this would cause calculations to be done on the frequency of these elements).

After all and only the 1612 texts were selected, I used the export command in FLEx to output them all as generic XML.

Then I did the same thing with a portion of the Confessionario text (Pareja 1613).  (Because I had been inconsistent in text names early in the Timucua project, it was not as easy to filter so that I got all and only Confessionario portions.  I should be able to fix this by either correcting text names or adding Genre information.)

I uploaded the two texts to Voyant, which ran its software over the two sets.  It discovered two interesting things which I had not noticed.  Once is that the word nanacu 'because' is very frequent in the 1612 Catechism (87 instances), but does not appear at all in the Confessionario.  A second fact is that the relative clause/previous mention suffix -michunu is frequent in the Confessionario (11 instances), but not found in the 1612 Catechism (0 instances).

Here is a screenshot of the Voyant output page.  The distinctive words box was one of the most informative.



I went back to FLEx to run concordance searches on these two, just to make sure. On the larger corpus, I did find 8 instances of -michunu in the 1612 Catechism, but this is compared to 48 examples for the Confessionario.

Partially the content difference between the two may explain some of this -- michunu is mostly a marker of a previously mentioned noun in discourse, and the Confessionario has many more short narrative portions, so that one might expect more instances of an affix that correlates with narration.

But I do not see a good reason to expect the distribution of nanacu 'because' to be so different between the two texts.  For that reason, I think this word may lend further support to my hypothesis that the two texts have different native authors, even though both have the Spanish priest, Francisco de Pareja, listed on the cover page as the author.

No comments: