File Repository

 Texts

 

 

Code

 

 

Analyses of individual texts

Analyses of text corpora

  • Calculation of type-token ratio: ttr.pl. To ensure that the type-token ratios can be compared fairly, even when the corpus contains texts of different lengths, this program only considers the number of types in the first 3000 words. If you want the program to consider a different number of words, you need to change the value of the $maxToken variable.
  • Calculation of the number of sentences, the number of syllables and the Flesch-Kincaid readability index: readability.pl
  • Counts of Part of Speech tags: pos.pl
  • Basic sentiment analysis: sentimentAnalysis.pl. This applications makes use of two files containing lists of words with positive and negative connotations: positive.txt and negative.txt.
  • Script to find to find the words that are unique to specific texts: unique.pl
  • Script to create a term-document matrix: tdm.pl