Home > Multiwords > Phrase extractor
 Phrase extractor v.1.2
      Find collocations in a text/corpus with MI

  See also related N-Gram (lexical bundles)
Collocations are high mutual-information phrases that are more frequent qua phrases than the average frequency of their component words separately. E.g., in some corpus, Puerto Rico freq=10, Puerto freq =11, Rico freq=11. IE, thewords in this phrase appears mainly in this one phrase, not in other phrases or independently. The frequency of Puerto Rico is 10, averaged frequency of the component terms is (11+11)/2 = 10.5, so the ratio of phrase to word frequency is 10:10.5, or .95. MI becomes interesting at ratios of about .5 or .7 (see examples in the sample corpora provided). Max 800k words.

(1) Upload TXT
       (Only first 700k wds are used)

OR

Choose Big Text/Corpus

  (2) Choose
      M-I
      level

HIGH
fewer/better

.9
.7
.5
.3
.1

LOW
more
(≅ N-Gram)

  (3) Click     (4) Get Result >