Home > Coverage
 Coverage Calculator v.3 CHECK + UPDATE 3 DEC 2025       
    The percentage of list words in a corpus
This program calculates how many times the words on a list appear in a corpus. A list of the 2,000 most common word families is often said to 'cover' up to 80% of the individual words (tokens) in a general corpus of English - i.e., 80% of the word tokens in the corpus will be words from that list. || Treatment of proper nouns is a checkbox option.|| Headword lists can be expanded into family/lemma lists here || List coverage in texts can be calculated here (Demo 7). || Known max of this routine 2024: 13,000 wds in list by ≈ 1 million wds in corpus (test corpora/texts will be reduced by program if needed)

SELECTED RESEARCH: >   1. Nation (2006)   2. Laufer Ravenhorst (2010)   3. Schmitt Jiang Grabe (2011)   4. Schmitt Cobb et al (2015)     5. Laufer (2020)     6. Cobb Laufer (2021)  


DEMO LISTS
BNC/COCA
1-3k 1-4k 1-5k 1-6k 1-7k 1-8k

  ~ NUCLEARIZED 1-3k 
  As per Cobb & Laufer 2021   [?]
    nfl-1 nfl-2 nfl-7  

   v.2 Mar '26 NFL-0 based nuclear
    With + layer of nuclearization
    family consolidation & build-out
NFL-0 HyperFams
 1-3k || 1-4k || 1-5k  

NUCLEAR French (1-3k)
Listes de fréquence nucléaire françaises
As per Cobb, Lindqvist & Ramnas (2026)

fr_lfnf-0
fr_lfnf-1
fr_lfnf-5
Updates may
be available at
Nuc. List Builder
(1) Select or paste + name LIST

 

  (2) Choose Test Text
      (Eng then Fr; chopped to <1m)

 

(4) Click

   

(5) See result (Propers/leftovers at bottom)