COVERAGE CALCULATOR

Home > Coverage

Coverage Calculator v.3 CHECK + UPDATE 3 DEC 2025
The percentage of list words in a corpus

This program calculates how many times the words on a list appear in a corpus. A list of the 2,000 most common word families is often said to 'cover' up to 80% of the individual words (tokens) in a general corpus of English - i.e., 80% of the word tokens in the corpus will be words from that list. || Treatment of proper nouns is a checkbox option.|| Headword lists can be expanded into family/lemma lists here || List coverage in texts can be calculated here (Demo 7). || Known max of this routine 2024: 13,000 wds in list by ≈ 1 million wds in corpus (test corpora/texts will be reduced by program if needed)

SELECTED RESEARCH: > 1. Nation (2006) 2. Laufer Ravenhorst (2010) 3. Schmitt Jiang Grabe (2011) 4. Schmitt Cobb et al (2015) 5. Laufer (2020) 6. Cobb Laufer (2021)

DEMO LISTS
BNC/COCA
1-3k 1-4k 1-5k 1-6k 1-7k 1-8k
~ NUCLEARIZED 1-3k
As per Cobb & Laufer 2021 [?]
nfl-1 nfl-2 nfl-7

v.2 Mar '26 NFL-0 based nuclear

With + layer of nuclearization

family consolidation & build-out

NFL-0 HyperFams
1-3k || 1-4k || 1-5k
NUCLEAR French (1-3k)
Listes de fréquence nucléaire françaises
As per Cobb, Lindqvist & Ramnas (2026)
fr_lfnf-0
fr_lfnf-1
fr_lfnf-5
Updates may
be available at
Nuc. List Builder
(1) Select or paste + name LIST

the is of and a an
(2) Choose Test Text
(Eng then Fr; chopped to <1m)

(4) Click

(5) See result (Propers/leftovers at bottom)

stats count