Home>Frequency>Nuclear input ::: UPDATE 2025-12-12
Nuclear List Builder v.4.3
  Reduce a family list to its frequent members
  + NEW - DERIVATIONS COUNT || FRENCH FAMILIES
+ Mobile
Jan '25
The BNC/Coca family lists are based on large corpora with families as complete as possible in order to classify every word of any text (in, e.g., VocabProfiles). But even K-1 to K-3 families may contain members that learners will never meet, or which appear mainly in specific text types (medicine, engineering). There is thus a case for reducing these lists to their essentials in both initial and specialist learning.
    Nuclear List Builder "crosses" family lists against word frequencies in a smaller (1-4 million words) corpus to obtain a list of just the family members that are frequent in that corpus. Why is this interesting? Read a paper about this, or its summary. (*Parallel French study en route arrivé 14 janvier 2026*)


(1) Choose `
(2) Choose Cross-Corpus

User upload
(850k wds ; format ~.txt; Enc UTF-8)

 OR 

Stored corpus

(3) Click 'Make List' to see complete list

FIRST
Explore cutoffs
Or just
Fam sum


(4)

THEN
Choose
cut-offs

  (5) Cut-offs↓
Include only words >
of Fam

OR
Count > in Cross-Corpus
 ? 

WITh OPTIONS:

Mark derived
words "z_"      ? 

Show %          ? 

Fam sums      ? 

(6)

(7) Get Result   (8) Copy list +   [?]