NUCLEAR INPUT

Home>Frequency>Nuclear input ::: UPDATE 2025-01-28

Nuclear List Builder v.4
Reduce a family list to frequent members
+ NEW - DERIVATIONS COUNT || FRENCH FAMILIES

+ Mobile
Jan '25

The BNC/Coca family lists are based on very large corpora, with families as complete as possible in to classify every word of any text (in, e.g., VP). But even K-1 to K-3 families may contain members that learners will never meet, or which appear mainly in specific text types (medicine, engineering). Thus the case for reducing these lists to their essentials in initial or specialist learning.
Nuclear List Builder "crosses" family lists against word frequencies in a smaller resident corpus (1-4 million words) or user specialist corpus (up to ≅ 800,000 words) to obtain a list of just the family members that are frequent in that corpus. Read a paper or summary that applies this idea to English (French en route)

(1) Choose full BNC/Coca
or familized Lonsdale Fr
(FNFL-0)

(2) Choose Cross-Corpus

User upload
(850k wds ; format ~.txt; Enc UTF-8)

Stored corpus

(3) Click 'Get List' to view complete list

FIRST
Explore cutoffs
OPTION: Fam Freqs

(4)

THEN
Choose
cut-offs

(5) Cut-offs↓

Exclude words <=
of Fam

AND
Count < in Cross-Corpus ?

OPTION: Mark derived words "z_" ?

(6) Click

OPTION:
Review % (on/off)

(7) Get Result

$\stats count\$