The BNC/Coca family lists are based on very large corpora, with families as complete as possible in to classify every word of any text (in, e.g., VP). But even K-1 to K-3 families may contain members that learners will never meet, or which appear mainly in specific text types (medicine, engineering). Thus the case for reducing these lists to their essentials in initial or specialist learning. Nuclear List Builder "crosses" family lists against word frequencies in a smaller resident corpus (1-4 million words) or user specialist corpus (up to ≅ 800,000 words) to obtain a list of just the family members that are frequent in that corpus. Read a
paper or summary that applies this idea to English (French en route)