One
size fits all? Francophone learners and
English vocabulary tests.
Tom Cobb
Département
de linguistique et de didactique des langues
Université du
Québec à Montréal
WWW-Prepublication
of paper to appear in Canadian Modern Language Review 57 (2), 295-324.
Abstract. Which need is
greater, the need for standard measures of vocabulary knowledge, or the need
for vocabulary measures tailored to learners' first languages (L1s)? This
question is explored using placement test data from more than 1000 francophone
students entering English language courses at the Université du Québec à
Montréal in 1997 and 1998. The test consisted of several measures including
a standard vocabulary size test (Nation's, 1990, Levels Test). The study shows
that a standard vocabulary measure can miss important information about
learners' knowledge. It also suggests that an interlanguage sensitive measure
can be a better predictor of broader language proficiency, and concludes that
different tests may be needed for different stages of second language
development.
Résumé.
Qu'est-ce qui est plus important, établir des normes standardisées pour mesurer
les connaissances du vocabulaire, ou créer des tests faits sur mesure adaptés à
la langue première (L1) de l'apprenant? Cette question est explorée en
examinant les résultats de test de classement de plus de 1000 étudiants
francophones apprenant l'anglais à l'Université du Québec à Montréal en 1997 et
1998. Les tests comprenaient plusieurs parties dont un test standardisé pour
mesurer la magnitude du vocabulaire (le Levels Test de Nation, 1990). L'étude
montre qu'une mesure de vocabulaire standardisé peut manquer des renseignements
importants sur les connaissances de l'apprenant. Elle suggère également qu'un
test interlangue sensible plus nuancé sera plus à même de prédire l'étendue de
la maîtrise d'une langue, et conclut qu'il faudrait différents types de tests
pour les différents stades dans le développement d'une langue seconde.
Prospects and problems for a standard vocabulary
test
Advantages of a standard measure
Vocabulary acquisition was once the neglected
area of language study (Meara, 1980), but its theoretical interest and
practical importance are now generally recognized. Even so, the abundant
research into second language vocabulary acquisition (SLVA) of the last 15
years has been slow to find its way into classrooms and course books
(Singleton, 1997). A reason often cited for this is the absence of standard
measures of vocabulary size that would allow course designers and instructors
to determine where, on the open seas of a second lexicon, particular learners
could most usefully cast their nets. The alternative, in the absence of a
systematic approach, has been to serve up the most common 1000 or so words of
English through direct instruction, as most courses do reasonably well (Meara,
1993), and set learners adrift thereafter to haul in the words they happen to
meet.
Meara (e.g., 1996, p. 41) proposes several
advantages for using standardized vocabulary measures in instructional
programs. These would "ask (and answer) questions about how many words
people know, how fast their vocabularies grow, and how these factors are
related to other aspects of linguistic competence." Instead, the
vocabulary measures we have are mainly one-off tests that are designed for use
with particular groups and purposes. Since they are incompatible with each
other, it is difficult to integrate the data they produce. This approach to
testing contributes to the fragmentation of the SLVA field. Meara regards
Nation's (1983/1990) Vocabulary Levels Test as "the nearest thing we have
to a standard test in vocabulary"(1996, p. 38), and his own Yes/No
Vocabulary Checklist (Meara & Buxton, 1987) is another candidate.
Nation and his colleagues (1990; 1995; 1997)
have attempted to build a systematic approach to vocabulary instruction, with
their frequency-based Vocabulary Levels Test at its centre. Based on corpus
analysis and experimental research, the Levels Test samples words from the 2000,
3000, 5000, and 10,000-word frequency levels, and from a zone of academic
discourse known as the University Word List (UWL, recently supplanted by the
Academic Word List). The test provides diagnostic advice as to where learners
could most usefully direct their word-learning efforts, in view of their
reading goals (e.g., whether or not they intend to do academic reading) and the
predicted return on learning investment at the various levels (e.g., high at
the 2000 level, low at the 10,000 level).
The Levels Test samples recognition knowledge of
18 words sampled from each of five frequency levels, in the manner shown in
Table 1. The test-taker's task is to match one of the six words on the left to
one of the three brief definitions on the right by writing the appropriate
number in the space. The total number of words tested at each level is actually
more than 18, because the words in the definitions are also test words. With
only 18 items at each of the five levels the test is compact (takes a native
speaker about five minutes) and usable in classroom conditions (especially
since the entire test may not be applicable in every case). Guessing is reduced
by employing the multiple choice format shown in Table 1, where six words are
matched to three glosses, making the choice ratio 1:6 rather than the usual 1:4
but without increasing the time for reading through additional distractors. A weak score at any
level is defined as knowing fewer than 15 out of 18 items, or less than 83%
according to Nation's (1990, p. 140) experience using the test.
TABLE 1: ITEMS FROM TWO LEVELS OF THE VOCABULARY
LEVELS TEST
Testees try to identify the meanings of three of
the words on the left, by writing the number of the appropriate word beside the
given meaning. Item (a) is taken from the 2000 level, item (b) from the
University Word List level.
(a) 1.
blame
2.
hide ___ keep out of sight
3.
hit ___ have a bad effect
4.
invite ___ ask
5.
pour
6.
spoil
(b) 1.
affluence
2.
axis ___ introduction of a
new thing
3.
episode ___ one event in a series
4.
innovation ___ wealth
5.
precise
6.
tissue
Once a target learning zone has been identified,
how the words in this zone should be learned is left to learners and their
instructors, but Nation and his colleagues also offer ample suggestions for
classroom activities (1990; 1994), text sequencing procedures (Worthington
& Nation, 1996), and procedures for matching texts to learners with the
text analysis computer program
VocabProfile (Hwang & Nation, 1994). Given the
likelihood of individual differences in levels and acquisition rates in this
area there is also a case for independent learning systems, whether flashcards
or on-line tutoring. (For an example of the latter incorporating these tests and
lists, see The Compleat Lexical Tutor at http://132.208.224.131).
The Levels Test has proven useful in several
classroom applications and research ventures since its inception in 1983. One
such venture (recounted in Cobb, 1999) took place at Sultan Qaboos University,
in the Sultanate of Oman, where it shed light on the longstanding mystery of
why many entering students were unable to pass a standard elementary test of
language proficiency after four months of intensive language study. The
proficiency test was Cambridge University's Preliminary English Test,
with a stated lexical base of 2,387 words, while the Levels Test revealed that
these students' typical vocabulary size was more in the range of 500 to 1000
words. With this information, it was possible to design materials that met the
students' needs.
The Levels Test provided useful information of a
different type at City University in Hong Kong (Cobb & Horst, 1999). A
longstanding problem in this institution is that diploma students have
difficulty reading academic texts. The Levels Test disclosed that these
students' knowledge of terms from the University Word List (850 sub-technical
terms used to scaffold information in such texts) was consistently weak, and
that UWL scores strongly predicted reading comprehension scores. Retesting with
the same instrument at yearly intervals also showed these students' incidental
acquisition in this zone to be rather minor, suggesting a role for direct
instruction. These latter findings were particularly interesting in that they
could be compared to those of EFL learners elsewhere in the world who had been
tested on the same measure, for example Laufer's (1994) Israeli learners, who
made strong UWL gains over a similar period of time but without direct
instruction. This is an instance of Meara's (1996) point, that a standard test
allows comparison across learning contexts. However, it is also an instance of
how a standard vocabulary test may not measure the same thing when used with
members of different language groups.
Some problems with a standard measure
Unfortunately, the Chinese and Israeli learners'
progress with the UWL may not be strictly comparable. These two groups of
learners come from typologically different first language backgrounds, and the
manner in which their L1s interact with test form or content is not controlled.
For instance, most Chinese words are monosyllabic, so that the polysyllabic
items of the UWL (episode, affluence, innovation) may well pose a
greater learning burden (Hsia, Chung, & Wong, 1995) than they do for
Israelis, whose L1, Hebrew, is mainly polysyllabic. Such factors could have
implications for how these two groups of students are best tested. It is
arguable that a vocabulary test for use in instructional planning should load on
the zones that are least similar to a learner's L1 and hence measure knowledge
of the L2 lexicon independent of any transfer facilitation (or inhibition).
This would be particularly true if one purpose of a vocabulary test, as
proposed above by Meara (1996), is to predict broader language ability. This
ability will arguably correlate more strongly with the effort a learner has put
into mastering aspects of the L2 lexicon that are not similar to the L1
lexicon, particularly in the case of cognate languages where some portion of
the second lexicon can be had cheap or even for free.
Another way that a vocabulary test may
mispredict broader proficiency is in cases where learners' L1s encourage them to
adopt what Johnson and Ngor (1996) call a ‘lexical processing strategy’ for
reading. They found that Chinese learners reading English tended not to use the
grammatical information contained in words (Chinese words do not contain this
information) but rather to guess at relationships between content words. In
this case it could be predicted that Chinese learners would equate learning
word meanings with learning the language, work hard on vocabulary, and then do
well on a recognition vocabulary test but badly on a test involving use of the
same words in sentences or texts. It could be concluded that a recognition
vocabulary test should not be used with Chinese learners.
Vocabulary tests can also interact with
learners' L1s on the level of format and culture. Experiments with Meara and
Buxton's (1987) Yes/No checklist provide examples of each. The format of the
checklist test is that learners are asked to indicate, yes or no, whether they
are familiar with each word in a series of lists at ascending frequency levels.
Guessing is controlled by the inclusion of plausible non-words in the lists
(e.g., cheatle), which are used to calculate how much testees are
overestimating their lexical knowledge. (See Table 2.) If testees indicate they
know non-words like mascarate, then they are penalized. However, the
test is known to function poorly with Arabic speaking learners, who identify a
very large proportion of non-words as known (Al-Hazemi, 1993; Ryan, 1997). An
explanation for this is that vowels are not normally written in Arabic script
but rather supplied by the reader following a contextual interpretation (Abu
Rabia & Seigel, 1995). With cognitive process transfer (Koda, 1988), Arabic
speakers reading English are often blind to vowel-based distinctions between
words, especially words out of context. Thus, they are likely to judge tilt
and toilet as the same word (Ryan & Meara, 1991), or mascarate
(in Table 2) as miscreate.
Culture, or more specifically the conditions of
language use within a culture, can also interact with lexical knowledge in a
vocabulary test. Meara and Buxton's Yes/No test was used with francophone
learners in Montreal, and there was once again a high degree of opting for
non-words, although this time it was for a different reason. Unlike Arabic
speakers, francophone learners expect written words to contain vowels, but in
French Canada they may never have seen the written forms of many of the English
words they have heard. Meara, Lightbown, and Halter (1994) report that subjects
often reported 'knowing' non-words, such as leddy that sounded like a
word they might have heard on English television (lady) but never seen
written.
TABLE 2: ITEMS FROM LEVEL 1 OF THE YES/NO
CHECKLIST VOCABULARY TEST
Testees have to write Y (for YES) in the box for
each word if they know what it means, or N (for NO) if they do not know what it
means or are not sure.
1
[ ] bridge 2 [ ]
modern 3 [ ] curtain
4
[ ] prison 5 [ ]
classinate 6 [ ] mascarate
7
[ ] engine 8 [ ] hurt 9 [ ] ugly
To summarize, it seems that a vocabulary test
can focus on either the language or the learner. The Levels Test and the Yes/No
Test both focus on the language, measuring the learner against the target
lexicon, but do not deal with what the learner already knows through knowledge
of his or her L1. With this approach comes standardization and some degree of
comparability, but also some potential cost in exactness with particular groups
of learners. How much cost? And how much tighter fit can be achieved with a
principled adaptation of test to learner? The purpose of the present study is
to shed light on these questions through an evaluation of the Levels Test as it
functioned for placement and program development in an institutional setting. (Endnote
1)
Evaluating vocabulary tests
How should a vocabulary test be evaluated? The
usual way of doing this is to compare a new test to a previous test that has
itself been shown to predict some observable language behaviour. For example,
Schmitt, 1995, measured several vocabulary tests against the TOEFL (Test of
English as a Foreign Language), which in turn has been shown to predict success
in academic study. Instead, the approach here will be to work from principles
(what information should a test provide) and predictions (what behavior will
learners display if they know 1000, 2000, 3000 etc. words?)
Three principles proposed by Meara (1996, p. 41)
are that a standard test should "answer questions about how many words
people know, how fast their vocabularies grow, and how these factors are
related to other aspects of linguistic competence." From these
prescriptions one can develop evaluation criteria by asking of a specific test
how well it answers these questions. In the present study, only the first and
third of Meara's principles will be developed in this way, since answering the
second question about learning rate is not applicable to a placement context.
The two remaining principles, predicting how many words learners know and how
this knowledge relates to other aspects of language proficiency, will be operationalized as measurable
behaviours and predictions.
First, regarding how many words are known: if a
vocabulary size test judged a learner to have recognition knowledge of 2000
words, how would one determine whether this claim was valid? Some ways could be
predicted not to succeed. Graduates of the 2000 level might be asked to answer
comprehension questions on a text constrained to 2000-level words as proof that
they knew and could use these words. However, it is well known that extensive word
knowledge is needed to affect reading comprehension to any meaningful extent
(Mezynski, 1983; Stahl, 1991) whereas the Levels Test does not claim to measure
more than basic recognition knowledge. There is a similar problem with asking
subjects to write sentences as proof of knowing words. Miller and Gildea (1987)
found that learners with only recognition knowledge were unable to do this. It
is too much to expect learners to do something with words they may have only
partial recognition knowledge of. One thing we have learned in the last 15
years is that vocabulary knowledge is incremental (Nagy, Herman & Anderson,
1985); hence we must measure recognition knowledge on its own terms.
In the present study, predictions of recognition
knowledge will be evaluated within the domain of recognition knowledge in the
following way: Levels Test scores will be compared to learners' recognition
knowledge needs as expressed by their dictionary look-ups. The assumption is
that if learners have basic recognition knowledge of a word, then they are
unlikely to look it up in a dictionary, and, by extension, that if they have
recognition ability at a certain frequency level then it is unlikely they will
be doing any major part of their dictionary look-ups at that level.
Second, regarding the relationship between
vocabulary knowledge and other aspects of linguistic ability: the strong
version of this relationship (stated or implicit in lexical approaches to
language learning, e.g., Willis, 1990) is that vocabulary is central or even
preconditional to other types of language proficiency, such as reading,
writing, and grammar, and hence even recognition vocabulary should predict
these to some extent. So the two evaluation criteria will be the following:
Context of studies
The Levels Test was used as part of a placement
procedure for several hundred francophone students entering English courses at
the Université du Québec à Montréal (UQAM) between 1997 and 1999. The
motivation to introduce the Levels Test was related to both placement and
program development. The present research, consisting of five related studies,
evaluates the usefulness of the Levels Test for these purposes.
Vocabulary testing had never been a part of the
placement procedure in this institution prior to 1997, and yet lexis was one of
the students' self-reported areas of weakness, so it was reasonable that the
placement procedure should include a vocabulary component as a means to more
accurate placement. Further, if testing confirmed the existence of a systematic
vocabulary problem then this would be a rationale for incorporating a more
deliberate focus on vocabulary within language courses. Systematic vocabulary
training had never been included in any École de langues course in the
past, and was out of keeping with communicative language teaching as practiced
in Quebec.
Participants
Most of the students entering language courses
were Canadian francophones
with roughly nine years of classroom English instruction behind them. For most of
them, this instruction had consisted of 2-5 hours per week focused mainly on
oral skills. About 20% of the students were francophone immigrants of diverse
origins and language learning experience. Another 20% were students intending
to enter ESL teacher training programs who wanted or were required to take
additional language training.
The placement test used prior to 1997 had asked
similar cohorts of students to write paragraphs outlining their main problems
with English and their motivation for taking an English course. Lack of
vocabulary was consistently identified as the main problem. Improving academic
reading was the main motivation for 75 per cent, often with a view to
performing well on the reading section of the TOEFL. (Reading is a strongly vocabulary
dependent skill, relative to grammar and oral communication.) In summary,
vocabulary testing and consideration of training were responses to learners'
expressed needs.
Materials
Choice of a standard vocabulary test
The two most plausible candidates for a
vocabulary test were the Levels Test and the Yes/No Checklist. The Levels Test
was not known to have any obvious problem for francophone learners, like the leddy
problem of the Yes/No test. Also, the Levels Test had been used several times
in other places in the context of academic reading (Cobb & Horst, in press;
Sutarsyah, Nation and Kennedy, 1994), had been found the most reliable of
several vocabulary measures, and was the test correlating most highly with
TOEFL scores (Schmitt, 1995). Thus the Levels Test was chosen, or rather two
levels of it, the 2000-word frequency level and the University Word List (UWL).
Only the sections testing these wordlists were used, because together these
lists account for a fairly reliable 90% of the tokens in an academic text. It
was assumed that these learners' lexical needs would probably lie in these two
areas.
An integrated placement measure
A placement test was constructed which consisted
of the 2000 and UWL sections of the Levels Test (18 questions each), a
TOEFL-style reading passage with 10 multiple choice comprehension questions, a
10-sentence grammar error identification task, and a 100-word writing task on
the topic 'What difference would it make to your life, studies, or career if
your English was much better than it is now?' The grammar and reading questions
were submitted to cycles of item analysis and replacement until no question was
answered by more than 85 per cent or fewer than 35 per cent of testees in a
given testing session. The lexis of all test passages and questions was
constrained using Laufer and Nation's (1995) measure of lexical richness, such
that every word on the test was a member of either the 2000 list or UWL. This
test was meant to serve as a multi-dimensional measure that could render three
services:
• place students accurately
• determine whether direct vocabulary
instruction was needed
• evaluate the Levels Test's ability to
predict other aspects of language proficiency.
When finished, the placement test was computerized
to facilitate its delivery to more than 750 students per year and to ease the
collection and processing of data. The entire test of 56 questions and writing
task appeared as a sequence of five standard Macintosh computer screens. The
time allowed was 50 minutes, with 'time remaining' and 'pages remaining'
clearly indicated on all screens. Most testees did not use all the time
available, and very few reported having any difficulty with navigation (knowing
where they were with respect to other parts of the test). All entries were made
by clicking the mouse, with the exception of the 100-word writing task which
required basic keyboard skills (students had the option to write the paragraph
on paper).
Reformatting the Levels Test
The format of the Levels Test posed an interface
design challenge, as it was necessary to ensure that the experience of taking
the on-line version of the test resembled the on-paper task as closely as
possible. In the paper version of the test, entering answers is simple (writing
a number in a space) and all the questions of the same type can be seen at once
(to facilitate the comparison of distractors and the revision of answers). The
computerized test was formatted to present the 18 questions at each level on a
single screen. Each 6x3 question-cluster (see Table 1) was transformed into
three multiple choice questions with the same six choices, answerable with a
mouse click. The original and the adapted format are shown in Table 3, and a
screen picture of the adapted test is shown in Appendix 1. A simple pilot test
was conducted with 10 learners using both paper and computer versions of the
test with a two week interval, and there was no significant difference in
scores. Of the 1500 students who have used the test, fewer than ten have
reported any difficulty with the computer interface. This was determined both
by informal observation and by employing a tracking system in the computer
program that recorded the time between interactions, the number of revised
answers, and other reliable indicators of human-machine interaction (see Cobb,
1997, Ch. 9 and 12).
TABLE 3: ITEM FROM THE ORIGINAL LEVELS TEST
RECODED FOR COMPUTER VERSION
Item (a) is taken from the 2000 level of the
original test, item (b) is the same item from the computer version of the test.
In the computer version, testees click the mouse on the square beside the
appropriate word and the square becomes filled with an "x".
(a) 1.
blame
2.
hide _2_
keep out of sight
3.
hit ___ have a bad effect
4.
invite ___ ask
5.
pour
6.
spoil
(b) 1.
keep out of sight [ ] blame
[x] hide [ ] hit [ ] invite
[ ] pour [ ] spoil
2. have a bad effect [ ] blame
[ ] hide [ ] hit [ ] invite [ ] pour [ ] spoil
3. ask [ ] blame [ ] hide
[ ] hit [ ] invite [ ]
pour [ ] spoil
Dictionary look-ups kit
A website was developed where students could
read newspaper stories from the Montreal Gazette, look up words in Merriam
Webster's World Wide Webster dictionary (at http://www.m-w.com), and
submit these to an instructor as part of a class word bank building project
(follow links to Group Lex from http://www.er.uqam.ca/nobel/r21270/4150).
The Gazette has often been used as a source of reading material in the École
de langues and has proven popular with students as it deals with familiar
topics in challenging yet comprehensible English. The look-ups idea follows a
methodology developed by Cohen, Glasman, Rosenbaum-Cohen and Ferrara (1988) and
a technology developed by Hulstijn (1993) and adapted here for Internet.
Study 1: Determining the English vocabulary
levels of UQAM students
Given the students' belief that vocabulary was
their main problem with English, and academic reading their main objective, it
was initially expected that vocabulary testing at the École de langues
would lead to the implementation of vocabulary components within reading
courses. It seemed reasonable to expect that the 1000-2000 wordlist might form
the basis of a useful instructional module in intermediate reading courses, and
the UWL in more advanced courses.
Results and Discussion
In 11 testing sessions over the course of a
year, the mean percentage score for 768 students on the Levels Test was 74% (SD
= 16) at the 2000 level and 68% (SD = 18) at the UWL level, both somewhat below
the suggested criterion of 83%.
On the basis of this information, an
experimental UWL module was added to two academic reading courses. However, the
administration of the École de langues did not feel that the 2000-level
scores (mean = 74%) bespoke any clear need to reallocate course time from
reading and skills training to vocabulary work. With roughly 10% of the lowest
placing testees either not being admitted to courses or else soon dropping out,
it was reasonable to assume that the average scores at the 2000 level of
students actually attending English classes would not be much below the 83%
criterion. One typical cohort of testees (n = 37) was tracked through the
registration and drop-out process, and with eliminations the mean 2000-level
score was 82% (SD = 9.3).
However, reading instructors familiar with the
various frequency lists represented in the Levels Test (view these lists at http://132.208.224.131)
observed that much reading class time was in fact devoted to discussing vocabulary
items, many of them rather common items, and in fact items from the 2000 list
of the most commonly used words of English. Thus there appeared to be a
discrepancy between test results and the perceptions of both learners and
instructors.
Study 2: Which words do students look up?
It was decided to verify the instructors'
impressions in a more rigorous manner by investigating which words students
were actually looking up in a dictionary. This involved building a dictionary
activity into a reading course and keeping a record of the words students had
sought information about. For one 14-week session, students in two randomly
chosen classes (n = 80) were assigned the task of reading at least five stories
per week on the Montreal Gazette website, summarizing each, looking up
any words found interesting or necessary for comprehension, and submitting five
of these along with definitions and sentence contexts to the on-line word bank
discussed above. If students could not find five words that they felt they needed
to look up, then they looked for words their classmates might be interested in
learning.
Results and discussion
Because of the large number of words collected
in this look-up study (7594 tokens, 4623 types), only words looked up by more
than five students (176 types) are reported (for the complete listing of both
five and two look-ups see http://www.er.uqam.nobel/r21270/lookups.) These 176
words were divided into frequency zones using Hwang and Nation's (1994)
VocabProfile, a text analysis program which assigns words in a text to
frequency lists following the scheme of the Levels Test. The interesting
result, consistent with instructors' intuitions, was that 34.6 per cent of the
look-ups were 2000-level items. That is, more than one third of the 176 words
that students looked up were common items, some even very common items from the
1000 frequency level (e.g., weak, worth, and youth). This
interest in high frequency items is all the more remarkable in view of the fact
that newspaper writing is lexically rich as a genre (Hwang, 1989), containing a
high proportion of less frequent but information bearing words to which
learners’ attention might have been drawn.
Is there any pattern to the words the students
looked up? Table 4 shows the 54 look-up items from the 2000-level list. It
seems clear the students have an unerring aim for the words of English that are
not (or not obviously) cognate with French words. In other words, their
2000-level look-ups are mainly words of Anglo-Saxon origin, which tend to be
well represented in the high frequency zones of English and are not usually
inferable from knowledge of French. Interestingly, two of the words are test
words on the Levels Test (lack and roar).
TABLE 4: 2000-LEVEL WORDS LOOKED UP BY
MORE THAN FIVE STUDENTS
abroad aim bare beam bear beneath boast bold borrow |
broad bundle claim curse damp drag eager elderly flood |
further illness increase indeed lack length meant nearby plenty |
prompted raise request roar seize settle settlement skills slightly |
slopes spread steep stir strike swallow sweep thread threat |
throat trial urge wage weak worth wound wrap youth |
(54 words, or 34.6 per cent of 176 words looked
up by 5 or more students)
How important are the Anglo-Saxon words of
English? They comprise only about 35 per cent of the lexicon as a whole, with
terms of French, Latin and Greek origin comprising most of the rest. However,
in the high frequency zones, Anglo-Saxon weighs in at closer to 50 per cent
(Roberts, 1965, cited in Nation, 1990, p. 18). Since the most frequent 2000
words of English reliably comprise about 80% of the individual words or tokens
in an average text (Carroll, Davies, & Richman, 1971), Anglo-Saxon terms
account for about half of these, or 40% of tokens in an average text. Many of
these are pronouns and other function words that most students could be
expected to know; however, note that the item beneath appears in Table
4, and several other prepositions and conjunctions appear in the larger list of
look-ups. The proportion of Anglo-Saxon words is probably even higher in spoken
language, which leans heavily on the first 1000 words of the language, where
the Anglo-Saxon proportion is 56 per cent. In other words, any systematic
weakness in learners' Anglo-Saxon lexicon could make it difficult for them to
understand English with any precision, and would shed light on their perception
that they are weak in vocabulary. (Endnote 2)
To summarize, this study has identified a
pattern whereby high frequency words were looked up more often than would be
expected given the students' performance on the Levels Test. These words were
non-cognate or not obviously cognate with French words. However, as mentioned
above the Levels Test is not bereft of AS items and the next study will report
on testees' performance with these.
Study 3: Item facility analysis of the Levels
Test
It is well known that the English lexicon
comprises two main strands, the Greco-Latin and the Anglo-Saxon. How does the
Levels Test reflect this twin inheritance? The test designer has clearly
attempted to represent both strands more or less equally. This became apparent
through the following analysis. Each of the 18 test items consists of a word
and a gloss. A rating of Greco-Latinness (GL) or Anglo-Saxonness (AS) was
established for each word and gloss in each item at the 2000 level by looking
up the words in a standard dictionary offering word etymologies (the Webster
Dictionary). In the glosses, AS prepositions, pronouns and other grammatical
words were assumed known to all testees and ignored. For example, elect =
choose by voting was rated GL-GL (élire = choisir par vote) despite
the presence of by in the gloss. Where a gloss was a phrase containing
both GL and AS terms (e.g., roar = loud deep sound) the classification
was based on simple majority of content words (loud and deep
outweigh sound so the item was rated AS-AS).
The ratings were used to assign a GL strength
to each of the 18 test items (word-gloss pairs) at the 2000 level. A GL word
and GL gloss (GL-GL) was assigned a strength of 3 (total = complete);
GL-AS was assigned a 2 (original = first); AS-GL was assigned a 1 (pride
= having a high opinion of yourself); and AS-AS was assigned a 0 (melt =
become like water). Some decisions are clearly built into these strength
assignments. For example, designating the gloss having a high opinion of
yourself as GL assumes that most learners know that have and avoir,
high and haut, are related. Also, in the mixed pairs, designating
GL-AS items strength 2 but AS-GL items only strength 1 is based on the
expectation that words out of context are more difficult to interpret than
words in phrases (10 of the 18 glosses are phrases). Also, the test is set up
such that all of the glosses but only three of the six words in each set must
be used (see Table 1), so glosses are more likely to be helped by elimination.
Three raters following the guidelines arrived at the same ratings. (See Table
5.)
An interesting point
is raised by designating item 17 (sport = game) and item 18 (victory
= winning) in Table 5 as GL-AS. While game and winning
must clearly be classified as AS words, they are nonetheless known to all
Quebeckers who watch at least some of their ice hockey on English television.
But as test items, these words do not necessarily function as samples
indicating knowledge of other words in the frequency level. In a vocabulary
size test, the tested words are meant to sample knowledge of many more words
beyond just themselves, so test items like game and win may
over-represent the vocabulary knowledge of francophone Canadians. This is an
example of how a standard vocabulary test can interact with culture.
The analysis revealed that at the 2000 level of
the test, there are five GL-GL items (GL strength 3), six GL-AS (strength 2),
two AS-GL items (strength 1), and five AS-AS items (strength 0). The mean GL
strength level amounted to 1.6 (SD = 1.2). By this reckoning, the Levels Test
balances the two strands reasonably well, although with some bias toward GL
items.
Once test scores had been assembled, a facility
index (the percentage of students answering correctly) was calculated for each
item. Based on the results of Studies 1 and 2, it was predicted that GL strength
would correlate significantly with success on test items. If so, this would
suggest that learners had drawn heavily on their knowledge of French cognates
for their test scores. It was further predicted that success with most AS items
would be low and reflect mainly the guessing opportunities afforded by the
elimination of GL items.
TABLE 5: LEVELS TEST, 2000 LEVEL: GL ASSIGNMENT
AND FACILITY INDEX
Test word |
Gloss |
AS-GL balance |
GL strength |
Facility Index (SD) |
1.
total |
complete |
GL-GL |
3 |
.91 (.06) |
2.
original |
first |
GL-AS |
2 |
.80 (.07) |
3.
private |
not public |
GL-GL |
3 |
.92 (.07) |
4.
elect |
choose by voting |
GL-GL |
3 |
.93 (.05) |
5.
melt |
become like water |
AS-AS |
0 |
.68 (.08) |
6.
manufacture |
make |
GL-AS |
2 |
.57 (.07) |
7.
hide |
keep out of sight |
AS-AS |
0 |
.60 (.08) |
8.
spoil |
have a bad effect on |
AS-GL |
1 |
.29 (.06) |
9.
invite |
ask |
GL-AS |
2 |
.78 (.06) |
10. pride |
having a high opinion of yourself |
AS-GL |
1 |
.77 (.07) |
11. debt |
something you must pay |
GL-GL |
3 |
.69 (.07) |
12. roar |
loud, deep sound |
AS-AS |
0 |
.62 (.10) |
13. salary |
money paid regularly for doing a job |
GL-GL |
3 |
.95 (.05) |
14. temperature |
heat |
GL-AS |
2 |
.82 (.07) |
15. flesh |
meat |
AS-AS |
0 |
.40 (.09) |
16. birth |
being born |
AS-AS |
0 |
.89 (.05) |
17. sport |
game |
GL-AS |
2 |
.89 (.05) |
18. victory |
winning |
GL-AS |
2 |
.92 (.06) |
Results and discussion
Facility analysis showed that while most testees
had not made a large number of errors on the test, many had made the same
errors. The rightmost column in Table 5 shows the percentage and standard
deviation of the 768 testees who answered each item correctly, with GL strength
rating in the column immediately to the left for easy comparison. The table
shows that 7 out of 18 test items are known to 89% or more of testees (items 1,
3, 4, 13, 16, 17, 18). The mean GL strength of these items is 2.3. Four out of 18
items are known to 62 per cent or fewer of the testees (items 6, 7, 8, 12 and
15). The mean GL strength of these items is .6. In other words, the GL strength
of high success items is almost four times that of low.
There is nothing irregular in such a distribution
in itself, provided there was no systematic or non-random basis for it. But
that is not the case here. Comparing the facility index to the content of each
item in Table 5, one can see that most of the items unknown to large numbers of
testees involve AS terms such as melt (Question 5), make
(Question 6), hide, keep, or sight (Question 7), spoil
(Question 8), loud, deep, or roar (Question 12), and meat
or flesh (Question 15). In other words, weakness is not distributed
throughout the system but concentrated on one type of item. Similarly, strength
is concentrated on words like total and complete (Question 1), private
and public (Question 2), elect and choose (Question 3).
The pattern is not complete, however, with AS items win (Question 17)
and game (Question 18) posing little problem, as noted already, and less
explicably neither birth and born (Question 16) nor pride
(Question 10).
Overall, the correlation between GL strength and
item facility is r = .63 (p<.05); in other words, the more GL an item has,
the more testees will know or be able to guess it. If the 2000-level test had
consisted only of items having GL strength 1 or 0 (i.e. requiring knowledge of
at least one Anglo-Saxon term), then the average score at this level for these
768 students would have been 63.4% (SD = 7.5), and the case for giving these
students basic vocabulary training would have been clear.
Scores from one testing session (n = 148) were
selected for more detailed examination. This session was randomly selected from
all the testing sessions which had not included future ESL teachers, who were
seen as a separate population (see Study 5 below). The mean score for the 148
testees at the 2000 level was 79.31 (SD = 13.29). As already noted, a mean this
high would not present a strong case for vocabulary training. But with items
broken down into GL and AS components, the picture changes dramatically. The
mean score for GL = 3 and GL = 2 items taken together was 79.30% (SD = 14.24);
the mean score for GL = 1 and GL = 0 items amounted to 39.34% (SD = 22.06). In
other words, success was double on the GL biased items.
The bar chart in Figure 1 represents the mean GL
and AS scores of individuals from the testing cohort, ranked from left to right
by mean overall placement test score, and divided into three sample subgroups
by ability level: ranks 1-10, ranks 70-80, and ranks 110-120. The columns
representing GL scores are consistently high across the ability range, whereas
the columns representing AS knowledge drop sharply. Something the bar chart does not show for lack of space is that
the drop occurs very shortly after the tenth testee (see
http://www.er.uqam.ca/nobel/r21270/lookups/barchart.htm for data on the
complete set of scores). In other words, the GL-AS difference is not spread
evenly across the ability range, but rather increases as ability decreases.
This same pattern was observed in four other cohorts that were examined.
FIGURE 1: AS AND GL KNOWLEDGE ACROSS THE ABILITY
RANGE
To summarize, these learners are systematically
less likely to know AS than GL items in the high frequency zone. This fact is
not disclosed but rather masked by a test that samples from the lexicon of
English as a whole. This is not to criticize the Levels Test, which clearly
samples the English lexicon representatively. It is rather an argument that a
test will not find L1-specific information if it does not look for it.
But does differential familiarity with the
lexical strands of English have any bearing on these learners' broader ability
to function in English? As mentioned above, AS terms constitute about 40% of
text lexis and as much as 56% of spoken lexis, so weakness in this area could
be expected to affect general linguistic functioning and would help account for
the students' perception that they are weak in vocabulary. This expectation is
tested in the next study.
Study 4: Predicting broader proficiency
What sort of correlation should we expect
between recognition vocabulary knowledge and broader proficiency in a second
language? Given what many now describe as the centrality of lexical knowledge
in all aspects of linguistic functioning, we might expect the correlation to be
substantial. And yet correlations between lexis and broader proficiency are not
commonly reported in the research literature. Possibly the closest we come to a
standard by which other predictions can be evaluated is the L1 work of Anderson
and Freebody (1983, used as a point of reference in Meara & Buxton, 1987),
in which a series of multiple choice vocabulary tests were found to predict
reading comprehension scores at r=.8, r=.75, and r=.66, with a mean correlation
of r= .73. Additional support for this figure comes from a study by Qian
(1999), which examined TOEFL recognition vocabulary and reading scores of 217
ESL students, and found the same correlation of r=.73. So, with the
understanding that it is not strictly appropriate to compare correlation
strengths across studies, samples, and instruments, a correlation of r = .73
between passive vocabulary knowledge and reading scores shall serve as a
guideline in the evaluation that follows.
The fourth study includes two analyses. The
first tested the correlation between Levels Test scores (at the 2000-level and
UWL) and reading scores. It was predicted that correlations would be
substantially less than r=.73, because of the test's overestimation of these
learners’ vocabulary knowledge as shown above. The second analysis examined the
independent contributions of testees' 2000-level scores on AS and GL items to
reading scores in a multiple regression analysis. It was predicted that GL
knowledge would account for more score variance than AS knowledge, because of
undetermined amounts of guessing in AS scores and the effects of unrepresentative
items in the test (like sport and win.) The data for this part of
the study was once again 2000-level placement test results for the cohort used
in Study 3 (n=148), with the 18 items divided into GL and AS components in the
method described.
The text on which the reading test was based was
a normal academic text with a lexical profile very much in line with the
frequency zones targeted by the Levels Test. Its profile as determined by
VocabProfile analysis was as follows: 84 per cent of tokens were from the
0-2000 level, 10 per cent were from the UWL level, and the remaining 6 per cent
were topic-specific words judged inferable from the context by three
instructors (assuming knowledge of the 2000 and UWL items comprising the
contexts). A further analysis of the GL-AS composition of the 0-2000 zone,
based on wordlists which can be viewed at http://132.208.224.131, revealed a
roughly equal distribution, as predicted by Roberts (1965). The ten multiple
choice questions were of an almost identical lexical composition to the text,
being drawn from the text with no terms added other than question words. The
questions were inferential in the sense of requiring comprehension of rephrased
or integrated text material but assumed no specialized topic knowledge. In
other words, an effort was made to align the reading task with both the Levels
Test and the normal GL-AS distribution of English texts.
Results and discussion
The Levels Test predicted reading scores only
moderately well by the proposed guideline of r =.73. The correlation between
reading and overall 2000 level vocabulary was r=.62, and reading and UWL was r
=.59 (both p<.05). However, when 2000 level scores were broken down into
their GL and AS components, calculated as independent percentages, and entered
into multiple regression analysis with reading scores as the dependent
variable, a different picture emerged: The correlation between GL knowledge and
reading scores was r=.74, while the correlation between AS and reading scores
amounted to only r=.05 (p>.05), accounting for less than 1 per cent of
variance. In other words, these students’ AS knowledge had apparently
contributed almost nothing to their reading scores. The same analysis with
three other test cohorts produced similar results.
This finding, while surprising in its extremity,
is nonetheless consistent with the hypothesis that testees' success with AS
test items was largely due to either guesswork, once GL items were eliminated,
or else to unrepresentative knowledge of AS terms like game and win.
In either case, a correct answer on an AS item would not reflect any
significant amount of broader lexical knowledge, and would not contribute much
to variance on an integrated measure such as a reading comprehension score.
Still, the finding stands in need of confirmation with a test of somewhat more
than 18 items representing 2000 words, in other words with less room for the
operation of token effects.
Pending a further investigation of AS and GL
contributions, the overall correlation of 2000-level and UWL vocabulary and
reading at r=.62 and r=.59 are in any case not high (with vocabulary scores
accounting for only about 35% of reading score variance). This in itself is,
arguably, sufficient justification to begin experimenting with other types of
tests. There are numerous directions that the search for an interlanguage
sensitive test could lead in, and the final study explores one of these, once
again in the context of a practical institutional concern.
Study 5: Exploring a L1-specific vocabulary test
Subjects, materials, procedures, predictions
Some courses given at the École de langues
are specifically designed for future ESL teachers to help them perfect their
English language skills. For many of these learners, the test described above, and
especially the vocabulary component, was found not to generate adequate
variance to make good placement decisions. A more demanding test was needed,
and this need presented an opportunity to use the findings of the foregoing
studies in the design of a different type of placement instrument.
The participants in this study were 73
applicants to the ESL teacher training program in the Département de
linguistique et de didactique des langues at UQAM. These applicants
were much more proficient in English than the majority of participants in the
previous studies. A new multi-skill test including a modified version of the
Levels Test was created for these applicants. The modified vocabulary test
targeted knowledge of English words independent of any ability to exploit
cognates. (Since the vocabulary part of the test was experimental, its results
were not used for placement purposes.)
The Levels Test was adapted in three ways.
First, it was shortened to 20 questions, in line with administrative time
constraints. Second, only 2000-level and UWL items that are not cognate with or
inferable from French words were included as test words (e.g., stretched,
wealth, burst, and slight in Table 6). These were obtained
by picking and choosing AS test words from parallel versions of the Levels Test
(published in a paper by Laufer & Nation, 1999). Some GL items continued to
appear in the wording of the contexts (e.g., difference in Table 6).
Third, the format of the test was changed to the controlled productive version
of the test described by Laufer and Nation, which resembles a c-test (a cloze
passage with the first half of each gap provided). This format activates
whatever memory trace is available for a word, yet renders guessing difficult
or impossible.
TABLE 6: CONTROLLED PRODUCTIVE TEST, 2000 AND
UWL LEVELS, NON COGNATE
Testees have to complete the words by typing in
the spaces provided. Spelling and grammar are not important if testees show
they know the word.
This sweater is too tight. It needs to be
stre___.
The rich man died and left all his we___ to his
son.
If you blow up that balloon any more it will
bur___.
The differences were so sl___ that they went
unnoticed.
For complete test see Appendix 2, or follow
links to Tests at http://132.208.224.131
The rest of the test, as in the studies reported
above, consisted of reading, writing, and grammar sections, thereby
facilitating a comparison of lexical knowledge with broader language ability.
It was predicted that scores on this AS-biased vocabulary test would reflect
learners' knowledge of English independent of their ability to exploit L1-L2
lexical similarities, and hence would strongly predict scores on the broader
measure. Such a result would have to be interpreted cautiously, however,
because the method adopted to eliminate guessing (the c-test) also turned the
measure into a test of production as well as recognition.
The participants in this study were 73
applicants to the ESL teacher training program in the Département de
linguistique et de didactique des langues at UQAM. These applicants
were much more proficient in English than the majority of participants had been
in the previous studies, many being near native speakers.
Results and Discussion
The mean score for vocabulary was 58.15%
(SD=25.13) and the average of the broader measure scores was 57.56% (SD=12.56).
The correlation between vocabulary and averaged broader measures was a
substantial r=.90 (p<.001), or 81% of variance. This can be compared to the
correlation of r=.59, or 35% of variance, between the regular Levels Test and
reading comprehension reported in Study 4. The probable explanation for this discrepancy is
that the vocabulary knowledge measured in the present study reflects true exposure
to English, while that measured in Study 4 reflects exposure, guessing from
cognates, elimination, and happenstance knowledge in unknown proportions -- statistical
noise.
However, the finding should be interpreted
cautiously; the modified test not only eliminated cognates but also called upon
productive as well as receptive knowledge. 'Deep' (Qian, 1999) or 'productive'
(Cobb, 1998) vocabulary knowledge has been shown to correlate with reading and
other integrated measures more highly than recognition knowledge does. Some
goals for a future study will be to vary the cognate and active-passive factors
independently, to test the same learners with both original and modified
versions of the Levels test, and to compare the independent contributions of
extensive GL and AS knowledge to the broader proficiency of higher level
learners. A practical limitation to a cognate-eliminating test is that it can
only be used with advanced learners. On the evidence above, most intermediate
learners would simply fail such a test, generating no useful variance
whatsoever to aid with their placement.
In the mean time, the finding of this study
demonstrates the advantage in principle of developing vocabulary tests that
incorporate information about specific L1-L2 interactions and provides a reason
to do further research on this topic.
Conclusion
This series of studies began with some questions
about vocabulary testing, gave reasons for choosing Nation's (1990) Levels Test
for use with lower-intermediate francophone learners in Quebec, and then
discussed some problems that were found with the test in this context. The test
did not predict either dictionary look-ups or reading comprehension
particularly well, and most importantly did not reveal a substantial gap in the
high-frequency lexicons of most testees. The explanation provided for this was
that the test allowed learners to answer about half the items correctly on the
basis of knowing French-English cognates, and then a few more on the basis of
guesswork or happenstance knowledge. This interpretation was supported by the
finding that subjects' scores on non-cognate items contributed little or
nothing to their scores on an integrated language task (reading comprehension).
Therefore, it is concluded that a vocabulary test which ignores key facts about
learners and their L1 -- in this case, the fact that francophone learners can
answer questions about English words that they have not necessarily learned
through exposure to English -- will not stand up to validity checks such as the
look-up test or the prediction test. On the other hand, a test that does take
such facts into account seems able to make strong predictions of broader proficiency.
It is important to note, however, that the
method of handling L1-L2 interactions that was developed for Study 5, the
elimination of cognates, could only be used with advanced learners, and that other
methods would have to be developed for the majority of students signing up for
language courses at institutions like the École de langues who are more
typically at intermediate or lower proficiency levels. What sort of vocabulary
test could be used with intermediate francophone students? Ironically, in view
of the arguments made above, an all-purpose language based test like the Levels
Test probably serves these learners, and those who would teach them, rather
well.
It seems clear that these learners' Levels Test
scores are based partly on imaginative work with cognates, but then,
imaginative work with cognates is a legitimate learning strategy that can
hardly be eliminated from the early stages of learning a cognate language.
Furthermore, as was seen in Study 4, variance on GL items seems able to predict
about 55% of the variance in reading scores (r=.74), suggesting that not all
learners are fully aware of cognates (confirming a point made by Lightbown
& Libben, 1984). Such awareness is a useful quality to look for in a
placement test, and where it is found missing or weak it should probably be
instructed (Treville, 1996). On the other hand, at even the earliest stages it
is desirable to know something about learners' knowledge of the parts of English
that cannot be inferred from French. It seems that the likely design of an
improved, interlanguage sensitive test for these learners will lie in the
direction of measuring both cognate-handling skill and cognate independent
knowledge deliberately and separately.
A suitable test for advanced learners, on the
other hand, as suggested by the strong finding in Study 5, will almost
certainly involve a greater emphasis on independent L2 abilities that can only
be gained through familiarity with the L2 itself. It seems unavoidable, in
other words, that we will need different tests for different levels of
learners.
The broader question underlying the studies
reported here concerned the prospects for standardization in L2 vocabulary
testing. The evidence from the foregoing studies, while far from conclusive,
suggests that if the goal is to have maximally predictive vocabulary tests,
then we should not expect to find a single vocabulary test that will function
across languages, as desirable as this would be for some purposes, nor even a
test that can function across levels within a language.
English in Quebec: An important role for
vocabulary tests
In the broad perspective, the Anglo-Saxon issue
in these studies is merely an example of information that may be missed in an
all-purpose or language-based vocabulary measure. In Quebec, however, the AS
issue also has some importance in its own right
A finding in the present study is that
francophone learners are consistently weak on the AS side of English, and that it
is quite possible for this fact to escape notice even with standard vocabulary
testing. How important is it for Quebec francophones to know the AS side of
English? It is true that most non-cognate terms have a roughly equivalent
cognate version (mess - disorder, give - donate, eager - enthusiastic),
so that learners with an eye to the costs and benefits of their labour might
well decide to pay less attention to alternate versions of words they already
know. This would be especially tempting since most of the irregularity and
hence labour of learning English is piled up on the AS side (begin - began -
begun vs. commence - commenced - commenced), to the extent that the
AS strand may even require its own learning principles, i.e. more rote and less
rule learning, as proposed by Pinker (1989, Ch.4). (Endnote 3)
However, a decision to ignore the AS side of the
lexicon would be hasty. English word pairs that seem equivalent rarely are so
(Sinclair, 1991), particularly on the dimensions of tone and register. Perspiration
and sweat may be referentially equivalent, but they are hardly
interchangeable between contexts. The twin strands of English are a rich
resource for communicating information, especially pragmatic information. Some
cognitive linguists argue that native words display more of the language's true
nature, encoding the bodily and perceptual metaphors through which English
speakers conceptualize their world and experience (Sweetser, 1990). Whether or
not this is true, it is clear that English speakers employ the native lexicon
heavily in everyday speech, with the result that francophones who know mainly
the cognate side of English plus a few street or sports terms may well be able
to say everything they want to say in English, but may understand less well
what is said to them.
A theme in the literacy discussion in English
Canada is that young anglophones often run up against a "lexical bar"
(Corson 1985; 1997) whereby their lack of familiarity with the Greco-Latin side
of English hinders their educational, economic and even cognitive (Olson, 1994,
p. 109) development. Ironically, a different sort of lexical bar may be
operating among young francophones learning English--a reverse lexical bar but
no less insidious. The first step in dealing with such a bar is to expose it
with appropriate vocabulary testing. The second is to focus instruction on
Quebec learners' real lexical needs since it seems clear these are not being
met through either current instruction or incidental exposure.
Biographical information
Thomas Cobb is a professor of ESL in the Département
de linguistique et de didactique des langues at the Université du Québec
à Montréal. He has taught English for academic and professional purposes at
universities in Saudi Arabia, the Sultanate of Oman, Hong Kong, and British
Columbia. He has a PhD in educational technology from Concordia University in
Montreal, with a specialization in research methods, instructional design, and
computer assisted language learning.
Acknowledgements
Funding for this research was provided by the
Social Sciences and Humanities Research Council of Canada (Grant No.
410-2000-1283). I am grateful to Marlise Horst and CMLR reviewers for helpful
comments on previous drafts.
Notes
(Endnote 1) While the Levels Test has effectively become the standard or in
any case most widely used vocabulary test, it should be mentioned that the test
was developed in an Asian context and that Nation felt it was "not
suitable for learners whose mother tongue is a language which has been strongly
influenced by Latin" (1990, p. 262). The analysis presented here examines
the prospects for any language-based vocabulary measure, and is not intended as
a criticism of the Levels Test, which indeed was originally developed for
specific learners.
(Endnote 2) A note on
terminology: referring to these words as 'Anglo-Saxon' is more a convenience
than an etymological claim. It is well known that many words popularly thought
to be native English actually originate in Latin or even French (McArthur,
1998). No less English a word than beef is from Latin (bovus) via
French (boeuf), although it is now fully nativized as indicated by its
acceptance of native morphologies (beefy). The point is less etymology
than learner perception (Carroll, 1992). In these studies, the term Anglo-Saxon
(AS) shall refer to 'non-cognate with French, usu. Anglo-Saxon.'
(Endnote 3) Theories about
the supposed difficulty of mastering the irregularities of AS must be weighed
against Corson's (1995) account of the inherent psycholinguistic ease of acquiring
and processing the AS lexicon. Its morphologies are relatively transparent,
based mainly on compounding of monosyllables (headland, rooftop)
and affixation that does not require additional phonological entries in the
mental lexicon (happy - happiness, neighbour – neighbourhood, as
compared to author – authority, certain – ascertain).
References
Abu Rabia, S. & Siegel, L.S. (1995).
Different orthographies different context effects: The effects of Arabic
sentence context in skilled and poor readers. Reading Psychology 16,
1-19.
Al-Hazemi, H. (1993). Low level EFL
vocabulary tests for Arabic speakers. University of Wales: Unpublished PhD
thesis.
Anderson, R.C. & Freebody, P. (1983). Reading
comprehension and the assessment and acquisition of word knowledge. In J.T.
Guthrie (Ed.), Comprehension and teaching: Research reviews (pp. 77-117).
Newark, DE: International Reading Association.
Carroll, J.B., Davies, P., & Richman, B.
(1971). The American Heritage Word Frequency Book. Boston: Houghton
Mifflin.
Carroll, S.E. (1992). On cognates. Second
Language Research 8, 93-119.
Cobb, T.M. (1997). From concord to lexicon:
Development and test of a corpus-based lexical tutor. Unpublished PhD
dissertation, Dept. of Education, Concordia University, Montreal. (Available at
http://www.er.uqam.ca/nobel/r21270/thesis0.html).
Cobb, T.M. (1999). Applying constructivism: A
test for the learner-as-scientist. Educational Technology Research and
Development, 47 (3), 15-31.
Cobb, T., & Horst, M. (In press). Carrying
learners across the lexical threshold. In J. Flowerdew & M. Peacock (Eds.),
The EAP curriculum. London: Cambridge University Press.
Cobb, T., & Horst, M. (1999). Vocabulary
sizes of some City University students. Journal of the Division of Language
Studies of City University of Hong Kong, 1 (1), 59-68. (Also at
http://www.er.uqam.ca/nobel/r21270/cv/CitySize.html.)
Cohen, A., Glasman, H., Rosenbaum-Cohen, P.R.,
Ferrara, J., & Fine, J. (1988). Reading English for specialized purposes:
Discourse analysis and the use of student informants. In P.L. Carrell, J.
Devine, & D. Eskey (Eds.), Interactive approaches to second language
reading (pp. 152-167). New York: Cambridge University Press.
Corson, D. (1985). The lexical bar.
Oxford: Pergamon Press.
Corson, D. (1995). Using English words.
Norwell, MA: Kluwer Academic Publishers.
Corson, D. (1997). The learning & use of
academic English words. Language Learning 47, 671-718.
Hulstijn, J.H. (1993). When do foreign-language
readers look up the meaning of unfamiliar words? The influence of task and
learner variables. Modern Language Journal 77, 139-147.
Hsia, S., Chung, P., & Wong,D. (1995). ESL
learners' word organization strategies: A case of Chinese learners of English
words in Hong Kong. Language and Education, 9, 81-102.
Hwang, K. (1989). Reading newspapers for the
improvement of vocabulary and reading skills. Unpublished MA thesis.
English Language Institute, Victoria University of Wellington, New Zealand.
Hwang, K., & Nation, P. (1994). VocabProfile:
Vocabulary analysis software. English Language Institute, Victoria
University of Wellington, New Zealand.
Johnson, R.K. & Ngor, Y.S. (1996). Coping
with second language texts: The development of lexically based reading
strategies. In D.A. Watkins & J.B.Biggs (Eds.), The Chinese Learner (pp.
123-140).
Hong Kong: Faculty of Education, Hong Kong University.
Koda, K. (1988). Cognitive processes in
second-language reading: Transfer of L1 reading skills and strategies. Second
Language Research 4, 133-156.
Laufer, B. (1994). The lexical profile of second
language writing: Does it change over time? RELC Journal, 25 (2), 21-33.
Laufer, B., & Nation, P. (1995). Vocabulary
size and use: Lexical richness in L2 written production. Applied Linguistics
16, 307-322.
Laufer, B., & Nation, P. (1999). A
vocabulary-size test of controlled productive ability. Language Testing, 16
(1), 33-51.
Lightbown, P., & Libben, G. (1984). The
recognition and use of cognates by L2 learners. In R. Anderson (Ed.), Second
languages: A cross-linguistic perspective (pp. 123-140).
Rowley MA: Newbury House.
Meara, P. (1980). Vocabulary acquisition: a
neglected aspect of language learning. Language teaching and linguistics: Abstracts,
13, 221-246.
Meara, P. (1993). Tintin and the world service:
A look at lexical environments. IATEFL: Annual Conference Report, 32-37.
Meara, P. (1996). The dimensions of lexical
competence. In G. Brown, K. Malmkjaer, & J. Williams (Eds.), Performance
and competence in second language acquisition (pp. 35-53).
Cambridge: Cambridge University Press.
Meara, P. And Buxton, B. (1987). An alternative
to multiple choice
vocabulary tests. Language Testing 4, 142-154.
Meara, P., Lightbown, P.M., & Halter, R.
(1994). The effect of cognates on the applicability of YES/NO vocabulary tests.
Canadian Modern Language Review 50, 296-311.
Mezynski, K. (1983). Issues concerning the
acquisition of knowledge: Effects of vocabulary training on reading
comprehension. Review of Educational Research 53, 253-279.
Miller, G.A., & Gildea, P.M. (1987). How
children learn words. Scientific American, 257 (3), 94-99.
Nagy, W.E., Herman, P.A., & Anderson, R.C.
(1985). Learning words from context. Reading Research Quarterly 20,
233-253.
Nation, P. (1983). Testing and teaching
vocabulary. Guidelines, 5 (1), 12-25.
Nation, P. (1990). Teaching and learning
vocabulary. New York: Newbury House.
Nation, P. (1994). New ways in teaching
vocabulary. Alexandra VA: TESOL Inc.
Nation, P., & Waring, R. (1997). Vocabulary
size, text coverage, and word lists. In N. Schmitt & M. McCarthy (Eds.), Vocabulary:
Description, acquisition, pedagogy (pp. 6-19). New York: Cambridge
University Press.
Olson, D. (1994). The world on paper: The
conceptual & cognitive implications of writing & reading. New York:
Cambridge University Press.
Qian, D. D. (1999). Assessing the roles of depth
and breadth of vocabulary knowledge in ESL reading comprehension. The Canadian
Modern Language Review, 56, 2, 282-307.
Ryan, A. (1997). Learning the orthographic form
of L2 vocabulary - a receptive and a productive process. In N. Schmitt & M.
McCarthy (Eds.), Vocabulary: Description, acquisition, & pedagogy (pp.
181-198). Cambridge: Cambridge
University Press.
Ryan, A., & Meara, P. (1991). The case of
the invisible vowels: Arabic speakers reading English words. Reading in a
Foreign Language 7, 531-540.
Schmitt, N. (1995). An examination of the
behaviour of four vocabulary tests. Paper presented at the Dyffryn
Conference, Centre for Applied Language Studies, University of Wales, Swansea,
April 1995.
Sinclair, J. (1991). Corpus, concordance,
collocation. Oxford University Press.
Singleton, D. (1997). Learning and processing L2
vocabulary: State of the art article. Language Teaching 30, 213-225.
Stahl, S.A. (1991). Beyond the instrumentalist
hypothesis: Some relationships between word meanings and comprehension. In
Schwanenflugel, P.J. (Ed.), The psychology of word meanings (pp.
157-186). Hillsdale, NJ: Erlbaum.
Sutarsyah, C., Nation, P., & Kennedy, G.
(1994). How useful is EAP vocabulary for ESP? A corpus based case study. RELC
Journal, 25 (2), 34-50.
Sweetser, E. (1990). From etymology to
pragmatics: Metaphorical and cultural aspects of semantic structure. Cambridge:
Cambridge University Press.
Treville, M.-C. (1996). Lexical learning and
reading in L2 at the beginner level: The advantage of cognates. Canadian
Modern Language Review 53, 173-190.
Willis, D. (1990). The lexical syllabus.
London: Collins Cobuild.
Worthington, D., & Nation, P. (1996). Using
texts to sequence the introduction of new vocabulary in an EAP course. RELC
Journal, 27 (2), 1-11.
Appendix
A. Computer adaptation of Levels Test, 2000 level
(screen picture)
Appendix B. French L1-adapted
Controlled Productive Levels Test, composed of AS items from three versions and
two levels (2000 and UWL).
1. This
sweater is too tight. It needs to be stre___.
2. The
rich man died and left all his we___ to his son.
3. If
you blow up that balloon any more it will bur___.
4. The
differences were so sl___ that they went unnoticed.
5. The
dress you're wearing is lov___ .
6. It's
the de___ that counts, not the thought.
7. He
is walking on the ti___ of his toes.
8. She
wan___ aimlessly in the streets.
9. This
year long sk___ are fashionable again.
10. They
had to cl___ a steep mountain to reach the cabin.
11. Plants
receive water from the soil through their ro___.
12. La___
of rain led to a shortage of water in the city.
13. Many
people in Canada mow the la___ on Sunday morning.
14. There
has been a recent tr___ among prosperous families towards a smaller number of children.
15. She
showed off her slen___ figure in a long narrow dress.
16. It
was a cold day. There was a ch___ in the air.
17. His
beard was too long. He decided to tr___ it.
18. You'll
sn___ that branch if you bend it too far.
19. You
must be aw___ that very few jobs are available.
20. The
airport is far away. If you want to ens___ that you catch your plane, you'll have to leave early.