The case for computer-assisted extensive reading
Tom Cobb
(designated in the text that follows as ‘Author’)
Dépt de
linguistique et de didactique des langues
Université
du Québec à Montréal
The case for
computer-assisted extensive reading
Abstract
About 10 years ago, Author & Stevens (1996) argued that the flood of text about to go online should be a boon for second language learners, and we proposed a number of ways that computers would be able to not only deliver this expanded supply of text but also enhance the amount of learning the text could provide by processing it in various ways both prior to and during delivery. In 2006, it seems safe to say that the amount, quality, diversity, and availability of such text has exceeded expectations. And yet it is not clear that the computer for its part is serving as more than delivery vehicle. This is a pity, because just as the text was more than expected, so are the opportunities for computers to do far more than simply download, distribute and print. Computer programs, accessing large shared text repositories, have a tremendous potential to resolve old questions about language learning for teachers and course designers and provide new and unique opportunities for large numbers of learners at low cost. I will provide concrete instances of questions resolved and opportunities provided in one exemplary domain, the theory and practice of extensive reading. Some parts of this paper take the form of a response to Krashen, a noted proponent of “buying books, not computers” if it comes to a choice. I hope to convince the reader that books and computers are now complements rather than choices.
The case for
computer-assisted extensive reading
Background: An ESL
dialogue
In scientific dialogues within applied linguistics, turn-taking can involve a delay of a decade or more. An example is a recent paper from Stephen Krashen (2003) entitled Free voluntary reading: Still a very good idea which criticizes the findings of a study I was involved in that called into question the amount of vocabulary acquisition resulting from pleasurable, meaning-oriented, private extensive reading (Horst, Author, & Meara, 1996, Beyond a Clockwork Orange). The study found that even with all the usual variables of a pre-post, empirical, extensive reading study held down rather more tightly than usual (e.g., more tightly than in some of Krashen’s own studies) the number of new words learned from reading a complete, motivating, level-appropriate book of 20,000 words was not sufficient to be the main or only source of vocabulary growth for a learner expecting to function in English any time soon in an academic or professional setting. (It should be noted at the outset that vocabulary growth, while only one of the potential benefits of extensive reading along with fluency, grammar, and other types of growth, is often used as simply the most measurable of the various outcomes.)
Krashen (2003) argued that studies like ours typically
underestimate the amount of lexical growth that takes place as words are
encountered and re-encountered in the course of extensive reading, in his
understanding of the term. Many words and phrases are learned that do not
appear in test results, but that is because of the crude nature of the testing
instruments, which typically have no way of accounting for partial or
incremental learning. In fact, he argues, word knowledge is bubbling under the
surface as one reads and may appear as a known item only some time later.
Over this period, Krashen’s views have remained largely unmodified. He remains convinced of the value of extensive reading yet has never really been able to prove the case, which ultimately rests on a sort of faith. Our views, on the other hand, have undergone some modification, and in some ways have come more closely into line with Krashen’s own and may even provide these with a firmer foundation than he has so far provided himself. A number of studies by Horst (2000) and Horst and Meara (1999) have investigated incremental vocabulary growth from reading through the use of what they describe as a matrix model. This model borrows the notion of vocabulary knowledge as a scale (including points such as, I do not know this word, I have seen this word, I think I know this word, and I know and can use this word in a sentence), specifically the vocabulary knowledge scale (VKS) as developed by Wesche and Paribakht (1996), but extends the scale longitudinally so that fine grained word knowledge can be tracked over time as the word is encountered and re-encountered. Seen in a matrix, vocabulary growth from reading is indeed more extensive than it may appear. But is it extensive enough to be a sufficient source of vocabulary growth for reading?
The matrix uses numbers from 0 to 3 to indicate the points on a modified VKS scale, as follows:
0 = I definitely don't know what this word means
1 = I am not really sure what this word means
2 = I think I know what this word means
3 = I definitely know what this word means
These numbers are then placed on a simple two-dimensional graph, with the same numbers appearing both top to bottom and left to right, as can be seen in Figure 1.
|
0 |
1 |
2 |
3 |
0 |
|
|
|
|
1 |
|
|
x |
|
2 |
|
|
|
|
3 |
|
|
|
|
Figure 1: From scale to matrix.
Every cell in the matrix is thus an intersection between two numbers, as for instance cell ‘x’ is at the intersection of 1 and 2. When a number representing a learned word ijs placed in this cell, that means that a word had been rated as a 1 (‘I’m not sure’) after a previous reading encounter, but then was rated as a 2 (‘I think I know’) after a subsequent encounter. In other words, cell intersections represent partial word learning as a result of one further encounter with a word. The movement between 1 and 2, or 2 and 3, represents an increase in knowledge of the word, but not enough of an increase to register on most standard vocabulary tests involving multiple-choice or production formats (for a review of standard vocabulary tests, see Read, 2000).
Employing a methodology of repeated readings of a literary novella, and a computer-based testing apparatus that allows us to test large number of words in a relatively short time, we have been able to trace the ups and downs of word knowledge that normally pass below the radar of conventional tests. In one study, Horst tracked 300 words through several readings of a German novella, and after each reading identified the number of words were at each knowledge level as compared to the previous reading. Each additional reading produced a new matrix, with the 300 words distributed slightly differently over its 16 cells for every reading. Authentic data for one pair of readings from this study, as reported in Horst (2000), is shown in Figure 2.
|
0 |
1 |
2 |
3 |
0 |
75 |
27 |
9 |
3 |
1 |
4 |
20 |
20 |
6 |
2 |
2 |
4 |
13 |
35 |
3 |
0 |
0 |
7 |
75 |
Figure 2: Movement between readings
The numbers in the cells refer to the number of words inhabiting each intersection point, with the row label indicating the previous knowledge level for those words and the column label indicating current knowledge. The bold numbers on the diagonals represent the number of words that had not moved between readings (75 fords in Figure 2 were rated 0, or unknown, in the previous reading, and were still rated the same following a second reading, etc.). Notice that words to the left of the diagonal are words that have lost ground since the previous reading—rated as less known than previously—while words to the right have gained ground.
Through simple addition, one can see that there are more words above the diagonal (27 + 9 + 3 + 20 + 6 + 35 = 100) in Figure 2 than there are below it (4 + 2 + 4 + 7 = 17), and hence that more are gaining ground than are losing ground. Over the several readings and matrix calculations of Horst’s study, it became clear that it was indeed the case that words were generally making progress over the course of several readings, much of which would nonetheless not have registered on a standard vocabulary test with an all-or-nothing assumption about word knowledge (such as Nation’s classic, 1990, Vocabulary Levels Test, or Laufer & Nation’s, 1999, update). Only 44 of Figure 2’s words (3 + 6 + 35 = 44) had moved into “I know this word,” a movement that might have shown up on a vocabulary test, but another 56 had made lesser gains that probably would not have. To summarize, then, while the implications of this methodology are still being worked out and will appear in forthcoming reports, it is already clear enough that Krashen is right: there is more word learning from extensive reading than meets the eye.
This modification of our views, however, is in some ways rather minor, inasmuch as we still do not consider that extensive reading as traditionally practiced could ever be the only or even main source of vocabulary growth for a second language learner. That is because however incremental the learning of encountered words may be, it still requires that words be encountered in sufficient number, and simple corpus research makes it clear that this will not happen.
How many words are enough to begin a serious undertaking in a second language, such as academic study or professional activity? Vocabulary researchers like Laufer (1992) and Hirsch and Nation (1992) tag the number at a minimum of 3000 word families, provided these are carefully selected for frequency and text coverage. It has also been shown that between six and ten encounters are needed for learning to occur (Zahar, Author & Spada, 2001), and in our own matrix work it seems that at least six encounters are needed for words to travel reliably from rating 0 to 4 and stabilize there. Will 3000 word families be met six times through extensive reading?
A computer program called Range (developed by Heatley & Nation, 1994; adapted for Internet by Author; available at the Compleat Lexical Tutor website, http://www.lextutor.ca/range; shown in Figures 3 & 4) takes a user’s text string as input and determines how often this occurs in a broad corpus of written English. This corpus (the Brown corpus, Kucera & Francis, 1979) is divided into 15 sublists, from science to fiction to law. For the purposes of the present argument, the Brown corpus can represent the most and most diverse text an ESL reader could possibly read in a year or two of extensive reading (most learners would obviously read both much less and much less broadly). The surprise finding is this: after the most frequent 1000 words of English, words thin out quite rapidly and a rarely met again. Here are some figures from the second thousand and Academic Word List components, which Nation and others include in the necessary lexicon of 3000 word families.
Figure 3: Range for word distributions – input (bottom half of screen, all forms of family abandon’)
Figure 4: Range for word distributions – output
The distribution of the word family abandon throughout the Brown corpus is requested in Figure 3 and shown in Figure 4. The point to notice is that while the item appears in 13 out of 15 sub-corpora, it appears more than six times in only three of them (press editorials, popular lore, and biographies). Patchy patterns like this are quite general for even medium frequency words. Table 1 shows the distributions in the 15 Brown sub-corpora of six word families from the high frequency 1000 list (not including function words), while Tables 2 and 3 show distributions for six word families from higher up in the 2000 list and another six families from the academic word list (AWL). Readers can visit Lextutor for themselves and enter their own words into the program.
1000-level word
family |
occurrences
in 1 million words |
Present
in how many from 15 sub-corpora |
in how
many with 6+ occurrences |
car |
285 |
13 |
10 |
house |
760 |
15 |
14 |
country |
510 |
15 |
15 |
able |
216 |
15 |
15 |
add |
715 |
15 |
15 |
admit |
473 |
15 |
15 |
Mean |
493.2 |
14.7 |
14 |
S.D. |
200.5 |
.7 |
1.8 |
Table 1: Word distributions for high frequency words
2000-level
word
family |
Occurrences
in 1 |
Present
in how many of 15
sub-corpora? |
In how
many with 6+
occurrences? |
accuse |
46 |
9 |
3 |
accustom |
15 |
10 |
0 |
ache |
4 |
3 |
0 |
admire |
10 |
8 |
0 |
afford |
58 |
12 |
4 |
Alike |
20 |
10 |
0 |
Mean |
25.5 |
8.7 |
1.2 |
S.D |
19.7 |
2.8 |
1.7 |
Table 2: Word distributions for medium frequency words
AWL |
Occurrences
in 1 |
Present
in how many of 15
sub-corpora? |
In how
many with 6+
occurrences? |
abandon |
59 |
13 |
3 |
academic |
95 |
13 |
6 |
accumulate |
29 |
6 |
3 |
achieve |
223 |
13 |
8 |
acknowledge |
32 |
11 |
1 |
acquire |
100 |
13 |
6 |
Mean |
89.7 |
11.5 |
4.5 |
S.D |
65.6 |
2.6 |
2.4 |
Table 3: Word distributions for AWL words
We can now answer the question whether 3000 word families have any reasonable chance of being met six times through extensive reading. If the Brown corpus represents the language at large, as it was designed to do in 1979 and presumably still does, then it seems clear that an extensive reader following his or her interests through one or two text domains or sub-corpora would meet the most frequent 1000 words in great abundance in any domain, but will meet even slightly less frequent words only intermittently and probably not often enough for reliable learning to occur. A word as common as ache would not be met six times in any of the Brown’s 15 sub-corpora; a word as common as accumulate would be met six times in only three of the Brown’s sub-corpora. In other words, meeting any significant portion of the critical 3000 words six times apiece in extensive reading is rather unlikely.
In summary, while there may be more word learning than meets the eye from random encounters in extensive reading, as Krashen believes, most words will simply not be encountered and hence the fact is interesting but irrelevant. The distribution of words in English simply does not allow a sufficient number of encounters to take place in this manner.
There are a number of teaching strategies that can increase the odds of words being encountered and learned. Some of these involve the direct teaching of vocabulary, for example through the classroom use of word lists or supplementary vocabulary course books, and of course through ad hoc teacher attention to vocabulary queries in class. But there are also strategies than can be devised to increase the odds for extensive reading itself, and many of these involve the exploitation of recent developments in computer technology (assuming that the extensive reading materials are in machine-readable format, as is the case increasingly). The rest of this paper will outline several of these strategies. The means for implementing these strategies are available to teachers or researchers on the website mentioned above. The format of the remaining presentation will adopt a framework of problem, solution, research on the solution, and proposals for distributing the solution to others. Inevitably some of the solutions overlap, but they have been given independent treatment if they involve independent research questions.
Problem 1: The number of encounters with new words is lower than it needs to be because learners do not always recognize a word they have met in text when they meet it again in speech.
It is common for many advanced learners to have an extensive lexicon of medium and lower frequency items for which they have only weak or uncertain sound representations. For this reason, if they re-encounter in speech a word previously met in reading, they often do not recognize it as a second encounter or capitalize on the further learning opportunity. How big a difference could it make if learners knew the pronunciation for every word they met in reading? Corpus evidence can provide an idea.
It is well established that conversational English comprises mainly (about 90%) 1000-level word families, and conversely that post-1000 items are mainly to be found in written texts (Stanovich & Cunningham, 1992). But this does not mean that less frequent words are totally absent from spoken English. The same sample of words referred to above were put through another of Range’s distribution comparison routines, this time one comparing similar sized corpora (roughly 1 million words apiece) of spoken and written British English as found on the BNC corpus sampler collections (top left of the screen in Figure 3). Tables 4 and 5 show the pattern of these distributions for 2000 and AWL-level medium frequency words.
2000-level
word
family |
Occurrences
in 1 million words of writing |
Occurrences
in 1 million words of speech |
accuse |
64 |
7 |
accustom |
42 |
2 |
ache |
5 |
5 |
admire |
61 |
14 |
afford |
43 |
78 |
alike |
20 |
4 |
SUM |
235 |
110 |
MEAN |
39.17 |
18.33 |
SD |
23.03 |
29.52 |
Table 4: Speech vs. writing for medium frequency words
AWL-level
word
family |
Occurrences
in 1 million words of writing |
Occurrences
in 1 million words of speech |
abandon |
45 |
6 |
academic |
81 |
6 |
accumulate |
30 |
11 |
achieve |
199 |
91 |
acknowledge |
34 |
14 |
acquire |
158 |
10 |
SUM |
547 |
138 |
MEAN |
91.17 |
23.00 |
SD |
71.19 |
33.45 |
Table 5: Speech vs. writing for AWL words
These tables show that post-1000 items are indeed found a great deal more in text than in talk—more than twice as much for 2000-level words, and nearly four times as much in the case of AWL words. From another point of view, however, one can say that if learners knew how every word sounded that they had met and noticed in reading, then they could increase the number of occurrences of new words, in the sense of recognizing them as re-occurrences when they encountered them again in speech, by as much as 46% (110/235 x 100) for 2000-level words and 25% (138/547 x 100) for AWL words. Of course, as in the case of the Brown corpus above, these occurrences may well be unevenly distributed within the BNC sampler corpora, but these are not broken into sub-corpora so this cannot be easily determined. For instance, acquire in Table 5 could well be piled up largely in the second language acquisition corner of the corpus, so that engineering students might be unlikely to encounter this item.
Some distributional information for medium frequency words in speech, at least as employed in academic contexts, can be gathered from the University of Michigan’s MICASE (Michigan Corpus of Academic Spoken English) corpus and website (Simpson, Briggs, Ovens, & Swales, 2003). This corpus is almost double the size of those consulted above, at 1,848,364 words broken down across several topics, situation and speaker types, and domains (although unfortunately not across the Brown domains so that this comparison is rough rather than precise). Table 6 shows a small sample of AWL words as broken down by topic areas. Again, the reader can expand the sample by visiting the site and entering additional words, at http://www.hti.umich.edu/m/micase/.
AWL-level
word
family |
Occurrences
in 1.8 m words of speech |
|
||||
|
Biology-
Health Sciences |
Arts, Ities |
Social
Sciences, Education |
Physical
Sciences, Engineering |
Occur- rences in 1.8
m |
Occur. in BNC
spoken,
1m |
abandon |
4 |
12 |
3 |
36 |
55 |
6 |
accumulate |
29 |
2 |
21 |
22 |
74 |
11 |
achieve |
10 |
13 |
59 |
11 |
93 |
91 |
acknowledge |
7 |
20 |
7 |
4 |
38 |
14 |
acquire |
4 |
27 |
16 |
26 |
73 |
10 |
SUM |
54 |
74 |
106 |
54 |
333 |
132 |
MEAN |
10.80 |
14.80 |
21.20 |
10.80 |
66.60 |
26.40 |
SD |
10.47 |
9.36 |
22.30 |
10.47 |
20.89 |
36.23 |
Table 6: AWL items in academic speech vs. general speech
Two points emerge from this brief look at the MICASE data. The first is that, again, encounters are likely to be piled up rather unpredictably rather than evenly distributed (as acknowledge is piled up in mainly in Arts and Humanities in Table 6). The second is that, nonetheless, if a learner was learning English in order to function within an academic environment, then spoken language within this environment yields a somewhat higher proportion of post-1000 items than does the spoken language generally. Accumulate appears 74 times in 1.8 million words of spoken academic, as against 11 times in 1 million words of general English (or, if one can extrapolate, 11 x 1.8 = 20 times in a general speech corpus of equal size). Lesser but still substantial advantages for the academic corpus are shown for abandon, accumulate, acknowledge, and acquire (but not for achieve). To conclude, it seems safe to say that the 25% increase in occurrences shown above for knowing how words are pronounced in general spoken English could be somewhat greater within target domains such as academic speech.
How can we ensure that learners have full access to the pronunciation of any new word they happened to come across in their reading? Lextutor’s builder routines offer two ways of doing this.
Solution 1: If learners are reading a text on a computer screen and have access to the Internet, then Lextutor gives them the means to access the pronunciation quickly and simply for any word in the text. At one of the website’s Hypertext builder routines (available at http://www.lextutor.ca/hypertext/), learners can enter any text they happen to be reading into a text input and click to transform it into a text with literally every word linked to a text-to-speech engine giving a tolerable (or better) pronunciation of the word. This requires a once-only download of a speech plug-in (free from Macromedia), which once completed allows instant pronunciations that should not distract readers from their reading unduly.
Solution 2: Of course, there is no guarantee that one audition of a word leads to a stable memory for its contours or a stable link between its phonetic and semantic features. If learners or their teachers wish to ensure that a particular set of words is heard again soon, they can make use of Lextutor’s Dictator routine (at http://www.lextutor.ca/dictator/), which transforms any word list into a text-to-speech based spelling dictation activity, in either practice or test formats. In Figure 5, a learner has created a training exercise to practice spelling the words he or she hears. The learner clicks a word to hear it, tries to spell it, and is given help with any errors. The help is provided by the resident Guidespell tutor (first piloted in Author, 1997a). Guidespell tells the student how many letters were correct in the attempt to spell accompany. When ready, the learner can enter the same words into a Test version of the program, shown in Figure 6, where there is no help but simply a score presented when all words have been entered.
Figure 5:
Dictator training activity under way
Figure
6: Dictator in test mode.
Hypertext
and Dictator thus provide two approaches to helping learners form
sound-spelling correspondences, in the goal of increasing the likelihood
that words met in text will be
recognized when they are met again in speech and hence in sufficient repetition
to be remembered and learned.
Research on Dictator: Realized in December 2004,
Dictator has not yet been subjected to substantial empirical testing.
Current
work on Dictator (December 2005): To fully integrate this
somewhat drill-like component within a reading-for-meaning context, an
interface is being developed that allows learners to click a word into a text
box for later relaying to the Dictator routine, for offline practice online, as
it were. A prototype of this feature can be seen at www.lextutor.ca/CallWild/. As readers proceeds through the text Call of the Wild, they can alt-click
any word into the gray box at the top of the page, and then at some later point
send these words to Dictator by clicking ‘spell_it.’ This linkage is shown in
Figure 7. Sometime in 2006 it will become possible for learners to link their own
texts to this resource.
Figure 7:
Linking resources
Problem 2: Words are often simply forgotten between encounters, even within the
same text.
Research
indicates that, on the one hand, new words tend to get ignored if they require
a great deal of effort to process, but, on the other hand, they tend to get
forgotten if they require very little effort to process (Mondria & Wit-deBoer, 1991). In other words, it seems
the conditions for retention of words from reading are rather particular and
may present themselves only occasionally, which may be part of the reason that
a minimum of six and as many as ten encounters are needed even for initial
retention. Learners need some way to keep track of whether they have seen a
word before and, if possible, to revisit the previous occurrence, if possible
without incurring a major exit from their current reading.
Solution 1: One solution is to offer readers a quick way of
clicking on a word and recording it for later reflection without losing the
thread of the story, as the user shown working in Figure 8 has clicked (with
the Alt-key held down ) on the words toil and groping from the
first chapter of Jack London’s classic tale Call of the Wild, located at
http://www.lextutor.ca/CallWild.
Figure 8:
Tagging toil and groping for post-reading attention
Solution 2: Another solution is to link a text to a Story
Concordance. A reader clicks on any word in the reading text, and receives a
full accounting of all the occurrences of the same word as already seen or yet
to be seen in other parts of the story. The reader in Fig. 8 has just found out
that the rather odd word toil might be worth paying attention to as it
occurs in five of the chapters to come.
Research on Story Concordance: An earlier version
of this concept was tested in Author, Greaves & Horst, 2001, with a strong
learning effect found for a suite of text-integrated resources although story
concordancing has not been tested independently. There is as yet no Builder version
of the program, owing to the complexity of handling user text input in the form
of distinct chapters or sections to provide the output shown in Fig 7.
Problem
3: The semantic features comprising word meanings are distributed over several
occurrences of words, so that integrated meanings are slow to construct.
As already noted, post-1000 words are distributed thinly in natural
language. But the semantic features comprising the meanings of these words are
distributed even more thinly. The whole set of semantic features underpinning the
concept represented by a word of any complexity is inevitably not present in
every occurrence. What this means for a learner building a vocabulary from
reading is that even if words are not forgotten between encounters, there may
not be enough information in a single encounter, or even in a number of
encounters, to provide more than a partial sense of its meaning.
For
instance, the semantic deep structure of a common word like work embraces features ranging from
doing a job for pay (‘work at a store’), the job itself (‘it’s my work’), an
effort expended not necessarily for pay (‘work on my car’), the correct
functioning of a device (‘it works now’), to instances of high art (‘a work of
Shakespeare’), but only one or two of these features are present in a sentence
like “What are you working on?” The word learner is thus required not only to
remember features of words from occurrence to occurrence, but at the same time
to be revising, updating, and especially integrating hypotheses about how these
fit together.
Solution: One way of showing
learners several pieces of a word’s meaning all at once, so they need not
attempt to gather them all up for themselves with attendant forgetting and
backtracking, is to present the word in a concordance. A concordance for ‘work’
from the Brown corpus is shown in Figure 9. Even a relatively small concordance
reveals to an observant (or possibly to a trained) learner such information as
the main parts of speech for the word (the work, we’ll work, to work),
several of its senses (take my car to work, work on my kicking, an idea that
would not work), and its main collocations (work for, work at, work out,
and especially work on).
1 after the board of canvassers completes its work. A difference of opinion arose between Mr
2 Authority bonds for rural road construction work. #A REVOLVING FUND# A01 1310 4 The depa
3 ghes Steel Erection Co. contracted to do the work at an impossibly low cost with a bid that wa
4 lta Sigma Pi at Lamar Tech, and did graduate work at Rhodes University in Grahamstown, South A
5 bomb tore his car apart as he left home for work. Battalion Chief Stanton M. Gladden, 42,
6 blic relations director, resigned Tuesday to work for Lt. Gov. Garland Byrd's campaign. A01 1
7 07 1230 7 #MISSIONARY EXPLAINS# "I don't work for the Government", the American said. "I'm
8 0 12 scrimmaged for 45 minutes. "We'll work hard Tuesday, Wednesday and Thursday", Meek
9 home so that he could take his other car to work. "I'd just turned on the ignition when th
10 school teaching certificate. A normal year's work in college is 30 semester hours. A02 1430
11 A. Berger firm, a Philadelphia builder, for work in the project. The second agreement perm
12 rk out about an hour on Saturday, then we'll work Monday and Tuesday of next week, then taper
13 f cars "might not be realistic and would not work". Mrs. Molvar asked again that the board
14 ARTIST# Mrs. Monte Tyson, chairman, says the work of 100 artists well known in the Delaware Va
15 overhauling of 102 joints. The city paid for work on 75, of which no more than 21 were repaire
16 ly involve failure to perform rehabilitation work on expansion joints along the El track. The
17 e. "This year, coach Royal told me if I'd work on my place-kicking he thought he could use
18 ales will begin and contracts let for repair work on some of Georgia's most heavily traveled h
19 - His miracles - His substitutionary work on the cross - His bodily resurrection fr
20 g, said the transit company is reviewing the work on the El. "We want to find out who knew
21 rty, appeared on payment vouchers certifying work on the project. Varani has been fired on cha
22 as completed after nearly eighteen months of work on the question of the organization of the U
23 bly will have a good scrimmage Friday. We'll work out about an hour on Saturday, then we'll wo
24 several more drafts". Salinger said the work President Kennedy, advisers, and members of
25 aborers go home Tuesday night for some rest. Work resumed Wednesday, he said. Mr. Schaefer
26 , stressed the need for the first two years' work. "Surveys show that one out of three Amer
27 e traditional visit to both chambers as they work toward adjournment. Vandiver likely will men
Figure 9: Lines
from the Brown corpus for work
However, there are
at least three major problems with using concordances as an aid to building a
second lexicon. First, learners do not usually have a concordancer handy when
they are reading, but would have to write the word down with a certain amount
of context and look it up in a concordancer later. Second, most full-blown
corpora like the Brown are likely to include a high proportion of other words
the learner will not know in addition to the one they are looking up. Third,
the single chopped-off lines of the concordance format, while designed to
highlight immediate formal patterns such as collocation, also reduces the
amount of semantic context to a level below what learners may need to identify
semantic features and integrate meanings of new words.
Figure 10: Making
concordance information accessible and comprehensible
Lextutor offers
teachers and learners responses to each of these problems, all of which can be
seen in further developments of the Call of the Wild Story Concordancer. The first
development addresses the access problem. As already seen, any word in the
story when clicked generates a concordance instantly in a window within the
same eye-span, or frameset. The learner can thus compare several examples of the
word along with the original at the same time with minimal exits from the
story. The second problem, of unknown items within the concordance, is
addressed by the fact that the concordance is recursive (any word clicked in
the concordance itself generates a new concordance which may shed light on the
unknown word), and that it derives not from a general corpus but from a
collection of other works by the same author, in this case Jack London. A
same-author corpus should mean that the range of lexis and types of contexts is
somewhat constrained relative to a general corpus, has extensive re-cycling
built into it, and offers a consistency of tone and style that learners can
habituate themselves to. In the screen print shown in Figure 10, the user has
clicked the link “progeny in
other Jack London stories” (not shown) and is presented with uses of
this word from other works like White Fang and Martin Eden. The
third problem, the small contexts and chopped off lines, is resolved by
building in a mouse select-and-release feature where the learner selects
several words, releases, and is delivered a series of much expanded contexts
either from the original text or from throughout the London opus, depending
where the request is launched (as shown for the phrase helpless progeny
in Figure 11). For truly astute learners, this same feature allows them to
explore an author’s trademark collocations and grammar preferences.
Figure 11:
Comprehensibility through same-author corpus
Figure 12: Linking
Brown concordances to user input texts
Research: At least two interesting questions can be asked
about this linked-concordance work. The first is the general question of
whether text-linked resources are a help or a hindrance to second language
readers. Some research suggests that the main effect of adding any resources to
a reading task is simply to increase the cognitive load. This question is
currently receiving a good deal of attention in the research literature (e.g.,
Chun & Payne, 2004). The second and
more specific question is whether working with concordances, however
accessible, however integrated into an ongoing reading task or linked to a
tailor-made corpus, can facilitate concept integration for language learners?
In one of my own
research studies (Author 1997a, 1997b; 1999), I proposed that degree of
transfer of word knowledge to a novel context should reflect the degree to
which a word had achieved a complex semantic representation, inasmuch as a
novel context is unlikely to have the exact semantic features present in the
word’s initial encoding. Subjects in a series of experiments learned words over
several weeks using either small bilingual dictionaries or else purpose-built,
monolingual concordances. They were then asked to match learned words to short
definitions as well as integrate them into a rational cloze passage for a text
they had never seen (that embedded the target lexis in contexts made up of
words that had been previously taught and tested). These tasks are shown in
Figure 13. After an extensive training period, students in both control
(dictionary) and experimental (concordance) groups had improved equally in
their ability to match words to short definitions, but the experimental group
had significantly greater ability to apply learned words to novel contexts.
These results are shown in the line graphs in Figure 14. (These figures along
with further details are available online in the author’s doctoral study,
online.) This result was replicated a number of times and at a number of
levels.
Figure
13: Testing two kinds of lexical knowledge – definitional and contextual
Figure 14: Better
transfer to novel texts for concordancers
Current work on Click-On Concordancing: Builder
versions
Lextutor can incorporate a user’s text into a
suite of reading resources including most
of those seen on the Call of the Wild page, available at http://www.lextutor.ca/hypertext/. A screen print
based on a user’s text is shown in Figure 12. In Figure 12, the corpus accessed
by clicking on words is the Brown corpus, which of course has the problems
mentioned before. It is not currently possible to allow teachers or learners to
load their own corpora into a web-based concordance. However, a number of
experiments are under way on Lextutor to allow significant upload of user texts
of up to 50,000 words (about the size of a Jack London story), including a Text
Concordancer (which the reader can inspect at http://www.lextutor.ca/concordancers/text_concord/
). Also, more learner-friendly corpora are being developed to replace the Brown
in the Hypertext routines, including a corpus of simplified readers that has
recently become available.
Current work on Click-On Concordancing:
Better integration with reading for meaning
As with the Dictator routine mentioned above, experiments are under way
to allow learners to store up words for later submission to a concordancer with
no or minimal exit from the ongoing reading task. The problem here of course is
that while Dictator is inherently designed to handle several words at once,
concordancers normally deal with one word or phrase at a time. A current plan
is to send several words to a multi-concordancer all together, from a stored
text box which a learner fills with words for later consideration. A trial
text-box submission can be seen in conjunction with the Academic Word List at www.lextutor.ca/ListLearn (click AWL); multi-concordancing is discussed in greater detail in another
context below.
Problem 5: Beyond
the low and medium frequency levels, words appear so infrequently that learners
have almost no chance of learning any significant portion of them.
As vocabulary
acquisition proceeds beyond the 3000 word level, the likelihood of learners
meeting many of the remaining 20,000 or so word families of English known to
native speakers becomes very poor indeed. Some learners may not aspire to know
all the words that native speakers know, and for these learners 3000 may be enough,
or as counselled by Nation and colleagues, it may be time for their efforts to
focus on strategy development or reading within an academic or professional
domain (Nation, 2001). However, many learners do aspire to full membership in a
second community or culture, and for these learners post-basic vocabulary
growth is a slow and haphazard process.
Figure
15: Sharing new acquisitions.
Solution: Advanced vocabulary acquisition is normally a
solitary process, but it need not be. In a class of 20 advanced learners, if
each one met 50 words in a month of extensive reading, then that would amount
to 1000 words (possibly with some redundancy) for the group as a whole.
Networked computing should in principle make it possible for such a group to
share lexical acquisitions, while at the same time providing for further
encounters, retrievals, and novel contextualizations, in line with points
raised above. Such is the goal of the Group Lex Database at http://www.lextutor.ca/group_lex/demo/, a set of web
pages allowing learners to enter words from their reading, share words with
others, quiz themselves on some or all of the words, and quiz themselves with
the same words in novel contexts. Figure 15 shows the words as initially
entered (in this case, by random visitors to Lextutor). Several areas on the
screen shown are hyperlinked to different sorting and extraction options; for
example, clicking on a name will extract all the entries for that name, or
similarly for a subject area like ‘Arts’ or other groupings. The quiz option
allows a user to select several words for retrieval practice, as shown in
Figure 16. This retrieval is of course
within then original context, but a click on the ‘Tougher Quiz’ option takes
quiz-takers to a new task (Figure 17) that asks them to plug these same words
into gapped multiple concordance lines from the Brown corpus – i.e., to
transfer their meanings to a novel context (to re-visit a theme from
above).
Figure
16: Learner-designed, collaborative instruction
Builder versions: The complexity of
these multi-page, database oriented programs has until now delayed the
development of Builder (i.e. user produced) versions of Group Lex. However,
several dedicated versions have been set up for roughly 25 teachers in various
corners of the ESL world over 2004-2006, some of them reporting on their work
at international conferences (e.g., TESOL 2005).
Research: Some initial
research has been concluded on learner use of Group Lex and is reported in a
paper in Language Learning & Technology (Horst, Author &
Nicolae, 2005). Questions so far investigated include learning effects,
resource use preference, and ability of learners at different levels to
generate contexts and definitions that their peers can make sense of in the
quiz routines.
Further
development: First, programming is almost completed to connect Group Lex directly to
learner texts, just as the various resources are linked to texts in some of the
examples above. Learners will use their mouses to select an example sentence
containing a target word, which on mouse release will be sent to an input form
for Group Lex. Second, code is being developed to allow for teacher controlled
auto-archiving of a word-set when it has reached a certain size, or an assigned
text is completed, etc.
Figure
17: Transfer to novel contexts - revisited
To conclude Part
II, it seems clear that properly designed computer programs properly used can
substantially increase the number of exposure to new words through reading. But
is it substantially enough? Some quantification and empirical investigation has
been completed, but more remains to be done. In the mean time, computer
programs can not only increase the number of exposures but also help teachers
and learners do more with the exposures available. This is the topic of the
next section.
Up to now this
paper has shown a number of ways computers can increase the number of exposures
to words. Now we turn to a different dimension, what computers can do to
improve the quality of an individual exposure. When a new word is met, there
are two things a learner can do with it if he or she decides to give it some
attention. One is to look it up in a dictionary, and there are many high
quality learner dictionaries now available for this purpose. The other is to
attempt to infer a meaning from the ongoing context. However, both these
strategies present problems. Dictionaries take the reader out of the text,
physically and mentally, and almost certainly disrupt the flow of reading.
Contextual inference (and probably successful dictionary use as well) is only
reliable if 95% of the words in the context are known (Laufer, 1992; Nation,
2001), and this is rarely the case for all but the most advanced learners.
Reading on a computer may be able to address both these problems.
Problem 1:
High-quality dictionaries can improve text readability, but at the same time
they disrupt the flow and possibly the pleasure of reading.
Several publishers
of ESL materials, notably Longman and Cambridge, have recently invested in well
researched and designed learner dictionaries for intermediate and advanced
learners. Nonetheless, studies of dictionary use (e.g., Hulstijn, Hollander, &
Greidanus, 1996) suggest that however beneficial even sophisticated language
learners may believe a dictionary to be, they will not use them extensively
while reading if they believe them to entail an exit from the reading task
itself. This of course might not happen as much if the resources could be
directly integrated into the text that the learner was reading. This ideal has
recently become possible as free online versions of these dictionaries have
become available.
This is precisely
the object of the dictionary option at http://www.lextutor.ca/hypertext.
The reader copies a text into a Web form, chooses from a menu one of four
excellent online dictionaries (including the online versions of the Longman LDOCE
and the new Cambridge Advanced Learner’s), and the program wires text
and dictionary together so that a click on any
word in the text produces the relevant definition in a window just beside
the text. (It should be noted that this any-word feature depends on the fact
that all of these dictionaries are fully lemmatized, or fleshed out as word
families, so that clicking on cats in the text produces the entry for cat,
for instance, as oppposed to a ‘Not Found’ notice.) The learner working in
Figure 18 has connected the Cambridge Advanced Learner’s Dictionary to a text
on Cell Phones and Driving, and run immediately into an unknown word, ban.
A click on the word generates a well thought-out definition in roughly one
second. Is this significant disruption, or not?
Figure
18: High-quality any-word click-up definitions online
Research: A repeated-readings case study by Author, Greaves
and Horst (2001) compared two Anglophone learners reading similar sized
extensive texts in their target languages, one (a German learner) reading a
German novella on paper, and the other (the French learner) reading a French
version of the Call of the Wild page adapted for Guy de Maupassant’s
novella Boule de Suif (www.lextutor.ca/bouledesuif/). Vocabulary
expansion resulting from the readings was used as the measure of reading
success and learning value. The offline reader simply read his text the
required number of times, while the resource-assisted reader could access a
dictionary and several other resources on a click-on basis. More than 60% of
the resources used involved the dictionary. Learning was tracked for several
hundred single-occurrence words in both texts (a quantity of test items made
possible through the employment of a computer). Vocabulary growth was roughly
double for the dictionary-linked reading experience, which was perhaps not
surprising. More interesting, however, for the present research question, was
that time-on-task was no greater for the online reader. In other words,
look-ups were not consuming large amounts of reading time, and seem to have
been an adjunct to reading rather than a disruption of it, as indicated by the
subject’s report as well as the time record.
Research – open
questions: The research reported here was merely
preliminary. First, no similar experiment has yet been undertaken for a larger group of
learners. Second, it is still not fully
established whether easy look-ups do or do not constitute a significant exit
from reading the text (integrating propositions, integrating prior knowledge,
constructing extra-textual inferences, etc.) Third, studies are needed that
compare the effects of online vs. offline resources for both vocabulary growth
and reading comprehension.
While text comprehensibility is somewhat linked to learner strategies,
task demands, and topic familiarity, as a general rule a readable text is one
where 95% of the vocabulary is known to the reader, or in other words where the
new-to-known ratio is no greater than
1:20. This is the point where enough of the text is in focus for
comprehension tests to be passed, new vocabulary reliably inferred, and reading
to become less effortful and more pleasurable (Laufer, 1992; Nation, 2001). But
how can such texts be located, or created?
Graded readers are often able to provide opportunities to meet new words
in low-density environments, and there can be little doubt that these readers
should be in far greater use than they are at present. However, there are two
problems with implementing graded readers: they tend toward children’s interest
levels (adventure stories and the like), and it is difficult to build a
collection that caters to a wide enough range of interests, particularly adult
interests. Successful graded reading programs ultimately depend on a growing
supply of teacher adapted texts. And yet how such texts are to be adapted, even
just from the lexical point of view, is not obvious. The rest of this paper
introduces tools that can help with the job.
Solutions
Successful grading of texts depends, first, on having some way of
defining the lexical levels of both learners and texts, and fortunately it is
possible to do this. On one side, vocabulary tests are available (with
limitations as noted in Part I above) that can indicate a learner’s rough
vocabulary size in terms of 1000 word-frequency levels (as devised by Nation,
1990; Laufer & Nation, 1999; Schmitt, Schmitt & Clapham, 2001; some of
these available at www.lextutor.ca/levels/). On the other side, a
computer program is available that analyses texts in terms of this same
1000-levels scheme. This program is Nation and Heatley’s (1994) Vocabprofile,
also adapted for Internet and available through Lextutor (at www.lextutor.ca/vp). In principle, these
two analyses should put texts and learners into contact with each other. Texts
can be found, written, or adapted to match particular learners’ abilities—not
easily, of course, but this technology at least makes it possible.
For example, the profile
for Chapter 1 of the fictional work Call of the Wild shown in Figure 19
indicates that just over 80% of its words come from the first 1000 word
families of English, so that a somewhat larger vocabulary than 1000 word
families would be necessary for a learner to make any sense of this particular
text. A learner knowing 1000 word families would be facing a new-to-known ratio
of about one word in five, not one word in 20. For such a learner, a text with
90% or higher at the 1000 level would be more suitable.
Figure 19: VocabProfile for call of the Wild, Chapter 1
One potential problem
with the above analysis however, is that over the course of an extended text,
words are presumably repeated a good deal and thus have a chance of being
learned on the fly, so that the new-to-known ratio could be substantially
reduced by the end of the story. To what extent does this happen? Are
intermediate learners rewarded with reasonable new-to-known ratios if they
struggle through the first few chapters of a book written for native speakers,
like Call of the Wild?
The answer to this
question is provided by another of Lextutor’s analytic tools, Text_Lex_Compare
(www.lextutor.ca/tools/text_lex_compare/),
which identifies the new (i.e., different) words in a second text in comparison
to a first text or set of previous texts. This program reads in texts of up to
twenty chapters and identifies, counts, and lists the new items appearing in
each. It further automatically links these items to the VocabProfile program
mentioned just above for frequency evaluation. This program input is shown in
Figure 20, with the first two chapters of the same story in position for a
lexical comparison. The output is shown in Figure 21.
Figure 20: Text_Lex_Compare input
Figure 21: Text_Lex_Compare output
The output shows that
there are 719 different new words in the second chapter, represented in 964
running words or tokens. Clicking the VP button and sending these words
to VP analysis subsequently reveals that 41% of these words are from the most
frequent 1000 words of English. In other words, almost 60% of them are
relatively infrequent for a learner who knows 1000 words. So far, then, this
analysis suggests that the lexis of Call of the Wild offers a fairly unfriendly
lexical ratio for the intermediate learner, but the pattern must be worked out
for the rest of the text – which is what the series of upload inputs in the
bottom half of Figure 20 makes possible. For example, the analysis of Chapter 3
shows how much new lexis it presents with respect to both of the preceding
chapters, and so on. The result of this analysis for the entire volume is shown
in Table 7.
Chapters |
New types |
New
tokens |
Per
cent 1k items |
2 |
743 |
982 |
37 |
3 |
800 |
983 |
29 |
4 |
351 |
401 |
27 |
5 |
632 |
893 |
36 |
6 |
565 |
733 |
32 |
7 |
633 |
795 |
25 |
Mean |
620.7 |
797.8 |
31.0 |
Table 7: New lexis
chapter by chapter
Figure 22: Little
reduction in the diet of novel lexis
Table 7 clearly suggests
that the flow of new lexis never abates in a text designed for native speakers.
Indeed, the third highest number of new word types appears in the final
chapter. Further, the Vocabprofiles for these new items show them to be mainly
post first-2000 items, that is to say potentially difficult items that are
fairly rare and will not necessarily repay the investment of learning. But the
new-to-known ratio is the biggest problem. With the exception of the third
chapter, these chapters are about 3500 words in length, and the number of new
word tokens is an average of almost 800 words. The ratio, in other words, is
about eight new words per 35, or more than one new word in four—rather far from
one in 20. Similar findings are available for other books written with native
speakers in mind (Conan Doyle’s Hound of
the Baskervilles can be downloaded from Lextutor’s Text_Lex_Compare page
for readers to test this assertion for themselves).
It is a truism that our
learners ought to be reading graded texts, but with the research findings and
technologies now at our disposal we are in a position to say very clearly why
this is so. Text_Lex_Compare was fed the seven chapters of the Penguin/Longman
graded version of the same Call of the Wild, with results as shown in
Table 8 and Figure 23.
Chapters |
Total words |
New types |
New tokens |
Per cent of 1k items |
New-known |
Per cent |
Ratio |
Ch 2 |
876 |
131 |
193 |
79 |
193/876 |
0.22 |
1
to 4.5 |
Ch 3 |
1573 |
116 |
199 |
64 |
199/1573 |
0.12 |
1
to 8.5 |
Ch 4 |
1272 |
69 |
111 |
56 |
111/1272 |
0.08 |
1
to 12.5 |
Ch 5 |
1178 |
49 |
112 |
40 |
112/1178 |
0.09 |
1
to 11.5 |
Ch 6 |
1584 |
63 |
139 |
45 |
139/1584 |
0.08 |
1
to 12.5 |
Ch 7 |
1838 |
50 |
110 |
53 |
110/1838 |
0.059 |
1
to 19 |
MEAN |
1386.8 |
79.7 |
144.0 |
56.2 |
|
|
|
Table 8: results for
graded ‘Call of the Wild’
Figure 23: Fewer and declining number of new
word types
The adaptors of this
text have produced a far more manageable proportion of new lexis, more than
half of it within the first 1000 level (56.2%), and moreover a decreasing
amount of it as the novel proceeds, indicating a good deal of recycling of
known items. Most interesting, by the final chapter, the new-to-known ratio has
actually come close to the 1-in-20 target (1:19 to be precise, as shown in the
final ratio of Table 8). This means that if learners had learned all the words
that appeared in the previous chapters, then by the final chapter they would be
reading with a new-word density of just over one new word in 20. For the other
six chapters, of course, the density is higher than that, which I believe is an
argument for linking even simplified readers to a relevant selection of the
computer based learning resources described above. And this, in turn, is an
argument for providing learners with simplified extensive texts that are
machine-readable.
Where would a varied,
multilevel library of machine readable extensive readings come from? Probably
not from the big commercial publishers. While companies like Longman have
produced an impressive paper collection of graded materials, mainly in the realm
of fiction classics, they have also been quite successful (unlike music
publishers) at making sure that teachers and learners pay for anything they
get. This is not a criticism of the publishers; text adaptation is hard work
and the publishers are entitled to recoup their investment.
The Internet is lacking
in very few types of texts, but one of the few is simplified reading materials
for language learners. While preparing this piece I sent out a plea to the
extensive reading community (via the Extensive Reading Pages website) to inform
me of any free online sources of extensive readings, and learned that there is
apparently only one in existence, an interesting but modest UK site named Blue Yonder. This lack of available
materials led me to conclude that a complete and useful library of graded
readings, particularly at adult interest levels and including non-fiction texts
as well as fiction, with vocabulary level and new word density rates
publicised, can probably only be produced by teachers and course designers
themselves. This would not be easy, but of course the fruits could be shared
over the Internet, perhaps at a dedicated Website. The resource
Text_Lex_Compare both shows us why we need to do this, and along with
VocabProfile provides the tools to get on with the job.
How would the
simplification process work? Assuming a text is in machine readable format, it
can be run through VocabProfile and its potentially difficult or unuseful
vocabulary identified (difficult in terms of the intended readership’s lexical
profile). Decisions would then be made as to whether each word was a proper
noun or place name posing no problem and hence could be recategorized as a
common item, was a crucial or repeated item requiring a contextual gloss in
more basic language, or was neither and should be written out of the text. When
the profile of the text as a whole shows roughly 5 per cent challenging yet
learnable vocabulary, the chapters can then be run through Text_Lex_Compare in
sequence to determine whether the density is proportional over the course of
the text. If not, further modifications are necessary. This is not easy work;
anyone who has done it knows why Longman and the others take care with the
distribution of their their simplified
readers. But with these computational tools it is feasible work.
Conclusion
I hope I have
convinced the reader that the role of the computer within the expanded universe
of text can and should go well beyond the functions of delivery, distribution,
and printing.
At the top of this
paper I proposed that “computer programs, accessing large shared text
repositories, have a tremendous potential to both resolve old questions for
teachers/course designers, and provide new and unique opportunities for large
numbers of learners at low cost.” Within one domain, extensive reading, and
using vocabulary growth as the index of the success of extensive reading, I
have shown how corpus analysis can define some of the key problems with growing
a lexicon through reading, and how the networking of different kinds of and
forms of texts on the learner’s computer screen should be able to solve it.
The key problems of
learning through extensive reading are clear. Corpus analysis shows that
post-1000 level words are unlikely to be encountered in natural reading in
sufficient numbers for learning to occur. VocabProfile analysis shows that the
amount of new vocabulary in natural texts is likely to be severely at odds with
both the lexical level and learning capacity of intermediate learners.
Text_Lex_Compare further shows that the rate of new word introduction in a text
designed for native speakers is far more these learners are able to cope with.
And yet these same tools can also be employed positively, to help with the
adaptation of texts that learners can read and learn from.
The long-term goal
is to build a shared free online universal library of graded reading materials.
The short term goal is to get available texts online and help learners use the
various tools described above that can proliferate encounters, keep track of
encounters, multi-contextualize meanings, and provide minimal-disruption links
to top quality learning resources. Only local teachers and course designers can
accomplish any significant part of this in any coherent way. Ironically, Lextutor’s
records and user correspondence show that the main users of the Websites
learning tools at present are individual learners. In other words, the market
is there.
Krashen (1989)
remarked in a deservedly famous paper on vocabulary growth from reading that a
number of books can be purchased for the price of one computer, implying that
the books were the wiser choice. In 2005, books and computers are less a choice
than a partnership.
References
Author, T.,
Greaves, C., & Horst, M. (2001). Can the rate of lexical
acquisition from reading be increased? An experiment in reading French with a
suite of on-line resources. In P. Raymond & C.
Cornaire (Eds.), Regards sur la didactique des langues secondes (pp. 133-153). Montréal: Éditions
logique.
Author, T. &
Stevens, V. (1996) A principled
consideration of computers and reading in a second language. In M.
Pennington (Ed.), The power of CALL (pp. 115-136). Houston: Athelstan.
Author, T. (1997a).
Is there any measurable learning from hands-on concordancing? System,
25(3), 301-31.
Author, T. (1997b). From concord to lexicon: Development and test of a corpus-based lexical tutor. Concordia University: Unpublished PhD dissertation. [Available at http://www.nlc-bnc.ca/obj/s4/f2/dsk3/ftp04/nq25913.pdf .]
Author, T. (1999). Applying constructivism: A test for the learner-as-scientist. Educational Technology Research & Development, 47 (3), 15-33.
Blue Yonder website
for extensive reading. [Online]. Accessed 2005 April 20, at http://gradedreading.pwp.blueyonder.co.uk .
Chun, D., & Payne, S. What makes
students click: Working memory & look-up behavior. System 32, 481-503.
Extensive Reading Pages website. [Online: http://www.extensivereading.net/.]
Heatley, A. and Nation, P. (1994). Range. Victoria University of Wellington, NZ. [Computer program, available at http://www.vuw.ac.nz/lals/.]
Hirsch, D,
& Nation, P. (1992), What
vocabulary size is needed to read unsimplified texts for pleasure? Reading
in a foreign language, 8(2), 689-696.
Horst, M. (2000). Text encounters of the frequent kind: Learning L2 vocabulary from reading. University of Wales (UK), Swansea: Unpublished PhD dissertation.
Horst, M., Author, T., & Meara, P. (1998). Beyond A Clockwork Orange: Acquiring second language vocabulary through reading. Reading in a Foreign Language, 11(2), 207-223.
Horst,
M., Author, T., & Nicolae, I. (2005). Expanding
Academic Vocabulary with a Collaborative On-line Database. Language
Learning & Technology, 9 (2), 90-110.
Horst, M., & Meara, P. (1999). Test of a model for predicting second language lexical growth through reading. Canadian Modern Language Review, 56 (2), 308-328.
Hulstijn, J. H., Hollander, M., & Greidanus, T. (1996), Incidental vocabulary learning by advanced foreign language students: The influence of marginal glosses, dictionary use, and reoccurrence of unknown words. Modern Language Journal, 80, 327-339.
Krashen, S. (1989).
We acquire vocabulary and spelling by reading: Additional evidence for the
input hypothesis. Modern Language
Journal, 73, 440-464.
Krashen, S. (2003).
Explorations in language acquisition and use: The Taipei lectures.
Portsmouth NH: Heinemann.
Kucera, H., &
Francis, W. (1979). A Standard Corpus of
Present-Day Edited American English, for use with Digital Computers
(Revised and amplified from 1967 version). Providence, RI: Brown University
Press.
Laufer, B. (1992).
How much lexis is necessary for reading comprehension? In P.J. Arnaud & H. Béjoint
(Eds.), Vocabulary and applied
linguistics (pp. 126-132). London: Macmillan.
Laufer,
B. & Nation, P. (1999), A vocabulary size test of controlled productive
ability. Language Testing 16(1), 33-51.
Mondria, J-A., & Wit-De Boer, M. (1991). Guessability
and the retention of words in a foreign language. Applied Linguistics, 12 (3), 249-263.
Nation, P. (1990) Teaching
and learning vocabulary. New York: Newbury House.
Nation, P. (2001). Learning vocabulary in another
language. Cambridge: Cambridge University
Press.
Read, J. (2000). Assessing
vocabulary. New York: Cambridge University Press.
Simpson, R.,
Briggs, S., Ovens, J. & Swales, J. M. (2002) The Michigan Corpus of Academic Spoken English. Ann Arbor, MI: The
Regents of the University of Michigan.
Schmitt, N., Schmitt, D., & Clapham, C. (2001).
Developing and exploring the behaviour of two new versions of the Vocabulary
Levels Test. Language Testing, 18(1),
55-89.
Stanovich, K.E., & Cunningham, A.E. (1992). Studying the consequences of literacy within a literate society: The
cognitive correlates of print exposure. Memory
& Cognition, 20, 51-68.
Wesche,
M., & Paribakht, S. (1996), Assessing vocabulary knowledge: Depth vs. breadth. Canadian Modern
Language Review, 53(1), 13-40.
Zahar, R., Author, T. & Spada, N. (2001),
Acquiring vocabulary through reading: Effects of frequency and
contextual richness. Canadian Modern Language Review, 57(4), 541-572.