The case for computer-assisted extensive reading

 

Tom Cobb (designated in the text that follows as ‘Author’)

Dépt de linguistique et de didactique des langues

Université du Québec à Montréal

 

cobb.tom@uqam.ca

 

 

 


The case for computer-assisted extensive reading

 

Abstract

 

About 10 years ago, Author & Stevens (1996) argued that the flood of text about to go online should be a boon for second language learners, and we proposed a number of ways that computers would be able to not only deliver this expanded supply of text but also enhance the amount of learning the text could provide by processing it in various ways both prior to and during delivery. In 2006, it seems safe to say that the amount, quality, diversity, and availability of such text has exceeded expectations. And yet it is not clear that the computer for its part is serving as more than delivery vehicle. This is a pity, because just as the text was more than expected, so are the opportunities for computers to do far more than simply download, distribute and print. Computer programs, accessing large shared text repositories, have a tremendous potential to resolve old questions about language learning for teachers and course designers and provide new and unique opportunities for large numbers of learners at low cost. I will provide concrete instances of questions resolved and opportunities provided in one exemplary domain, the theory and practice of extensive reading. Some parts of this paper take the form of a response to Krashen, a noted proponent of “buying books, not computers” if it comes to a choice. I hope to convince the reader that books and computers are now complements rather than choices.

 


The case for computer-assisted extensive reading

 

Background: An ESL dialogue

 

In scientific dialogues within applied linguistics, turn-taking can involve a delay of a decade or more. An example is a recent paper from Stephen Krashen (2003) entitled Free voluntary reading: Still a very good idea which criticizes the findings of a study I was involved in that called into question the amount of vocabulary acquisition resulting from pleasurable, meaning-oriented, private extensive reading (Horst, Author, & Meara, 1996, Beyond a Clockwork Orange). The study found that even with all the usual variables of a pre-post, empirical, extensive reading study held down rather more tightly than usual (e.g., more tightly than in some of Krashen’s own studies) the number of new words learned from reading a complete, motivating, level-appropriate book of 20,000 words was not sufficient to be the main or only source of vocabulary growth for a learner expecting to function in English any time soon in an academic or professional setting. (It should be noted at the outset that vocabulary growth, while only one of the potential benefits of extensive reading along with fluency, grammar, and other types of growth, is often used as simply the most measurable of the various outcomes.)

 

Krashen (2003) argued that studies like ours typically underestimate the amount of lexical growth that takes place as words are encountered and re-encountered in the course of extensive reading, in his understanding of the term. Many words and phrases are learned that do not appear in test results, but that is because of the crude nature of the testing instruments, which typically have no way of accounting for partial or incremental learning. In fact, he argues, word knowledge is bubbling under the surface as one reads and may appear as a known item only some time later.

Over this period, Krashen’s views have remained largely unmodified. He remains convinced of the value of extensive reading yet has never really been able to prove the case, which ultimately rests on a sort of faith.  Our views, on the other hand, have undergone some modification, and in some ways have come more closely into line with Krashen’s own and may even provide these with a firmer foundation than he has so far provided himself. A number of studies by Horst (2000) and Horst and Meara (1999) have investigated incremental vocabulary growth from reading through the use of what they describe as a matrix model. This model borrows the notion of vocabulary knowledge as a scale (including points such as, I do not know this word, I have seen this word, I think I know this word, and I know and can use this word in a sentence), specifically the vocabulary knowledge scale (VKS) as developed by Wesche and Paribakht (1996), but extends the scale longitudinally so that fine grained word knowledge can be tracked over time as the word is encountered and re-encountered. Seen in a matrix, vocabulary growth from reading is indeed more extensive than it may appear. But is it extensive enough to be a sufficient source of vocabulary growth for reading?

 

 

 

 

Part I: Computing the vocabulary learning from extensive reading

 

The matrix uses numbers from 0 to 3 to indicate the points on a modified VKS scale, as follows:

 

0 = I definitely don't know what this word means

1 = I am not really sure what this word means

2 = I think I know what this word means

                        3 = I definitely know what this word means

 

These numbers are then placed on a simple two-dimensional graph, with the same numbers appearing both top to bottom and left to right, as can be seen in Figure 1.

 

 

0

1

2

3

0

 

 

 

 

1

 

 

x

 

2

 

 

 

 

3

 

 

 

 

Figure 1: From scale to matrix.

 

Every cell in the matrix is thus an intersection between two numbers, as for instance cell ‘x’ is at the intersection of 1 and 2. When a number representing a learned word ijs placed in this cell, that means that a word had been rated as a 1 (‘I’m not sure’) after a previous reading encounter, but then was rated as a 2 (‘I think I know’) after a subsequent encounter. In other words, cell intersections represent partial word learning as a result of one further encounter with a word. The movement between 1 and 2, or 2 and 3, represents an increase in knowledge of the word, but not enough of an increase to register on most standard vocabulary tests involving multiple-choice or production formats (for a review of standard vocabulary tests, see Read, 2000).

 

Employing a methodology of repeated readings of a literary novella, and a computer-based testing apparatus that allows us to test large number of words in a relatively short time, we have been able to trace the ups and downs of word knowledge that normally pass below the radar of conventional tests. In one study, Horst tracked 300 words through several readings of a German novella, and after each reading identified the number of words were at each knowledge level as compared to the previous reading. Each additional reading produced a new matrix, with the 300 words distributed slightly differently over its 16 cells for every reading. Authentic data for one pair of readings from this study, as reported in Horst (2000), is shown in Figure 2.

 

 

0

1

2

3

0

75

27

9

3

1

4

20

20

6

2

2

4

13

35

3

0

0

7

75

Figure 2: Movement between readings

 

The numbers in the cells refer to the number of words inhabiting each intersection point, with the row label indicating the previous knowledge level for those words and the column label indicating current knowledge. The bold numbers on the diagonals represent the number of words that had not moved between readings (75 fords in Figure 2 were rated 0, or unknown, in the previous reading, and were still rated the same following a second reading, etc.). Notice that words to the left of the diagonal are words that have lost ground since the previous reading—rated as less known than previously—while words to the right have gained ground.

 

Through simple addition, one can see that there are more words above the diagonal (27 + 9 + 3 + 20 + 6 + 35 = 100) in Figure 2 than there are below it (4 + 2 + 4 + 7 = 17), and hence that more are gaining ground than are losing ground. Over the several readings and matrix calculations of Horst’s study, it became clear that it was indeed the case that words were generally making progress over the course of several readings, much of which would nonetheless not have registered on a standard vocabulary test with an all-or-nothing assumption about word knowledge (such as Nation’s classic, 1990, Vocabulary Levels Test, or Laufer & Nation’s, 1999, update). Only 44 of Figure 2’s words (3 + 6 + 35 = 44) had moved into “I know this word,” a movement that might have shown up on a vocabulary test, but another 56 had made lesser gains that probably would not have. To summarize, then, while the implications of this methodology are still being worked out and will appear in forthcoming reports, it is already clear enough that Krashen is right: there is more word learning from extensive reading than meets the eye.

 

This modification of our views, however, is in some ways rather minor, inasmuch as we still do not consider that extensive reading as traditionally practiced could ever be the only or even main source of vocabulary growth for a second language learner. That is because however incremental the learning of encountered words may be, it still requires that words be encountered in sufficient number, and simple corpus research makes it clear that this will not happen.

 

How many words are enough to begin a serious undertaking in a second language, such as academic study or professional activity? Vocabulary researchers like Laufer (1992) and Hirsch and Nation (1992) tag the number at a minimum of 3000 word families, provided these are carefully selected for frequency and text coverage. It has also been shown that between six and ten encounters are needed for learning to occur (Zahar, Author & Spada, 2001), and in our own matrix work it seems that at least six encounters are needed for words to travel reliably from rating 0 to 4 and stabilize there. Will 3000 word families be met six times through extensive reading?

 

A computer program called Range (developed by Heatley & Nation, 1994; adapted for Internet by Author; available at the Compleat Lexical Tutor website, http://www.lextutor.ca/range; shown in Figures 3 & 4) takes a user’s text string as input and determines how often this occurs in a broad corpus of written English. This corpus (the Brown corpus, Kucera & Francis, 1979) is divided into 15 sublists, from science to fiction to law. For the purposes of the present argument, the Brown corpus can represent the most and most diverse text an ESL reader could possibly read in a year or two of extensive reading (most learners would obviously read both much less and much less broadly). The surprise finding is this: after the most frequent 1000 words of English, words thin out quite rapidly and a rarely met again. Here are some figures from the second thousand and Academic Word List components, which Nation and others include in the necessary lexicon of 3000 word families.

 

 

Figure 3: Range for word distributions – input (bottom half of screen, all forms of family abandon’)

 

 

 

Figure 4: Range for word distributions – output

 

 

The distribution of the word family abandon throughout the Brown corpus is requested in Figure 3 and shown in Figure 4. The point to notice is that while the item appears in 13 out of 15 sub-corpora, it appears more than six times in only three of them (press editorials, popular lore, and biographies). Patchy patterns like this are quite general for even medium frequency words. Table 1 shows the distributions in the 15 Brown sub-corpora of six word families from the high frequency 1000 list (not including function words), while Tables 2 and 3 show distributions for six word families from higher up in the 2000 list and another six families from the academic word list (AWL).  Readers can visit Lextutor for themselves and enter their own words into the program.

 

1000-level

word family

occurrences in 1 million words

Present in how many from 15 sub-corpora

in how many with 6+ occurrences

car

285

13

10

house

760

15

14

country

510

15

15

able

216

15

15

add

715

15

15

admit

473

15

15

Mean

493.2

14.7

14

S.D.

200.5

.7

1.8

Table 1: Word distributions for high frequency words

 

 

2000-level

word family

Occurrences in 1
million words

Present in how many

of 15 sub-corpora?

In how many with

6+ occurrences?

accuse

46

9

3

accustom

15

10

0

ache

4

3

0

admire

10

8

0

afford

58

12

4

Alike

20

10

0

Mean

25.5

8.7

1.2

S.D

19.7

2.8

1.7

Table 2: Word distributions for medium frequency words

 

 

AWL
word family

Occurrences in 1
million words

Present in how many

of 15 sub-corpora?

In how many with

6+ occurrences?

abandon

59

13

3

academic

95

13

6

accumulate

29

6

3

achieve

223

13

8

acknowledge

32

11

1

acquire

100

13

6

Mean

89.7

11.5

4.5

S.D

65.6

2.6

2.4

Table 3: Word distributions for AWL words

 

We can now answer the question whether 3000 word families have any reasonable chance of being met six times through extensive reading. If the Brown corpus represents the language at large, as it was designed to do in 1979 and presumably still does, then it seems clear that an extensive reader following his or her interests through one or two text domains or sub-corpora would meet the most frequent 1000 words in great abundance in any domain, but will meet even slightly less frequent words only intermittently and probably not often enough for reliable learning to occur. A word as common as ache would not be met six times in any of the Brown’s 15 sub-corpora; a word as common as accumulate would be met six times in only three of the Brown’s sub-corpora. In other words, meeting any significant portion of the critical 3000 words six times apiece in extensive reading is rather unlikely.

 

In summary, while there may be more word learning than meets the eye from random encounters in extensive reading, as Krashen believes, most words will simply not be encountered and hence the fact is interesting but irrelevant. The distribution of words in English simply does not allow a sufficient number of encounters to take place in this manner.

 

Part II: Bringing up the numbers with computing

 

There are a number of teaching strategies that can increase the odds of words being encountered and learned. Some of these involve the direct teaching of vocabulary, for example through the classroom use of word lists or supplementary vocabulary course books, and of course through ad hoc teacher attention to vocabulary queries in class. But there are also strategies than can be devised to increase the odds for extensive reading itself, and many of these involve the exploitation of recent developments in computer technology (assuming that the extensive reading materials are in machine-readable format, as is the case increasingly). The rest of this paper will outline several of these strategies. The means for implementing these strategies are available to teachers or researchers on the website mentioned above. The format of the remaining presentation will adopt a framework of problem, solution, research on the solution, and proposals for distributing the solution to others. Inevitably some of the solutions overlap, but they have been given independent treatment if they involve independent research questions.

 

Problem 1: The number of encounters with new words is lower than it needs to be because learners do not always recognize a word they have met in text when they meet it again in speech.

 

It is common for many advanced learners to have an extensive lexicon of medium and lower frequency items for which they have only weak or uncertain sound representations. For this reason, if they re-encounter in speech a word previously met in reading, they often do not recognize it as a second encounter or capitalize on the further learning opportunity. How big a difference could it make if learners knew the pronunciation for every word they met in reading? Corpus evidence can provide an idea.

 

It is well established that conversational English comprises mainly (about 90%) 1000-level word families, and conversely that post-1000 items are mainly to be found in written texts (Stanovich & Cunningham, 1992). But this does not mean that less frequent words are totally absent from spoken English. The same sample of words referred to above were put through another of Range’s distribution comparison routines, this time one comparing similar sized corpora (roughly 1 million words apiece) of spoken and written British English as found on the BNC corpus sampler collections (top left of the screen in Figure 3). Tables 4 and 5 show the pattern of these distributions for 2000 and AWL-level medium frequency words.

 

 

2000-level

word family

Occurrences in 1 million words of writing

Occurrences in 1 million words of speech

accuse

64

7

accustom

42

2

ache

5

5

admire

61

14

afford

43

78

alike

20

4

SUM

235

110

MEAN

39.17

18.33

SD

23.03

29.52

Table 4: Speech vs. writing for medium frequency words

 

 

 

 

AWL-level

word family

Occurrences in 1 million words of writing

Occurrences in 1 million words of speech

abandon

45

6

academic

81

6

accumulate

30

11

achieve

199

91

acknowledge

34

14

acquire

158

10

SUM

547

138

MEAN

91.17

23.00

SD

71.19

33.45

Table 5: Speech vs. writing for AWL words

 

These tables show that post-1000 items are indeed found a great deal more in text than in talk—more than twice as much for 2000-level words, and nearly four times as much in the case of AWL words. From another point of view, however, one can say that if learners knew how every word sounded that they had met and noticed in reading, then they could increase the number of occurrences of new words, in the sense of recognizing them as re-occurrences when they encountered them again in speech, by as much as 46% (110/235 x 100) for 2000-level words and 25% (138/547 x 100) for AWL words. Of course, as in the case of the Brown corpus above, these occurrences may well be unevenly distributed within the BNC sampler corpora, but these are not broken into sub-corpora so this cannot be easily determined. For instance, acquire in Table 5 could well be piled up largely in the second language acquisition corner of the corpus, so that engineering students might be unlikely to encounter this item.

 

Some distributional information for medium frequency words in speech, at least as employed in academic contexts, can be gathered from the University of Michigan’s MICASE (Michigan Corpus of Academic Spoken English) corpus and website (Simpson, Briggs, Ovens,  & Swales, 2003). This corpus is almost double the size of those consulted above, at 1,848,364 words broken down across several topics, situation and speaker types, and domains (although unfortunately not across the Brown domains so that this comparison is rough rather than precise). Table 6 shows a small sample of AWL words as broken down by topic areas. Again, the reader can expand the sample by visiting the site and entering additional words, at http://www.hti.umich.edu/m/micase/.

 

AWL-level

word family

Occurrences in 1.8 m words of speech
in academic contexts (MICASE)

 

 

Biology- Health Sciences

Arts,
Human-

Ities

Social Sciences,

Education

Physical Sciences,

Engineering

Occur-

rences

in 1.8 m

Occur.

in BNC

spoken, 1m

abandon

4

12

3

36

55

6

accumulate

29

2

21

22

74

11

achieve

10

13

59

11

93

91

acknowledge

7

20

7

4

38

14

acquire

4

27

16

26

73

10

SUM

54

74

106

54

333

132

MEAN

10.80

14.80

21.20

10.80

66.60

26.40

SD

10.47

9.36

22.30

10.47

20.89

36.23

Table 6: AWL items in academic speech vs. general speech

 

Two points emerge from this brief look at the MICASE data. The first is that, again, encounters are likely to be piled up rather unpredictably rather than evenly distributed (as acknowledge is piled up in mainly in Arts and Humanities in Table 6). The second is that, nonetheless, if a learner was learning English in order to function within an academic environment, then spoken language within this environment yields a somewhat higher proportion of post-1000 items than does the spoken language generally. Accumulate appears 74 times in 1.8 million words of spoken academic, as against 11 times in 1 million words of general English (or, if one can extrapolate, 11 x 1.8 = 20 times in a general speech corpus of equal size). Lesser but still substantial advantages for the academic corpus are shown for abandon, accumulate, acknowledge, and acquire (but not for achieve). To conclude, it seems safe to say that the 25% increase in occurrences shown above for knowing how words are pronounced in general spoken English could be somewhat greater within target domains such as academic speech.

 

How can we ensure that learners have full access to the pronunciation of any new word they happened to come across in their reading? Lextutor’s builder routines offer two ways of doing this.

 

Solution 1: If learners are reading a text on a computer screen and have access to the Internet, then Lextutor gives them the means to access the pronunciation quickly and simply for any word in the text. At one of the website’s Hypertext builder routines (available at http://www.lextutor.ca/hypertext/), learners can enter any text they happen to be reading into a text input and click to transform it into a text with literally every word linked to a text-to-speech engine giving a tolerable (or better) pronunciation of  the word. This requires a once-only download of a speech plug-in (free from Macromedia), which once completed allows instant pronunciations that should not distract readers from their reading unduly.

 

Solution 2: Of course, there is no guarantee that one audition of a word leads to a stable memory for its contours or a stable link between its phonetic and semantic features. If learners or their teachers wish to ensure that a particular set of words is heard again soon, they can make use of Lextutor’s Dictator routine (at http://www.lextutor.ca/dictator/),  which transforms any word list into a text-to-speech based spelling dictation activity, in either practice or test formats. In Figure 5, a learner has created a training exercise to practice spelling the words he or she hears. The learner clicks a word to hear it, tries to spell it, and is given help with any errors. The help is provided by the resident Guidespell tutor (first piloted in Author, 1997a). Guidespell tells the student how many letters were correct in the attempt to spell accompany.  When ready, the learner can enter the same words into a Test version of the program, shown in Figure 6, where there is no help but simply a score presented when all words have been entered.

Figure 5: Dictator training activity under way

Figure 6: Dictator in test mode.

Hypertext and Dictator thus provide two approaches to helping learners form sound-spelling correspondences, in the goal of increasing the likelihood that  words met in text will be recognized when they are met again in speech and hence in sufficient repetition to be remembered and learned.

Research on Dictator: Realized in December 2004, Dictator has not yet been subjected to substantial empirical testing.

Current work on Dictator (December 2005): To fully integrate this somewhat drill-like component within a reading-for-meaning context, an interface is being developed that allows learners to click a word into a text box for later relaying to the Dictator routine, for offline practice online, as it were. A prototype of this feature can be seen at www.lextutor.ca/CallWild/. As readers proceeds through the text Call of the Wild, they can alt-click any word into the gray box at the top of the page, and then at some later point send these words to Dictator by clicking ‘spell_it.’ This linkage is shown in Figure 7. Sometime in 2006 it will become possible for learners to link their own texts to this resource.

 Figure 7: Linking resources


Problem 2: Words are often simply forgotten between encounters, even within the same text.

Research indicates that, on the one hand, new words tend to get ignored if they require a great deal of effort to process, but, on the other hand, they tend to get forgotten if they require very little effort to process (Mondria & Wit-deBoer, 1991). In other words, it seems the conditions for retention of words from reading are rather particular and may present themselves only occasionally, which may be part of the reason that a minimum of six and as many as ten encounters are needed even for initial retention. Learners need some way to keep track of whether they have seen a word before and, if possible, to revisit the previous occurrence, if possible without incurring a major exit from their current reading.

Solution 1: One solution is to offer readers a quick way of clicking on a word and recording it for later reflection without losing the thread of the story, as the user shown working in Figure 8 has clicked (with the Alt-key held down ) on the words toil and groping from the first chapter of Jack London’s classic tale Call of the Wild, located at http://www.lextutor.ca/CallWild.

Figure 8: Tagging toil and groping for post-reading attention

Solution 2: Another solution is to link a text to a Story Concordance. A reader clicks on any word in the reading text, and receives a full accounting of all the occurrences of the same word as already seen or yet to be seen in other parts of the story. The reader in Fig. 8 has just found out that the rather odd word toil might be worth paying attention to as it occurs in five of the chapters to come.

Research on Story Concordance: An earlier version of this concept was tested in Author, Greaves & Horst, 2001, with a strong learning effect found for a suite of text-integrated resources although story concordancing has not been tested independently. There is as yet no Builder version of the program, owing to the complexity of handling user text input in the form of distinct chapters or sections to provide the output shown in Fig 7.

 Problem 3: The semantic features comprising word meanings are distributed over several occurrences of words, so that integrated meanings are slow to construct.

As already noted, post-1000 words are distributed thinly in natural language. But the semantic features comprising the meanings of these words are distributed even more thinly. The whole set of semantic features underpinning the concept represented by a word of any complexity is inevitably not present in every occurrence. What this means for a learner building a vocabulary from reading is that even if words are not forgotten between encounters, there may not be enough information in a single encounter, or even in a number of encounters, to provide more than a partial sense of its meaning.

For instance, the semantic deep structure of a common word like work embraces features ranging from doing a job for pay (‘work at a store’), the job itself (‘it’s my work’), an effort expended not necessarily for pay (‘work on my car’), the correct functioning of a device (‘it works now’), to instances of high art (‘a work of Shakespeare’), but only one or two of these features are present in a sentence like “What are you working on?” The word learner is thus required not only to remember features of words from occurrence to occurrence, but at the same time to be revising, updating, and especially integrating hypotheses about how these fit together. 

Solution: One way of showing learners several pieces of a word’s meaning all at once, so they need not attempt to gather them all up for themselves with attendant forgetting and backtracking, is to present the word in a concordance. A concordance for ‘work’ from the Brown corpus is shown in Figure 9. Even a relatively small concordance reveals to an observant (or possibly to a trained) learner such information as the main parts of speech for the word (the work, we’ll work, to work), several of its senses (take my car to work, work on my kicking, an idea that would not work), and its main collocations (work for, work at, work out, and especially work on). 

1     after the board of canvassers completes its work.    A difference of opinion arose between Mr
2     Authority bonds for rural road construction work. #A REVOLVING FUND#  A01 1310  4    The depa
3    ghes Steel Erection Co. contracted to do the work at an impossibly low cost with a bid that wa
4    lta Sigma Pi at Lamar Tech, and did graduate work at Rhodes University in Grahamstown, South A
5     bomb tore his car apart as he left home for work.    Battalion Chief Stanton M. Gladden, 42, 
6    blic relations director, resigned Tuesday to work for Lt. Gov. Garland Byrd's campaign.  A01 1
7    07 1230  7    #MISSIONARY EXPLAINS# "I don't work for the Government", the American said. "I'm
8    0 12    scrimmaged for 45 minutes.    "We'll work hard Tuesday, Wednesday and Thursday", Meek 
9     home so that he could take his other car to work.    "I'd just turned on the ignition when th
10   school teaching certificate. A normal year's work in college is 30 semester hours.  A02 1430  
11    A. Berger firm, a Philadelphia builder, for work in the project.    The second agreement perm
12   rk out about an hour on Saturday, then we'll work Monday and Tuesday of next week, then taper 
13   f cars "might not be realistic and would not work".    Mrs. Molvar asked again that the board 
14   ARTIST# Mrs. Monte Tyson, chairman, says the work of 100 artists well known in the Delaware Va
15   overhauling of 102 joints. The city paid for work on 75, of which no more than 21 were repaire
16   ly involve failure to perform rehabilitation work on expansion joints along the El track. The 
17   e.    "This year, coach Royal told me if I'd work on my place-kicking he thought he could use 
18   ales will begin and contracts let for repair work on some of Georgia's most heavily traveled h
19        - His miracles    - His substitutionary work on the cross    - His bodily resurrection fr
20   g, said the transit company is reviewing the work on the El.    "We want to find out who knew 
21   rty, appeared on payment vouchers certifying work on the project. Varani has been fired on cha
22   as completed after nearly eighteen months of work on the question of the organization of the U
23   bly will have a good scrimmage Friday. We'll work out about an hour on Saturday, then we'll wo
24     several more drafts".    Salinger said the work President Kennedy, advisers, and members of 
25   aborers go home Tuesday night for some rest. Work resumed Wednesday, he said.    Mr. Schaefer 
26   , stressed the need for the first two years' work.    "Surveys show that one out of three Amer
27   e traditional visit to both chambers as they work toward adjournment. Vandiver likely will men

Figure 9: Lines from the Brown corpus for work

However, there are at least three major problems with using concordances as an aid to building a second lexicon. First, learners do not usually have a concordancer handy when they are reading, but would have to write the word down with a certain amount of context and look it up in a concordancer later. Second, most full-blown corpora like the Brown are likely to include a high proportion of other words the learner will not know in addition to the one they are looking up. Third, the single chopped-off lines of the concordance format, while designed to highlight immediate formal patterns such as collocation, also reduces the amount of semantic context to a level below what learners may need to identify semantic features and integrate meanings of new words.

Figure 10: Making concordance information accessible and comprehensible

Lextutor offers teachers and learners responses to each of these problems, all of which can be seen in further developments of the Call of the Wild Story Concordancer. The first development addresses the access problem. As already seen, any word in the story when clicked generates a concordance instantly in a window within the same eye-span, or frameset. The learner can thus compare several examples of the word along with the original at the same time with minimal exits from the story. The second problem, of unknown items within the concordance, is addressed by the fact that the concordance is recursive (any word clicked in the concordance itself generates a new concordance which may shed light on the unknown word), and that it derives not from a general corpus but from a collection of other works by the same author, in this case Jack London. A same-author corpus should mean that the range of lexis and types of contexts is somewhat constrained relative to a general corpus, has extensive re-cycling built into it, and offers a consistency of tone and style that learners can habituate themselves to. In the screen print shown in Figure 10, the user has clicked the link “progeny in other Jack London stories” (not shown) and is presented with uses of this word from other works like White Fang and Martin Eden. The third problem, the small contexts and chopped off lines, is resolved by building in a mouse select-and-release feature where the learner selects several words, releases, and is delivered a series of much expanded contexts either from the original text or from throughout the London opus, depending where the request is launched (as shown for the phrase helpless progeny in Figure 11). For truly astute learners, this same feature allows them to explore an author’s trademark collocations and grammar preferences.

Figure 11: Comprehensibility through same-author corpus

 

Figure 12: Linking Brown concordances to user input texts

Research: At least two interesting questions can be asked about this linked-concordance work. The first is the general question of whether text-linked resources are a help or a hindrance to second language readers. Some research suggests that the main effect of adding any resources to a reading task is simply to increase the cognitive load. This question is currently receiving a good deal of attention in the research literature (e.g., Chun & Payne, 2004).  The second and more specific question is whether working with concordances, however accessible, however integrated into an ongoing reading task or linked to a tailor-made corpus, can facilitate concept integration for language learners?

In one of my own research studies (Author 1997a, 1997b; 1999), I proposed that degree of transfer of word knowledge to a novel context should reflect the degree to which a word had achieved a complex semantic representation, inasmuch as a novel context is unlikely to have the exact semantic features present in the word’s initial encoding. Subjects in a series of experiments learned words over several weeks using either small bilingual dictionaries or else purpose-built, monolingual concordances. They were then asked to match learned words to short definitions as well as integrate them into a rational cloze passage for a text they had never seen (that embedded the target lexis in contexts made up of words that had been previously taught and tested). These tasks are shown in Figure 13. After an extensive training period, students in both control (dictionary) and experimental (concordance) groups had improved equally in their ability to match words to short definitions, but the experimental group had significantly greater ability to apply learned words to novel contexts. These results are shown in the line graphs in Figure 14. (These figures along with further details are available online in the author’s doctoral study, online.) This result was replicated a number of times and at a number of levels.


Figure 13: Testing two kinds of lexical knowledge – definitional and contextual

Figure 14: Better transfer to novel texts for concordancers

Current work on Click-On Concordancing: Builder versions

 Lextutor can incorporate a user’s text into a suite of reading resources including most of those seen on the Call of the Wild page, available at http://www.lextutor.ca/hypertext/. A screen print based on a user’s text is shown in Figure 12. In Figure 12, the corpus accessed by clicking on words is the Brown corpus, which of course has the problems mentioned before. It is not currently possible to allow teachers or learners to load their own corpora into a web-based concordance. However, a number of experiments are under way on Lextutor to allow significant upload of user texts of up to 50,000 words (about the size of a Jack London story), including a Text Concordancer (which the reader can inspect at http://www.lextutor.ca/concordancers/text_concord/ ). Also, more learner-friendly corpora are being developed to replace the Brown in the Hypertext routines, including a corpus of simplified readers that has recently become available. 

Current work on Click-On Concordancing: Better integration with reading for meaning

As with the Dictator routine mentioned above, experiments are under way to allow learners to store up words for later submission to a concordancer with no or minimal exit from the ongoing reading task. The problem here of course is that while Dictator is inherently designed to handle several words at once, concordancers normally deal with one word or phrase at a time. A current plan is to send several words to a multi-concordancer all together, from a stored text box which a learner fills with words for later consideration. A trial text-box submission can be seen in conjunction with the Academic Word List at www.lextutor.ca/ListLearn (click AWL); multi-concordancing is discussed in greater detail in another context below.

Problem 5: Beyond the low and medium frequency levels, words appear so infrequently that learners have almost no chance of learning any significant portion of them.

As vocabulary acquisition proceeds beyond the 3000 word level, the likelihood of learners meeting many of the remaining 20,000 or so word families of English known to native speakers becomes very poor indeed. Some learners may not aspire to know all the words that native speakers know, and for these learners 3000 may be enough, or as counselled by Nation and colleagues, it may be time for their efforts to focus on strategy development or reading within an academic or professional domain (Nation, 2001). However, many learners do aspire to full membership in a second community or culture, and for these learners post-basic vocabulary growth is a slow and haphazard process.

Figure 15: Sharing new acquisitions.

Solution: Advanced vocabulary acquisition is normally a solitary process, but it need not be. In a class of 20 advanced learners, if each one met 50 words in a month of extensive reading, then that would amount to 1000 words (possibly with some redundancy) for the group as a whole. Networked computing should in principle make it possible for such a group to share lexical acquisitions, while at the same time providing for further encounters, retrievals, and novel contextualizations, in line with points raised above. Such is the goal of the Group Lex Database at http://www.lextutor.ca/group_lex/demo/, a set of web pages allowing learners to enter words from their reading, share words with others, quiz themselves on some or all of the words, and quiz themselves with the same words in novel contexts. Figure 15 shows the words as initially entered (in this case, by random visitors to Lextutor). Several areas on the screen shown are hyperlinked to different sorting and extraction options; for example, clicking on a name will extract all the entries for that name, or similarly for a subject area like ‘Arts’ or other groupings. The quiz option allows a user to select several words for retrieval practice, as shown in Figure 16.  This retrieval is of course within then original context, but a click on the ‘Tougher Quiz’ option takes quiz-takers to a new task (Figure 17) that asks them to plug these same words into gapped multiple concordance lines from the Brown corpus – i.e., to transfer their meanings to a novel context (to re-visit a theme from above). 

Figure 16: Learner-designed, collaborative instruction

Builder versions: The complexity of these multi-page, database oriented programs has until now delayed the development of Builder (i.e. user produced) versions of Group Lex. However, several dedicated versions have been set up for roughly 25 teachers in various corners of the ESL world over 2004-2006, some of them reporting on their work at international conferences (e.g., TESOL 2005).

Research: Some initial research has been concluded on learner use of Group Lex and is reported in a paper in Language Learning & Technology (Horst, Author & Nicolae, 2005). Questions so far investigated include learning effects, resource use preference, and ability of learners at different levels to generate contexts and definitions that their peers can make sense of in the quiz routines.

Further development: First, programming is almost completed to connect Group Lex directly to learner texts, just as the various resources are linked to texts in some of the examples above. Learners will use their mouses to select an example sentence containing a target word, which on mouse release will be sent to an input form for Group Lex. Second, code is being developed to allow for teacher controlled auto-archiving of a word-set when it has reached a certain size, or an assigned text is completed, etc.

Figure 17: Transfer to novel contexts - revisited

To conclude Part II, it seems clear that properly designed computer programs properly used can substantially increase the number of exposure to new words through reading. But is it substantially enough? Some quantification and empirical investigation has been completed, but more remains to be done. In the mean time, computer programs can not only increase the number of exposures but also help teachers and learners do more with the exposures available. This is the topic of the next section.

 

Part III: Improving the quality of individual exposures

Up to now this paper has shown a number of ways computers can increase the number of exposures to words. Now we turn to a different dimension, what computers can do to improve the quality of an individual exposure. When a new word is met, there are two things a learner can do with it if he or she decides to give it some attention. One is to look it up in a dictionary, and there are many high quality learner dictionaries now available for this purpose. The other is to attempt to infer a meaning from the ongoing context. However, both these strategies present problems. Dictionaries take the reader out of the text, physically and mentally, and almost certainly disrupt the flow of reading. Contextual inference (and probably successful dictionary use as well) is only reliable if 95% of the words in the context are known (Laufer, 1992; Nation, 2001), and this is rarely the case for all but the most advanced learners. Reading on a computer may be able to address both these problems.

Problem 1: High-quality dictionaries can improve text readability, but at the same time they disrupt the flow and possibly the pleasure of reading.

Several publishers of ESL materials, notably Longman and Cambridge, have recently invested in well researched and designed learner dictionaries for intermediate and advanced learners. Nonetheless, studies of dictionary use (e.g., Hulstijn, Hollander, & Greidanus, 1996) suggest that however beneficial even sophisticated language learners may believe a dictionary to be, they will not use them extensively while reading if they believe them to entail an exit from the reading task itself. This of course might not happen as much if the resources could be directly integrated into the text that the learner was reading. This ideal has recently become possible as free online versions of these dictionaries have become available.

This is precisely the object of the dictionary option at http://www.lextutor.ca/hypertext. The reader copies a text into a Web form, chooses from a menu one of four excellent online dictionaries (including the online versions of the Longman LDOCE and the new Cambridge Advanced Learner’s), and the program wires text and dictionary together so that a click on any word in the text produces the relevant definition in a window just beside the text. (It should be noted that this any-word feature depends on the fact that all of these dictionaries are fully lemmatized, or fleshed out as word families, so that clicking on cats in the text produces the entry for cat, for instance, as oppposed to a ‘Not Found’ notice.) The learner working in Figure 18 has connected the Cambridge Advanced Learner’s Dictionary to a text on Cell Phones and Driving, and run immediately into an unknown word, ban. A click on the word generates a well thought-out definition in roughly one second. Is this significant disruption, or not?

Figure 18: High-quality any-word click-up definitions online

Research: A repeated-readings case study by Author, Greaves and Horst (2001) compared two Anglophone learners reading similar sized extensive texts in their target languages, one (a German learner) reading a German novella on paper, and the other (the French learner) reading a French version of the Call of the Wild page adapted for Guy de Maupassant’s novella Boule de Suif (www.lextutor.ca/bouledesuif/). Vocabulary expansion resulting from the readings was used as the measure of reading success and learning value. The offline reader simply read his text the required number of times, while the resource-assisted reader could access a dictionary and several other resources on a click-on basis. More than 60% of the resources used involved the dictionary. Learning was tracked for several hundred single-occurrence words in both texts (a quantity of test items made possible through the employment of a computer). Vocabulary growth was roughly double for the dictionary-linked reading experience, which was perhaps not surprising. More interesting, however, for the present research question, was that time-on-task was no greater for the online reader. In other words, look-ups were not consuming large amounts of reading time, and seem to have been an adjunct to reading rather than a disruption of it, as indicated by the subject’s report as well as the time record.

Research – open questions: The research reported here was merely preliminary. First, no similar experiment has yet been undertaken for a larger group of learners. Second, it is still not fully established whether easy look-ups do or do not constitute a significant exit from reading the text (integrating propositions, integrating prior knowledge, constructing extra-textual inferences, etc.) Third, studies are needed that compare the effects of online vs. offline resources for both vocabulary growth and reading comprehension.

Problem 2: While the proportions of known-to-unknown words that are conducive to learning to read at different levels are reasonably well known, there is  almost no way that this information can be utilized in the development of reading materials.

While text comprehensibility is somewhat linked to learner strategies, task demands, and topic familiarity, as a general rule a readable text is one where 95% of the vocabulary is known to the reader, or in other words where the new-to-known ratio is no greater than  1:20. This is the point where enough of the text is in focus for comprehension tests to be passed, new vocabulary reliably inferred, and reading to become less effortful and more pleasurable (Laufer, 1992; Nation, 2001). But how can such texts be located, or created?

Graded readers are often able to provide opportunities to meet new words in low-density environments, and there can be little doubt that these readers should be in far greater use than they are at present. However, there are two problems with implementing graded readers: they tend toward children’s interest levels (adventure stories and the like), and it is difficult to build a collection that caters to a wide enough range of interests, particularly adult interests. Successful graded reading programs ultimately depend on a growing supply of teacher adapted texts. And yet how such texts are to be adapted, even just from the lexical point of view, is not obvious. The rest of this paper introduces tools that can help with the job.

Solutions

Successful grading of texts depends, first, on having some way of defining the lexical levels of both learners and texts, and fortunately it is possible to do this. On one side, vocabulary tests are available (with limitations as noted in Part I above) that can indicate a learner’s rough vocabulary size in terms of 1000 word-frequency levels (as devised by Nation, 1990; Laufer & Nation, 1999; Schmitt, Schmitt & Clapham, 2001; some of these available at www.lextutor.ca/levels/). On the other side, a computer program is available that analyses texts in terms of this same 1000-levels scheme. This program is Nation and Heatley’s (1994) Vocabprofile, also adapted for Internet and available through Lextutor (at www.lextutor.ca/vp). In principle, these two analyses should put texts and learners into contact with each other. Texts can be found, written, or adapted to match particular learners’ abilities—not easily, of course, but this technology at least makes it possible.

For example, the profile for Chapter 1 of the fictional work Call of the Wild shown in Figure 19 indicates that just over 80% of its words come from the first 1000 word families of English, so that a somewhat larger vocabulary than 1000 word families would be necessary for a learner to make any sense of this particular text. A learner knowing 1000 word families would be facing a new-to-known ratio of about one word in five, not one word in 20. For such a learner, a text with 90% or higher at the 1000 level would be more suitable.


Figure 19: VocabProfile for call of the Wild, Chapter 1

One potential problem with the above analysis however, is that over the course of an extended text, words are presumably repeated a good deal and thus have a chance of being learned on the fly, so that the new-to-known ratio could be substantially reduced by the end of the story. To what extent does this happen? Are intermediate learners rewarded with reasonable new-to-known ratios if they struggle through the first few chapters of a book written for native speakers, like Call of the Wild?

The answer to this question is provided by another of Lextutor’s analytic tools, Text_Lex_Compare (www.lextutor.ca/tools/text_lex_compare/), which identifies the new (i.e., different) words in a second text in comparison to a first text or set of previous texts. This program reads in texts of up to twenty chapters and identifies, counts, and lists the new items appearing in each. It further automatically links these items to the VocabProfile program mentioned just above for frequency evaluation. This program input is shown in Figure 20, with the first two chapters of the same story in position for a lexical comparison. The output is shown in Figure 21.


Figure 20: Text_Lex_Compare input


Figure 21: Text_Lex_Compare output

The output shows that there are 719 different new words in the second chapter, represented in 964 running words or tokens. Clicking the VP button and sending these words to VP analysis subsequently reveals that 41% of these words are from the most frequent 1000 words of English. In other words, almost 60% of them are relatively infrequent for a learner who knows 1000 words. So far, then, this analysis suggests that the lexis of Call of the Wild offers a fairly unfriendly lexical ratio for the intermediate learner, but the pattern must be worked out for the rest of the text – which is what the series of upload inputs in the bottom half of Figure 20 makes possible. For example, the analysis of Chapter 3 shows how much new lexis it presents with respect to both of the preceding chapters, and so on. The result of this analysis for the entire volume is shown in Table 7.

Chapters

New types

New tokens

Per cent  1k items

2

743

982

37

3

800

983

29

4

351

401

27

5

632

893

36

6

565

733

32

7

633

795

25

Mean

620.7

797.8

31.0

Table 7: New lexis chapter by chapter

Figure 22: Little reduction in the diet of novel lexis

Table 7 clearly suggests that the flow of new lexis never abates in a text designed for native speakers. Indeed, the third highest number of new word types appears in the final chapter. Further, the Vocabprofiles for these new items show them to be mainly post first-2000 items, that is to say potentially difficult items that are fairly rare and will not necessarily repay the investment of learning. But the new-to-known ratio is the biggest problem. With the exception of the third chapter, these chapters are about 3500 words in length, and the number of new word tokens is an average of almost 800 words. The ratio, in other words, is about eight new words per 35, or more than one new word in four—rather far from one in 20. Similar findings are available for other books written with native speakers in mind (Conan Doyle’s Hound of the Baskervilles can be downloaded from Lextutor’s Text_Lex_Compare page for readers to test this assertion for themselves).

It is a truism that our learners ought to be reading graded texts, but with the research findings and technologies now at our disposal we are in a position to say very clearly why this is so. Text_Lex_Compare was fed the seven chapters of the Penguin/Longman graded version of the same Call of the Wild, with results as shown in Table 8 and Figure 23.

Chapters

Total words

New types

New tokens

Per cent of 1k items

New-known

Per cent

Ratio

Ch 2

876

131

193

79

193/876

0.22

1 to 4.5

Ch 3

1573

116

199

64

199/1573

0.12

1 to 8.5

Ch 4

1272

69

111

56

111/1272

0.08

1 to 12.5

Ch 5

1178

49

112

40

112/1178

0.09

1 to 11.5

Ch 6

1584

63

139

45

139/1584

0.08

1 to 12.5

Ch 7

1838

50

110

53

110/1838

0.059

1 to 19

MEAN

1386.8

79.7

144.0

56.2

 

 

 

Table 8: results for graded ‘Call of the Wild’

 

 Figure 23: Fewer and declining number of new word types

The adaptors of this text have produced a far more manageable proportion of new lexis, more than half of it within the first 1000 level (56.2%), and moreover a decreasing amount of it as the novel proceeds, indicating a good deal of recycling of known items. Most interesting, by the final chapter, the new-to-known ratio has actually come close to the 1-in-20 target (1:19 to be precise, as shown in the final ratio of Table 8). This means that if learners had learned all the words that appeared in the previous chapters, then by the final chapter they would be reading with a new-word density of just over one new word in 20. For the other six chapters, of course, the density is higher than that, which I believe is an argument for linking even simplified readers to a relevant selection of the computer based learning resources described above. And this, in turn, is an argument for providing learners with simplified extensive texts that are machine-readable.

Where would a varied, multilevel library of machine readable extensive readings come from? Probably not from the big commercial publishers. While companies like Longman have produced an impressive paper collection of graded materials, mainly in the realm of fiction classics, they have also been quite successful (unlike music publishers) at making sure that teachers and learners pay for anything they get. This is not a criticism of the publishers; text adaptation is hard work and the publishers are entitled to recoup their investment.

The Internet is lacking in very few types of texts, but one of the few is simplified reading materials for language learners. While preparing this piece I sent out a plea to the extensive reading community (via the Extensive Reading Pages website) to inform me of any free online sources of extensive readings, and learned that there is apparently only one in existence, an interesting but modest UK site named Blue Yonder. This lack of available materials led me to conclude that a complete and useful library of graded readings, particularly at adult interest levels and including non-fiction texts as well as fiction, with vocabulary level and new word density rates publicised, can probably only be produced by teachers and course designers themselves. This would not be easy, but of course the fruits could be shared over the Internet, perhaps at a dedicated Website. The resource Text_Lex_Compare both shows us why we need to do this, and along with VocabProfile provides the tools to get on with the job.

How would the simplification process work? Assuming a text is in machine readable format, it can be run through VocabProfile and its potentially difficult or unuseful vocabulary identified (difficult in terms of the intended readership’s lexical profile). Decisions would then be made as to whether each word was a proper noun or place name posing no problem and hence could be recategorized as a common item, was a crucial or repeated item requiring a contextual gloss in more basic language, or was neither and should be written out of the text. When the profile of the text as a whole shows roughly 5 per cent challenging yet learnable vocabulary, the chapters can then be run through Text_Lex_Compare in sequence to determine whether the density is proportional over the course of the text. If not, further modifications are necessary. This is not easy work; anyone who has done it knows why Longman and the others take care with the distribution of their  their simplified readers. But with these computational tools it is feasible work.


Conclusion

I hope I have convinced the reader that the role of the computer within the expanded universe of text can and should go well beyond the functions of delivery, distribution, and printing.

At the top of this paper I proposed that “computer programs, accessing large shared text repositories, have a tremendous potential to both resolve old questions for teachers/course designers, and provide new and unique opportunities for large numbers of learners at low cost.” Within one domain, extensive reading, and using vocabulary growth as the index of the success of extensive reading, I have shown how corpus analysis can define some of the key problems with growing a lexicon through reading, and how the networking of different kinds of and forms of texts on the learner’s computer screen should be able to solve it.

The key problems of learning through extensive reading are clear. Corpus analysis shows that post-1000 level words are unlikely to be encountered in natural reading in sufficient numbers for learning to occur. VocabProfile analysis shows that the amount of new vocabulary in natural texts is likely to be severely at odds with both the lexical level and learning capacity of intermediate learners. Text_Lex_Compare further shows that the rate of new word introduction in a text designed for native speakers is far more these learners are able to cope with. And yet these same tools can also be employed positively, to help with the adaptation of texts that learners can read and learn from.

The long-term goal is to build a shared free online universal library of graded reading materials. The short term goal is to get available texts online and help learners use the various tools described above that can proliferate encounters, keep track of encounters, multi-contextualize meanings, and provide minimal-disruption links to top quality learning resources. Only local teachers and course designers can accomplish any significant part of this in any coherent way. Ironically, Lextutor’s records and user correspondence show that the main users of the Websites learning tools at present are individual learners. In other words, the market is there.

Krashen (1989) remarked in a deservedly famous paper on vocabulary growth from reading that a number of books can be purchased for the price of one computer, implying that the books were the wiser choice. In 2005, books and computers are less a choice than a partnership.

References

Author, T., Greaves, C., & Horst, M. (2001). Can the rate of lexical acquisition from reading be increased? An experiment in reading French with a suite of on-line resources. In P. Raymond & C. Cornaire (Eds.), Regards sur la didactique des langues secondes (pp. 133-153). Montréal: Éditions logique.

Author, T. & Stevens, V. (1996) A principled consideration of computers and reading in a second language. In M. Pennington (Ed.), The power of CALL (pp. 115-136). Houston: Athelstan.

Author, T. (1997a). Is there any measurable learning from hands-on concordancing? System, 25(3), 301-31.

Author, T. (1997b). From concord to lexicon: Development and test of a corpus-based lexical tutor. Concordia University: Unpublished PhD dissertation. [Available at http://www.nlc-bnc.ca/obj/s4/f2/dsk3/ftp04/nq25913.pdf .]

Author, T. (1999). Applying constructivism: A test for the learner-as-scientist. Educational Technology Research & Development, 47 (3), 15-33.

Blue Yonder website for extensive reading. [Online]. Accessed 2005 April 20, at http://gradedreading.pwp.blueyonder.co.uk .

Chun, D., & Payne, S. What makes students click: Working memory & look-up behavior. System 32, 481-503.

Extensive Reading Pages website. [Online: http://www.extensivereading.net/.] 

Heatley, A. and Nation, P. (1994). Range. Victoria University of Wellington, NZ. [Computer program, available at http://www.vuw.ac.nz/lals/.]

Hirsch, D, & Nation, P. (1992), What vocabulary size is needed to read unsimplified texts for pleasure? Reading in a foreign language, 8(2), 689-696.

Horst, M. (2000). Text encounters of the frequent kind: Learning L2 vocabulary from reading. University of Wales (UK), Swansea: Unpublished PhD dissertation.

Horst, M., Author, T., & Meara, P. (1998). Beyond A Clockwork Orange: Acquiring second language vocabulary through reading. Reading in a Foreign Language, 11(2), 207-223.

Horst, M., Author, T., & Nicolae, I. (2005). Expanding Academic Vocabulary with a Collaborative On-line Database. Language Learning & Technology, 9 (2), 90-110.

Horst, M., & Meara, P. (1999). Test of a model for predicting second language lexical growth through reading. Canadian Modern Language Review, 56 (2), 308-328.

Hulstijn, J. H., Hollander, M., & Greidanus, T. (1996), Incidental vocabulary learning by advanced foreign language students: The influence of marginal glosses, dictionary use, and reoccurrence of unknown words. Modern Language Journal, 80, 327-339.          

Krashen, S. (1989). We acquire vocabulary and spelling by reading: Additional evidence for the input hypothesis. Modern Language Journal, 73, 440-464.

Krashen, S. (2003). Explorations in language acquisition and use: The Taipei lectures. Portsmouth NH: Heinemann.

Kucera, H., & Francis, W. (1979). A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Revised and amplified from 1967 version). Providence, RI: Brown University Press.

Laufer, B. (1992). How much lexis is necessary for reading comprehension? In P.J. Arnaud & H. Béjoint (Eds.), Vocabulary and applied linguistics (pp. 126-132). London: Macmillan.

Laufer, B. & Nation, P. (1999), A vocabulary size test of controlled productive ability. Language Testing 16(1), 33-51.

Mondria, J-A., & Wit-De Boer, M. (1991). Guessability and the retention of words in a foreign language. Applied Linguistics, 12 (3), 249-263.

Nation, P. (1990) Teaching and learning vocabulary. New York: Newbury House.

Nation, P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.

Read, J. (2000). Assessing vocabulary. New York: Cambridge University Press.

Simpson, R., Briggs, S., Ovens, J. & Swales, J. M. (2002) The Michigan Corpus of Academic Spoken English. Ann Arbor, MI: The Regents of the University of Michigan.

Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing, 18(1), 55-89.

Stanovich, K.E., & Cunningham, A.E. (1992).  Studying the consequences of literacy within a literate society: The cognitive correlates of print exposure. Memory & Cognition, 20, 51-68.

Wesche, M., & Paribakht, S. (1996), Assessing vocabulary knowledge: Depth vs. breadth. Canadian Modern Language Review, 53(1), 13-40.

Zahar, R., Author, T. & Spada, N. (2001), Acquiring vocabulary through reading: Effects of frequency and contextual richness. Canadian Modern Language Review, 57(4), 541-572.