Language learners know there are many words out there for them to learn, and they are ever enjoined by their instructors to learn these words by contextual inference from text rather than from the small bilingual dictionaries they usually prefer. Concordance analysis is able to present learners with words in a wide variety of contexts quickly and efficiently, and also indicate by the number of occurrences roughly how much learning attention each deserves, so a first draft of a falsifiable hypothesis is that some variant of the corpus-concordance configuration could be an effective tool for learning words through inference.
It is often difficult to find a relevant literature to review for novel uses of technology like concordancing. In this case, while the application of corpus technology to vocabulary acquisition may be a novel idea, learning words from written context is not. Learning from context is one of the most written-about topics in language instruction, and it is mainly in this literature that issues relevant to instructional concordancing can be found.
In the 1970s there was a consensus based on very little research that learning words from context was natural and easy, following the so-called psycholinguistic reading theory of Goodman (1967) and Smith (1971). Goodman (1973) argued that the principles of this theory were universal and therefore applied to second language reading as much as first; Clarke and Silberstein (1977) and Coady (1979) proposed specific ways that the theory could be adapted to second language pedagogy. The practical outcome of this reasoning was that vocabulary instruction changed from direct word-training to indirect skills-training, the main skill being the guessing of word meanings from context. Words themselves were no longer taught in any systematic way; the notion of vocabulary control in instructional materials virtually disappeared, and presenting students with lists of words came to seem ridiculous. Why make students labour over specific words, when with one high-level generative skill all the words can be had for free?
However, when the context idea was subjected to empirical investigation, particularly with regard to second language reading, it was quickly complicated by a mass of complex findings. A series of studies in the early 1980s made guessing seem a very dubious activity for learners to engage in or instructors to promote. For example, a study by Haynes (1983) found that most types of context were actually of quite limited use to most learners. The reasons for this and many similar findings became clear when researchers looked into the thought processes behind learners' inferences. Laufer and Sim (1985) had Hebrew learners of English talk aloud their answers to comprehension questions on a text, and listened for a pattern in the learners' approach to new words. The typical learner "tends to look for cues in the word itself, its morphology, and its resemblance to words in other languages, rather than using contextual clues." For example, reading about people "who took their holidays in spas where they spent their time relaxing in the hot water pools," learners glossed "spas" as "space" on the basis of an orthographic resemblance while ignoring the total mismatch with the larger context.
Of course, some guessers were more successful than others. Van Daalen-Kapteijns and Elshout-Mohr (1981) performed an experiment with Dutch second-language learners who had previously been classed high and low verbal by an IQ test. The researchers presented each subject with a neologism and then a series of sentences using that neologism. After each new sentence was added, subjects were asked what the neologism meant in the light of the total information. Two very different ways of handling the task emerged: high-verbal learners integrated the growing information supply, searching for a core of invariance while expanding, adjusting, and integrating the contexts. Low-verbal learners were buffeted by each new context, working through a succession of unrelated theories about the word's meaning. In other words, inference may be a game for the bright.
Or, it may be a game especially not for the bright. Language learners in a study by Parry (1991) talked aloud their ways of dealing with novel words in text, showing reasonable ability to get partial sense of meanings. Parry assumed these partial meanings would be integrated with others when the words were encountered a second and third time, but found unexpectedly that after even a short delay the words had often been forgotten. Moreover, it was often the best guessers who were the worst forgetters. Good guessers were able to perceive the main lines of a text very quickly, fill in semantic gaps left by unknown words, and then show no vocabulary gain between pretest and posttest.
But individual differences are not the only source of variance in inferring word meanings. Beck, McKeown and McCaslin (1983) demonstrated that many natural contexts reveal little or nothing about what words mean regardless of the amount of verbal ability applied to them. They targeted random words in a first-language basal reader, classifying each as having one of four levels of contextual support ranging from totally redundant to totally misdirective, all of which were about equally present. The target words were blacked out, and adults guessed the missing words from the context. Predictably, their success correlated perfectly with the level of contextual support for the target word, from misdirective context (3% correct) to redundant (86%). There is no reason to think this finding would not be replicated in a second-language context-but more so, since the proportion of low-support contexts would rise with the proportion of unfamiliar words.
In the face of the type of evidence just cited, steps were taken to rescue guessing theory. One idea was to present learners with to-be-learned words in very clear, almost redundant contexts specially designed for word-learning, an idea influentially promoted by Schouten-van Parreren (1985) under the heading "pregnant contexts." However, a test of the idea by Mondria and Wit-de Boer (1991) demonstrated that while very rich contexts may make words easy to guess, it also makes them hard to remember. The reason is possibly that when the meaning of the overall sentence or passage is utterly clear, learners assume they know the constituent words and pay no attention to them. In other words, "the inherent difficulty of guessing in highly pregnant contexts is too low to bring about a positive learning effect " (p. 262). Sharwood Smith (1986), Haastrup (1989), and Stein (1993) provide related arguments.
Another problem with pregnant contexts is that they prevent learners from developing what Beck and McKeown (1991, p. 809) call "one of the most important insights we [teachers] can pass on" about context, namely the conscious or metacognitive ability to distinguish helpful from unhelpful contexts. The importance of this skill comes to the fore when learners leave the classroom and must continue to acquire vocabulary on their own with no one devising pregnant contexts for them.
So, if there are problems with both natural and staged contexts, how are words ever learned? One response to the problem of variable learning conditions in natural contexts is just to accept that this is life and the way words get learned, to the extent that they do. Sternberg's (1987) view is that contextual inference is not the best or even a very good way to learn words, just the way most words get learned. Children simply know too many words to have been taught them directly or looked them up in dictionaries. By the end of school, they know at least partially many or most of the 88,700 words of "printed school English," as calculated by Nagy and Anderson (1984), a number instruction could not have much effect on. Whether in a first or later language, the disparity between words taught and words learned leaves either incidental or inferential acquisition from listening or reading the default word-teacher. And since conversations, situations, and even television tend to be lexically repetitious (West and Stanovich, 1991), vocabulary expansion must take place mainly through incidental exposure during reading.
Logically, incidental learning over time makes sense. In texts, words are visible, noticeable, repeated, reviewable, and so on, and over time a learner will meet words in every type of context (level of support, degree of memorability, etc) required for learning. Empirically, however, the default argument has been hard to demonstrate, let alone build a pedagogy on, and several experimental studies have actually cast doubt on whether it takes place to any great degree (for example Jenkins, Stein, and Wysocki, 1984).
In a long series of experiments starting from a study by Anderson and Freebody (1979), Anderson, Nagy and colleagues at the Center for the Study of Reading at the University of Illinois developed a methodology and instrumentation to prove the existence of incidental acquisition. Acquisition of vocabulary from reading, they argue, is hard to demonstrate only because of the way word knowledge is measured. The finding that words are not normally learned from reading (Jenkins and colleagues) depends on a crude binary measure of word knowledge, a word is either known or unknown. If the measure of word knowledge is ability to define a new word or use it correctly in a sentence, yes or no, then it is easy to show that no learning results from meeting a new word once or twice in reading. However, words are not learned all at once but incrementally, with productive or definitional knowledge at the end of the process not the beginning. Finer-grained measurements can be developed that are sensitive to these increments over the course of learning.
For example, a child reads, "The protagonist went out looking for the lion that had been terrorizing the villagers." The child will then display different understandings of "protagonist" depending on whether he or she is asked to define the word, use it in a novel sentence, or answer questions like, "Is the protagonist a person" or "Does the protagonist like the villagers?" With such questions as the pre-post instrument, it will be seen that measurable learning results from meeting the word in just one context.
Nagy, Herman and Anderson (1985) demonstrated that some appreciable learning takes place on almost every encounter with a new word. The probability of a word being learned in a single occurrence is as low as .15, later revised down to .05 (Herman, Anderson, Pearson, and Nagy, 1987) or 1 chance in 20. Nevertheless, with an average exposure to a million words of text per year at school, incidental acquisition was shown by simple arithmetic to account for adult vocabulary size.
And what about the problems of learning from context, the misuse of context clues and the varying clarity of natural contexts? Nagy and colleagues argue that these problems are simply absorbed by the volume over the course of growing up, the million words of running text per year. Of course, not all students read that much, but-and here is the pedagogy-every student should be encouraged to read as much as possible. Wide reading is the only way to assure that clear contexts outweigh unclear over the long run, and that incremental learning proceeds all the way to roughly standard adult understandings of words.
In other words, Goodman and Smith were right about learning from context, except that it is hard not easy, takes a long time not a short time, and there is no guarantee that the learning will go all the way.
This first-language research was imported directly into second-language theorizing, notably by Krashen (1989). Empirically, Nagy and colleagues' results have been more or less replicated in second language contexts, although with slower rates of acquisition. A typical finding is that on the basis of a single encounter, 3 out of 28 (11%) new words in a text achieve some sort of appreciable learning for fairly advanced European learners (Pitts, White, Krashen, 1989)-perceptible but minute learning. Horst (1995) replicated the study with Arab students, obtaining even smaller gains between pre-test and post-test: an average of 1.3 words learned out of 16 (8%), an even smaller but still significant gain (t (25)=2.66, p < .05).
Looking not just at products but also processes, Parry (1991) confirmed the point that when fine measures of word knowledge are applied then slow incremental word learning is revealed. In a series of talk-aloud studies of academic second-language learners glossing novel words in text, Parry found that the meaning representations derived from a single exposure tended to be not so much wrong as partial: "Each [learner] recorded a substantial proportion of correct glosses and got more than half of the total at least partly correct" (p. 640). And what happens to these partial meanings gleaned from texts? Parry hypothesizes "that a trace of each inference will remain to be modified in subsequent encounters with the word." The partial, semi-correct trace representations will gradually add up to complex, correct ones, provided the word is encountered several more times, presumably within some sort of time limit to accommodate memory.
So, second-language learners should be encouraged to read more, a lot more, to parlay minute learning into functional lexicons. Krashen (1989) and many of the contributors to Carrell, Devine and Eskey (1988) propose various schemes for massively increasing the amount of reading second-language learners will do.
However, it is not clear that the advocates of wide reading in second language have worked out the math in days and hours, particularly for settings outside Europe and North America. Meara (1988) argues that Nagy and colleagues' figures could hardly be directly applicable to second language learning situations, where few learners are exposed to a million words of running text per year, and indeed "the figure is more likely to be in the region of a few thousand" (p. 11). And the years available for lexical acquisition are more likely to be in the region of one or two rather than 10 or 15.
If there are so many problems with learning from context in a second language, why not simply return to learning from dictionaries, the cognitive tool of choice for centuries during which words and languages somehow got learned. And of course electronic dictionaries are now widely available, monolingual and bilingual.
Dictionaries have their uses in vocabulary expansion, but there are logical and empirical reasons not to rely on them as the primary tool of word learning. Logically, the traditional genus-and-differentia structure of a definition is inherently unsuited to learning, especially in the case of the high-frequency words of a language that typically occupy second-language learners. A dictionary definition starts by categorizing the look-up word at the next higher order of generality, i.e. the next lower order of frequency, so that words are explained via others even less likely to be known ("a car is a vehicle which..."). If learners are looking up "car," what hope they know "vehicle"? Or take Merriam-Webster's "give"-"applicable to any passing over of anything by any means." Experiments are under way with the performance properties of definitional formats (Crystal, 1986; Sinclair, 1987b; McKeown, 1993; Cumming, Cropp and Sussex, 1994; Nist and Olejnik, 1995) but with no firm recommendations as yet.
Empirically, several studies have determined that neither young nor second-language learners are always able to get good information from dictionaries. In a study of first-language children's dictionary use, Miller and Gildea (1985) showed children consistently short-circuiting even simplified definitional information. The task was to read a definition and then write a sentence incorporating the word. In a large number of cases, children fixated on a familiar word or phrase in the definition and then built their sentence around that, ignoring the rest of the information, a strategy the researchers called "kidrule." For example, a fifth-grader read that "erode" means "eat out, eat away," and, since "eat out" was familiar wrote "Our family erodes a lot." Kidrule was consistent enough to suggest that the definitional format itself presents a barrier to learning, at least in children. However, the strategy is also used by adult second-language learners. Nesi and Meara (1994) replicated the finding with academic language learners in Britain, and Horst (1994) with similar learners in Oman.
With an on-line or CD-ROM dictionary, the kidrule problem could be predicted to be worse not better. Kidrule is an attempt to limit exposure to the large amount of information contained in a definition, and of course one of the vaunted advantages of electronic dictionaries is the removal of the space limitations of paper so that definitions can contain even more information. Sub-senses can proliferate, examples abound-all the more for kids and language learners to ignore. As Nesi and Meara (1994) suggest, "longer entries may create their own particular problems; it is possible that only part of a longer entry will be attended to, and this part may not even be the kernel definition, but may be an example phrase which simply provides context" (p. 5).
But of course the usual information-management strategy of language learners is not to use kidrule with a real dictionary, but to access brief translation equivalents with a small bilingual dictionary. While this strategy may be justifiable in the early stages of language learning, it is ultimately limiting, resting on an assumption that terms in two languages have identical semantic coverage as they rarely do. Electronic bilingual dictionaries are now available, but more sophisticated technology is of little use unless it somehow promotes a more sophisticated learning strategy. So far, there is no sign that it does; Bland, Noblitt, Armstrong, and Gray's (1990) study of learners' use of on-line bilingual dictionaries reveals mainly the extent of their "naive lexical hypothesis" (that words map one-to-one between languages) and its costs in misunderstanding.
Whatever the quality of a definition or the capacity of the learner to
use it, there is evidence that definitional knowledge is in any case not
the most useful kind of knowledge to have about words.
A counter-intuitive but often replicated finding in first-language
reading
studies is that in spite of the correlation between reading comprehension
and vocabulary size, merely learning to state the meanings of words does
not in itself affect the comprehension of text using those words. The
classic
papers on definitions and comprehension are Mezynski's (1983) review, and
Stahl's (1991) update. After examining several approaches to vocabulary
instruction, Mezynski concludes as follows:
The results from eight vocabulary training studies demon-strated that it is relatively easy to increase students' word knowledge, at least to the extent that they can give definitions of words. However, several studies indicated that students could know definitions, yet apparently be unable to use the words to comprehend textual information (p. 272).
The few training methods examined by Mezynski that actually did produce word knowledge that affected comprehension had just two features in common, multiple contextualizations for each word and some way of getting learners to become active seekers of information about words. Of course, both of these can be produced in a classroom. The classic instance is a first-language training program developed and tested by Beck, Perfetti and McKeown (1982) and re-tested by McKeown, Beck, Omanson and Perfetti (1983). Beck and colleagues had students recycle words several times as definitions and example sentences, go outside the classroom and find instances in the community, and so on, all with a major teacher involvement. This elaborate training produced strong gains on all forms of word knowledge, including comprehension, transfer to novel contexts, even speed of lexical access.
However, only 104 words could be taught to this "rich" level in 5 months of instruction, about 20 words a month. In other words, the instructional pace was not much swifter than natural acquisition from reading, as detailed by Nagy and Anderson.
J. B. Carroll (1964) expressed long ago a wish that a way could be found to mimic the effects of natural contextual learning, except more efficiently. Beck and colleagues' training program appears to mimic, but not much more efficiently. Maybe efficiency is impossible in this area. Krashen (1989) argued that there are no shortcuts, whether definitions, mnemonic strategies, wordlists, or training in context clues:
It thus appears to be the case that vocabulary teaching methods that attempt to do what reading does-give the student a complete knowledge of the word-are not efficient, and those that are efficient result in superficial knowledge (p. 450).
Somehow, learners must put in the time, whether it is extensive reading or taking part in a training program like the one described by Beck and colleagues.
And yet there are many educational situations where the time for either is unlikely to be found, and as noted above one of these is second-language learning, especially where the goal is to get on with English-medium academic courses with the least possible delay. Martin (1984) summarizes the problem and poses the question to be answered in this study: "The luxury of multiple exposures to words over time in a variety of meaningful contexts is denied to second and foreign language students. They need prodigious amounts of information within an artificially short time How can this enormous amount of information be imparted? (p. 130)"
The main idea of education, arguably, and educational technology definitely, is that learning processes can be made more efficient than they would be left to themselves as trial-and-error sequences. The proposal here is that corpus and concordance software might be able to mimic the main features of natural lexical acquisition from text, so that word knowledge affected comprehension, but more efficiently than through either massive reading or an intensive training program. Mezynski's two distinguishing features of successful off-line training regimes were multiple contexts and active learning set, both of which are integral parts of a concordance program that could form the basis of a lexical tutor.
A corpus-based lexical tutor could also respond to some of the specific problems and paradoxes of word learning raised above:
1. If the time is just not there for massive exposure to text as in first language acquisition, then some sort of compressed exposure to a large corpus might be a substitute. A corpus that comprised, say, a complete term's reading would let learners view or review the texts of their courses from a time-collapsed focus on individual words, bringing occurrences together for integration that are otherwise distributed through time and likely to be forgotten.
2. The problem of variability in contextual support might be less problematic in a large corpus than it is in smaller texts, since with multiple contexts on display one of them would probably make sense, or some combination of them. This follows Krashen's (1982, 1989) idea that while learning takes place only through "comprehensible input," learners are nonetheless capable of selecting or negotiating (Hatch, 1978; Larsen-Freeman and Long, 1991) from raw input the parts that are comprehensible to them. Concordance might function as an interactive inference-support tool, helping readers make text comprehensible just as conversation with native speakers helps them make spoken input comprehensible.
3. The problem of learners not knowing helpful from less helpful contextual information might be reduced by concordancing, since searching through several contexts gives practice in making this distinction.
4. The problem of pregnant contexts would not occur. A new word is unlikely to disappear into its context if it is the focus of the exercise, and a natural corpus of any size is unlikely to contain only or mainly pregnant contexts.
5. The problem of good readers gliding along on meaning, unfocused on individual words, would be unlikely to occur, since the text displayed on a computer screen is long enough to make an inference but too short to build up much glide-speed.
6. A concordance offers little encouragement to the naive hypothesis, exposing in plain view as it does the many semantic and collocational differences between roughly similar words between languages. For example, the Arabic for "thing" is "shay," but if a learner looks at a concordance before consulting his bilingual dictionary the one-to-one hypothesis is unlikely to be confirmed:
"Things" in Alice in Wonderland come in many guises, most of them unequatable to Arabic uses of "shay" in any simple way: "seen such a thing" (unlikely occurrence); "poor little thing" (child); "learnt several things" (facts); "things had happened" (events); and so on. In the concordance tutorial proposed here, no attempt is planned to ban dictionaries of whatever sort from the word-learning process, but merely to delay their entry until several natural contexts have been considered (a sequence also proposed by Anderson and Nagy, 1991).
In other words, a corpus tutor might be uniquely able to overcome some of the specific problems and paradoxes of vocabulary acquisition in a second language. Further, the comprehension issue now allows a refinement in the hypothesis of this study. The draft hypothesis in the previous chapter was that concordancing can simulate important aspects of vocabulary acquisition from natural reading but in a reduced time frame. The refined hypothesis specifies "important aspects"-words will be learned so that they can later be comprehended in novel contexts.
However, one problem raised above that a concordance tutor definitely would not solve is the suggestion raised by Van Daalen-Kapteijns and Elshout-Mohr (1981) that low-verbal learners will just never be able to integrate multi-contextual information into complex representations of word meaning. If true, this would limit participation to a certain type of learner, and argue against the development of a corpus-based lexical tutor in any setting where resources were constrained to any degree.
After years of speculation concerning the special mental powers needed to learn new words from context (Jensen, 1980; Sternberg, 1985), or alternatively about the ease and naturalness of learning new words from context (Goodman, 1967; Smith, 1971), some light has just recently been shed on this dark area. Amazingly, it now appears that people of average intelligence are moderately able to learn new words from written contexts, provided (1) they see the word a few times, and (2) they are familiar with most of the other words in the context. In a study of several factors affecting learners' ability to use contextual information, Shefelbine (1990) showed that intelligence is relatively minor, with the size of one's existing vocabulary, i.e. the base for making inferences, claiming the main variance. Similarly, West and Stanovich (1991) plotted several factors against vocabulary size in a regression analysis, and found that amount of print exposure claimed more variance than score on the Scholastic Achievement Test (SAT).
But has not the verbal intelligence problem gone away only to be replaced by one just as serious for inferential learning, that learners are unlikely to know the words in the contexts they are inferring from? It is indeed a tough paradox that you need words to learn words, but it can be softened by three factors. First, as mentioned, with several contexts accessible, a learner is likely to find one where he knows enough ambient words to make a useful inference. Second, a finer-grained picture of exactly how many words are needed to make inferences is becoming available through corpus analysis, an idea to be explored in this study. Third, a corpus tutor can be designed to contain elements of both direct instruction and dictionary work in the initial bootstrapping phase.
There is a space in instructional research for concordancing as a
word-learning
tool. Exploring this space has suggested some of the design parameters
for
such a tool, to be discussed in a subsequent chapter. But first, why
should
meeting a word in several contexts be so important to comprehending it in
a novel context? An examination of relevant learning research is the
subject
of the next chapter.
contents |