Do corpus-based electronic dictionaries replace concordancers?

 

Tom Cobb

Dépt de linguistique et de didactique des langues

Université du Québec à Montréal

Canada

In B. Morrison, G. Green, & G. Motteram (Eds.) (2003).
Directions in CALL: Experience, experiments, evaluation (pp. 179-206).
Hong Kong: Polytechnic University.

 

Costs and benefits of examples in language learning

It is universally acknowledged that language learning proceeds by examples. Learners need to meet and process many, many examples of the words and the structures of the language they are learning. There is an argument that they also need to meet explanations, but at present the evidence seems to favour examples. Of course, useful examples of the different linguistic features are not so easy to come by or make sense of, and one of the things language instruction can do for learners is to assemble, contextualize, clarify, and sequence the examples, or inputs, that fuel language learning.

 

A major source of both explanations and examples for many learners, particularly in moments of genuine curiosity, has always been the dictionary. The examples that accompany a definition are probably the main source of learning from a dictionary consultation, possibly assisted by the categorization of examples which the format provides, with the role of the definition itself remaining unclear. However, for as long as there have been dictionaries there has been a premium on page space that has limited the number of examples that can be provided. But there is no such premium on electronic space. In a print medium, every additional example is an additional cost, but in an electronic medium it is free of charge. For example, a concordance program linked to a text corpus can provide any number of examples for a word or structure of interest (see Figure 4).

 

The problem with the examples provided by a concordance program is that, while numerous, these are somewhat raw and have not been pre-processed for learning purposes. Despite this, several studies over the 1990s have shown that language learners can use concordance examples to good effect for such purposes as lexical acquisition (Cobb 1997, 1999; Cobb & Horst, 2001), error correction (Gaskell, 2002; Ng & Burton, 2001), and translation (Bowker, 1999). Some of these studies show greater learning benefits from concordances than from dictionaries for equivalent tasks, although it is probable that the concordances were mainly used in conjunction with dictionaries.

 

Recently, the cost of examples has dropped dramatically in ESL lexicography. The main publishers of learner dictionaries (e.g., Cobuild, Longman, and Cambridge) have now made their products available in electronic format, and at the same time they have devoted considerable resources to building and analyzing corpora as the basis of the information they offer. So, the concordance's monopoly on large numbers of cheap examples is, in principle, over. The potential impact of this change was probably limited in the days when electronic referred to dedicated single-machine resources, in the case of concordancers and dictionaries alike, but both are now available online. Pioneers like Chris Greaves at The Hong Kong Polytechnic University made Web-based concordancing available in the late 1990s, and now the main learner dictionaries are available online as well. These dictionaries can support their entries with as many examples as seem pedagogically indicated, from their own corpora, with no limitations on space. So it would seem that learners, at least of English, are in line for a long-awaited bonanza of examples, free of charge. So, is there any further need for learners to work with the raw and often messy data of concordance output?

 

The purpose of this paper is to investigate the treatment of examples in some of the new online learner dictionaries, in terms of what we know about learning from examples. The investigation asks questions about the quantity, quality, accessibility, and completeness of examples provided by three major online ESL learner dictionaries. In answering these questions, I will refer to research and media development work that my students and I have been doing since the ITMELT 2001 conference.

 

 

Question 1: What is the quantity of examples in an online entry?

Some corpus-based dictionaries advertise the number of examples they include. The Cobuild Learner’s contains 55,000 examples and its Advanced Learner’s 105,000 (according to the commercial website at http://titania.cobuild.collins.co.uk/cat-dictionaries.html). But how do these large numbers divide up into typical entries? To investigate this, I will compare entries from the three main online English learners' dictionaries for a single word, complain, a word with different forms (complaint, complainant), collocations (complain of, complain about), and senses (have a complaint, suffer from a complaint). Figures 1 through 3 are screen entries for complain from three leading learner dictionaries.

 

 

Figure 1: Cambridge Learner’s Online Dictionary


Figure 2: Cobuild Learner’s Online Dictionary


Figure 3: Longman Dictionary of Contemporary English (LDOCE) Online

 

There are some interesting things to notice about these treatments of examples, particularly if one has the paper version of the same dictionary handy for comparison (although it is not always clear which edition of the offline dictionary an online version corresponds to). I was struck by the following:

 

·         None of the on-line entries give the learner more examples than the corresponding off-line version, and one actually gives fewer. Longman has eight examples, one for each major argument structure, exactly as in the paper version. Cobuild online has about half as many examples for complain as the corresponding entry in the paper dictionary (four online as opposed to seven on paper, p. 327).

·         The main purpose of at least one of the dictionaries (Cambridge) seems to be to direct the reader to an offline purchasing opportunity. Advertising occupies most of the screen space that could otherwise have carried more examples of the headword.

 

·         Two of the entries give no hint that there might be other forms of complain (complaint, complainant). Only Longman has a "next entry" feature (accessed with the two arrows at the bottom of the entry) allowing the user to explore for other members of a word family, at least to the extent that its members are nearby alphabetically. Neither Cobuild nor Cambridge offer any easy way to explore for related forms. Offline, of course, exploring up and down the page for related forms is one of the pleasures of a dictionary. Cambridge even forces users to back up a page just to choose complain of, should they happen to remember it was one of the options on the page they came from.

 

·         An exploration option provided by Longman that is not visible in Figure 3 is that the learner can click on any word in an entry to get a definition for that word from elsewhere in the dictionary. Also, Longman invites users to share computer code that will allow users to click-connect their own texts to the dictionary in the same way.  (However, this requires a response by email which I did not receive in nine weeks).

 

To summarize, it seems that none of these dictionaries has done much to exploit the electronic medium's capacity, and only Longman has done much to exploit its interactivity. Longman provides the most examples and in addition has extended some of their best ideas from the paper version. One of these is to make entries comprehensible with a 2000-word defining vocabulary, here logically extended by the within-entry second clicking opportunity to define any word in the definition. So, if we take Longman online as the best of the entries, with eight examples and fairly easy access to more, is there anything missing even from here that a concordancer can provide, or is this number of examples adequate? Figure 4 shows a concordance for starts-with complai~ from Brown corpus using Chris Greaves' concordancer on my Lexical Tutor website.

 

Concordances for starts with complai = 60

1    f them, moreover, are beginning to complain about the scarcity of Western amu
2    lt an inward shiver. "I sure can't complain about the service in this place",
3    chermen like himself. They did not complain at the inhuman hour of starting (
4    king a bath. He says the neighbors complain, but I don't believe it. Why don'
5    ey add up to headlines? You should complain".    He crossed the street and wa
6    hosen me.    Actually, I shouldn't complain, I told myself in the shaving mir
7    plaint "most strongly against" the complainant. In other words the burden of 
8    13.8 gm., but on Oct. 20, 1958, he complained of "caving in" in his knees. By
9     very few letters in which he ever complained of Meynell, Thompson told Patmo
10   f 734 Hartford Avenue, Providence, complained of shoulder pains after an acci
11   d purity of her style. And when he complained of the lack of time for all he 
12   of season, in venison pies. No one complained of the white wine either: at th
13   ck of interest and attention, Mary complained often that he didn't help aroun
14   e counselor. This woman repeatedly complained she was "too tired" for marital
15   en his bodyguard, Yankee Schwartz, complained that he had been snubbed by Dav
16   ew not to their taste, others have complained that he makes the Tory traditio
17   would impair contracts.    He also complained that not enough notice was give
18   tment, hard-bitten Russian experts complained that the Capitol was out of its
19    Democrats, the Woonsocket Patriot complained that the Virginia authorities s
20   erred. Miriam sniffed at this, and complained that Wright had said unkind thi
21   id she would only hurt herself. He complained to me once that I must talk to 
22   vant maid of Gorton. The old woman complained to the deputy governor, who ord
23   8 1040 11    it go free.    Morgan complained to Washington about the men det
24   he agility of panthers. But no one complained when they wound up, regardless 
25   vice president of the City Council complained yesterday that there are "defic
26    "Argiento, this is senseless", he complained, not liking to work on the wet 
27   t know where he went. Not that she complained, or had any cause to. Four or f
28   rsting my lungs for you", Mr. Jack complained. He was standing in front of th
29   ead of the usual three weeks. Pels complained: "Litigants and witnesses were 
30    shareholders of these four funds, complaining about mistakes in their accoun
31   s. You about ready"?    "What's he complaining about"? Bake asked. "They're d
32   time, and two drinks later, he was complaining bitterly about his wife, He wa
33   numerable postmen, who already are complaining of heavy loads and low pay, an
34   refused to approach the armadillo, complaining- in ad-lib- that "it smelled".
35   g at the Hanoverian Succession, he complains, are allowed to pass unnoticed. 
36   rsonal quarrel with Swift. Thus he complains, with considerable justice, that
37   ays in the army, but I get by. Who complains? Many times I tried to reach you
38   te procedure by construing a vague complaint "most strongly against" the comp
39   t risk obviously. But there was no complaint from the Dominican crowds which 
40   ecalled sympathetically the Duke's complaint in Browning's "My Last Duchess".
41   any Village atmosphere. But Krim's complaint is important because not only in
42   ar  B22 1420  1    deficit.    Our complaint is that in many crucial areas th
43    10    #CHARGE LISTS 3 CHECKS# The complaint on which the warrant was issued 
44   lone failed to change the skeletal complaint or the severe muscle weakness.  
45   r, and it gave further grounds for complaint to his overtaxed subjects, who w
46   ture, was pronounced, and a common complaint was "difficulty in stepping up o
47   four years in the mid-1950's, this complaint was heard rumbling up from the S
48   rney Macon Weaver said the federal complaint, charged that the juror gave fal
49   h whole-wheat bread", was an early complaint. Of course they learned in time 
50   s they did in 1960 there can be no complaint. They shouldn't be asked to carr
51    mind?    What is the common man's complaint? Let's take a panoramic look bac
52   quack devices and 10 times as many complaints compared with two years ago.  F
53    down, are come hither also". Upon complaints from the Lower House of Convoca
54   etween faculty and administration, complaints of a lack of communication pers
55   been more to the point. Noting the complaints of inventors and members of the
56   nd John B. Turner of Miami.    "No complaints or charges have been filed duri
57   aimless tacking. Once more, Juet's complaints were the loudest. Hudson's repl
58   ems a pity to have to register any complaints. Still a demurrer or two must b
59   itants of Africa was struck out in complaisance to South Carolina and Georgia
60   utine by having his own firm and a complaisant partner, his work in New York 
Figure 4: Concordance for complain family
 
 

There is lots of extra information buried in these lines that is vague or absent even from a good dictionary (for the learner who knows how to dig it out). The enterprising learner might use the concordancer to discover that:

·         there are several members in the complain family;

·         the complain family is used quite frequently in English, with about 60 entries for 1-million words. A curious concordancer might notice, for example, that complain racks up about the same number of lines as chair (66) but fewer than table (205) in the same corpus;

·         the verb complain is about twice as frequent in English as the noun complaint;

·         the collocations complain of and complain that are roughly equivalent in frequency;

·         the noun complainant is used infrequently (only once in a million words);

·         a complaint in the sense of an illness, while given as a major sense in two of the entries shown, features only twice in the concordance (lines 10 and 44) suggesting the usage may be on the verge of quaintness; and,

·         complain is used more in speech than in writing, with 84 lines in the British National Corpus (BNC) spoken sample corpus, and 48 in the written (both accessible from the same interface).

 

Some of this information, of course, is unlikely to be sought or noticed by an untrained learner, and some of it might even be confusing. For example, the cost of expanding the complain family with the complai~ search has brought the unrelated complaisant along for the ride. But this would also be true of the Longman next-entry feature: the next item alphabetically may or may not be a member of the same family. The Longman entry also requires some learner preparation, for example to interpret the codes "S2 W3" (in Figure 3, meaning that the word is within the most frequent 2000 words in speech, 3000 in writing) or some of the grammatical terminology ("transitive not in passive").. Training is required in either case.

 

In terms of quantity, then, it seems that two of the three online learner dictionaries have done little, either with the huge corpora they have assembled or with the capacity and interactivity of the Internet, to give learners any more or better examples of the target language. The third dictionary, Longman's, on the other hand, has used the interactive opportunities of the medium to extend the already strong features of their paper version, if not giving more examples then at least giving better access to the examples already available on paper.

 

 

Question 2: What is the quality of examples in a learner dictionary?


As can be seen in Figures 1 to 3, the examples in the all of the entries are clear and understandable. The contexts have been chosen from natural corpora, but chosen carefully in order to make the various meanings of complain evident to the learner. Maybe they have been made too evident.

 

Contexts can make word meanings clear to varying degrees, as argued in a classic paper by Beck, McKeown and McCaslin (1983) entitled Not all contexts are created equal. At one end of a continuum, the new word can be so enmeshed with its context as to be effectively redundant ("His head was cold so he put his ____ on"). At the other, it can be so loosely integated as to be semi-opaque ("He was cold and put his ___ on"), or even misleading ("He expected people to put coins in his ___ "). The dictionary example contexts clearly aspire to be totally clear. But it cannot be assumed that clear contexts are the only kind that provide support for learning. A number of second-language reading studies (e.g., Parry, 1991; Mondria & Wit-De Boer, 1991) show that when new words are easy to interpret in fully redundant or "pregnant" contexts, they are often not noticed let alone retained.

 

The role of different types of contexts in vocabulary acquisition was explored in a study of extensive reading by school learners in Montreal (Zahar, Cobb & Spada, 2001). The learners read a story, The Golden Fleece, and their acquisition of new words in the story was tracked along with the characteristics of the words learned. The words available for learning in the story were identified in a pretest, and the frequency of each was tabulated, as well as the degree of contextual support for each occurrence in the story. Four degrees of support were calculated, according to native speakers' ability to replace each word when removed from its context. Table 1 shows the frequency and contextual support of the most and least learned words. For example, the word centaur, which was central to the mythic tale and was learned by almost all the readers, had appeared seven times in the text in a variety of contextual richness levels from 2 (low) to 4 (high), with a mean of 2.6. Youth, on the other hand, appeared only three times in the text but with consistently strong contextual support (a mean of 3.3). The words that got learned were mainly those that had appeared frequently and in a mix of context support levels; the words that did not get learned were those that had appeared less frequently but sometimes with good contextual support (e.g., drew and youth). What learners find in their learner dictionaries is, of course, a small number of very clear examples.

 

Table 1: Most and Least Learned Words, their Frequency and Context Ratings

Most acquired

Frequency

Contextual Support Ratings

Mean

s.d.

centaur (n)

7

4,3,3,2,2,2,2  

2.60  

.58

dove (n)

5

3,3,3,2,2

2.60 

.48

fleece (n)

15

4,4,2,2,2,2,2,2, 2,2,2,2,2,2,2

2.30  

.56

oracle (n)

9

4,3,3,3,2,2,2,2,2 

2.50  

.86

plow (n)(v)

4

3,3,3,2     

2.75  

.44

sow (v)

2

3,2      

2.50  

.50

 

 

 

 

 

Least acquired

Frequency

Richness Ratings

Mean

s.d.

drew (v)

4

4,3,3,3    

3.25

.44

oars (n)

2

2,2     

2.00   

.00

sheep (n)

2

2,2     

2.00  

.00

youth (n)

3

4,3,3     

3.30  

.44

Data from Zahar, Cobb & Spada (2001) in Canadian Modern Language Review 57 (3), 541-572.

 

This finding has since been replicated twice with much larger data sets (by Horst, 2000). There are a number of mechanisms that might explain the phenomenon, but we favoured this one, that less clear contexts open up learning spaces for new words which are then filled by more clear contexts when these appear (a process that may be recursive).

 

Let us now apply this finding to the examples a learner meets in a good dictionary entry like the Longman entry for complain in Figure 3. Most learners open up their dictionaries precisely when they have met a new word in an unclear context ("He still suffered from the same complaint"). In their dictionaries they then meet some version of the word again but in a clear context ("Dan's been complaining of severe headaches recently"), plus receive extra definitional and grammatical information. So far, so good; this is the desired sequence, a learning space opened up and then filled.

 

However, curious learners will then go on to read above, below, and around the entry they came for, particularly if an interactive computer format has made this an attractive option (Béjoint found that 55% of L2 consulters of monolingual dictionaries were doing this in 1981, so the number is presumably greater now), and here is where they will meet low numbers of new items in high strength contexts. Our research predicts that the meanings and uses of these new words and senses will seem obvious at the time but then will not be retained. This would also be the case when working with the dictionary workbooks that accompany most learner dictionaries, and of course related Web activities, both of which offer the learner practice with words either in isolation or in pregnant contexts.

 

To summarize, these online learner dictionaries do not provide any great quantity of examples, despite their corpus foundations and lack of space constraints, and additionally they do not provide the qualities of examples needed for learning; and, in principle, they cannot. They provide high quality examples, in small quantities, while learners need mixed quality examples in large quantities.

 

 

Question 3: How accessible are the examples in an online dictionary?

Sometimes learners need to meet quite specific examples of language phenomena, particularly to help with the acquisition of structure. If a learner is writing a sentence that starts, "She always complains…" and is uncertain about how to arrive at "… the weather," then a simple consultation of the entry for complain in any of our online dictionaries will suggest about as the glue to bind the ends together. But if he then writes, "She always goes to home early*," and the teacher flags the word to for revision, the learner might have more trouble finding a useful example involving both go and home to see what the problem is. Looking in a dictionary under either entry would not necessarily indicate how the two words are to be linked, since the phrase is not an especially frequent or interesting collocation. All of our three dictionaries probably contain, somewhere in their server databases, useful examples of "go home" or "goes home," but none offer phrase searches or cross referenced searches and hence the needed examples are inaccessible.

 

A concordance search of any medium-sized corpus can, on the other hand, produce several examples of go in the environment of home rather easily, which will disclose all the grammatical and collocational requirements of the construction. However, setting up the search may require more linguistic sophistication than learners would possess who needed the information. In other words, while numerous corpus examples are accessible in principle, they may not be accessible to learners.

 

So dictionaries and concordances both have their own accessibility problems. But recently there have been interesting developments on both fronts. One is some new ideas about the structure of online dictionaries from Cambridge that may make precise searches possible. The other is some progress by my research students on the question of whether, when, and to what effect learners are able to undertake concordance searches independently. The investigation begins with the concordance research, which involves giving concordance feedback to errors in learner writing via precast hyperlinks as a way of training them to access specific examples for themselves. Then, the question is posed whether similarly useful hyperlinks could be devised to target specific information in an online dictionary.

 

Online concordance as tailored feedback

Interest in the concordance-as-feedback interest stems from an experiment in the late 1990s at the City University of Hong Kong (CityU) in getting learners to use concordances to correct their errors in writing (described by Peggy Ng & Pauline Burton at the previous ITMELT conference). For example, if a learner had written a composition containing the words "He goes to home*," the instructor might suggest a concordance search on go~ + home, and this would reveal that to never appears in this construction. Learners filled in a form stating the error, the search words used to explore the error, a sample of the concordance output, and the correction. CityU students seemed to enjoy this activity, and it seemed to be effective (although both relative and long-term learning effects are yet to be determined).

 

However, this method of giving feedback was even more time-consuming than traditional written correction for the teacher. It involved a lot of writing in the margins to suggest the search parameters, it involved testing out some of the concordance searches, and it involved taking students to a computer lab where each machine was set up with Microconcord (Johns, 1986) offline software and related corpora. It was hoped that learners would gradually be able to conduct these searches for themselves, thus reducing labour for the teacher and increasing learning for the learner.

 

Since that time, online concordancing has created a number of potential improvements in this procedure. First, it eliminates the class trip to the lab: the concordance work can now be assigned for Web homework. Second, the initial search suggestions can be pre-coded for the learner and delivered to them in a click-on format. This is possible because the various search parameters of a concordance (e.g., what we are looking for, in which corpus, and so on) take the form of a URL. The parameters that can be controlled are these:

·         the corpus to be searched;

·         the search word (or part of word, or set of words);

·         whether search word must be a complete word or can be the beginning or end of a word;

·         any associated word(s) in the environment;

·         whether associate is right or left and within how many words of the keyword; and,

·         the number of examples (concordance lines).

 

These parameters come out of a form input as a URL with the parameters (such as "searchword=cat") added on to the end of it. Once created, this URL can be stored, re-used or sent to someone else like any other URL. Third, with the increasingly common practice of online submission, the URL can be embedded as a hyperlink right in the learners' document at the point of greatest relevance or visibility and the document returned for further attention. And fourth, the turnaround time between doing the writing and getting the feedback can be vastly reduced (delay was a major factor in Truscott's (1996) argument that writing feedback is ineffective).

 

For example, suppose a learner has submitted a text with a number-agreement error, "He is one of the best teacher I ever had*." The text is returned to the student thus: "He was one of the best teacher [concord #1] I ever had." When the learner clicks on the underlined hyperlink, the information in Figure 5 appears in a window on the computer screen.

 

1    os Angeles he became one of the boys, a bigger hero than he ever had be
2    CER# He attacked one of the officers and was restrained. About five min
3    he project.    One of the agreements calls for the New Eastwick Corp. t
4     locally.    One of the first things he would do, he said, would be to 
5    Friday.    One of the largest crowds in the club's history turned out t
6    .    "This is one of the major items in the Fulton County general assis
7    n worsens.    One of the first moves made after a cabinet decision was 
8    ment "one of the most serious causes of family breakdown, desertion, an
9    imself as one of the guiding spirits of the House of Delegates.    MARY
10   e's one of the most valuable members of the Longhorn team that will be 
11   Betsy Parker was one of the speakers on the panel of the Eastern Women'
12   ed for one of the original 13 states, perhaps is not the most impressiv
13   hich one of the three Kowalski girls present held for her mother, becau
14   ne. And one of the Milwaukee rookies sighed and remarked, "Wish I was 4
15   ould assign one of the rescue trucks to the Riverside section of the ci

Figure 5: Guided access to highly specific examples


This concordance output is designed to show the learner that "one of the" is invariably (but not always immediately) followed by a plural noun. The output represents a search of the Brown corpus for 15 instances of word-ending "s" where the associated phrase "one of the" exists within three words to the left. The URL that generates Figure 5 is as follows, with bolded words the variables:

 

http://132.208.224.131/scripts/cgi-bin/wwwassocwords.exe?Corpus=Brown.txt&Maximum=15 &Associate=Left&D1=3&SearchType=ends with&SearchStr=s&AssocWord=one of the

 

Of course, the learner does not see all this code, but rather sees just the surface of the link. The variables can be changed to generate an unlimited number of concordances, one of which will shed light on almost any construction a learner can come up with. While unwieldy looking, these long URL's can be fairly simply assembled and delivered by a teacher, especially if a catalogue of recurring patterns can be built up and shared. The teacher can work directly with the long URL in the browser window, or can construct it using form inputs. He or she can speed the process further by building and sharing a catalogue of such URL's for recurring error patterns. Additionally, of course, as in Ng and Burton's work, the goal is eventually to have learners conduct the concordance searches for themselves.

 

The example shown in Figure 6 is from a study conducted by one of my Montreal research students (Gaskell, 2002) with Chinese learners studying English composition as preparation for academic study in Canada. The learners submitted their weekly assignments online, and the instructor returned them with sentence errors indicated and concordance links inserted in appropriate places. We found that the learners enjoyed this way of getting feedback, felt they learned from it, and were generally able to make an adequate correction on the basis of the concordance information.

 

One of our main questions was whether learners could be trained to access specific examples from a corpus for themselves. We saw the precast URL links as a sort of training regime that would motivate and train learners to form their own concordance searches. To test the success of this training, we stopped providing the URL's at a certain point in the 15-week experiment and instead simply indicated where errors had occurred and showed learners how to work out solutions with the concordancer independently. It was predictable that some learners continued to seek concordance information by forming their own searches, while others resorted to strategies like copying and guessing. However, and encouragingly, the distinction between persisters and non-persisters was not random: t-tests showed that persisters' had made significantly fewer errors at pretest than non-persisters, leading us to hope that a proficiency condition can be established for independent use of this and similar resources.

 

To summarize, a concordance can access extremely specific examples of different language features; learners seem able to use these examples for concrete learning tasks like error correction; and there is some indication they can access such examples themselves once they have reached a certain proficiency level.

 

Online dictionary as tailored feedback

Now, what about the accessibility of examples in our online dictionaries? Each of these dictionaries' contains many examples of correct lexical, collocational, grammatical and even pragmatic usage on its pages, which are referenced by URLs like everything else on the Web. But how accessible are these examples?

 

Unfortunately, all three of the dictionaries under investigation would be quite poor for the purpose of giving learners highly specific examples, because their pages can only be accessed whole and not via particular examples or any other page components. Suppose a learner had written, "He complained about he was never allowed to speak*," and his or her document was returned with the error marked and a link to the LDOCE entry in Figure 3. The link would have to be to the entire page, since the individual pieces of information (even complain, complain about, complain that) cannot be accessed separately. Which part of the entry is the learner supposed to look at?

 

None of the three dictionaries have separate URLs leading to complain about or any other piece within the entry, i.e. they do not allow targeting of specific lexical or grammatical information. Searching for complain about will either generate an error, or else lead to the general entry for the first word. A concordance, of course, can target very specific information, whether of several words, or parts of words, or either separated by still other words. The dictionary pages are precast wholes, while concordance pages are constructed dynamically, from small pieces, on demand. But is this dictionary limitation one of principle or just of current technology?

 

As we saw with Question 1, online dictionaries give their readers no more examples than equivalent offline versions do, despite the absence of capacity limitations. Now we see that these dictionaries give no better access than their offline equivalents; access is via the headword alone. However, this may not always be the case. It is common for paper media to go online initially in their paper formats (as has happened with newspapers and several other document types), only gradually being adapted to the additional possibilities that the electronic medium opens up. It could be expected that dictionaries might follow this pattern. For example, the Longman dictionary resembles its offline version in headword access, but as mentioned also has within-entry hyperlinks as an electronic extension of its classic defining vocabulary. Perhaps general within-entry access is on the horizon too?

 

Toward more differentiated online dictionaries

A recent addition to the Cambridge online series offers greater accessibility than any of the three dictionaries we have focused on hitherto, as well as greater exploitation of the Internet medium. This dictionary, the Cambridge International Dictionary of English (CIDE, 2002), is not strictly classified as a learner dictionary, but it has much in common with the Longman LDOCE including the within-entry hyperlink (click any word inside the entry and you are led to that word's definition), and this technology will no doubt be re-used in the long awaited Cambridge Advanced Learner's Dictionary. The CIDE has structured its information in far smaller pieces than any other online dictionary, and it has given the pieces their own URLs. The prospects for finer-grained access may be good.

 

As can be seen on the left side of Figure 6, the main CIDE entry for love (to take a fresh example) asks the enquirer to refine his or her search into one of three directions (love somebody, love something, and love as a tennis score). The main entry is accessed via a URL of the usual kind (http://dictionary.cambridge.org/cmd_search.asp?searchword=love), and each separate sense has its own URL (e.g., love something is via http://dictionary.cambridge.org/define.asp?key=love*2+0), as do each of several multi-word units involving the term (love triangle, labour of love, and 18 others). In all, the entry has been broken into 23 separate Web pages all with their own URLs. All of these can be fairly easily found, copied, and embedded into a learner's text to provide information about a revision. For example, the learner who writes he is "in love with ice-cream" could be sent to the like something sub-page for reasonably specific help with a revision.

 

So, does breaking entries into several URL's solve the accessibility problem for dictionaries (and make concordances somewhat redundant)?

 

 

Figure 6: More examples, smaller information pieces

 

The inherent limits of dictionary as feedback

More pages and more URLs will no doubt produce a better dictionary. In terms of convenience, as entries expand in size the greater number of separate pages will reduce the need for scrolling (and the risk that learners will miss something off the bottom of the screen). In terms of learning, having meaning-senses on their own pages will help bring important distinctions to learners' attention. But in terms of access to the very detailed examples that learners need, the difference will be minimal. That is because the learner has more questions than can ever be anticipated and pre-coded.  

 

Take for example the 20 multi-word units listed in Figure 6. Most learner errors involve multi-word constructions (such as one of the + plural noun in Figure 5). Indeed, most of any language is made up of multiword constructions of various kinds (Wray, 2002), all with their own requirements, in various shades of fixity, which learners will make many mistakes with and profit from seeing well-formed instances of. But no dictionary will ever be able to precode a page for all such units or even a significant portion of them. For example, one of the is a common structure in English, if not a particularly colourful one, and yet it is not among the online CIDE's 50 or so phrases listed with one. But a concordance search for this phrase in any medium-sized corpus will generate dozens if not hundreds of examples of it, which can be coded as an URL and sent to a learner, or found by learners themselves fairly easily.

 

 

Question 4: How complete are the examples in an online dictionary?


The listing of 20 phrases involving love in Figure 6 clearly represents a selection from the many that are possible in English They are probably some of the more frequent or colourful ones, and no claim to being exhaustive is stated or implied. Concordance output, on the other hand, can claim to be exhaustive in some ways. If a phrase is not among the dictionary entry's 20 samples, this tells us nothing about whether it exists or not; but if a word or construction is not in a 1-million word general corpus, then either it does not exist or else it is quite rare, and the argument strengthens as the corpus lengthens. In other words, a concordance can provide some indication of what is not in a language, also known as negative evidence.

 

Learner dictionaries provide learners with some explicit negative evidence (usually in a "Do not say…" box). So far, these do not seem to feature in the online versions. At any rate, given the infinite creativity of learners' interlanguage grammars, a dictionary can hardly plan in advance for all of the specific injunctions that might be needed. Concordances, on the other hand, can provide clear if implicit negative information, provided the learner possesses one simple insight: the concordance is drawn from a corpus which is a sample of everything there is in a language, so that a construction that is not present in the corpus is probably not a common word or construction. A construction that is not present in a dictionary, on the other hand, may simply be infrequent or uninteresting.

 

For example, our learner needs help with his sentence, "I am in love with ice cream." Compare the concordance output in Figure 7 to the dictionary information in the right panel of Figure 6, as feedback to the error. The dictionary offers loves ice cream but does not say or imply that in love with ice cream is not possible. But since it does suggest (on the love someone page, not shown) that we can both love and be in love with our husbands, wives or lovers, then there is no strong negative evidence in either to prohibit in love with ice-cream. But in the concordance from the Brown corpus (Figure 7), there is one instance of someone being in love with a steam shovel (line 1) and nine of being in love with other animate humans, so it is fairly obvious that in love with collocates with a human in other than humorous contexts.

 

1  mmer operator who falls hopelessly in love with a middle-aged steam shovel. A biti
2  sky. The cast:      Everybody fell in love with Amy again last night at the Warwic
3  ft shoe tunes ever invented, "Once in Love with Amy" is also, of course, one of th
4  s wife had said to him: "Nellie is in love with Clayton Roy. He wouldn't even danc
5  ove, just as Papa had first fallen in love with her Mamma before he chose her; and
6  as doing. And she was made to fall in love with him again there in the rutted dirt
7  reetcar pulled away, he had fallen in love with Paula.  A letter awaited her at 
8  etta, and signed himself Joe- fell in love with pretty sister Rachel.  Henrietta
9  t the kind to go violent. Were you in love with that girl"? L01 0280 6    "Wo
10 and marries Sibylla who had fallen in love with the beautiful knight the moment sh

Figure 7: Implicit negative evidence

 

 

Conclusion: Complementary information types

To conclude, there are several important categories of examples that are not likely to be found in dictionaries. Online, corpus-based, capacity unlimited dictionaries do not presently replace the value of concordances for language learners, and they are unlikely to do so in the future.  This is not to denigrate the online learner dictionaries; I hope I have given some idea of the progress that is being made with these, and yet at the same time of their limitations in principle.

 

Learners need many kinds of information to make progress in their acquisition of a second language, and dictionaries can provide some of these kinds in increasingly accessible and attractive formats. A study by Hulsteijn, Hollander and Greidanus (1996) showed that learners can make strong use of dictionary information, but nevertheless do not take the trouble to look up very many words, mainly because it is somewhat laborious to do so. Therefore, easier dictionary access is a key issue, probably more than other issues that have received much attention (such as the formats of definitions, McKeown, 1993). So, the increased accessibility of some of the new online dictionaries, particularly Longman and Cambridge with their within-entry second clicks, are likely to be of real value to learners.

 

But learners need to access information within entries, not just entries per se; they need to meet new words in a mix of example types, not just clear examples; they need evidence about what is not in a language, as well as what is in it; they need information about lexical phrases and collocations, not just individual words or a handful of selected phrases; and they need information about parts of words and groups of words that may extend across distributed patterns, not just about continuous patterns. The concordance can provide all this, albeit in a raw form that not all learners will be able to make sense of. The dictionary provides more organized information, but at the price of leaving out important information.

 

Therefore, the ideal electronic resource for language learning is a blend of dictionary and concordance. A simple way of achieving this blend, now that dictionaries are corpus-based, would be to allow searches of the dictionary publisher's corpus from individual entry pages. One might find, for example, on the CIDE's page of 50 or so selected phrases involving one, an input box allowing the user to search for other recurring phrases involving one in the Cambridge corpus (or maybe a corpus dedicated to learning purposes). Then, if one of the was not in the list of pre-coded phrases leading to dedicated entries, the learner would still be able dig down and get several examples of the construction.

 

Such a blend is well within current technology. Chris Greaves' English-Chinese bilingualized dictionary on the Virtual Language Centre web site offers a version of it; John Milton (at the Hong Kong University of Science and Technology) has a plan to link phrase searches to the online CIDE within a single interface; and I also have some related developments to show at the ITMELT 2003 conference. If the dictionary industry is watching the academic developers for innovative ideas, then the future looks bright for principled and accessible learning resources.  Meantime, click here for an example of a paper worksheet that combines dictionary and concordance work.

 

References

Beck, I., McKeown, M., & McCaslin, E., (1983). Vocabulary development: All contexts are not created equal. Elementary School Journal, 83, 177-81.

 

Béjoint, H. (1981).  The foreign student’s use of monolingual English dictionaries. Applied Linguistics 2, 207-22.  

 

Bowker, L. (1999). Exploring the potential of corpora for raising language awareness in student translators. Language Awareness 8(3), 160-73.

 

Cambridge Learners Dictionary Online [at http://dictionary.cambridge.org].

 

Cambridge International Dictionary of English Online [at http://dictionary.cambridge.org/].

Cobb, T., & Horst, M. (2001). Growing academic vocabulary with a collaborative online database.  In Morrison, B., Gardner, D., Keobke, K., & Spratt, M. (Eds). ELT Perspectives on IT & Multimedia (pp. 189-226). Hong Kong: The English Language Centre, The Hong Kong Polytechnic University.

 

Cobb, T. (1999). Applying constructivism: A test for the learner-as-scientist. Educational Technology Research & Development 47 (3), 15-31.

 

Cobb, T. (1997). Is there any measurable learning from hands-on concordancing? System 25, 301-315.

 

Collins-Cobuild English Dictionary (New Edition), (1995). London: HarperCollins.


Cobuild Learner's Dictionary Online [at http://www.linguistics.ruhr-uni-bochum.de/ccsd].

 

Compleat Lexical Tutor, Université du Québec à Montréal  [at http://132.208.224.131].

 

Gaskell, D. (2002). Can and will elementary ESL learners use concordance information to correct their written errors? Unpublished M.A. dissertation. Montréal: Université du Québec à Montréal.

 
Horst, M. (2000). Text encounters of the frequent kind: Learning L2 vocabulary through reading. Unpublished doctoral dissertation, University of Wales, Swansea.

 
Hulstijn, J., Hollander, M., & Greidanus, T. (1996.), Incidental vocabulary learning by advanced foreign language students: The influence of marginal glosses, dictionary use, and reoccurrence of unknown words. Modern Language Journal 80, 327-39.       


Johns, T. (1986). Micro-concord: A language learner's research tool. System, 14 (2), 151-62.


Longman Dictionary of Contemporary English (Third Edition). (1995). Harlow, UK: Longman.

 

Longman Dictionary of Contemporary English Online [at http://www.longmanwebdict.com/].

 

Mondria, J.A., & Wit-De Boer, M. (1991). Guessability and the retention of words in a foreign language. Applied Linguistics 12 (3), 249-63.


McKeown, M. (1993). Creating effective definitions for young word learners. Reading Research Quarterly 28 (1), 17-31.


Ng, P. & Burton, P. (2001). Developing language awareness through concordancing: Action research and materials development for first-year students in Hong Kong. Paper presented at the ITMELT conference, Hong Kong Polytechnic University, Hong Kong.

 

Parry, K. (1991). Building a vocabulary through academic reading. TESOL Quarterly 25 (4), 629-53.

 

Truscott, J. (1996). The case against grammar correction in L2 writing classes. Language Learning 46, 327-369.

 

Virtual Language Centre, Hong Kong Polytechnic University [at http://vlc.polyu.edu.hk].

 

Wray, A. (2002). Formulaic language and the lexicon. Cambridge UK: Cambridge University Press.


Zahar, R., Cobb, T., & Spada, N. (2001). Acquiring vocabulary through reading: Effects of frequency and contextual richness. Canadian Modern Language Review 57 (3), 541-72.