Review of Tony McEnery, Richard Xiao, & Yukio Tonio, Corpus-based language studies: An advanced resource book

Review of Tony McEnery, Richard Xiao, & Yukio Tonio, Corpus-based language studies: An advanced resource book. (2006). New York: Routledge. Pp. 386 + xx. $33.95.

Reviewed by Tom Cobb

Dépt de linguistique et de didactique des langues

Université du Quebec à Montréal

May 31, 2007

The corpus-driven revolution in applied linguistics continues apace, and along with it the paradox that as corpora change the face of applied linguistics (most dictionaries, grammars, and course books now claim to be corpus based) it is largely without the participation of practitioners. Only a few teachers or researchers have ever built a corpus or delved through concordance lines. Possibly in a bid to remedy this hands-off problem, the book display at the recent conference of the American Association of Applied Linguists (AAAL) in California this spring offered roughly a dozen books promising to enlist language teachers and graduate students in the revolution at a more basic level. The two I walked off with were that under review as well as O’keefe et al’s (2007) From corpus to classroom.

It quickly became apparent that the latter volume was basically a set of conclusions about how corpus evidence can inform teaching practice based on insights drawn from the impressive Cambridge and Nottingham CANCODE corpus of written and spoken British English, but that there would be no hands-on opportunities to construct further insights of one’s own from this corpus, owing to the total inaccessibility of this corpus to the broader professional community. So it was a relief to find that access and hands-on were main themes in the other volume.

The McEnery et al volume is one in a new series by Routledge/Taylor and Francis that deal with applied linguistics themes (intercultural communication, translation, grammar and context, and second language acquisition) and bear the common sub-title “an advanced resource book.” Access and support for hands-on projects are main series themes, and arguably key to a new-approach show of force from a publisher (the new owners of Lawrence Erlbaum) bidding to enter the language education field as competition for Cambridge. So how accessible and hands-on does the McEnery volume get?

The series format involves first presenting concepts and procedures, then related excerpts from actual research articles, and finally fully worked example research projects that can be replicated. Topics include corpus types, corpus building, corpus tagging (adding POS or part of speech markers), corpus statistics, corpus controversies, and corpus analysis for various purposes. Research article excerpts include a thoughtful and relevant collection of true area classics by Widdowson, Biber, Stubbs, Carter, McCarthy, and others. Research projects unpacked in detail include a methodology for corpus-based lexicography; a study in the sociolinguistics of British swearing made possible by the huge and finely sub-divided British National Corpus (BNC); a replication using the Longman Learner’s Corpus of the empirical morpheme acquisition sequences studies of Krashen and others in the 1970s—to mention fewer than half of those available. No other volume that I know of has gone so far to bring corpus research into focus for practitioners, or to make at least the beginning of a serious research project actually possible.

But there are probably limits to how accessible such an involved methodology can actually be made. For one thing, many of the book’s analyses involve the use of Mike Smith’s text analysis program Wordsmith, which must be purchased to get beyond the demo limitations, for about $CAD 100. There are of course Web concordancers, and indeed the swearing study mentioned above is based on the University of Zurich’s Web concordancer apparently running the full BNC. However, getting on to this site requires a password, which the book does not actually mention getting; when I emailed the site administrator I received a irritated reply complaining about the number of users this book was sending them, as well as a reluctantly given user name and password (that did not work).

Software is not of course the biggest access problem in this area (a number of excellent free concordancers are available for download, such as Lawrence Anthony’s AntConc), but rather corpora, such as the Japanese EFL division of the Longman Learner Corpus that was used in the morpheme study already mentioned. That study begins with a casual mention that this now rather dated corpus is publicly available and launches straight into a description of some complex operations that must be performed to tag and otherwise bring it up to date for a modern treatment. But where to get the corpus is not mentioned. There are two companion websites for the book (the publishers’ and the authors’), so perhaps it can be downloaded from one of these? These sites do offer corpus materials specifically linked to some of the projects in the book, but in the case of the Longman Learner Corpus, after 20 minutes scrolling and clicking I finally managed to get merely a link to the Longman site (which was “under construction”). Readily available however is a tool for processing this corpus in some way that the analysis requires, should the reader find a way to obtain the corpus. So no one should buy this book expecting to start a complex research project tomorrow morning! Some preparatory digging will still be required.

Nor should anyone buy this book expecting to find awareness raising or other hands-on activities for their language learners. The book is about doing research. There is admittedly a smallish section on pedagogical uses of corpora, which raises an interesting quibble with the authors, with which I will end. Throughout, the book promotes the benefits of using tagged corpora for every purpose, where for instance the difference between bank_VERB and bank_NOUN are coded directly in the corpus so that an alphabetical sorting of these homonyms will keep them distinct. It is probably important for research concerns that the corpus be pre-parsed in this and many other ways, but the point of corpus work for learners is that they deal with the raw word form (as they must in real life) and use the context to decide for themselves which bank is involved in a particular utterance. I have always believed that applied linguists in the tradition of this volume do not give adequate place to the uses of an untagged or flat corpus as a constructivist learning tool. Thus it is satisfying that none of the pedagogical studies cited appear to involve learners’ working with a tagged corpus!

Bravo on a fine book, that gives more access than ever before to a fascinating and important set of research methods and questions.

Reference

O’keefe, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom: Language use and language teaching. Cambridge: Cambridge University Press.

1110 words