Review of Nadja Nesselhauf (2005), Collocations in a Learner Corpus.
Studies in Corpus Linguistics 14. Amsterdam: John Benjamins. 331 pp + xii.
If vocabulary was flavour of the month in last-decade applied
linguistics, in this one it is the multiword unit (MWU). This refinement
follows a certain logic, because if claims made for said unit are even half
accurate then all levels of the language teaching industry are in for a
significant re-think. Ideas to be incorporated would be that grammars emerge
from phrases not vice versa, main tasks in language acquisition are piecemeal
not rule based, and functioning lexicons consist not in manageable handfuls of
words but vast array of combinations lexicalised to varying degrees and
operating within mazes of apparently random restrictions. No surprise that the
rethink has hardly begun, with progress somewhat held up until recently by the
lack of clear terms and an empirical database. To contribute to ongoing work on
both fronts is the purpose of Nadja Nesselhauf’s book, based on her doctoral
study of one type of multiword unit in the written production of advanced
German-speaking ESL students.
Nesselhauf has assembled a corpus of advanced learner writing with a
view to inspecting one of its MWUs, following procedures for learner corpus
research established by Granger (1998). But Nesselhauf goes beyond anything
published to date in her delimitation of phenomena and her generation of
comparable data. In a detailed but (largely) readable account of her
methodology, she carefully separates out collocation as the type of MWU she
will look for, and within that verb-noun collocations (i.e., ride a bike
not *drive a bike), with the specification that the restriction (on ride)
be fully arbitrary rather than meaningful. A catalogue of such collocations is
hand-extracted from her learner corpus, and native collocations separated from
learner deviations by native raters; collocations are counted in terms of
frequency and range, and deviations are categorized by type and probable intended
meaning. All data can be traced back through individual writers to a background
questionnaire itemizing years of ESL study, extent of exposure to English
abroad, conditions of writing like timed and untimed, and dictionary yes or no.
To call this “a lot of work” is an understatement, and indeed the amount of
handwork involved raises the question whether this approach can be scaled up to
a larger corpus (than her 200,000 words) as Nesselhauf proposes.
But even a smallish corpus carved with instruments this fine can
generate interesting information. A predictable finding is that collocation
remains a serious problem well into advanced learning. Less predictable is that
neither years of instruction, nor years abroad, nor writing with or without
time pressure, with or without a dictionary, has any effect on number of
collocations employed or number of deviations. Particularly interesting are the
deviations exposed by imputing intended meaning – like when a learner writes “I
don’t take care of carrots,” which is a good collocation, except that he
probably means “I don’t care for carrots” (which a computer match of learner
strings against a standard corpus would have missed).
So, an even worse problem that we thought, but what solution?
Nesselhauf explores awareness vs. learning as solutions. Learners are not
(encouraged) in the habit of scanning language to become aware of restrictions
on word combination. The collocation problem resembles one from the vocabulary
research, that a word met in rich contexts can have a meaning so obvious that
the word itself does not register in memory - “ride a bike” paints a picture so
clear there is little motivation to notice it was ride not drive.
A lengthy pedagogical implications section suggests ways of promoting awareness
as well developing a collocational syllabus.
Interspersed in the treatment are attempts to clarify unresolved issues
in the MWU agenda. One concerns Kjellmer’s (1991) idea that while natives
process language in prefabricated sequences learners rely on grammars and
lexicons which leaves them “sounding odd.” A problem is that if fluency is
impossible without access to MWU’s (Sinclair, 1991), but learners do become
fluent users of second languages, then either they employ such units or else
fluency can be achieved on a words-and-rules basis. One way through the paradox
is the frequent finding (e.g., Cobb, 2003) that learners do use MWU’s including
collocations, and indeed over-use the few that they have, which is why their
language sounds odd. Nesselhauf proposes another angles on the Kjellmer
question, which there is no space to mention and anyway a review should not
give away too much!
It is be hoped that the clear thinking and methodological exactitude of
Nesselhauf’s study will be taken up in further studies. Should others accept
the challenge to advance the MWU agenda through hard and careful work as
Nesselhauf has done, there are some things to watch for in the write-up. First,
while detail and precision clearly advance the research, the reading can be
heavy going (e.g., pp. 240-241 offer two pages of closely reasoned linguistics
with just one example to give a breather). Second, some of the apparatus of a
thesis is out of place for a book audience (e.g., 30 pages of endnotes). Third,
with all the pains taken to quantify her data, Nesselhauf nonetheless relies
entirely on descriptive statistics even when making comparisons (e.g., between
collocation counts in timed and non-timed writing). She even describes the
results of comparisons with folkloric expressions: the length of stay in an
English speaking country “does not seem to lead to” an increased use
collocations (p. 236); the percentage of deviant collocations for users and
non-users of dictionaries was “exactly the same” at 36.1% (p. 231). Isn’t the
point of t-tests to tell us which differences are really different, etc?
These criticisms are just to say that no study can do everything and
much work remains to be done in this area. The methods developed here are
eminently replicable, the holes to plug are obvious. This is not easy research,
but careers will be made in it – this
is the first act in a drama that will unfold for years to come.
Jan 3, 2006
By Tom Cobb
Dépt. de linguistique et de didactique des langues
Université du Québec à Montréal
For Canadian Modern Language Review.
References
Cobb, T. (2003). Analyzing late interlanguage with learner corpora:
Quebec replications of three European studies. Canadian Modern Language
Review, 59(3), 393-423
Granger, S., Ed. (1998). Learner English on computer. London:
Longman.
Kjellmer, G. (1991). A mint of phrases. In Aijmer, K., & Altenberg,
B. (Eds.), English Corpus Linguistics (111-127). London: Longman.
Sinclair, J. (1991). Corpus, concordance,
collocation. Oxford: Oxford University Press.