Elke Teich & Peter Fankhauser

Darmstadt University of Technology, Germany
Fraunhofer IPSI Darmstadt, Germany

Supporting lexical cohesion analysis using WordNet

Analyzing large amounts of texts in terms of lexical cohesion (Halliday & Hasan 1976; Hasan 1984; Hoey 1991) manually is an extremely tedious task and prone to error. However, in order to carry out in-depth investigations of lexical cohesion large sets of reliable data are needed.

In this paper, we propose an automatic method of lexical cohesion analysis using WordNet (Fellbaum 1998), an electronic lexical resource that organizes the vocabulary of English content words in terms of basic sense relations (synonymy, hyponymy/supernymy, antonymy, etc). Taking the sense-tagged version of the Brown corpus as a data basis, we enrich texts with potential lexical ties by matching the semantic neighbourhood in WordNet of each token with its subsequent tokens. Introducing a number of constraints, e.g., on part-of-speech, semantic distance and degree of specificity, potential lexical chains are determined automatically (or ruled out).

We also show how we use the annotated corpus to carry out some numerical analyses of lexical cohesion patterns, asking, for example: What are the most substantive chains (length, number of elements)? Which types of lexical cohesion is used predominantly? Are there lexical cohesion patterns according to register?

References:

Fellbaum C. (ed.), 1998. WordNet: An Electronic Lexical Database. MIT Press.

Halliday MAK & R. Hasan, 1976. Cohesion in English. Longman.

Hasan R., 1984. Coherence and cohesive harmony. In Flood J. (ed.), Understanding Reading Comprehension, pp. 181-219. IRA.

Hoey M., 1991. Patterns of lexis in text. Oxford University Press.

Elke Teich & Peter Fankhauser

Darmstadt University of Technology, GermanyFraunhofer IPSI Darmstadt, Germany

Supporting lexical cohesion analysis using WordNet

Darmstadt University of Technology, Germany
Fraunhofer IPSI Darmstadt, Germany