|
Elke Teich & Peter Fankhauser
Darmstadt University of Technology, Germany Fraunhofer IPSI Darmstadt, Germany
Supporting lexical cohesion analysis using WordNet
Analyzing large amounts of texts in terms of lexical cohesion (Halliday & Hasan 1976; Hasan 1984; Hoey 1991) manually is an extremely tedious task and prone to error. However, in order to carry out in-depth investigations of lexical cohesion large sets of reliable data are needed.
In this paper, we propose an automatic method of lexical cohesion analysis using WordNet (Fellbaum 1998), an electronic lexical resource that organizes the vocabulary of English content words in terms of basic sense relations (synonymy, hyponymy/supernymy, antonymy, etc). Taking the sense-tagged version of the Brown corpus as a data basis, we enrich texts with potential lexical ties by matching the semantic neighbourhood in WordNet of each token with its subsequent tokens. Introducing a number of constraints, e.g., on part-of-speech, semantic distance and degree of specificity, potential lexical chains are determined automatically (or ruled out).
We also show how we use the annotated corpus to carry out some numerical analyses of lexical cohesion patterns, asking, for example: What are the most substantive chains (length, number of elements)? Which types of lexical cohesion is used predominantly? Are there lexical cohesion patterns according to register? References:
Fellbaum C. (ed.), 1998. WordNet: An Electronic Lexical Database. MIT Press.
Halliday MAK & R. Hasan, 1976. Cohesion in English. Longman.
Hasan R., 1984. Coherence and cohesive harmony. In Flood J. (ed.), Understanding Reading Comprehension, pp. 181-219. IRA.
Hoey M., 1991. Patterns of lexis in text. Oxford University Press. |