Corpus linguistics in the Greek papyri: developing a corpus to study variation and change in the post-classical Greek complementation system
The aim of this PhD project is to advance the corpus-linguistic study of the Greek papyri, a large diachronic corpus (3rd century BC – 8th century AD) of non-literary Greek. It consists of two central parts. The first part is focused on corpus design: starting from the transcribed (XML) version of these texts, it describes a pipeline model to supply the papyri step for step with linguistic information, using natural language processing (NLP) techniques. The different components are described in the individual chapters. First, the texts are tokenized (divided into individual words). Next, their part-of-speech, morphology and lemma is automatically determined. The next step is syntactic parsing: the individual sentences are transformed into syntactic dependency trees. Finally, a number of techniques for automatic semantic analysis are investigated, including the creation of so-called distributional vector models for the individual lemmas, which describe their lexical meaning, and the labeling of the semantic relations in the sentence (“semantic role labeling”).
The next part describes how these automatically analyzed texts can be used for corpus research. The central linguistic topic is variation and change in the verbal complementation system of Greek, studied from an usage-based framework. In a first introductory chapter, I analyze how such verbal complementation constructions should be defined, and how these constructions can be retrieved from the corpus data. Special attention also goes to the question how constructions that are vague between complements and adverbials fit in this analysis. The next two chapters are focused on two important loci of variation in the Greek complementation system. The first of them analyzes complementizer choice: in this chapter, I describe how a number of exploratory quantitative techniques can be used to gain an overview of the main extra- and intra-linguistic factors determining the choice between a large number of possible complementizers. This chapter also examines the question how systematic the Greek complementation ‘system’ truly is. The second of these chapters is focused on verbal stem choice: on the hand of a specific case study (speech verb constructions), I investigate which temporal, aspectual and modal constraints describe the choice between the four verbal stems (present, aorist, perfect and future).
In a final, concluding chapter, I analyze the main findings of the corpus-linguistic approach I developed, describing its strengths and shortcomings, and how these shortcomings can be overcome in the future.