Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2015-04-07T15:26:20+0000
Authors: Elise Thorsen (University of Pittsburgh) and David J. Birnbaum (University of Pittsburgh)
Quantitative metrics, and particularly the statistical study of meter and rhyme, has been a core research methodology in Russian verse theory and scholarship at least since the early twentieth century both among Russian scholars (e.g., Belyj, Taranovski, Gasparov) and abroad (e.g., Shaw, Scherr, Friedberg). Until recently, the methods have had to rely largely on the laborious human identification and tagging or recording of all individual stress and rhyme phenomena, which have then served as input into the (often computer-assisted) statistical analysis of synchronic patterns and diachronic trends in meter and rhyme. Not only is this sort of manual effort at corpus preparation not scalable, but more often than not the raw data underlying published scholarship have not been shared, which means that the results cannot be replicated and the conclusions cannot be verified. Almost the entire corpus of Russian classical verse is now freely accessible on the Internet in authoritative scholarly digital editions, and computational tools could therefore be used to relieve scholars of the human labor previously needed to prepare and collect the data needed for studies in quantitative versification, and to perform the analysis. To the extent that the data preparation and analysis proceeds algorithmically, intermediate results can be saved and examined and the entire process can be replicated and verified.
The principal limitation to using available poetic texts for this purpose has been that the place of stress in words, which in Russian is not predictable without linguistic knowledge and which is not part of standard orthographic practice, must be known before the orthographic representation can be converted algorithmically to a useful phonetic representation, which is, in turn, a prerequisite for identifying meter and rhyme automatically. To address this need, the authors have built a network of tools, all freely available as web services, that automate as much as possible this process. The system begins with a full-text dictionary of Russian, consisting of approximately 100,000 headwords with all inflected forms, including information about the place of stress, which can be accessed through an API to add stress information to Russian-language texts in natural, native orthography. The stressed texts can be used for other purposes (such as in readers for language learners), but our principal goal was that they could then serve as input into API-driven web services that are capable of producing descriptive statistics and visualizations of the metrical patterns in individual poetic texts and in corpora. These integrated resources thus serve as a workstation for the formal and quantitative exploration of Russian versification in a way that is consistent with current best practice for research data management. All web services are publicly accessible and all data and other materials are available under a Creative Commons license.
There are other tools and services that address some of the same issues as our system, but none that enables users to process their own texts by means of a convenient pipeline of API-enabled web services that can convert corpora of texts in native Russian orthography into tagged output, descriptive statistical reports, and visualizations. Furthermore, our innovative two-pass methodology enables us to use regularities in the poetic structure to compensate for lacunae in the dictionary; the ambient meter that emerges on a first pass in the case of metrically regular poetry can be used, in a second pass, to infer the place of stress for words that either are not in the dictionary or return ambiguous results. The feedback portion of our system that corrects for lacunae and ambiguities does presuppose largely regular metrical and rhyme patterns, which means that this component of the system performs most reliably, effectively, and usefully with the largely regular syllabotonic verse that dominates Russian poetry from the beginning of the nineteenth well into the twentieth century. We also note, in response to early reader comments, that although the Russian National Corpus (RNC) provides a poetry subcorpus that purports to be able to return metrical information, in fact the RNC output merely layers the predetermined dominant metrical scheme of a poem on top of the text without regard to actual linguistic stress phenomena, which means that the output is of no value for determining which potential (metrical) stresses are realized (linguistically) and which were not.
Our conference presentation will illustrate the use of the system to answer types of research questions that are common in Russian versification studies, involving synchronic and diachronic properties and regularities of poets, periods, and forms and styles of poetry. In particular, metrical analyses have produced the concept of semantic halo, a term coined by Mikhail Gasparov to describe intertexual meanings conventionally associated with a given meter. This is an observation enabled by a form of close reading performed numerous times, and because these observations of verse have by necessity been made by counting by hand, sample sizes have thus far been limited. The poetic canon is small enough that counting manually is feasible; however, scansion by hand could not begin to account for the large body of amateur and pulp poetry written from the mid-nineteenth century onward, or the extent to which this poetry does or does not reflect conclusions drawn from analyses of major nineteenth- and twentieth-century poets.
While the obscurity of most of these authors means their works remain undigitized, there is an ever-growing body of poetry published digitally. For example, there are more than 30,000 poems self-published by members of the online journal Poezia.ru. A corpus of this size offers the opportunity to elucidate the relationship of the larger population of those who consume and produce poetry with the poetic canon and contemporary critically acclaimed poets. With large-scale data about the use of meter, rhythm, and characteristic vocabulary, our research questions address the extent to which metrical semantic halos operate in this corpus and the extent to which they reflect patterns of readership and influence from the canon. What kinds of semantic halos are rarified phenomena, and which are more universally accepted as verse norms?
Mednyj vsadnik. Moscow: Federacija.