Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2015-04-07T15:27:07+0000
[By Lawrence D. Adams and David J. Birnbaum. Originally published in Text technology 7, 1 (1997): 1–17. The present version has been updated to correct errors, standardize the transliteration of Russian text, and remove links to external resources that are no longer available. The goal of the original argument was to explore how a computer scientist and a humanist might conceptualize a solution to the same problem in different ways, and Russian versification was for that purpose an arbitrary test-bed task. The article nonetheless describes our first implementation of the algorithm that we continue to use, in modified form, to identify rhyme.]
While text crunching has always been a component of computer science, a growing interest in humanities computing has led to the increasing involvement of humanities scholars in hands-on programming projects. In many cases, the humanities scholars in question have little or no formal training in computer science or programming, and, instead, teach themselves the skills required to achieve their computer-dependent research goals. This socio-academic phenomenon led us to wonder whether the different backgrounds of autodidact humanities scholars and professional computer scientists would lead to different interpretations, methods, and solutions for programming projects in humanities computing.
In an effort to examine how the communities of computer scientists and humanities scholars might approach a specific humanities computing task, we undertook a small joint project to develop a system for extracting rhyme schemes from Russian verse. The authors of this report are Lawrence D. Adams, a computer scientist with a knowledge of Russian language and literature, and David J. Birnbaum, a Slavic linguist with experience in humanities computing. We were interested in examining how we would confront the Russian rhyme problem, and in learning where our approaches would be similar or different. For the practical task at hand, Birnbaum assumed primary responsibility for analyzing the Russian data and Adams assumed primary responsibility for program development and coding, but both researchers conferred frequently at different stages of the project.
Certain differences in skills were predictable: in most cases, a Slavic linguist would have more experience in studying the linguistic structures of the Russian language, and would therefore be better equipped to determine an algorithm for identifying rhyme. Similarly, in most cases a computer scientist would have more coding experience, and would therefore be better able to implement this algorithm in code. But other issues were subtler, involving the general analysis of the overall problem, how it should be divided into subtasks, and algorithm development.
The first difference that we recognized involved our choice of programming languages, with Birnbaum inclined toward SPITBOL and Adams toward C. It seemed to us that the choice of a programming language in both cases was governed by one assumption and two criteria. Our assumption was that there were likely to be several computer languages that were equally capable of solving our programming task. Our criteria, which may lead in contradictory directions, were that 1) some languages are better suited to certain projects than others, where suitability is reflected in the availability of primitive operators that correspond to the specific tasks required; and 2) the best language for a project is the language the programmer knows best. Thus, Birnbaum favored SPITBOL both because he knew it well and because it had powerful pattern-matching and string-processing operators, while Adams chose C because he knew it well and for several other reasons, even though C’s poor suitability for string processing vis-à-vis other programming languages is well-known:
Adams’s attention to portability and general availability of the development language reflects, among other things, his background in corporate programming, where collaboration and long-term maintenance may be crucial. Humanities scholars may be more likely to work individually on smaller tasks, where ease of collaboration with other programmers becomes much less important.
A subtler issue affecting the choice of programming language involves the relationship between the syntax of a specific programming language and the way programmers conceptualize specific problems. SPITBOL, which is weakly typed (performing type conversion behind the scenes where needed) and loosely structured (supporting GOTOs, which may be conditioned by the success or failure of functions), corresponds to the way some humanities scholars may conceive of text processing-problems. For example, SPITBOL, like a human, treats strings of digits as either strings or numbers, depending on the operation performed, avoiding the need for explicit type casting. Humanities scholars can certainly learn to deal with typed variables, but may find strong typing more of an inconvenience than a virtue.
Russian rhyme is not inferable from Russian orthography for several reasons:
The consequences of these features for program development are discussed below.
We conceived of our ultimate project as analyzing the rhyme schemes of unstressed, orthographically normal Russian text for two reasons, one practical and one philosophical. The practical reason is that this is the format most commonly encountered in Russian electronic text archives and in Russian printed sources (which could be converted to electronic format with optical character recognition). This means that most Russian material available for analysis would consist of unstressed text, and that a useful program would have to be able to analyze rhyme without relying on orthographic stress marks. The philosophical reason is that human readers determine rhyme from written text that does not bear overt stress marks, and we felt that our program would be more useful if it were able to analyze rhyme using the same input as humans.
The preceding facts led us to modularize this project into five general stages, of which only the last four were implemented:
We believe that the first of these tasks can be solved with the help of lexical reference materials, which would let us parse a word-form in an unstressed electronic text, look it up in an on-line lexicon, and return the position of stress. Because there are Russian words that are spelled identically and differ only in stress, and because an on-line lexicon would be unlikely to include all possible words (especially proper nouns), a small number of ambiguities would remain, which would have to be addressed either through more sophisticated context analysis or with the help of human intervention. Nevertheless, the process could be automated substantially.[1]
This modularization was intuitively obvious to both authors, but anecdotal evidence suggests that this may reflect the fact that we are both more familiar with each other’s areas of expertise than might be the case with other linguists and computer scientists. Our modularized analysis may also reflect considerable Unix experience on the part of both authors, which led us naturally to conceive of the project as a series of filters operating on one another’s output. Although the two authors of the present study understood and modeled the problem in question similarly, this similarity may not easily be generalized to humanities or computer specialists who may have less practical experience in each other’s areas of expertise.[2]
For the present project, we left the problem of determining the place of stress automatically for later, and provisionally inserted stress tags into our test corpus manually. Birnbaum then developed a text-to-pronunciation algorithm, which Adams implemented, after which the representations of pronunciation were compared to ascertain which lines might rhyme in a particular poem or corpus. We hypothesized that Adams’s implementation might incorporate revisions of Birnbaum’s algorithm for reasons of design or implementation efficiency, but Adams’s code turned out to follow the general structure of Birnbaum’s original algorithm very closely. The algorithm and implementation were then modified in response to errors or inconsistencies discovered during testing.
Because rules of Russian pronunciation and rhyme vary with both time and individual, we adopted a compromise definition of rhyme based on modern standard pronunciation.[3] A later extension of the program might allow the user to specify different rules for conversion from orthography to pronunciation.
Our general model of Russian rhyme required exact phonetic correspondence from the last stressed vowel of a line until the end of the line. In the case of masculine rhymes ending in a vowel sound, the consonant sounds immediately preceding the stressed vowels of rhyming lines must also correspond.[4] Deviations from these patterns in the works of Russian poets were illuminating, showing that inexact rhymes may sometimes deviate only in specific, restricted ways from exact correspondence. These facts are well known to scholars of Russian versification, but the need to specify the degree and type of phonetic correspondence required for rhyme made obvious certain subtleties of inexact rhyme, which illustrates the commonplace that programming enforces an attention to detail that might prove valuable to linguists and literature scholars.
Our system is divided into four modules: Stress, Orth, Weight, and Report.
Stress reads each line from the original input file and isolates the portion of the line that is relevant for rhyme: the last stressed vowel, all information following that vowel, and, in the case of open masculine rhymes, the consonant preceding the stressed vowel. All punctuation is stripped. Although Russian rhyme may cross word boundaries, spaces were retained until certain features of pronunciation that depend on word boundaries could be addressed.
Finding the final stressed vowel was generally straightforward, although preserving a supporting consonant for open masculine rhymes proved more complicated. The consonant sound /j/ is not always spelled with a separate consonant letter, and may instead be incorporated into a vowel letter that spells a sequence of /j/ plus a vowel sound, a property that depends on the letter preceding the soft vowel letter in question.[5] This is awkward because Stress otherwise operates on letters, but the notion of a supporting consonant in Russian is based on sounds, and there is not always a one-to-one correspondence between sounds and letters.
Stress also outputs a string of vowels and stress marks in a separate field. This information is not used in the rhyme analysis program, but would be needed for more general metrical analysis, and can be generated at no significant extra cost, enabling Stress to serve as part of an eventual metrical analysis system.
Orth converts the output of Stress, which is the (potentially) rhyming portion of each line, from its orthographic representation to a phonetic one, according to the following algorithm:
Russian non-distinctive voicing: a stocktaking,Russian linguistics 17, 1 [1993]: 1–14 for linguistic details). Also working stepwise from the right, dental consonants (t d s z n l) and soft palatals (č šč) undergo regressive softness assimilation (although there is no corresponding hardness assimilation).[6]
This conversion is imperfect in several respects, none of which led to significant error during field testing:
The regressive obstruent voicing assimilation rule must be implemented from right to left; Adams implemented all of the other procedures from left to right in the input string. This choice was largely arbitrary: in those cases where a string search was implemented using the system function strchr, the search was necessarily left to right. In all other cases, the string was tested character by character, and Adams arbitrarily chose to implement the search from left to right in order to mimic a human’s natural reading order and make for a simpler implementation. Had Birnbaum implemented these procedures in SPITBOL, which has a primitive left-to-right scanning function but no right-to-left counterpart, it would have been most sensible to handle obstruent voicing assimilation by using a SPITBOL primitive to reverse the string and then employing primitive left-to-right scanning functions. Reversing the text and then scanning from left-to-right might not be an intuitive strategy for a humanities scholar.[7]
Weight compared the potentially rhyming portions of all lines within each quatrain to one another, and rhymes were considered perfect if the output of Orth for two lines was identical. As was noted above, stressed o and e were considered identical, but rhymes involving these letters were tagged so that they could be verified later, as necessary. It was assumed that all rhymes are isosyllabic, so that two lines cannot be considered rhyming unless the final stressed vowels coincide and the lines have the same number of syllables after the final stressed vowel.[8] Deviations from perfect rhyme may include the following, each of which was recorded in a set of bits allocated for each line-to-line comparison (the term feminine in this list refers to all non-masculine rhyme, e.g., feminine, dactylic, hyperdactylic, etc.):
Broken rhymes, spread across word boundaries, may be of two types. If the words constitute a single stress group (that is, if one of the words has no lexical stress, and is usually pronounced as part of a neighboring word), they function as a single word for metrical purposes, so that Sergej Esenin (1895-1925) rhymes, for example, initially-stressed nébo with né byl. These stresses are not poetic artifacts; they correspond to normal Russian speech, and reflect an imperfect rhyme characterized by a lack of correspondence of unstressed post-tonic vowels and of final open and closed syllables.
The situation is more complex where lexical stress conflicts with metrical stress. Readers of Russian poetry sense the metrical cadence of a line in two ways: by the metrical pattern implied by its context (e.g., an unvarying iambic cadence leads the reader to expect further iambs) and by the place of lexical stress within the line. These principles may conflict in one of two ways: a lexically stressed syllable might be metrically unstressed, or vice versa. These deviations from consistent metrical patterns, leading to, for example, a trochee in the middle of an iambic line, or other adjustments, are a significant factor in preventing verse from sounding "singsong."[9]
This raises important complications for a general metrical analysis algorithm. On one level, readers identify patterns of metrical stress as a consequence of lexical stress; e.g., a sequence of naturally (lexically) iambic words imposes an iambic metrical pattern, creating an expectation on the part of the reader. But on a different level, metrical stress is more assertive than lexical stress, so that a single trochee in a strongly iambic line will not prevent a reader from recognizing the line as nonetheless iambic. The force of a prevailing metrical pattern can depend on features outside the particular line being analyzed, such as the metrical structure of a line with which it rhymes (most rhymes in pre-modern Russian verse involve lines of identical basic metrical structure), a general metrical pattern imposed by the stanza, or the metrical structure of the lines occupying the same position in other stanzas of the same poem. What this ultimately means is that basic metrical patterns are induced from lexical stress, but lexical stress may deviate in certain ways from metrical stress without undermining the metrical pattern.
In the case of the present rhyme-oriented project, clashes between lexical and metrical stress play a very small role. Such clashes are extremely rare at the ends of lines because the final stress, which, after all, is crucial for determining rhyme, is generally felt to be the strongest in the line, and the most resistent to tension between metrical and lexical stress. But broken rhymes that involve dissonance between metrical and lexical stress nonetheless occur, as in Zinaida Gippius’s (1869-1945) rhyme of vjálaja and žálo já.
Because stress was marked manually for the present project, it would have been possible to force lines such as Gippius’s to rhyme simply by not marking stress on ja, even though it is normally a lexically stressed word. We feel that this would be the wrong approach, because an eventual autostressing module based on lexical parsing and dictionary lookup would have to return a stress for this word. Our program currently would err by failing to recognize the Gippius rhyme, and we believe that this error should be corrected at a later stage by modifying the Stress module to look at the penultimate stress of certain lines, and perhaps at the stress patterns of adjacent lines, where appropriate. (The preceding was what we wrote in 1997, but we would now be inclined to regard the phonetic stress of certain types of words, including personal pronouns, as negotiated between the underlying lexical stress of the word form and whether it occurs in strong or weak position, that is, whether the general metrical structure of the poem leads the reader to expect a stress or not.)
Whether imperfect rhymes should count as rhyming or not is determined in the Report module. User configuration options specify which imperfections count as rhyming and which do not. Imperfections are specified negatively (e.g., presence of post-tonic consonants that fail to correspond), rather than positively (e.g., all post-tonic consonants correspond), because not all possibilities for imperfections will be present in all cases. This means that the evaluation module in Report can scan the bit fields, assuming that a set bit means a deviation from perfect rhyme, while an unset bit means either that the sounds in question correspond perfectly or that there are no such sounds in a particular pair of lines (e.g., there would be no check for supporting consonant in rhymes that are not masculine and open).
Our initial implementation looks for rhymes within stanzas, which are delimited by blank lines in the input file. Comparisons of lines proceed pairwise through a stanza, so that, for example, a quatrain would undergo six comparisons (lines 1-2, 1-3, 1-4, 2-3, 2-4, and 3-4). This combination of n lines, taken two at a time, implies O(n^2) comparisons. This number of comparisons does not scale well even for relatively small n; simplifying heuristics will eventually be needed to deal with long non-stanzaic poetry. Currently, the program attempts to minimize number of comparisons by not revisiting a line once it is determined that it rhymes perfectly with line(s) preceding it in the stanza. More important, the present strategy does not check for rhyme outside stanzas, and will require modification to deal with rhyme schemes that may cross stanzas, such as terza rima.
The output of Report produces a rhyme scheme (e.g., of the type
AbAb
) with additional information about types of imperfect rhyme. Perfect
rhymes are reported using capital letters; imperfect rhymes are reported using
lower-case letters (note that this supersedes the traditional use of capitalization to
distinguish feminine from masculine rhyme, which Report does not
report). Further, rhyming lines which contain cyrillic e are marked
with an asterisk, in order to signify that the system assumes that the
e is pronounced either as /e/ or as /o/, as appropriate with the
rhyming line.
The following examples illustrate these reporting conventions
Original text | Report output |
---|---|
Vot už večer. Rosa | A |
Blestit na krapive. | B |
Ja stoju u dorogi, | C |
Prislonivšis′ k ive. | B |
Original text | Report output |
---|---|
Tam, gde kapustnye grjadki | A |
Krasnoj vodoj polivaet vosxod, | B |
Klenenoček malen′kij matke | A |
Zelenoe vymja soset. | B* |
Original text | Report output |
---|---|
Pod zatumanennoju dymkoj | A |
Ty kažeš′ devič′ju krasu, | B |
I treplet veter pod kosynkoj | a |
Ryževolosuju kosu. | B |
The first stanza is an example of a perfect rhyme; Report marks this stanza as A B C B. In the second stanza, Report notes that lines two and four of the quatrain rhyme if the e in soset is pronounced /o/. Report therefore considers the rhyme to be perfect, and reports the stanza’s rhyme structure as A B A B*. In the third stanza, dymkoj rhymes imperfectly with kosynkoj, as the former possesses an /m/ phoneme where the latter contains an /n/. Report notes this imperfect rhyme with a lower-case a: A B a B.
Our test file was a corpus of a few hundred lines by Sergej Esenin (1895-1925), chosen because it was readily available and is characterized by frequent imperfect rhyme. The file was encoded by George Fowler in a proprietary character coding scheme, which is currently hard-coded into our system. The ready availability of Jan Labanowski’s translit program, which supports arbitrary recoding of files, obviated the need to support additional code pages; files in other encodings can be piped through conversion scripts to Fowler’s encoding. (tranlit is no longer available at the address we used when preparing our original publication.)
Adams originally considered supporting Brjabrin’s alternative
encoding,[10] which is a de facto standard on personal
computers in Russia (although not on the Internet), but decided to remain with Fowler’s
system during development because it was easier to read on ASCII terminals. (We have
updated this report to use the scholarly
transliteration and Unicode, instead of
the hybrid ASCII representation we employed originally.) Our system could be expanded to
support alternative lookup tables, which would allow the user to select an encoding
system at runtime. A more extensible system could eschew internal mapping tables
entirely in favor of an external mapping file, similar to those used by Labanowski for
translit.
The implementation process was characterized by analysis and initial coding, followed by cycles of testing and revision. Our modular structure proved crucial here, since it allowed Adams to tinker with details of implementation without substantial system-wide adjustments. For example, our initial plan involved combining Weight and Report into a single module, with additive or subtractive weights assigned for different correspondences (or lack thereof) between pairs of lines. When we realized that we might later wish to expand the inventory of types of imperfect rhymes, we separated the two modules and incorporated an open string of bit flags for imperfect rhyme patterns. This new design permits the inventory of patterns to be expanded without substantial program modification. Report and Weight share only the information regarding the numerical line comparison rhyme rating, and a flag indicating whether the e/o issue was a factor. As such, we make all the decisions in the Weight module, and in Report we just bring together the original text with an output convention relaying Weight’s analysis.
One feature of this enterprise was that both authors were comfortable treating text strings as data objects, capable of being manipulated in ways that are independent of their relationship to human language. The most efficient computer implementation of an operation might differ considerably from the way a human would pursue the same result, and it is unclear, for example, whether alternative strategies discussed above for locating the last stress in a line would occur naturally to someone without both programming experience and an understanding of data structures. Indeed, the most efficient implementation in a particular computer language may depend on idiosyncrasies of the language, a design concept that finds no obvious analogy in the way humans read and understand the features of literary texts.
Furthermore, in order to determine a phonetic representation for a given line, the system makes a number of passes through each line of text. This approach was dictated by our interest in keeping the source code readable and maintainable; a human, on the other hand, would likely scan each line of text only once. Additionally, due to the relatively small length of input lines (3-6 characters per rhyme on average), it was felt that this multiple-pass strategy, which requires only a single disk access, would not affect the performance of the system.
Our experiment necessarily remains anecdotal, first because it is questionable whether there is such a thing as a representative humanities scholar or computer scientist, and second because the present authors probably know too much about each other’s fields of expertise to qualify as representative of their own disciplines. Nevertheless, we can draw certain qualified conclusions.
The question of whether humanities scholars and computer scientists would analyze and implement the same problem in different ways is misstated. It inappropriately concentrates on biography, and infers conclusions about algorithm design and programming style from academic training or employment. What underlies this question is an assumption that these two types of scholars will bring different perspectives to their work, and, in particular, that humanities scholars will tend to regard electronic poetry files primarily as text, while computer scientists will tend to regard these same files primarily as data objects. Because one tends to do different things with literary texts than with computational data objects, one might reasonably ask whether a scholar’s perceptual habits might create blind spots in places where it might be necessary to look at files in a new way.
There is, of course, no reason why one cannot be both a humanities scholar and a computer scientist, or, to put it less professionally, both a competent reader and a competent programmer. Our anecdotal observations have been that humanities scholars without programming experience may tend toward algorithms that mimic human processing strategies, while computer scientists without much literary experience may initially overlook subtleties of literary structure. In general, though, we may conclude that what is required for successful humanities computing is not just a knowledge of literature (for example) and a knowledge of programming, but the flexibility to recognize that the same piece of text may represent both a human data type (e.g., a line of poetry) and a computational one (e.g., a string of characters), each with its own properties and subject to operations appropriate for interacting with those properties. Human and computational processes of identifying rhyme schemes may use the same input data and generate the same output, and may even follow the same high-level algorithm (e.g., convert text to speech, then find specific corresponding sound patterns in the speech). But the difference in data types may suggest different lower-level algorithms, a perspective that will only be accessible to someone who understands the data structures involved.
[1] Some words also admit variant stresses, which are disambiguated in metrical verse by the overall meter.
[2] Two anecdotes may be illustrative. A Slavic linguist colleague had previously expressed his frustration with the difficulty of analyzing rhyme in Russian text that was not tagged for stress, but did not immediately recognize that analyzing tagged text is not just a different task from dealing with untagged text, but also a partial solution to the untagged-text problem. And an information science colleague initially suggested that English rhyme might be analyzed through orthographic comparison, overlooking the complex relationship of orthography to pronunciation that would have been immediately apparent to a linguist.
[3] In particular, certain changes in the pronunciation of unstressed words were not taken into consideration in early rhyme, so that, for example, early nineteenth-century poets rarely rhymed unstressed o and a or unstressed e and i. Earlier poetry was sometimes based on Church Slavonic, rather than vernacular Russian, pronunciation, and corresponded even more closely to orthography.
[4] See chapter 6 of Boris O. Unbegaun, Russian versification, Oxford: Clarendon, 1956 for details. Some examples in our discussion are taken from this source.
[5] E.g., the letter ja spells the sounds /ja/ in mojá, but in spját it identifies the preceding p as palatalized (see below) and then spells the vowel sound /a/, with no /j/ sound.
[6] Palatalization assimilation in Russian is complex and varied, but regressive softness assimilation in contiguous dental consonants seems to be consistent. An alternative heuristic might involve imposing regressive softness assimilation on all consonant sequences.
[7] Two alternative algorithms may seem more natural to someone who is not accustomed to thinking in terms of computational efficiency:
[8] Vladimir Majakovskij (1893-1930) and other twentieth-century poets sometimes employed heterosyllabic rhymes, such as górode/mórde. Our system currently assumes that all rhyme is isosyllabic, but could be modified to admit heterosyllabic rhyming or any other deviation from perfect rhyme.
[9] Vladimir Nabokov (Notes on prosody, Princeton: Princeton University Press, 1964) refers to lexical stress as accent and metrical stress as stress, emphasizing that these different types of prominence belong to different systems. This treatise, originally an appendix to the author’s four-volume critical translation of Aleksander Puškin’s (1799–1837) Eugene Onegin, offers a clear introduction to Russian prosody, with particular attention to differences between Russian and English scansion.
[10] V. M. Brjabrin, I. Ja. Landau, M. E. Nemenman. "O sisteme kodirovanija dlja personal′nyx ÈVM," Mikroprocessornye sredstva i sistemy, 4 (1986): 61-63.
The authors are grateful to Julia West for literary consultations.