This collection briefly introduces Perseus 6.0, the first version of Perseus 6 (Beyond Translation): Perseus 6 represents the most extensive step forward of any version since Perseus 1.0 appeared in 1992.
Perseus 6.0: March 15, 2023
This collection of documents briefly introduces features offered by the Beyond Translation Project, the latest major release of the Perseus Digital Library. While Beyond Translation represents the sixth major version of Perseus (Perseus 6.0, the initial iteration of Perseus 6), this version represents the biggest change of any new version and arguably represents a new beginning for a project that began in 1985. (Jump to an overview of new features in Perseus 6.0 here.)
We announce Perseus 6.0 on March 15, 2023, five years after the release of the Scaife Viewer in 2018 and 15 years after the loss of Ross Scaife (1960-2008), a pioneer in Digital Classics and a friend to many of those whose work contributes to Perseus 6. Current NEH funding for Beyond Translation will run through August 2023. Our goal is to introduce the features that Beyond Translation offers with enough time left in the current NEH-funded phase of development so that we can respond, insofar as possible, to feedback from the community.1
James Tauber and Jacob Wegner, first as part of the Eldarion and now as members of Signum University, have, in collaboration with Perseus staff, been responsible for the software development of Perseus 6. Members of Tufts computing (such as Patrick Florance, Peter Nadel and Christopher Barnett) have contributed as well, playing key roles in planning and in thinking through our priorities.
Perseus 6.0 contains an initial version of each major feature that we prioritized for inclusion in Perseus 6.0. We have used the Homeric Iliad and Odyssey as initial testbeds because of the wealth of openly licensed data available for these works but we include as well:
Beowulf (under “Anonymous,” for now), illustrating token level annotations with dictionary forms, grammatical codes, and glosses for each word.
sections of Bodin’s Six Books of a Commonweale in French and English aligned at the section level
the poetry of Hafez (with word/phrase level alignments to an English translation by Maryam Foradi),
the Passionate Shepherd of Christopher Marlowe (included because we were able to use, without modification, TEI-xml textual notes encoded by Hilary Binda in 1998, thus demonstrating, even to our own surprise, how sustainable complex encoding can be)
Perseus 6 is designed to integrate openly-licensed, machine-actionable micro-publications into an integrated reading environment. Other projects have for years — in many cases, for more than a decade — allowed users to create particular types of annotation:
Perseus, Alpheios and Perseids have provided various tools to create treebanks that followed the Perseus Ancient Greek and Latin Dependency Treebanks annotation scheme. As the center gravity moves towards the newer Universal Dependency Framework, we gain access to annotation and analytical environments such as INCEpTION (which builds upon the previous WebAnno system).
Pelagios (drawing on the Pleiades gazetteer) and ToposText have made it possible to link names in source texts to specific locations (e.g., Alexandria as the famous city in Egypt rather than Alexandria, Virginia) and, to some extent, to particular people (ToposText uses Wikidata).
The Homer Multitext Project also used the broader CITE architecture to generate alignments between lines of, and medieval annotations about, the Iliad between images in the Venetus A manuscript and transcriptions. Many other institutions are now using IIIF-based tools to generate comparable data.
For more than fifteen years, the University of Chicago has made openly licensed materials from Perseus and other sources available through its Philologic system, with its own browsing and advanced searching capabilities.2.Chicago also has a complementary project, Logeion, that provides access to range of Greek and Latin lexica in a variety of modern languages. Helma Dik and her colleagues at Chicago made thousands of updates to Richard John Cunliffe’s Lexicon of the Homeric Dialect and to his dictionary of Homeric Proper and Place Names.
David Chamberlain created his own system to publish metrical analyses for 250,000 lines of Greek and Latin poetry under an open license on his Hypotactic site.
Every class of annotation involves scholarly decisions and these decisions need to be explained. We need to be able to add commentary with traditional narrative prose. The New Alexandria Foundation’s Open Commentary Platform fills this niche. The Open Commentary Platform is designed to support annotations, each with their own unique credits and metadata, for any text that is compliant with the CTS data model and uses the same CTS library (CapiTainS) upon which Perseus 5, the Scaife Viewer, is based. As a result, the Open Commentary Platform has been integrated into Perseus 5 and Perseus 6, with real time access from either version of Perseus to comments published in the Open Commentary Platform.
Projects such as those above all change the way in which we engage with ancient sources. We no longer need to depend upon static texts and lexica (whether in print or print-like formats such as PDF). Curated linguistic annotations for more than 1,400,000 words of Greek are available. The Harvard Undergraduate Greek and Latin Reading Lists, combined, amounted to c. 150,000 words and was abolished because it was too large. Researchers such as Vanessa Gorman (Reading Ancient Greek in the Digital Age), Toon van Hal (Pedalion), Neven Jovanovic (Greek Morphology: a reader for first year of undergraduate Ancient Greek course, short prose excerpts), and Farnoosh Shamsian (Shamsian 2022) are doing pioneering work as they explore new ways for audiences to explore and internalize Greek. No topic may, in fact, be more important for the study of ancient cultures than the question of how audiences can engage earlier with sources in the original language and internalize such knowledge as they need for their particular purposes with as much speed and satisfaction as possible.
Perseus 6 was designed to build upon open data projects such as those listed above. Building particularly upon the work of the Alpheios Project (see Figure 2 below, which also funded the beginning of the Perseus Greek Treebank), Perseus 6 addresses the task of bringing such disparate publications together.
The contents of Perseus 6.0 represent years and even decades of work (plans for the Perseus Treebank, for example, were announced in summer 2002, more than twenty years ago) that has evolved in separate projects but can now be viewed together.
Major features of Perseus 6 include the following:
A new reading environment that supports a middle path, between mastery of the source language and complete reliance upon translation. Perseus 6 challenges its audience to engage directly with source texts in languages that they have not learned. Perseus 6.0 uses Homeric poetry and, in particular, books 1 of the Iliad and 5 of the Odyssey, to illustrate what is possible, bringing together born-digital aligned translations, treebanks, dynamic maps. multiple lexica, metrical annotations and performance, explanations, links to high resolution scans of a major manuscript, as well as commentaries ancient and modern.
Perseus 6 now has the ability to publish critical editions with machine actionable textual notes (rather than just treating these as footnotes with raw text). This is possible because the code base of Perseus 5 (the Scaife Viewer) is open source and the commercial publisher Brill added features to that open code base to serve the needs of its own scholarly editions platform. The content of Brill scholarly editions may require subscriptions but Brill’s investment in the Perseus 5 codebase now allows Perseus 6 to provide full support for traditional critical editions in a range of languages.
Application of the same natural language processing tools to ancient and modern languages. A surge in the power of machine learning has transformed natural language processing and makes the same philological methods that we apply to ancient texts become powerful tools for working with modern languages alike. While we use Greek as our initial example, the methods that we present are immediately applicable to dozens of languages. For students of Greco-Roman culture, this enhances access not only to sources in Greek and Latin but in both traditionally recognized languages of scholarship such as French, German and Italian and in many other languages within Europe (e.g., Croatian and Latvian) and beyond (e.g., Arabic and Chinese).
Most of the new content in Perseus 6 is born-digital and cannot be fully represented in a format that strictly mimics print models. The most important new contributions to Perseus 6, such as treebanks, translation alignments, and geospatial annotation, go beyond the models of, and cannot be meaningfully represented in, print form. In earlier work, we mined print resources for machine actionable knowledge. That earlier extracted information provides a starting point still, but the center of gravity is now increasingly born-digital. We can — and, indeed, must — write articles and monographs about these born-digital objects. To this end, we have created the Perseus Journal of Data Preservation and Sustainability so that we have a vehicle by which to explain what publications are in Perseus, how they work and how they can be reused in the future. Such articles and monographs are only meaningful insofar as they enhance the value of those born-digital resources.
Perseus 6 depends upon a new, rapidly expanding, much more decentralized culture of intellectual production. The most important direct contributors to Perseus 6.0 are often not professional researchers but students and scientists. Automated systems play a key role in providing initial analyses for large bodies of textual information. Professional researchers and faculty play a key role in educating and providing feedback.
But Perseus 6 would not be possible if students and members of the public had not dedicated their time to creating complex, vital new resources such as editions, linguistic analyses, aligned translations, and geospatial annotations. Expertise, both academic and professional, comes not only from students and faculty in traditional fields of the humanities but also from those engaged in the computational and data sciences.
Perseus 6 shows the possibilities for a study of the past that engages contributors from outside of established academic centers in Europe, North America and the developed world. Perseus 6 includes not only the first born-digital, but the first direct, translation for the first book of the Iliad from Greek into Persian. Perseus 6 uses the Didakta Modular Grammar for Greek that Farnoosh Shamsian developed as compact resource optimized for translation and localization into Persian and other languages in which few sources about Ancient Greek exist. While Perseus 6 appears as an English resource, it is designed to evolve into a multilingual space in which speakers of many modern languages work with sources, ancient and modern, from around the world.
Perseus 6 is an exercise not only in data integration but in preservation and sustainability. Perseus 6 represents a concrete demonstration that born-digital content can not only be preserved as static files in a digital archive but can be sustained as evolving components within new efforts that draw upon, augment and then themselves recirculate openly licensed content of considerable complexity.
The Beyond Translation Project addresses the challenge of living in a world with more languages than any of us will ever be able to study, much less master. A growing number of readers now regularly use machine translation to explore ideas in new languages. Machine translation, however, aims to match the performance of human translators. The Beyond Translation Project views any translation, whatever its sources, as important insofar as it allows readers to push beyond that surface and to explore the source text in its original form. Beyond Translation focuses on what translations cannot communicate and challenges readers to see how far they themselves can go.
Beyond Translation explores a third way of interacting with the human record, one that occupies a space between using our own internalized knowledge of a language and relying upon translation. As we make the first soft launch of Beyond Translation, we have implemented an initial model of a next-generation reading environment which draws upon rich linguistic annotation, alignments at the word and phrase level between source text and translation(s), links from references to people, places, ethnic groups and other named entities to knowledge bases that allow automatic generation of maps, social networks and other visualizations, full representation of metrical structures and one or more accompanying performances, dictionaries, grammars, and traditional explanations in narrative prose.
When documents cover topics with established, cross-cultural terminologies and especially where the goal of any researcher is to summarize key ideas, translation may be largely adequate. Beyond Translation aims for that point where translation begins to break down because we are working with cultural terms that can overlap, but inevitably differ as cultural differences mount and for those points where superficially identical concepts (e.g., “bread” and “rain”) evoke different meanings based upon very different experiences — Iranian collaborators with Beyond Translation have, for example, expressed shock the first time they heard people in Germany complain about rain, because rain is all to rare and a rainy day is a good day in their experience.
The audience for Beyond Translation is broad and includes anyone who has compared with frustration source texts and translations. Lyrics provides one example. Millions of human beings every day, for example, listen over and over again to songs in languages that they do not know, whether a Bach Cantata or a song on the death of the last speaker of Old Prussian by a Latvian heavy metal band, or a music video in Bambara celebrating Great Mali. Others may have favorite movies in languages that they do not know and that contain richly expressive features (such as the honorifics in Japanese films set in the medieval period). One goal of Beyond Translation would be to reveal the structure and cultural background for each word in such songs and allow audiences, who may have already memorized the words, to internalize a far deeper understanding than they had ever thought possible.
Within academia, Beyond Translation aims to serve anyone who needs to understand language in the terms of its producers. This includes historians of all aspects of human activity, including politics, social relations, science, religion, philosophy, and music. Students of World Literature conventionally rely upon translation so that they can work with literary sources from around the world. Aside from the fundamental limitations of translation noted above, this approach, however, depends upon both the existence and the quality of the translation available — many sources are only available in indirect translation (e.g., a Persian translation of an English translation of Xenophon’s Life of Cyrus the Great). Comparative Literature, by contrast, normally demands that readers develop mastery for the languages of the literatures on which they focus. But where linguistic expertise may give this approach added rigor and insight, the need to master languages drastically reduces the range of the student. Few among us could master the 24 official languages of the European Union and fewer still could also master the 22 scheduled languages of India, much less the hundreds of languages spoken in Indonesia or the thousands of languages that are still spoken.
In the United States, three ancient languages report by far the most enrollments. Ancient Greek and Latin are the most widely studied ancient languages in the United States (ranking 12th and 9th most commonly studied foreign languages) according to the most recent available figures (published by the Modern Language Association in 2019 and providing coverage through 2016). Ancient Greek and Latin with total reported enrollments of 13,264 and 24,866 (combined 38,130) were almost 4 times as large as Biblical Hebrew (with 9,587), the next most widely studied historical language.
Most historical languages are rarely taught and, if so, are almost always only available in a handful of elite institutions. Other than Ancient Greek, Latin and Biblical Hebrew, no other historical languages reported enrollments that even approached 1,000 (and indeed all reported enrollments were smaller by an order of magnitude from Biblical Hebrew and two orders of magnitude less than the combined Greek/Latin figures). All appeared in the list of less commonly taught languages, with the highest reported enrollments being in Ancient Aramaic (453), Sanskrit and Vedic (391), Classical Chinese (298), and Akkadian (119). Enrollments for Ancient, Middle and Late Egyptian reported for 2016 amounted to 36, 6, and 0.
But if Ancient Greek and Latin are the most widely taught historical languages in the US, they face their own profound challenges. First, opportunities to study Greek and Latin are themselves not widely available. The US Department of Education reports that there are approximately 6,000 postsecondary institutions that participate in its Title IV program. A February 2023 search of the National Center for Educational Statistics (NCES: https://nces.ed.gov/) for universities with programs in ancient or classical studies turns up 278 results. Courses in Greek and Latin can be offered without such programs and biblical Greek is taught in seminaries but not every program teaches Greek and Latin. These figures suggest that students at 95% of US postsecondary institutions have no access to courses on the two most widely taught ancient languages.
Second, the study of Greco-Roman antiquity is, itself, too big: in 2021, Princeton provoked controversy when it removed any language requirements from its Classics major but its move reflected the fact that traditional study of Greek and Latin takes too much time to be practical. At the same time, advanced study in this field assumes the ability to work with multiple modern languages other than English (typically, French, German, Italian and Spanish). Beyond Translation focuses on new opportunities for those interested in this field to work directly not only with primary sources but with discussions in a growing range of modern languages, including, but extending beyond those of Europe. Machine translation alone already makes it possible to track ideas (e.g., a Persian language Twitter thread discussing how Thomas Jefferson understood and used Xenophon’s Education of Cyrus).
Third and perhaps most importantly, the field of Classical Studies is too narrowly construed. The Princeton Classics Department still states on its website that it “investigates the history, language, literature, and thought of ancient Greece and Rome.” The equation of Classics and Classical Studies with a focus on Greco-Roman culture reflects a narrow Eurocentric world view that is, at the least, problematic. Harvard, in fact, has a “Department of the Classics,” where the use of the definite article drives home the deeply problematic idea that only Greek and Latin warrant the term classical. (I try to use a term such as Greco-Roman Studies when I describe this narrow view of the field.)
If we are to use terms such as Classics or Classical in countries that acknowledge the diversity of their societies, we must adopt a more expansive conception of this term, one that includes not only established literary languages such as Classical Chinese, Sanskrit, the cuneiform languages of the Middle East, the various forms of Egyptian and Persian, and Classical Arabic, but also sources from and about Africa and the Western Hemisphere itself: the K’iche’ Mayan Popol Wuj, the productions of West African oral performers in a range of languages; and Classical Arabic histories of the Songhai Empire produced by writers from West Africa. We must expand our linguistic and cultural range and, if we rely upon traditional approaches, we will become increasingly dependent upon the choices and authority of translators.
But if Beyond Translation pursues strategies with very broad applications in contemporary culture, we began our work with the opening of the continuous literary tradition of Europe, the Iliad and Odyssey attributed to Homer. In part, this reflects my own training as a specialist in Ancient Greek with a particular interest in Homeric epic. At the same time, Homeric poetry proved a strategic starting point because a small, but growing, community had created complementary, openly licensed digital resources of various kinds. Each of these represented a substantial achievement. All of these resources, when recombined in different ways, could (and were designed to) support uses not possible for any one of them on their own. We worked to create an initial integrated reading environment in which audiences could begin to see what was now possible.
Beyond Translation can be viewed as a sixth generation digital library in at last two ways.
First, consider digital libraries as cumulative layers:
catalogues with data about collections (such as print card catalogues that directed users to locations in a print library);
scanned images of the pages from the printed books accompanied by textual transcriptions automatically generated from the images by Optical Character Recognition software;
curated transcriptions with markup that identifies chapter and section breaks, distinguishes source text from footnotes, and can encode a wide range of features within the text;
machine actionable annotations, such as detailed linguistic annotations identifying various functions of each word in a text, that go beyond what was possible in print and that human experts manually compile;
annotations automatically annotated at scale that have used the manual annotations as training data; the fourth and fifth levels interact, with training data leading to automatic first drafts, that human editors correct and that automated systems in turn use as new training data (the Pedalion Greek Treebank provides an example of this);
systems (such as Perseus 6) that integrate different classes of annotation (named entities, translation alignments, linguistic, metrical, etc.) to support new forms of human reading and automated analysis.
At the same time, from a more pragmatic perspective, Beyond Translation represents the sixth major version of the Perseus Digital Library.
1992: Perseus 1 appeared on CD ROM and Videodisc as a publication that could be ordered from Yale University Press in 1992
1996: a much expanded Perseus 2.0, now without the analogue videodisc and spread over 4 CD Roms followed in 1996 (with a version available for Windows as well as Macs in 2000).
1995: Even as Perseus 2.0 was being prepared for publication, Perseus began to move to the then new world wide web in 1995, appearing publicly before the earlier, CD ROM-based Perseus 2 could appear from the publisher. Perseus 3.0 describes the online digital library which David A. Smith developed in Perl.
2003: (the Perseus Hopper) Eight years later in 2003 David Mimno, shifting to Java, created the initial version of Perseus 4. Perseus 4 evolved for ten years and remains the most widely used version, with more than 300,000 unique users a month during the 2022/2023 academic year.
2023: (Beyond Translation) Perseus 6.0 is announced on March 15, 2023, exactly 5 years after the release of the Scaife Viewer and 15 years after the passing of Ross Scaife.
Crane, Gregory. “Don’t Miss the Lexicographers for the Treebanks Philology in an Electronic Age.” Conference on the Cambridge Greek Lexicon, July 2002, https://www.academia.edu/82054421/Dont_miss_the_lexicographers_for_the_treebanks_Philology_in_an_electronic_age.
Looney, Dennis, and Natalia Lusin. Enrollments in Languages Other Than English in United State Institutions of Higher Learning, Summer 2016 and Fall 2016, June 2019, https://www.mla.org/Resources/Research/Surveys-Reports-and-Other-Documents/Teaching-Enrollments-and-Programs/Enrollments-in-Languages-Other-Than-English-in-United-States-Institutions-of-Higher-Education.
Shamsian, Farnoosh, and Gregory Crane. “Open Resources for Corpus-Based Learning of Ancient Greek in Persian.” Journal of Interactive Technology and Pedagogy, vol. 21, 2022, https://cuny.manifoldapp.org/read/open-resources-for-corpus-based-learning-of-ancient-greek-in-persian/section/.
Wood, Graeme. “Princeton Dumbs Down Classics.” The Atlantic, 9 June 2021, https://www.theatlantic.com/ideas/archive/2021/06/princeton-greek-latin-requirement/619136/.