Tuesday 24 September 2013

Pelagios 3 Overview

After our initial announcement we thought it would be good to go into a little more information about our work plans for Pelagios 3. This post provides a summary of the stages ahead, and as we begin new phases of the project we will provide additional detail about each workpackage. You can also read a PDF of the full project description.

Mission

The mission of Pelagios 3 is to annotate, link and index place references in digitized Early Geospatial Documents (EGDs). EGDs are documents that use written or visual representation to describe geographic space prior to 1492.

Primary objectives:

(i) provide an index of toponyms attested, and the places they refer to (where known), in all available EGDs, accessible both as Linked Open Data and via the Pelagios Web Service;
(ii) create an open and semi-automated toolset that allows the scholarly community to enhance and refine the index incrementally, by annotating place references in further historical sources;
(iii) develop a freely available analysis workbench and contextualization widgets that enable researchers to bring together spatial documents in new and innovative ways and provide key contextual information as embedded content in third-party websites.

We will carry this out through a series of nine workpackages. Three Infrastructure Workpackages (IWPs) will deal with the mechanics. Six Content Workpackages (CWPs) will deal with content related to specific historical regions and periods.



Workpackages


IWP1: Gazetteer Infrastructure

IWP1 will establish the common gazetteer infrastructure necessary to form the bodies of Pelagios annotations. Pelagios is grounded in the idea of a “Gazetteer ecosystem”: URI-based gazetteers that are specific to a spatial, temporal or cultural milieu and maintained and curated by their respective research communities, but aligned through the principles of Linked Data and a common, overarching referencing framework. (Hereafter we refer to all such URI-based gazetteers simply as ‘gazetteers’). In order to arrive at an initial, pragmatic version of such an infrastructure, two challenges need to be addressed: (i) a common, generic gazetteer data model needs to be identified which suits the needs of the different individual stakeholders involved; (ii) referencing frameworks need to be agreed, through which different gazetteers can cross-link to each other.

IWP2: Annotation Toolkit

IWP2 will facilitate pragmatic solutions to the issues of transcription and identification by assembling a toolkit of both automated and manual methods and technologies that can be tailored to a specific document. The following software tools will be the results of IWP2:

  • an assistive image processing tool that automatically pre-identifies toponym candidates on digitized old maps;
  •  a tool (integrated with the previous browser interface) to visually enhance pre-identified toponym candidates to aid manual transcription;
  •  manual annotation and transcription tools that focus specifically on simplifying navigation and selection within high-resolution digitized EGDs;
  • a recommender system that proposes plausible toponym options to the annotator (seamlessly integrated with the overall annotation browser interface);
  • a management dashboard to extract, compile, edit and export lists of annotations, and prepare them for linking and upload into the Gazetteer infrastructure;
  • a publishing tool to present annotated items online.


IWP3: EGD Workbench

IWP3 will develop tools that allow end-users to navigate, visualise, interpret and compare the annotations generated in CWP1-6. These tools will operate on top of the Pelagios API, which will be extended to support the updated Pelagios 3 annotation data model. Concrete visualization software components to be developed will include:

  • a browser interface containing a synchronized map-, timeline-, and network-based visualization; 
  • a tool to drill down to explore specific properties of an annotation set (equal to one or more collections or specific EGDs), such as its spatial coverage or the sequence of the toponyms contained within it, and compare it against other annotation sets. 
  • a visual search interface which enables end users to discover collections that are particularly salient with regard to a specific area and time of interest.


CWP1: Latin Tradition: 

Example EGDs - Antonine Itineraries, Ravenna Cosmography, Bordeaux Itinerary, Vicarello goblets, Natural History (Pliny), Chorographia (Pomponius Mela), Peutinger Table, Divisio Orbis Terrarum, Dimensuratio provinciarum, Notitia Dignitatum, Ora Maritima (Avienus), Periegesis (Priscian), De Mirabilibus Mundi (Solinus)

CWP2: Greek Tradition: 

Example EGDs - Geography (Strabo), Armenian Geography, Suda, A Sketch of Geography in Epitome (Agathemerus), Manual of Geography (Ptolemy), Description of Greece (Pausanias), Synecdemus (Hierocles), Christian Topography (Cosmas), Epitome of the Ethnica (Stephanus of Byzantium), Description of the Roman World (George of Cyprus), the Madaba Mosaic, The Dura Europos Shield, the Iliad, the Odyssey, texts in Minor Greek Geographers vols. 1 & 2.

CWP3: Early Christian Tradition

Example EGDs - Gough Map, Italie Provincie Modernus Situs, Description of the World (Marco Polo), On the Vicissitudes of Fortune (Niccolo de Conti), Fra Mauro Map, Erdapfel (Martin Behaim), World Map (Henricus Martellus Germanus), Genoese Map, De Virga world map, Vesconte World Map, Bianco World Map, approx. 320 sundry EGDs from the British Library

CWP4: Early Maritime Tradition

Example EGDs - Le Liber (portolano), Lo Compasso (portolano), approx. 180 Portolan charts (Pujades 2007), Catalan Atlas (Cresques Abraham).

CWP5: Early Islamic Tradition

Example EGDs - Image of the Earth (Al Khwarizmi), al-Kashgari World Map, Tabula Rogeriana (al-Idrisi) Book of Curiosities, Maps of the Balkhi School

CWP6: Early Chinese Tradition: 

Example EGDs - Yujitu (‘Map of the Tracks of Yu’), Songhuiyao, Chinese Buddhist Temple Gazetteers, ‘Record of Buddhistic Kingdoms’

Monday 16 September 2013

Pelagios Chapter 3: Early Geospatial Documents

A few weeks ago we trailed that we had some exciting news and now we can finally announce it. Thanks to the generosity of The Andrew W. Mellon Foundation, Pelagios is entering a third, even more ambitious phase. We will be extending the Pelagios approach to all early geospatial documents up to 1492 (a game-changing year for the history of cartography). This means that we'll be dealing with texts and maps, not only from the ancient Greco-Roman worlds, but also the early Byzantine, Christian, Maritime, Islamic and Chinese traditions.


With a digital place index of maps and descriptions of the world in place, researchers and the general public will be able to explore online the historical significance of both famous and obscure places in the history of geography. As just one example, Claudius Ptolemy used London as one of his primary reference points for global time zones in the late second century, just as we do today. While such coincidences may be rare, and many places in early maps and texts are unidentified, or existed only in the popular or religious imaginations, our aim is to help their rich biographies to be told. With such an unprecedented variety of data linked together, it will be possible to trace in broad terms the continuities - and discontinuities - of people's responses to the world around them. Equally exciting, and thanks to the continuing annotation of data by Pelagios growing community of partners, you'll also be able to bring together disparate fragments of its life history, its connections with other places, its stories and imagery.

The project raises significant technological challenges as well. First of all we will need to make sure that URI-based gazetteers (standardised lists of places) are available for all of our periods and regions, and aligned with one another so that they can be cross-referenced. This means working not only with our old friends at Pleiades, but also with new ones at the China Historical GIS and PastPlace. Then we will need to use a raft of methods, old and new, to identify toponyms in texts and images, and in a range of languages. Optical Character Recognition (OCR) – a computer-based method for the automatic recognition of text in digitized images – is inadequate for use with medieval handwritten script. Therefore we are developing new, semi-automatic methods, which employ image processing and statistical approaches to eliminate as much of the tedious manual work of transcription as possible. Third, we will need to relate those place references to the gazetteers, building on the knowledge and expertise of a network of experts, along with a few tricks of our own. Places that we can't identify we intend to throw out to the public, along with any clues we have available, to invite the wider community to have a go. Finally, we continue to work on the Pelagios search API and web interface so that the results will become ever easier to work with and incorporate in other digital resources online.

In addition to the continually growing community of projects providing content about all these places, we will be working in collaboration with specialists from all around the world, including from the British Library, Queen Mary, University of London, KCL, the University of Portsmouth, the University of Edinburgh, the Orient Institute of Beirut, the Institute for the Study of the Ancient World, Drew University and Harvard University. If you would like to get involved in any way, please do contact us!


Tuesday 3 September 2013

How Dickinson College Commentaries linked up with Pelagios

Thanks to the Pelagios Project, Dickinson College Commentaries has recently stepped up into the world of linked geographical data, and I am very grateful to Elton Barker, Rainer Simon, and Leif Isaksen at Pelagios, and to Tom Elliot, Sean Gillies, and Sebastian Heath at Pleiades for making it possible. In this post I want to talk about how Pelagios and Pleiades have helped us and our users, and to say a little bit about the work flow on our end.

screen shot of Dickinson College Commentaries

DCC explores a model of textual commentary that tries to take full advantage of the digital medium, harnessing the best of traditional philological, historical, and archaeological scholarship, and focusing on the user experience in a way to enhance reading, rather than just searching. We’re not really a database, but a reading environment, so we try not to bury the user in information, but to offer scholarly guidance informed by teaching experience. We also have some limitations financially and institutionally. We are lucky to have an endowment at the Department of Classical Studies at Dickinson, on which we can draw to hire undergraduate students. And we have a strong support system in the Academic Technology unit at Dickinson, where Ryan Burke built the structure of our site in Drupal, and helps to maintain and improve it. But we have no graduate students, no dedicated programmers or web developers, and no full time staff. I teach a full load at Dickinson and do this in my spare time, as it were, with help of a number of colleagues at other institutions who are on our editorial board. This is all to say that I have to be careful about not getting in over my head when it comes to site maintenance. I value user functionality and solid content above all, but simplicity runs a very close third.

Pelagios, with its machine linking of places mentioned in our commentaries to the unique place identifiers in Pleiades, delivers simplicity itself. On our end what needed to be done was to create a single file that listed all of our geographical annotations, with their locations (urls). We already had Google Earth maps made in summer 2012 by Dickinson student Merri Wilson, that contained placemarks with all places mentioned in two of the existing commentaries, each placemark annotated with Pleiades URIs (unique identifiers). A third Google Earth map, for Caesar’s Gallic War, did not have the Pleiades URIs, and all the linkages in the other two commentaries (Sulpicius Severus’ Life of St. Martin and Book 1 of Ovid’s Amores) had to be checked for errors. Archaeology and Classics major Dan Plekhov was perfect for this job, which required a good knowledge of ancient geography, Latin, Greek, and solid research skills. He worked in Carlisle for 8 weeks in the summer of 2013, with approximately two weeks devoted to this aspect of the project.

Meanwhile, computer science major Qingyu Wang investigated the .RDF format we were to use for the comprehensive file, and the very specific formatting required by Pelagios. This is not exactly the kind of thing computer science majors do all day, but she taught herself the skills she needed to complete the work, spending about a week on it all told. She was aided by good advice from Sebastian Heath at New York University, and Rainer Simon of Pelagios. We had to invent a human-readable code for our specific type of annotations—so we could keep track of things and every annotation would have a unique designation—then put all that into a format that Pelagios could deal with. My role was deciding on concise but informative conventions that fit our material. Once we figured all that out, Qingyu created the .RDF file that specifies the linkages between a unique ancient place as referred to in Pleiades, with a specific annotation on a page of our site. Now, when you go to that place in Pleiades (Gallia, for instance), under "Related Content from Pelagios" you will see "Pleiades urls Dickinson College Commentaries." So someone exploring Gaul could now go straight to DCC, read Caesar’s account, or watch our little video of the famous opening paragraph of the BG.

Here are some examples of the lists of references we adapted from the Pelagios template. The first is a reference to the Alps in Sulpicius Severus' Life of St. Martin, section 5.

<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:oac="http://www.openannotation.org/ns/" xmlns:dcterms="http://purl.org/dc/terms/" rdf:ID="sulpicsev-martin-5.4-alpes">
  <rdf:type rdf:resource="http://www.openannotation.org/ns/Annotation"/>
  <oac:hasBody rdf:resource="http://pleiades.stoa.org/places/783"/>
  <oac:hasTarget rdf:resource="http://dcc.dickinson.edu/sulpicius-severus/section-5"/>
  <dcterms:creator rdf:resource="http://dcc.dickinson.edu/"/>
  <dcterms:title>"Sulpicius Severus, Life of St. Martin 5.4"</dcterms:title>
</rdf:Description>

The Gallic tribe the Boii in Caesar, Gallic War 1.5:

<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:oac="http://www.openannotation.org/ns/" xmlns:dcterms="http://purl.org/dc/terms/" rdf:ID="caesar-bg-1.5-boii">
  <rdf:type rdf:resource="http://www.openannotation.org/ns/Annotation"/>
  <oac:hasBody rdf:resource="http://pleiades.stoa.org/places/197173"/>
  <oac:hasTarget rdf:resource="http://dcc.dickinson.edu/caesar/book-1/chapter-1-5"/>
  <dcterms:creator rdf:resource="http://dcc.dickinson.edu/"/>
  <dcterms:title>Julius Caesar, Gallic War 1.5</dcterms:title>
</rdf:Description>

Mt. Olympus in Ovid, Amores 1.2.39:

<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:oac="http://www.openannotation.org/ns/" xmlns:dcterms="http://purl.org/dc/terms/" rdf:ID="ovid-amores-1.2.39-olympusmons">
  <rdf:type rdf:resource="http://www.openannotation.org/ns/Annotation"/>
  <oac:hasBody rdf:resource="http://pleiades.stoa.org/places/491677"/>
  <oac:hasTarget rdf:resource="http://dcc.dickinson.edu/ovid-amores/amores-1-2"/>
  <dcterms:creator rdf:resource="http://dcc.dickinson.edu/"/>
  <dcterms:title>"Ovid, Amores 1.2.39"</dcterms:title>
</rdf:Description>

Our full .rdf file is available here.

Another aspect of that process, in a sense the reverse of it, was the automatic channeling of data from Pleiades into DCC, via the addition of thumbnail pop-ups on the names of places mentioned in the notes fields. As of this summer, when you mouse over such a linked place name in DCC, a thumbnail with a small map pops up, with the link to Pleiades.

screen shot of text and notes to Sulpicius Severus with thumbnail popup to Pleiades


The beauty of this is that one does not have to navigate away from the text to get an idea of where roughly the place is; but at the same time, Pleiades is only a click way. Qingyu and Ryan Burke made this happen, using a bit of css code created by Sebastian Heath for use in his ISAW papers. One nagging issue is that when viewed on an iPad, the pop-ups do not go away, and one must reload the page to get rid of them. But I view this is a superb use of the digital medium to enhance the reading experience. Geographical knowledge is delivered on time, as needed, unobtrusively, right there beside the text, in way simply impossible in print. And all that is required, once the css code is in place, is to create the normal html link in the Drupal editor.

I’m here at a liberal arts college doing digital humanities at a fairly small scale, compared to what’s going on at large research universities, or at a well-funded outfit like the Perseus Project. Small size has certain advantages, I suppose, but the biggest danger is probably isolation. On an organizational level I try to avoid that by reaching out to colleagues at other institutions and getting them involved, as the Bryn Mawr Classical Review has done so successfully. But Pelagios offers DCC and projects like it an equally potent way to combat isolation, by allowing our small project to make a contribution to the much larger world of linked geographical data. Maybe someday there will be a similar infrastructure of sharing linked data about ancient persons, texts, and material objects as well, and I’d like to be there adding to it.

Chris Francese (francese@dickinson.edu)