Thursday 13 November 2014

Bringing About the SEA CHANGE

About two weeks ago, on Friday October 31, we held the first of two annotation workshops funded through the Open Humanities Awards, designed to gather data through our Recogito "crowdsourcing" interface. The Heidelberg University Institute of Geography kindly agreed to be our host for this inaugural event. A big thank you goes to Lukas Loos for setting up our visit and taking care of local organization, and to Armin Volkmann for his spontaneous decision to merge his geo-archaeology seminar with our workshop on that day.

And with what an effect. We were blown away by the results! In just two hours, our 27 participants made 6.620 contributions to 51 different documents (19 text and 32 maps). We've written a comprehensive report over at the DM2E blog. Be sure not to miss it!

Tuesday 14 October 2014

Greece (is the Time, is the Place, is the Motion)


It turns out The Bee Gees were right. We've wrapped up work (for now) on Greek early geographic documents and the experience has made it clear that time, place and motion do indeed feature heavily.

First a few statistics. Our objective - as always - has been to identify sources for as many documents as we could, both in the original Greek and in modern translation. Wherever possible we have used open access, online materials so that people can access the texts and read them for themselves. This time we have identified some 66 works, of which we were able to obtain digital texts for 42 of them (and 8 in both languages). You can see our list of available texts on the Recogito public site and we’d be very happy to hear any suggestions for working with those texts which are still missing. Pau has been working like Greased Lightning over these long Summer Nights to produce a remarkable 48,000 edits (and counting)!

Pau has not been alone in this work either. We’ll talk more about the new Recogito Editors group in a future blog post, but for now we’d like to say an especially big thank you to Brady Kiesling who donated a large number of pre-annotated texts from his wonderful ToposText project, and even did some translation to boot. Shout-outs also go to Bruce Robertson, Greta Franzini and Monica Berti for their help in OCR’ing Greek geographic texts.

Thanks to Rainer’s hard work, the Recogito interface is really starting to shape up. Not only are new features such as detailed user- and document-stats being added regularly, but there’s now a tutorial for users, and various small enhancements were made to the front page (e.g. temporal ordering of documents, so that you can start to see the development of ancient geography at a glance). There are other major changes afoot for our third Content Workpackage on the early Christian tradition… but you’ll have to wait for another blog post to hear more about that.

Just like last time, we’ve generated a preliminary heatmap of our work on the Greek sources so far. Even incomplete as it is, it’s fascinating to see our authors focus not only on the Aegean Sea, Magna Graecia and the Black Sea, but also their explorations along the Red Sea, the Atlantic and even the Silk Road. 



So what about those sources? The list of documents we’ve been working with includes some of the biggest and most important in the history of geography, including Strabo, Herodotus and the immense Suda. We said that Greece was the place, but in fact what we are really talking about, and what emerges from these early investigations, is just how many places the "Greek world" comprises of and how many places "Greek knowledge" extends to. Time also plays an essential role. From Ptolemy’s "Hour Intervals", which divide up the world like the face of a huge celestial clock, to the Spartan Cleomenes's alarming realisation that it was not a matter of days to travel to the Persian capital but months, time is used to try to make sense of, or express bewilderment at, the vast distances being talked about. And Greek geography is not just static, but frequently in motion, with stadiasmoi, periploi, itineraries and even the occasional International Business TravellerWe hope you enjoy exploring these documents as much as we do. If you’d like to get involved and help us annotate the rest, please do get in touch. We'll go together like....

Thursday 2 October 2014

“How many miles to Babylon?”

The answer to this famous nursery rhyme – “three score and ten”, i.e. 70 km – seems outrageously high for a day's journey, no matter “if your heels are [exceptionally] nimble and light”. (Even swift-footed Achilles would struggle to cover 70 km in day!) So, what are we to make of it? How can we evaluate such a distance number?

This is where the "database ancient measurements" comes in. The project was initially sponsored by Berlin's excellence cluster TOPOI and is managed now by my IT whiz Rainer Streng who set it up in MS Access and programmed data exports into "Google Earth" and applications in "ArcGIS 10". Irina Tupikova, who by day is an astronomer and mathematician, is also working on Ptolemy's data, recalculating the spherical coordinates to the original measurements.

As far as we know, there is no comparable collection of this kind. Right now, our database includes nearly 100 ancient authors and their works, especially ancient geographers and historians (Strabo, Pliny, Hero­do­tus, Thucydides etc.), but also minor authors like the pseudo-Aristotelian work de mundo or Horace's Satires. All in all we have in our database 2466 "distances", i.e., attested routes with two points and a number. (Among them eleven routes for Babylon, and even bigger figures for a "day´s journey" than the 70 km in the nursery rhyme, if you are interested!)

What can one do with these data? We think: a lot! To start with:

  • How accurate and reliable were ancient measurements data?
  • What units are attested and how do they relate to each other? This is a basic and notorious question in the field of ancient metrology.
  • Who measured or rather estimated distances in antiquity? Soldiers, explorers, merchants? Were there any attempts to map a whole country or empire and standardize the many distances in antiquity? If so, was this a "bottom-up" process done by practitioners like seamen or merchants or a "top-down" one, organized by a central administration?

But there are potentially much more searching questions, such as:

  • How does an ancient author employ numbers, especially distances as a means to engage with his readership, in order to bring home his own ideas or concepts? Authors like Herodotus or Thucydides were very careful (and sometimes even deceptive!) in using numbers in their narratives.
  • How can we use measurement data to explore one of the most basic, important and comparable properties of space is its extension, its spatiality? If researchers concern themselves with spaces, they should not ignore this aspect (as they mostly do). Distances are a means to evaluate the different concepts of space the ancients had in mind. But they allow us also to reconstruct not only the real maps of ancient geographers but also the "mental maps" of merchants, soldiers, intellectuals etc.
  • How can a corpus of ancient measurement data allow us to reconstruct ancient routes and waterways and, in addition, social phenomena like migration or mobility? The ambitious application Rainer works on now, is an ancient network of ancient routes and waterways (something like Orbis or Omnes Viae, but based on our measurement data).

To give just a small example:

The green line depicts the route between Tridentum and Rome, a route, which according
to the "codex Theodosianus" (6.28.1) can be covered in 34 days. 34 days of travel
are "normally" equivalent to c. 850 km (a day´s journey calculated as 25 km). The
linear distance according to Google Earth is 477.58 km. But our ArcGIS model shows that
the route on known Roman roads is in fact 90 km lon­ger, i.e. 567.40 km.

The meetings with the nimble-footed Pelagios team (zigzagging between several locations all over Berlin in one and a half days) helped us sharpen our own profile and scientific approach tremendously. Fine-tuning our data and making it compatible and interoperable with the other Pelagios partners will be undertaken over the upcoming months. Watch this space!

Monday 29 September 2014

Taking to the high seas: introducing Pelagios phase 4

This month sees the start of another new and exciting phase of Pelagios. With funding from the Arts and Humanities Research Council's Digital Transformations programme, we will be exploring the transformative potential of our linked open data network for doing research. In short our brief is to address the question, "ok, now we can link stuff online—so what?"

In response to the challenge posed by "data silos" (the mass of independently produced material uploaded onto the Web), since 2011 we have been developing the means of linking online resources via their common references to place. This has involved "annotating" the place names found in documents and aligning those references to a global gazetteer service (for the ancient world, this is Pleiades). Using Pleiades's Uniform Resource Identifiers (or "social security numbers") for each ancient place as our glue, it is now possible to agree that places mentioned in different materials are one and the same (e.g. Classical Athens and not "Athens, Georgia"). Users are now able to move seamlessly between and search the records of a growing list of international partners.

Thus each place annotation made in the document doesn’t just attach useful spatial information to a resource; it also provides a way of linking to other resources. But, as Andrew Prescott, leader of the AHRC’s Digital Transformations strand, has recently written: 'Scholarship is much harder than [the ability to link]: we need to be clear about why we are linking data, what sort of data we are linking, and our aim in doing so'. Our one-year grant from the AHRC looks to unlock the potential of our place network to reveal previously unknown connections between different places and different documents (texts, databases, maps, etc.).

In particular what we want to do is to use these new links between different documents to rethink key periods in the history of cartography. Until now digital resources have largely concerned issues of accuracy and visualization; i.e. to pinpoint the locations of ancient places with respect to our contemporary topography. What we want to do, rather, is to try to reconstruct and interpret the markedly different ways in which pre-modern authors and mapmakers conceptualized the world. Turning the spotlight on to five moments in time, Pelagios 4 will explore how ancient or pre-modern authors used various means to grasp, represent and communicate spatial knowledge of the world around them.

To conduct this research Pelagios is happy to announce the following scholarly collaborators:

  • Pascal Arnaud, Professor of History at Université Lyon 2 and senior member of the Institut universitaire de France (IUF), is the leading specialist in ancient geography and navigation.
  • Tony Campbell is former head of the British Library’s ‘Map Room’ and the pre-eminent expert on Portolan Charts.
  • Marianne O'Doherty, Lecturer in English at the University of Southampton, has published on medieval European travel narratives, geography and cartography.
  • Klaus Geus, Chair of Ancient Geography at FU Berlin, co-ordinates the TOPOI Excellence Cluster in ‘Common Sense Geography’. He is joined by Irina Tupikova, a leading mathematical astronomer with an interest in the history of science.

We look forward to working with these scholars and rethinking the ways in which geographic space was imagined and represented before the advent of modern Cartesian cartography.

Portolan chart by Jorge de Aguiar (1492), the oldest known
signed and dated chart of Portuguese origin.

Citation: "Jorge Aguiar 1492 MR" by Jorge de Aguiar - Beinecke Rare Book and Manuscript Library, University of Yale, New Haven, USA. Licensed under Public domain via Wikimedia Commons

Tuesday 24 June 2014

What Have the Romans Ever Mapped for Us? Results from the Latin Geographic Tradition

Having recently completed our first content workpackage (CWP1), dedicated to Early Geospatial Documents from the Latin Tradition, we'd like to take this opportunity to share the annotation data that we've compiled so far. Overall we have completed annotating place references in 33 documents (41 if we include additional language versions of the same document). Within these documents, we've identified 19,880 toponyms, and were able to establish mappings to Pleiades in 15,721 cases (79%).

Spatial distribution of toponyms annotated in CWP1 - Latin Tradition

You can find the complete list of our documents, along with a download link for the data, below. The annotations are stored in CSV format - i.e. they can be opened in a spreadsheet application, or imported into a database or GIS.

An additional part of our work in CWP1 (which will be important too for all our content work packages) has been to identify additional relevant documents as we go along. Our list of "geospatial documents" has therefore grown quite substantially. We have included these documents in our annotation tool Recogito. You can follow their status directly on Recogito's Latin Tradition landing page.

Now that we have finished work on the first of our six traditions, we are keen to get your feedback. In particular:

  1. We look forward to seeing what and how you make use of these data. We're sure that you'll use them in ways that we can't anticipate, and we'd love you to share that with us!
  2. The large number of documents, the ambiguity of the evidence and the comparatively short space of time mean that some of our identifications will inevitably be wrong or open to debate. We are planning a mechanism to allow people to suggest alternative suggestions or to indicate agreement and disagreement. In the meantime, feel free to contact us if you have proposals for corrections. We're also happy to hear suggestions for other early latin geographic documents which we may have missed.
  3. Since, as we expected, we could not fully annotate or geo-resolve all of our documents, we're interested in hearing from people who might be willing to join us in the challenge. If you would like to join in and help find places in the incomplete documents, feel free to get in touch and we may be able to provide you with a Recogito account. We'll have to roll this out slowly since Recogito is not a 'community tool' as such (with features such as full moderation, user profiles, etc) so please be aware that there may be a wait if we get a lot of volunteers!

Thursday 22 May 2014

Future Footnotes, Reverse References and Bottomless Maps

Pelagios and the Graph of Historical Data 
are continually evolving in a number of directions that can sometimes make it hard to answer the simple question: ‘What are the benefits?’ Regular readers will know that these are many and various (and by no means all accounted for) but in this blogpost we’d like to outline two important ones that we've been thinking about for a while. We call these ‘Future Footnotes’ and ‘Reverse References’ (more on Bottomless Maps at the...well...bottom). 

Footnotes and references are pretty much the defining feature of textual academic discourse. It isn’t enough to have a bright idea or discover something remarkable – we have to relate our ideas and discovery to a wider body of research in order to locate them in the scholarly debate. But both suffer from severe limitations: footnotes can only point backwards and references can only point outwards.

Footnotes allow us to provide additional information that provides context or authority for an idea in a text (or even an image in the form of captions). Sometimes they are descriptive, but more often than not they are cross-references to previously written material – as an author we obviously can’t know about future material at the time of writing.  Because the graph of historic data is continually growing, Pelagios links act as a Future Footnotes. Annotation allows us (and anyone else in fact) to create hooks that connect our texts to both old and new material as it becomes available, without having to manually update those links ourselves. So, for example, when we annotate a reference to Londinium, for example, we don’t just say ‘here are the other references to Londinium that I am aware of at the time I wrote this’, but ‘here are other references to Londinium that the community is aware of, at the time you are reading it’.

References, on the other hand, allow us to provide evidence that backs up a point that we are making. They unilaterally point outwards because we only have the opportunity to refer to other work. In contrast, by annotating content we are simultaneously contributing our information to a wider cloud and so we create a Reverse Reference – i.e. it becomes directly available to other people through their own annotations. And to flip the logic of Future Footnotes, we don’t merely make it available to works that will be annotated in the future, but we make it available to works that have already been annotated as well. Thus we immediately make our work more accessible to precisely the people who might find it interesting.

So the benefits of Pelagios, and the Graph of Historical Data in general, are that they both future-proof and mutualise the cross-referencing that underpins academia in a way that has never been possible before. The analogies to footnotes and references aren’t quite perfect because they don’t account for the authorial stance – i.e. the desire for an author to selectively identify content, but they do indicate how radical this development in one of the most fundamental practices of academia can be. 

There is one additional benefit that we also think is essential: because the graph is open and distributed, it’s possible to create services that allow for seamless interlinking between online resources. In other words, you don’t have to go to a centralised portal or search service to discover relevant material. You simply discover it naturally, through hyperlinked references and footnotes in online books or articles (or webpages, or pictures, or maps, or songs or videos...and so forth). 

Of course curated portals and search services are valuable too, which leads us to our Bottomless Maps. There has been much discussion of the Deep Map in recent years – interactive digital maps that contain content that extends beyond the visual surface. Bottomless Maps, like the Pelagios heatmap for example, link an ever-growing (and thus to all intents and purposes infinite) quantity of content to the places they depict. While the scope of the Pelagios project is restricted to historic geographic concepts, the model which we have been collaboratively developing is applicable to any other kind of reference (people, periods, classifications, canonical text citations, and so forth). An ecology of other projects is now springing up to support this, so the combined (and evolving) graph of humanities data will ultimately become much more significant than Pelagios itself. We look forward to a future in which such cross-referencing is just as commonplace as footnotes and references are today.

*We here expand the notion of the Graph of Ancient World Data to include any content of a historical, classical or archaeological nature.

Friday 7 March 2014

Greeking Out

This week marks a new and exciting milestone in the Pelagios 3 project - the start of work on the ancient Greek geographic tradition. There's more Latin to do of course: our work packages run on a staggered, overlapping 6-month basis, and, while we already have 19 documents in the system (some in both Latin and their modern language translation), future additions will include some major itinerary lists—including the Antonine Itineraries and Ravenna Cosmography—as well as a number of smaller but fascinating geographic sources such as the Haidra mosaic, some more inscribed vessels, and the Piazzale delle Corporazione at Ostia.

But from today we'll start introducing Greek documents into the system. Ancient Greek traditions of knowledge about geography extend far beyond Plato's "frogs around a pond" metaphor for Greek settlements around the Aegean Sea. From Homer's Odyssey, Greek texts push the boundaries of travel, exploration and knowledge, and Odysseus, the man who 'saw the cities of many men and knew their minds', stands as the archetypal explorer for Greeks who settled in places as far off as the Black Sea, Massalia (Marseille) and Libya. Later Greek authors like Hecataeus, HerodotusAristotle, PytheasEratosthenes, Hipparchus, Posidonius, Artemidorus and Ptolemy are largely responsible for the way we conceptualise geography today (indeed, Eratosthenes invents the discipline), and we still use the terms that they came up with—terms such as equator, meridian, parallel, latitude and longitude. At the same time, much Greek geography is almost cosmological in nature—an attempt to understand the form of the earth and its place in the universe.

Remarkably, however, given the number and detail of these ancient witnesses, almost no Greek maps survive, and it is debate whether maps were even a feature of Greek traditions of geographical knowledge. (A map documented in Herodotus's Histories, carried by a certain Aristagoras of Mytilene, becomes the site of contestation and debate, while Herodotus himself 'laughs at' the schematic representations of his contemporaries.) Instead Greek conceptualizations of the world were almost exclusively in a narrative form, from numerous periploi (sailing itineraries) to Strabo, whose Geografica remains central to our understanding of global geography in the transition to Empire. 

Working with Ancient Greek texts will introduce some new challenges for us to tackle. To begin with change in alphabet will take a little getting used to for some of the team! Fortunately recent work by Bruce Robertson, Greg Crane and others on OCRing ancient Greek means that we should be able to include a range of previously inaccessible texts. We can also draw on experience form the Hestia project and a promising new approach developed by Thomas Efer at the University of Leipzig that can identify toponyms in a Greek text by comparing it to a previously marked up English text. We don't yet know what will be the most efficient combination of methodologies but at least we have plenty to choose from.

We have enormously enjoyed working with the Latin texts and will continue doing so, but the possibilities for analysis opened up by annotating documents from these two strongly related yet radically divergent traditions are incredibly exciting.

Jerusalem depicted in the Madaba Mosaic (6th C. AD). Image from Wikimedia Commons.

Tuesday 25 February 2014

Latin Groove

In our two previous posts we introduced Recogito, a tool we are developing in order to efficiently extract, annotate and verify geographic references in texts. The development of Recogito is still continuing at full steam, and the team (and Leif in particular ;-) is feeding our feature backlog with a steady flow of new ideas & requirements. But despite the fact that there’s still a slight ambience of a busy construction site around Recogito, we have not just been developing. We have also been using it heavily to annotate new documents.

Prior to the start of Pelagios 3, we assembled a list of potential ancient sources to work on in each content work package. The sources we selected are specifically geographical works, i.e. documents where the authors give accounts of their world in their time. For some of the more extensive sources (such as Pliny’s Natural History), we restricted ourselves to only the specifically geographical chapters.

At the moment, we are about halfway through our first content work package, dealing with the Latin tradition (3 months out of 6). It’s therefore a good time to share with you the progress we made so far. The first three documents – the Vicarello Beakers, the Bordeaux Itinerary and Pliny’s Natural History – we already introduced previously. We've since found our groove and the list has grown much longer. Here are some documents we are currently working on:

Fig.1. The Bordeaux Itinerary (Part 1) in Recogito (» View Map)

Pomponius Mela: De Chorographia (around 43 AD)

Pomponius Mela lived during the government of Claudius and presumably died around the year 45 AD. His most famous work, cited by other great geographers such as Pliny the Elder, was De Chorographia. This work was composed of three volumes and was developed during the decade of the 40s. Each of his books is dedicated to an area of the known Roman world. In the first volume, Mela generally describes the world and its regions, the Mediterranean coasts of Africa and the Near East, starting from the Strait of Gibraltar. The second volume describes the coasts from the Near East to Hispania, where he talks about Greece, Italy and Gaul. Finally, the third volume describes the Atlantic territories, Britannia, and all remote territories, such as the German Limes, Arabia and India. » Map in Recogito

Laterculus Veronensis (AD 304-324?)

The Laterculus Veronensis is a listing of the various Roman provinces that existed during the governments of Diocletian and Constantine. Its chronology is therefore located between the years 284 and 337. The work is named due to the origin of the single manuscript that has been preserved in the Library of Verona. This source describes twelve dioceses gathering a total of over 100 provinces. » Map in Recogito

Avenius: Ora Maritima (AD IV)

Rufius Avienus Festus was an Etrurian poet, astronomer and geographer who lived in the 4th Century AD. He wrote several books and poems, the most prominent was Ora Maritima. This work is based on the Greek journey of Eutimenes of Massalia from the sixth century. Avienus used other sources such as the work of the first century BC Greek historian Ephorus. The use of this kind of ancient sources has introduced much confusion, making some places difficult to locate, and resulting in a mix of parts originating from very different times. » Map in Recogito

Rutilius Namatianus: A Voyage Home to Gaul (AD 416)

Rutilius Namatianus was born in southern Gaul, probably at the beginning of V century AD. He was a poet, but his only preserved work is the poem De reditu suo libri duo. It must have been written between 416 and 420 AD, and is composed in elegiac meter. Originally written in two volumes, the poem describes a trip down the coast from Rome to Gaul. Unfortunately, however, many parts (especially from the second volume) are lost, and the extant text stops at the port of Moon. » Map in Recogito

Jordanes: Getica (AD VI)

Jordanes lived during the sixth century AD and was of partially Gothic origin. It is believed that during his public career he was a notary and that he might further have had a religious career, coming to be a Bishop. Jordanes' fame comes from two major works, De regnorum ac Temporum successione, a world history from the creation to the 6th century, and De Origine et Rebu Getarum Gestis, better known as Getica. The latter one we have included in Pelagios 3 (restricting to the chapters with geographic descriptions). It is the only preserved source that explains the origin and characteristics of the Goths. » Map in Recogito

Bede: The ecclesiastical history of our island and nation (AD 703)

Bede, also referred as a Saint Bede, was born in England in the seventh century AD. He was a monk in the kingdom of Northumbria. Bede is known for his work Historia Ecclesiastica gentis Anglorum, completed around the year 731 AD. This work consists of multiple volumes. It begins with the invasion of Caesar in 55 BC and ends with the fifth book, in the time of Bede himself. In Pelagios 3, we only have included the first chapters of this source, which are devoted to a geographical description of the British Isles. » Map in Recogito

Ammianus Marcellinus: Roman History (before 391)

This is a document we are currently starting to work on. Ammianus Marcellinus was a historian in the fourth century AD, probably born in Antioch. After developing his military career, he wrote one of the most famous stories of antiquity. His Res Gestae described the history of Rome from the government of Nerva in 96 to the Valeno’s death in 378. Unfortunately, the first thirteen books were lost, and the remaining eighteen contain missing parts. Only the last books survive, and are dedicated to the events between the years 353 and 378. Like in other cases, we only included those chapters where the geographic aspect was most prominent. » Map in Recogito

In numbers, we have already progressed to a total of 20.164 annotations (as of today), with an overall verification rate of 37.3% (which means we've confirmed more than 7.500 place references so far). But there are more Latin sources on our list which we yet have to address over the next three months. And our Greek content work package is about to start as well. So lots of exciting work ahead of us.

You can follow our progress live at http://pelagios.org/recogito!

- Ada, Pau & Rainer

Tuesday 21 January 2014

There's Pliny of Room at the Bottom1 - Introducing Recogito Pt. 2

In our last post, we introduced Recogito, a tool we built to verify and correct the results of our automatic text-to-map conversion process. Last time, we've focused primarily on Recogito's map-based interface, in which we clean up the results of geo-resolution - the step that automatically assigns gazetteer IDs to toponyms.

In this post, we want to talk about Recogito's second view: the text annotation interface. And as usual, we'd like to seize the opportunity to introduce our next Early Geospatial Document along with it: the Natural History by Pliny the Elder.

Naturalis Historia

The Natural History (Naturalis Historia) by Pliny the Elder is an encyclopedia published ca. AD 77–79. This amazing work covers the Roman civilization's knowledge about astronomy, geography, zoology, botany, medicine and mineralogy. In total, it consists of 37 books, and builds on more than 400 sources from the Latin and Greek worlds. Books 3, 4, 5 and 6 focus on geography. In these books, Pliny describes the known world from the Atlantic to the Near East, and from the North of Europe to Africa. He records all the peoples and cities known, with all the geographic features prominent in each territory, such as rivers, mountains, gulfs, or islands.

Fig. 1. Pliny Books 3 and 4 - work in progress in Recogito.

Recogito Text Annotation UI

The Natural history is the largest text we have addressed so far. Fig.1 shows our current progress with it. (In numbers, we're through the toponyms of Book 3 by 98%, and have just started Book 4 - now at 5.5%). It also differs from our previous itinerary texts, in the sense that it's prose, and not structured into an almost 'tabular' format. Time to enter our 'reading view' in Recogito: the text annotation interface.

Fig. 2. Recogito text annotation interface.

The text annotation interface (see Fig. 2) is the place where we inspect and correct the results of geo-parsing - the automatic processing step that identifies toponyms in our source texts. Initially, when we start off with a new document, this view shows us our source text, marked up with grey 'highlights' wherever the geoparser thinks it has identified a toponym. We can then remove false matches, annotate toponyms the geoparser has missed, or modify things the geoparser got wrong (e.g. merge multiple identifications into one, turning separate consecutive identifications such as 'Mount' and 'Atlas' into a single toponym 'Mount Atlas').

Going through the source texts is a time-consuming task, and we have made every attempt to make the process as quick and painless as possible. The video above shows how the interface works in practice. Select text in the user interface as you would normally (using click and drag with your mouse, or double click), and confirm the action in the dialog window that pops up. Depending on what you select, the tool will automatically perform the appropriate action: either create a new annotation, delete one, or modify the annotation(s) in the selection. To speed up work even further, there is also an 'advanced' mode that skips the confirmation step.

There is one more thing you can see in Fig. 2: annotations are coloured to indicate their 'sign-off status'. We have already talked about this briefly in our previous post. It's a consequence of our practice to manually check every annotation before releasing it to the wild. Green annotations are those we have verified, and where we have confirmed a valid gazetteer ID). Yellow are the ones we've verified as valid toponyms - but for whatever reason we were yet unable to identify a suitable gazetteer ID for them. Grey are the ones we've either not looked at yet; or they are still 'work in progress' and we just haven't verified their gazetteer mapping.

Combined with the map-based interface you can think of this as creating the two parts of an annotation. The text annotation interface presents us with a reference to a place in a document (the 'target' of the annotation in Open Annotation terminology), while the map interface identifies a place in a gazetteer (the 'body' of the annotation). Although there are two steps to the process, they are fairly quick and easy. Maybe even fun!

1 "There's Plenty of Room at the Bottom" was a lecture given by physicist Richard Feynman in 1959. The talk is considered to be a seminal event in the history of nanotechnology, as it inspired the conceptual beginnings of the field decades later.

Monday 13 January 2014

From Bordeaux to Jerusalem and Back Again: Introducing Recogito (Pt. 1)

Welcome back to another update from our Infrastructure Workpackage 2 - "Annotation Toolkit", affectionately known as IWP2. In our previous IWP2 post, we talked a little bit about the basics of annotating place references in early geospatial documents. We also presented a first sample dataset based on the Vicarello Beakers. What we did not talk about yet, however, is how we actually annotate our documents in the first place.

The general plan behind the Pelagios annotation workflow is this:

  1. We use Named Entity Recognition (NER) to identify a first batch of place names automatically in our source texts. This step is also called "geo-parsing", and tells us which toponyms there are in our text, and where in the text they occur. We implemented NER using the open source Stanford NLP Toolkit, and presently restrict this step to English translations of our documents. In a later project phase, we intend to cross-match the data gathered from the English translations to the original language versions, which is likely more feasible within the lifetime of the project, than trying to attempt latin-language NER.
  2. NER gives us the toponyms. What it does not tell us anything about, however, is which places they represent, or where these places are located. Next, we therefore look up the toponyms in our gazetteer, and determine the most plausible match. This step is called "geo-resolution", and - like NER - is also fully automated.
  3. Naturally, neither geo-parsing nor geo-resolution work perfectly. Therefore, we need to manually verify the results of our automatic processes, correct erroneous NER or geo-resolution matches, and fill gaps where NER or geo-resolution have failed to produce a result at all. And this is where our new Tool Recogito comes in.

Fig. 1: data from the Bordeaux Itinerary in Recogito (interactive version in Latin and English).

The Itinerarium Burdigalense

The first document we've tackled entirely in Recogito is the Itinerarium Burdigalense: the Itinerarium Burdigalense (or Bordeaux Itinerary) is a travel document that records a Pilgrim route between the cities of Bordeaux and Jerusalem. It is considered the oldest Christian pilgrimage document, dated in 333 AD - which is just 20 years after the Edict of Milan from 313, when the Emperor Constantine granted the religious liberty to Christians (and other religions). Formally, this document is very similar in some aspects to the Itinerarium Provinciarum Antonini Augusti: both of them are compiled as a list of places with the distances between them. Additionally, the Itinerarium Burdigalense also marked all the places as mutatio, mansio or civitas (change, halt or city) in a similar way as the Peutinger Table. The format of the document changes when the travel arrives to Judea, where it offers detailed descriptions of important places to Christian Pilgrims. So we can consider it an itinerarium in the tradition of Greek and Roman writing, except for its Christian emphasis. (We've compiled a detailed bibliography for the Itinerarium Burdigalense here. The text of an English translation can be found, for example, on this Website.)

Annotating the Bordeaux Itinerary with Recogito

Recogito presents the results of our automatic processing steps in two flavours: in a text-based user interface, which is primarily designed to inspect and correct what the geoparser has done; and in a map-based interface which is used to work with the results from the geo-resolution step. A screenshot of the latter is shown in Fig.2, and we will explore it in more detail below. The former interface (which benefits from a little pre-knowledge of the map-based interface) we will disucss in a separate blog post.

Geo-Resolution Verification & Correction

The map-based interface separates the screen into a table listing the toponyms, and a map that shows how they are mapped to places. The primary work area for us in this interface is the table: here, we can scroll through all the toponyms and quickly check the gazetteer IDs they were mapped to. As a matter of policy, we want to explicity keep track of which toponyms have been looked at by someone, and which haven't. To that end, each entry in the table can be 'signed off' as either a verified gazetteer match, an unknown place, or a false NER detection. (In addition, there is also a generic 'ignore' flag, for toponyms that may be correctly identified in a technical sense, but which we don't want to appear in the map for whatever reason.)

Fig. 2: Recogito map-based geo-resolution correction interface.

Double-clicking an entry in the table opens a window with details for the toponym (Fig.3): the window shows the previous automatic gazetteer match (if any), the latest manual correction, and a text snippet showing the toponym in context. A lists of suggestions for other potential gazetteer matches, along with a small search widget allows us to quickly re-assign the gazetteer match in case it is incorrect. The change history for each toponym is recorded so we know who has change what (and when), or whether there are places that may see substantially more edits than others in the long run. Furthermore, manual changes are recorded separately from the initial automatic results. This way we will be able to benchmark the performance of NER and automatic geo-resolution later on. Detailed figures for the Bordeaux Itinerary are not yet out - but our initial figures suggest that NER has caught about 2/3 of all toponyms; and that approx. 80% of NER results were correct detections. The automatic geo-resolution correctly resolved between 30%-40% of the toponyms.

Fig. 3: toponym details.

While Recogito is still under heavy construction, Pau is already deeply buried in the next document - which we will present in one of our next blogposts, together with an overview of the text-based interface.

Tuesday 7 January 2014

The day of Pelagios: Berlin 11.12.13

Before the seasonal break of mince pies and Glühwein, the Pelagios team held a meeting in Berlin to address a range of issues relating to geospatial data aggregation and analysis. The fact that we were holding this in Berlin reflected the fortunate co-presence there of a number of different digital humanities initiatives. Our hosts were the German Archaeological Institute (or DAI), the ICT Director, Reinhard Förtsch, along with his researchers Philipp Gerth and Wolfgang Schmidle. Others joining us were:
The meeting presented us with the opportunity to talk first about Pelagios and its evolution. The Pelagios model of phases 1 and 2 uses annotations to facilitate linking (in our case through common references to places) rather than trying to unify different models. By enabling linking, each partner’s site also serves as a gateway to another, thereby maximizing the potential discoverability of these resources and avoiding fruitless attempts at creating individual portals that are supposed to do everything. Yet, even if we are decentralized, for linking to be facilitated we need a lightweight structure.

In Pelagios phase 3 work is concentrating on three areas. Since we are extending our model into new regions and time periods, gazetteers - essentially databases of place names - are crucial. Again our approach is to enable the linking between resources rather than trying to build a super gazetteer that contains all place names over time. With the aim of aligning gazetteers, we are currently investigating interoperability: What might a gazetteer 'ecosystem' look like? Options include using popular gazetteers as a backbone, though each come with drawbacks (the Getty Thesaurus of Geographic Names is heavily curated, minimizing community involvement, while Geonames includes extraneous information like every hotel in Berlin), and the SKOS vocabulary 'close match' label to enable links between gazetteers. For the meeting we've brought along a first preview of our 'cross gazetteer search', which runs on top of the linkages between the datasets from Pleiades and DARE. A screenshot of the user interface to the system is shown below.

Figure 1. Cross-Gazetteer Search Preview UI

Our second task is to enable annotations to be made on primary data (both textual and visual), so that place names can be identified. Initial attempts at building a toolkit for annotating texts will be discussed in forthcoming posts on this blog. As for the challenge of annotating maps, two questions are particularly relevant: where can we get computers to do the heavy lifting? And where do humans have to come into the loop? Finally, we are also investigating ways of visualizing the resources in our network. Our heat map provides an early indication not only of the spatial spread but also the intensity of the resources.

These three areas—relating to gazetteer interoperability, annotation methods and visualization—were the subjects of discussion.

Gazetteers
The DAI started work in May to build a gazetteer of the Institute’s archaeological and bibliographical records. They have also been working with Wikidata and Wikimedia to explore how knowledge about the Roman frontier (the ‘Limes’) can be aggregated and used. One such example is an interactive timeline (seen below), showing how the border changed over time. Markus Schnöpf is currently working on a gazetteer for the Islamic world, which could help provide the basis for future Pelagios activity with Islamic texts. Meanwhile, at Stanford, Josh Ober’s team are developing a digital version of Mogen Hansen’s Polis inventory, which will not only provide a comprehensive dataset of settlements in ancient Greece, but also allow them to be searched in various ways using a simple browser plug in map. (Watch this space for developments.) These projects join a list that includes Pleiades, the Digital Atlas of the Roman Empire, Chinese Historical GIS, and Past Place, as the key protagonists taking the first steps towards creating a gazetteer ecosystem.

Figure 2. An interactive timeline of the Roman ‘Limes’ (frontier)

Annotation methods
With Greg Crane’s Humboldt Professorship at the University of Leipzig, various new initiatives are being launched with the aim of utilizing digital resources for the study of the ancient world. One of these, the Historical Languages eLearning Project, is experimenting with e-learning strategies for teaching ancient Greek and Latin based around annotation. Pelagios could work with this team to help in cases of disambiguating names that prove too challenging for our automated workbench, or to experiment with using games to scale up annotation over larger number of documents. The ARIADNE project, here represented by Martin Doerr and Gerald Hiebel, is laying the foundations for inferencing over data rather than just data retrieval (which is what Pelagios focuses on). In particular, the CIDOC-CRM model adopted by ARIADNE uses a formal structure for describing concepts and relationships that, while more complex semantically, is compatible with the Pelagios annotation model; moreover, the results of Pelagios can be used as the basis for CRM-compliant data.

Visualization
Throughout the discussion, we were also concerned about visualization developments that can help in the understanding and analysis of potentially massive datasets. Dirk Wintergrün presented on GeoTemCo, a platform for visualising spatio-temporal data. This potentially looks very powerful, and will be especially interesting once temporal content (derived from e.g. publication dates, person references and other sources) are combined with place annotations. We give one example below, since it provides a new way of looking at data that members of the Pelagios team have produced in a previous project, GAP. Figure 3 shows GAP data from Herodotus and Pausanias in GeoTemCo, enabling the analysis and comparison of geographical referencing of these different books. In particular, Marian Dörk demonstrated a wide range of exciting visualization possibilities that could answer specific research questions and more generally appeal to the general public.

Figure 3. A comparison of places in Herodotus and Pausanias, using GAP data in GeoTemCo