data models – kgeographer

Representing Place in World Historical Gazetteer

Here are the slides for an invited talk I gave at my alma mater, U.C. Santa Barbara’s Center for Spatial Studies (spatial@ucsb) on October 27th — virtually of course. I have annotated them with speaker notes, outlined with dashed red lines.

GeoJSON-T: adding time to GeoJSON

GeoJSON-T is a proposed extension to the GeoJSON data standard (spec) widely used “for encoding a variety of geographic data structures,” principally in web maps. GeoJSON-T is described in the README file of its GitHub repository, but there is not yet a versioned specification. Now would be a good time to make refinements and write one. To join that discussion, see Issue #3 in the GeoJSON-T GitHub repository.

GeoJSON-T was initially developed in 2017 [1], motivated particularly by requirements of historical researchers. Many geographic features of interest have a temporal scope, and a standard way of describing geographic and temporal extents together would benefit creators of web applications meant to view and analyze such data [2]. A pilot map+time application, now called Linked Paths, was developed at that time to test and demonstrate its implementation.

The temporal attributes of “typical” geographic features such as countries, regions, provinces, cities, buildings, and monuments are important for understanding change over time. But we also routinely map and analyze spatial patterns in many other kinds of historical phenomena: events, and “event-like” features including conflict, births and deaths, finds of archaeological objects, as well as many kinds of geographic movement, including journeys and flows.

Just use “properties”?

The temporal attributes of a GeoJSON Feature can be represented as among its “properties,” and web map applications are routinely built to parse temporal data there and use it to display dynamical change. For example:

{ "type": "Feature",
  "geometry": {"type":"Point", "coordinates": [0.0,0.0]},
  "properties": {
    "name": "Null Island",
    "start": "1492",
    "end": "1500",
    "prop1": "some value", ...
  }
}

But there are several shortcomings inherent in this approach, including:

1) A given feature might change location or shape over time; this can currently only be handled by creating multiple Features for the same thing/place, each with different time properties. Better to put all of a place’s geometries in a GeometryCollection and let each have its own “when.”

2) Likewise, other properties of a feature might change over time, for example name and type (e.g. villa to town to city). These too can be temporally scoped with their own “when.”

3) The labels used for temporal property keys (‘start’ vs. ‘begin’ vs. ‘year’ etc.) vary per dataset, so data from one project can’t be linked to or combined with data from another unless there is prior coordination. Without a standard vocabulary and structure, there can’t be a generic, open-source “time map” library for rendering maps with various temporal visualizations [2].

4) There are no conventions for expressing the various types of uncertainty found in historical data, including vagueness, imprecision, unknown values.

5) Movement features such as journeys, lifepaths, and flows can contain multiple nodes and edges, each with distinctive associated timestamps or intervals. In cases, sequence is known but dates are not.

Just add “when”

What GeoJSON-T does is specify vocabulary and structure for a “when” object, which can be added in several locations outside of the “properties” element required by GeoJSON (foreign member in the vocabulary of the GeoJSON spec):

1) At the level of a Feature, applying to all geometries within it (simple example here; for all options see the draft specification):

{ "type": "Feature", 
  "geometry": {"type":"Point", "coordinates": [0.0,0.0]}, 
  "properties": { "name": "Null Island", "prop1": "some value", ... },
  "when" {
    "timespans":[{"start": "1492", "end": "1500", }]
  }
 }

2) At the level of an individual geometry within a GeometryCollection, e.g.

{ "type": "Feature", 
  "geometry": {
    "type":"GeometryCollection", 
    "geometries": [
      {"type":"Point", 
       "coordinates": [0.0,0.0],
       "when": {"timespans":[{"start": "1492", "end": "1495", }]}
      },
      {"type":"Point", 
       "coordinates": [1.0,2.0], 
       "when": {"timespans":[{"start": "1498", "end": "1500", }]} 
      }]          
    },
    "properties": { "name": "Null Island", "prop1": "some value" }
}

3) At the level of a FeatureCollection, applying to all of its Features

A “when” object can include optionally one or more timespans, and/or one or more named periods from a time period gazetteer. These and other optional properties are described in the repo README. Some proposed changes are listed in Issue #3 in the repo, which is open for comment.

Existing GeoJSON-compatible apps and libraries will simply ignore “when” objects, wherever they might be. Support for “when” would be included in any new software and libraries supporting the GeoJSON-T extension.

Current and planned adoption

In recent months, GeoJSON-T has received increasing attention, a couple of test implementations, and plans for more. These have highlighted some issues, and the draft format needs a closer examination for it to become a versioned specification of a standard format.

Linked Places format
The GeoJSON-T patterns for time were adopted for Linked Places format (LPF), developed in 2018 for contributions to the World Historical Gazetteer and Pelagios project’s Recogito platforms. LPF is not only valid GeoJSON, but valid JSON-LD, a syntax of RDF, and it standardizes the representation of several additional dimensions of Place beyond geometry and when: names, types, relations, descriptions, depictions, depictions, and links to corresponding external place records.

Linked Places format is therefore a specialized superset of GeoJSON-T, intended for historical gazetteer platforms specifically.

WebMaps-T
This early-stage project led by the British Library [3] seeks to develop a standard, customizable library for map+time visualizations in web applications, and would render data formatted as GeoJSON-T.

IIIF Maps Community Group
The International Image Interoperability Framework (IIIF) now has a working group dedicated to “defining best practice in associating geographical information with IIIF materials.” The Maps group is “explor(ing) creating a JSON schema [meeting] the needs of the IIIF community” specifically for map images, and is at least informally evaluating GeoJSON-T toward that end.

Notes

[1] In 2016, supported by a small Resource Development Grant from the Pelagios project, I met with colleagues Lex Berman and Rainer Simon to outline a GeoJSON format extension that could handle features representing historical geographic movement. GeoJSON-T and the Linked Paths pilot were products of that work. That project, titled “Linking Linked Places,” is documented in this blog post.

[2] The Timemap.js library developed by Nick Rabinowitz in 2008 joined the SIMILE Timeline library with several web mapping libraries of that era. It has fallen into disuse due to outdated dependency issues.

[3] An initiating hackathon for WebMaps-T took place in London in 2019, hosted at the British Library by Gethin Rees and Adi Keinan-Schoonbaert, with funding from a Pelagios Working Group small grant. Work to refine its early products continues, albeit slowed by lack of further funding. WebMaps-T is intended to have a modular structure permitting several types of temporal visualizations, including multiple timeline styles and histograms.

Event Centrality

I have been making a case for events in geo-historical information systems, periodically so to speak, over the past several years. Since I went “alternative-academic” (alt-ac) after completing my Geography PhD in 2010, I have neglected to publish material out of my dissertation—something I was cautioned strenuously against. I will begin rectifying that in bite-sized chunks on this blog; in time, maybe even write an article for an open access journal.

After nearly five years of building interactive scholarly works [1] others had in mind as a digital humanities research developer at Stanford, I’m moving on this Fall, and revisiting some of the things I was going on about in that dissertation. Oddly enough, I took a geography degree in order to better understand human history. Also odd perhaps is that I still like a lot of what I wrote.

It is titled “Representing Historical Knowledge in Geographic Information Systems,” and worth noting I was referring to (lower case, generic) geographic information systems, not GIS software packages like QGIS or ArcMap. Had no idea they were conflated so completely outside the discipline. Anyway, I began the introduction this way:

“The conceptual vehicle by means of which historians construct or analyze the contingency and temporal fatefulness of social life is the event. Historians see the flow of social life as being punctuated by significant happenings, by complexes of social action that somehow change the course of history.”

(Sewell 2005:8)

Complex historical events are dynamic geographic phenomena: they comprise human activity associated with particular locations on the earth surface, and their participants’ locations and attributes over time are integral to their analysis.

Events are the central and most comprehensive container for information about dynamic geo-historical phenomena. To describe an event well is to account for its purpose and results, its participants’ roles in component activities during some interval, its setting in terms of space-time locations and relevant condition states, and its relation to other events, including as elements of historical processes.

The representation of large numbers of events along those dimensions will enable a powerful “faceted browsing” capability, and spatiotemporal analyses supporting the discovery of underlying processes. The power of events as information containers will stem in large part from typing them along their numerous dimensions.

After four chapters explaining my ontology engineering methodology and analyzing print historical atlases to discover some of what I called “the stuff of history,” in Chapter 5 I finally got around to elaborating the case and citing the articulate originators of CIDOC-CRM:

If we make a simple graph of the things in atlas maps—say events, people, material objects, and settings—we see the only object directly connected to all the others is the event. Virtually all relationships between persons, things and places are a function of, or mediated by, events (Figure 5-1).

Doerr and Iorizzo (2008) describe how its (CIDOC-CRM) event-centered approach permits “a picture of history as a network of lifelines of persistent items meeting in events in space/time […],” an “extraordinarily powerful” model supporting a “surprising wealth of inferences” (p. 5-8).

Figure 5.1 Event centrality

From that simple conceit, I went on to postulate that:

… just as physical objects are composed of one or more material substance, discrete temporal objects like events are composed of activity. At least it will be useful in the design of certain information systems to represent things as such, whether or not in actuality there is something in the temporal realm corresponding to matter in the physical realm.

Then I drew a slightly crazy figure to illustrate that notion (a = activity, e = events, pr = processes, pd = historical periods, s = states):

Following that (skipping a step or two), I framed the engineering tasks to follow by enumerating six primitive “constructs” (I prefer “patterns” now) that are key elements of historical knowledge representation: Events and Participation; Place; Groups and Membership; Historical Periods; Historical Processes; and Attribution.

Not coincidentally, I’m finding that the research I’ve pursued since has focused on a few of these, which are interwoven when one approaches particular systems, such as historical gazetteer services and applications blending maps and timelines.

In particular a new (to me) pattern, of Setting merges Place and Historical Period has emerged in response to requirements of particular systems and applications. My work on Topotime is a result. I do think this is the way ontology design patterns are supposed to work: as modular and often connecting pieces expressing conceptual and theoretical bases for data models and interchange formats.

[1] Interactive Scholarly Works is a term I came up with back in the day, when Elijah Meeks and I worked out a taxonomy comprising Interactive Scholarly [Objects, Works, Publications] reflecting the range of things we were building at Stanford.

References

Doerr, M. & Iorizzo, D. (2008). The dream of a global knowledge network—A new approach. ACM Journal on Computing and Cultural Heritage 1(1): 5.1-5.23.

Sewell, W. H. (2005). Logics of history. Chicago: University of Chicago Press.

The Orbis Initiative: A Pelagios for Networks? [Take 2]

NOTE: This a “refresh” of the earlier post of the same title, edited to reflect some new terminology (indicated by red) and replace the conceptual model figure.

A small sampling of historical network datasets

I believe there would be widespread interest in a global collaboratively developed system, organized similarly to Pelagios, aimed at creating and linking data records for attested historical journeys (e.g. itineraries, and flows of people, commodities, information, correspondence) and ways (roads, rivers, canals, sea currents). In this provisional semantics, a journey is evidence of some person(s) or thing(s) moving from here to there (then there, etc.), at a known, approximate or estimated time and/or in a particular sequence, as attested in some source. A way is the physical medium for journeys.

Both journeys and ways can be represented as two or more places and one or more segments (nodes and edges in network parlance). Place nodes are necessarily “geographically embedded” and typically represented by feature centroids. The geometry of ways between nodes for various types of journeys may be known, estimated, or in the case of some flow data, of no concern.

Historical gazetteers in the Pelagios ecosystem represent only named places. Most are point-like features (e.g. settlements, sites); increasingly, polygonal features are included as well (e.g. regions, administrative areas). But what of historical movement—journeys between named places along ways? The simple data models used for the Pelagios interchange format and for most gazetteers do not accommodate journeys and ways.

Not surprisingly, the first early geographic document geo-parsed in Pelagios’ Recogito tool describes an itinerary: “Itinerarium Burdigalense: the Itinerarium Burdigalense (or Bordeaux Itinerary) […] a travel document that records a Pilgrim route between the cities of Bordeaux and Jerusalem.” Although we know each attested place was part of a traveled route, by virtue of its association with a text having “itinerarium” in its title, those relationships are not recorded formally in gazetteers, and therefore not readily discoverable and analyzable as routes and components of networks.

The Orbis Initiative

In February, 2015 I submitted a proposal to the National Science Foundation for a fairly large grant ($1.6m over 3 years) to develop the Orbis Initiative. Although reviews were quite positive, it was not funded. The project was designed to facilitate the creation, archiving, discovery, linking, and analysis of historical geospatial network data for “everywhere and every when” [1-page summary]. The project name was borrowed from an interactive scholarly web application I helped build, originally published by Stanford University Libraries in 2012 and significantly upgraded in 2014, ORBIS: The Stanford Geospatial Network Model of the Roman Empire (hereafter, ORBIS: Rome).

Whereas ORBIS: Rome is a model of travel and transport for a particular region and period aimed at answering the research questions of one Classical scholar, Walter Scheidel–and built by Scheidel and Elijah Meeks–the Orbis Initiative would instead be a system for creating, storing, and linking geospatial network data spanning potentially all places and periods—a distributed repository along with a set of relatively simple interactive web-based tools to facilitate its use. The design and proposed development of the Orbis Initiative is a response to researchers who have expressed a desire to build ORBIS: Rome-like applications for their own areas and periods of study. Importantly, the intent is not to expand the ORBIS: Rome network transport model, but to provide a generic data infrastructure and tools to facilitate development of other models and modeling approaches.

I remain convinced this would be a worthwhile undertaking and subsequently, two opportunities have emerged to begin some of the work described in the grant proposal, at a much smaller initial scale; I’ll discuss one of them here.

A Community of Interest?

Writing the Orbis Initiative grant entailed recruiting collaborators with varied exemplar datasets being developed for ongoing research. Several of those projects are concerned with processes of cultural diffusion and commercial activity—separately and in concert—in East and Central Asia and between Asia and Europe over extended periods. Their aggregated temporal extent is 7^th century BCE to 16^th century AD. Researchers in those groups, and now a few others, have indicated an immediate pragmatic interest in exposing and linking their data for common benefit. Meetings to discuss next steps have begun.

Something Like Pelagios

An Orbis Initiative would replicate several aspects of the Pelagios Project, which has gained terrific momentum in developing online resources, methods and software for linking historical gazetteers. I believe Pelagios’ success is due in large part to its “ground-up” nature—the fact it answers some immediate requirements of a distinct community of interest for the Classical Mediterranean. Its spatial and temporal extents and software tool development scope are growing organically, expanding upon smallish proofs-of-concept that people find useful. Tools developed so far facilitate data creation (Recogito) and data discovery (Peripleo). The Pelagios approach offers a stark contrast with some “build it and they will come” data repository projects attempted in recent years.

In the same vein, a pragmatic start to an Orbis Initiative could be seeded by meeting the requirements of the above-mentioned community of interest to link (and in a sense gather) their historical geospatial network data: connections by road, river, canal, and sea route between the places attested in Pelagios-compatible gazetteers.

A Conceptual Model

So, networks of journeys and flows are different in kind from place locations as commonly understood, and as such require a different, somewhat more elaborate data model. Furthermore, while all spatial data may include temporal attributes, some network data—itineraries for example—are inherently temporal; in fact they are events. Flows are essentially aggregated movement events.

In my experience a helpful first step in data modeling is to create a conceptual model of the entities and relations of what is being represented—an ontology design pattern if you will. Typically a collaborative undertaking, the resulting visualization provides a basis for the data schemas to follow, be they relational or graph. I’ve taken a ~~first~~ second stab at such a model, borrowing a bit from a recently published trajectory pattern (Hu, et al 2013); input is invited and essential.

Data Format

The GeoJSON data format is in common use and provides a good starting point for a standardized representation of trajectories and paths. Granting that much data is initially gathered in spreadsheets, by and large if it is to be mapped or analyzed spatially, it makes its way into human-readable GeoJSON or the binary shapefile. GeoJSON represents geographic Features in a FeatureCollection, and spatial attributes are represented in a required Geometry object, but time is not accounted for natively. Although temporal attributes of a Feature can be recorded as one or more of a Feature’s Properties there is no norm or best practice for this and mapping software that consumes GeoJSON does not typically look for or make use of temporal attributes.

This can potentially be remedied by an extension to GeoJSON, such as the Topotime format I’ve been developing. Topotime data is valid GeoJSON, but it includes a new, optional When object, and leverages the sparingly used GeometryCollection object that is found in the GeoJSON specification.

One of my tasks at hand—which I welcome collaborative input on—is testing the efficacy of the Topotime model for the several types of historical geospatial network data found in the wild. I’ve begun posting some sample data to the Orbis Initiative GitHub repo.

The Basics of Topotime

Topotime was initially conceived as a means for representing historical temporal data that is vague and otherwise uncertain, for visualization in browser timeline software and for the analysis of probabilistic relationships between and amongst events and periods.

The goals of the Topotime project have recently both broadened and simplified considerably—it is now aimed at extending the GeoJSON format to account for time (including some of the difficult historical cases), without breaking GeoJSON. That is, Topotime data would be recognized as GeoJSON by any software that supports GeoJSON. The work-in-progress described on the Topotime repo is now a little behind samples I’m pushing to the Orbis Initiative repo (kgeographer/oi).

I am working through varied and more complex data examples. When a suitable data format is settled, I’ll write some basic software that accesses Topotime’s unique attributes to browse and search several exemplar datasets.

The following is a snippet to give a sense of it:

Next Steps

This effort does not have institutional support at this time, but if enough people feel it’s worth pursuing, we should seek it. UPDATE: A small group of colleagues and I will be submitting grant proposals soon.

As mentioned above, a small group representing several active research projects focused on Asian maritime and land routes will be meeting soon to assess whether Topotime or something like it is appropriate for a “Pelagios for Networks.” We will make our results public for discussion, through this blog and the Pelagios Linked Past SIG forum. More later…and comments are welcome.

The Orbis Initiative: a Pelagios for Networks?

A small sampling of historical network datasets

I believe there would be widespread interest in a global collaboratively developed system, organized similarly to Pelagios, aimed at creating and linking data records for attested historical trajectories (e.g. itineraries, routes, commercial flows, correspondence) and paths (roads, rivers, canals, sea currents). In this provisional semantics, a trajectory is evidence of some person(s) or thing(s) moving from here to there (then there, etc.), at a known, approximate or estimated time and/or in a particular sequence, as attested in some source. A path is the physical medium for trajectories.

Both paths and trajectories can be represented as two or more places and one or more segments (nodes and edges in network parlance). Place nodes are necessarily “geographically embedded” and typically represented by feature centroids. The geometry of paths between nodes for various types of trajectories may be known, estimated, or in the case of some flow data, of no concern.

Historical gazetteers in the Pelagios ecosystem represent only named places. Most are point-like features (e.g. settlements, sites); increasingly, polygonal features are included as well (e.g. regions, administrative areas). But what of historical movement—trajectories between named places along paths? The simple data models used for the Pelagios interchange format and for most gazetteers do not accommodate trajectories and paths.

The Orbis Initiative

In February, 2015 I submitted a proposal to the National Science Foundation for a fairly large grant ($1.6m over 3 years) to develop the Orbis Initiative. Although reviews were quite positive, it was not funded. The project was designed to facilitate the creation, archiving, discovery, linking, and analysis of historical geospatial network data for “everywhere and every when.” The project name was borrowed from an interactive scholarly web application I helped build, originally published by Stanford University Libraries in 2012 and significantly upgraded in 2014, ORBIS: The Stanford Geospatial Network Model of the Roman Empire (hereafter, ORBIS: Rome).

Whereas ORBIS: Rome is an authored model of travel and transport for a particular region and period aimed at answering the research questions of one Classical scholar (Walter Scheidel), the Orbis Initiative is instead a system for creating, storing, and linking geospatial network data spanning potentially all places and periods—a distributed repository along with a set of relatively simple interactive web-based tools to facilitate its use. The design and proposed development of the Orbis Initiative is a response to researchers who have expressed a desire to build ORBIS: Rome-like applications for their own areas and periods of study. Importantly, the intent is not to expand the ORBIS: Rome network transport model, but to provide a generic data infrastructure and tools to facilitate development of other models and modeling approaches.

A Community of Interest?

Something Like Pelagios

A Conceptual Model

So, networks of trajectories are different in kind from place locations as commonly understood, and as such require a different, somewhat more elaborate data model. Furthermore, while all spatial data may include temporal attributes, some network data—itineraries for example—are inherently temporal; in fact they are events. Flows are essentially aggregated movement events.

In my experience a helpful first step in data modeling is to create a conceptual model of the entities and relations of what is being represented—an ontology design pattern if you will. Typically a collaborative undertaking, the resulting visualization provides a basis for the data schemas to follow, be they relational or graph. I’ve taken a first stab at such a model, using a recently published trajectory pattern (Hu, et al 2013) as a point of departure; input is invited and essential.

Data Format

One of my tasks at hand—which I welcome collaborative input on—is testing the efficacy of the Topotime model for the several types of historical geospatial network data found in the wild.

The Basics of Topotime

The goals of the Topotime project have recently both broadened and simplified considerably—now essentially aimed at extending the GeoJSON format to account for time (including some of the difficult historical cases), without breaking GeoJSON. That is, Topotime data would be recognized as GeoJSON by any software that supports GeoJSON.

Work-in-progress is described, with a few toy examples, at https://github.com/kgeographer/topotime. I’m planning to work through varied and more complex data examples soon, then write some basic software that accesses Topotime’s unique attributes.

The following is a snippet to give a sense of it:

Next Steps

This effort does not have institutional support at this time, but if enough people feel it’s worth pursuing, we should seek it.

As mentioned earlier, a small group representing several active research projects focused on Asian maritime and land routes will be meeting soon to assess whether Topotime or something like it is appropriate for a “Pelagios for Networks.” We will make our results public for discussion, possibly through the Pelagios SIG infrastructure. More later… and comments are welcome.