topotime – kgeographer

I’ve recently been co-developing with colleague Elijah Meeks something called Topotime, which at this stage is experimental software for rendering timelines and doing some computational reasoning about historical timespans, such as calculating overlap. The first adjective we use to describe this work is pragmatic, because we felt we had thought hard enough about time versus temporality for digital humanities work [1], and built enough temporal data models and timelines, that we should begin some concrete steps to “operationalize” [2] our views and personal wishlists in some working software. The results to date have just been publicly released on GitHub, and we hope other will participate in its further development. Elements of the Topotime data model and software are novel (we think) but it is built around a couple of common and successful design patterns.

First, Topotime models Periods in PeriodCollections, much as GeoJSON models Features in FeatureCollections. GeoJSON Features have a typed geometry and unlimited number of user-defined properties. Topotime Periods have typed timespans (tSpan) and unlimited user-defined properties. Topotime can be written as a JSON object, just as GeoJSON is. I find the symmetry between representation requirements for spatial things and temporal things astonishing, although it would probably not surprise physicists. For starters, both have names, metrical representations (geometries, even), and are usefully typed. The close relationship between places and periods will be a refrain on this blog.

The second borrowed pattern is representing the uncertain boundaries of intervals as intervals themselves, not “instants” (there aren’t very many instants in historiography). The result is a quad of start (s), latest start (ls), earliest end (ee), and end (e). The first and third of these can be stated in natural language as “not before,” and the second and fourth as “not after.” This pattern appears in Simile Timeline and in several scholarly works I cited in an earlier blog post.

Topotime extends that pattern to allow any of these to be qualified as “about” or “approximately” (~) some day, month or year. It also parses an elaboration of the starting and ending spans (sls and eee respectively). The result is a function returning a probability y for any time x. The area under the function’s curve, although not a useful number in and of itself, can be used to good effect in computing overlap with other period or event timespans, and with query areas (as discussed in this short paper [PDF], and earlier demonstrated by Kauppinen et al [3]). The Topotime model also permits specifying intermittent, multi-part timespans which can be cyclical or irregular.

Meeting of minds (and conceptualizations)

Topotime’s name, courtesy of Elijah, stems from our wish to capture certain topological relations between periods (their timespans actually). We can know a period or event began after another and not know when that is exactly but wish to represent and reason about that adjacency. Similarly, we may know two events (lives, e.g.) overlapped, but have only minimal information about their starts and ends.

As it turned out, tackling that issue led to a more involved data model. Its hard to know where to put the bounds on development projects, due to the EAGER principle we live by here: Everything’s A Graph and Everything’s Related. Both Elijah and I have been working at event data models for multiple projects for several years, and this was an opportunity to operationalize some of our individual perspectives, which differ but seem to have important overlaps as well.

These are a couple of the agreements and how they’ve appeared in Topotime so far:

There are temporal things, which include events, historical periods, and lifespans of things, people and groups (e.g. nations). They all share some representation requirements, so in software we can make a super-class for them, potentially specializing distinctive differences in sub-classes later. But for the time being every temporal thing is a Period, for lack of a better all-encompassing term, and we don’t do anything different for events, lifespans and historical periods. If you add an attribute like class or css_class to the generic periods you can make them render distinctively in a timeline app.
Periods have meaningful relationships to other Periods, some of which are non-topological. For this, Topotime recognizes a relations[ ] array of simple subject-predicate-object triples. This will be written as JSON-LD soon, and therefore be Semantic Web compatible. That is, although relationships between the timespans of two events are metrical, measured, and possibly incidental (they overlap, abut, are disjoint, etc.), relationships between periods are a different thing. The most basic is compositional, or mereological ( Peter Simons’ Parts: A study in ontology is fascinating, and short). Events are composed of or contained by other events. We use a part_of relation for this.

Other relationships researchers might wish to encode include caused, required, led_to, etc., none of which we deal with yet. At minimum we might like to visualize our understandings and arguments about these in timeline interfaces (perhaps along the lines of Nowiskie and Drucker’s PlaySpace 2003 [1]). Quite possibly, we can find further interesting ways to compute over them, but they first must find their way into data models.

[1] Although not as hard as Bethany Nowiskie and Johanna Drucker! I only recently came across a trove of their interesting work theorizing time v. temporality, and building out pilots for novel timeline applications for digital humanities. For example, the Temporal Modelling Project and PlaySpace 2003 [screenshots]

[2] A term with plenty of history, but recently the subject of a really nice Stanford Lit Lab pamphlet by Franco Moretti.

[3] Tomi Kauppinen, Glauco Mantegari, Panu Paakkarinen, Heini Kuittinen, Eero Hyvönen, and Stefania Bandini. (2010). Determining Relevance of Imprecise Temporal Intervals for Cultural Heritage Information Retrieval. International Journal of Human-Computer Studies, Volume 68, Issue 9, pp. 549-560 , Elsevier. Preprint PDF

semi-intervals_1 — Fig. 1 – Christian Freksa’s (1992) semi-intervals – Allen’s interval relations as components of temporal conceptual neighborhoods, discussed below

When my colleague Elijah Meeks recently tweeted about the possibility of a temporal topology data standard (“topotime” as he called it), my reaction was: Fantastic! Maybe the time has arrived, so to speak, for a proper Period datatype in relational databases like PostgreSQL, to meet the needs of historical scholarship—a comprehensive means for qualitative reasoning about historical time. And while we’re at it, how about a generic Period ontology design pattern that could be used in any RDFS/OWL representations? It’s not that a start towards topotime hasn’t been made, only that we can advance things considerably if we as a community get specific about general requirements. Hmm…specifics about generality.

Our standard options in relational databases at the moment are to use one or more ISO 8601 date fields or integer fields to cobble together something that meets our immediate requirements: for example, either a single DATE or YEAR, or START and END fields in a form of either yyyy‑mm‑dd, or nnnn. We can then use the operators <, >, and = to readily compute the 13 relations of Allen’s interval algebra (before, meets, overlaps, starts, during, finishes-and their inverses-plus equals). In RDF-world, we find the Allen relations are present in CIDOC-CRM.

What more could we (humanist representers of time and temporality) possibly want? That question was the topic of a short talk I gave in a recent panel at the DH2013 in Lincoln, NE. How about a single Period field for starters—a compound date?

In fact, an existing extension for PostgreSQL written by Jeff Davis provides this (https://github.com/jeff-davis/PostgreSQL-Temporal), and I’ve used it several times. Davis provides, along with operators for standard Allen relations, several more to get finer grain, e.g. to differentiate between before (overlaps-or-left-of) and strictly-before. There are also numerous functions for computing relationships in SQL statements. A Period is entered as a date array that looks like this:

[ (yyyy-mm-dd), (yyyy-mm-dd) ]

The begin and end dates (and parts thereof) are still accessible using first(period) and last(period) functions, and these can be used in concert with PostgreSQL’s built-in date-part and interval functions to calculate periods of interest on the fly. For example, in a recent project we converted birth and death dates to Period lifetimes and calculated contemporaries as individuals who were adults ( >= 17 ) at the same time: overlaps( (first(lifetime)::date + 17 years, last(lifetime)), (1832-01-01, 1874-11-23)).

semi-intervals_2 — Fig. 2 – The “survived-by” conceptual neighborhood merges several semi-interval relations

If you happen to be using PostgreSQL, this helps with many use cases, but we can and should go much further. I made a baby step in the course of dissertation research, by writing a series of Postgres functions to perform some minimal computation over Christian Freksa’s temporal conceptual neighborhoods (sets of 13 semi‑intervals) using the Period datatype (Fig. 2). These neighborhoods are sets of semi-interval relations corresponding to some common (and not so common) reasoning tasks. For example, survived-by merges less‑than, meets, overlaps(left), starts, and during. Freksa’s algebra has many more elements which I didn’t use, but should be considered going forward.

Now, what of uncertainty in its many forms—the vague, probabilistic, and contested data we routinely encounter? The many classes of uncertainty have been outlined in a fairly exhaustive taxonomy a decade ago by historical geographer Brandon Plewe (2002), and that work should be helpful in future modeling efforts. If an event began “most likely in late Spring, 1832 (Jones 2013),” when should its representation appear in a dynamic interactive visualization having a granularity of months? When it appears in a time-filtering application, how should it differ from an event that began in “April, 1832 (Smith 2012)?”

Application logic to do something about such cases would need an underlying temporal entity having a probability (0 – 1) and/or some kind of ‘confidence’ weight. If we’re talking about the span of the event, it’s a period bounded not by instants (dates) but by periods, each with an author and probability/confidence value.

In fact, some very nice research to formalize such temporal objects using periods bounded by periods has been done in the context of historical/heritage applications. Members of the FinnOnto group (Kaupinnen et al 2010) have developed a formal representation and algebra for fuzzy historical intervals (Fig. 3).

kaupinnen_1 — Figure 3 – The period ‘‘from around the beginning of the 1st century B.C. to the first half of the 1st century A.D.’’ represented as a fuzzy temporal interval. The fuzzy bounds for start and end are 10- and 14-year periods respectively.

holmen_1 — Fig. 4 – Deduction rule for A1 < A2, where A1, A2 are two points in time modeled as intervals.

In the realm of semantic (ontological) representations, Holmen and Ore (2009) have developed a database system based on the event-centric CIDOC-CRM that includes an algebra (Fig. 4) and temporal analyzer module to reduce fuzziness and aid in the creation of event sequences as “Stored Story Objects.” Like the previous work, period starts and ends are represented as intervals.

Ceri Binding (2009) developed a CIDOC-CRM based representation of multiple attestations of historical periods and their extents for the archaeological project, STARS.

All of the work I’ve mentioned seems to me compatible in fundamental respects. I believe that as a community of interest can we can collaboratively develop a few shared resources that would be very helpful for many research projects. For example, a Linked Data repository of historical periods along the lines of what Pleiades/Pelagios does for places in the Classical Mediterranean. Lex Berman of the Harvard Center for Geographical Analysis has given this a lot of thought and done some prototype work, as have others. What is the right venue for making this happen?

Another concrete goal is extending the Period datatype for PostgreSQL to allow a probability or confidence term for each bounding period. Once that is worked out, someone might even port it to ArcGIS. Yeah.

NOTE: These and related topics are among those to be addressed by a proposed new GeoHumanities SIG for the Alliance of Digital Humanities Organizations (ADHO) I’m co-instigating with Kathy Weimer of Texas A & M. Further word on that within a week or so.

Cited works

Binding, C. (2009). Implementing archaeological time periods using CIDOC CRM and SKOS. CAA 2009 Proceedings (http://hypermedia.research.southwales.ac.uk/media/files/documents/2010-06-09/ESWC2010_binding_paper.pdf)

Freksa, C. (1992). Temporal reasoning based on semi-intervals, Artificial Intelligence 54, 199-227
(http://cindy.informatik.uni-bremen.de/cosy/staff/freksa/publications/TemReBaSeIn92.pdf)

Kauppinen, T., Mantegari, G., Paakkarinen, P., Kuittinen, H., Hyvonen, E., Bandini, S. (2010). Determining relevance of imprecise temporal intervals for cultural heritage information retrieval. International Journal of Human-Computer Studies 68 (2010) 549–560 (http://kauppinen.net/tomi/temporal-relevance-ijhcs2010.pdf)

Holmen, J., and Ore, C. (2009). Deducing event chronology in a cultural heritage documentation system. In CAA 2009 Proceedings (http://www.edd.uio.no/artiklar/arkeologi/holmen_ore_caa2009.pdf)

Plewe, B. (2002). The Nature of Uncertainty in Historical Geographic Information. Transactions in GIS, 6(4): 431-456. (http://dusk.geo.orst.edu/buffgis/TGIS_uncertainty.pdf)

Plewe, B. (2003). Representing Datum-level Uncertainty in Historical GIS. Cartography and Geographic Information Science, 30(4):319-334

Category: topotime

Topotime and Place

Topotime: Qualitative reasoning for historical time