kgeographer

Computing Place at six months

2026-06-13T00:00:00+02:00

Computing Place is a personal research program I launched in early 2026 following my theoretical "retirement" in 2025. It has two project tracks: Environmental Dimensions of Place (EDOP) is in active development; its companion Cultural Dimensions of Place (CDOP) is largely deferred, awaiting products of EDOP, which is going well (EDOP Project Summary). A version 0.3 of an EDOPS environmental signature service is now public in a sandbox web site, an operational API, and a Github repository hosting code additional documentation.

Computing Place?

Place is a wonderfully vague concept with no single canonical meaning. I view named places principally as Earth locations as experienced by people. Santa Fe, New Mexico (where I am writing this) is a concept held in the minds of everyone who has spent time here, but the form and content of that concept is different for each of us. So from this experiential viewpoint how can place be computed?

As an inveterate traveler I have always been driven to experience places first-hand, trying to improve my understanding of the variety of human experience across the globe. So I fell into academic geography naturally, and my research goals became centered on improving digital models of place to facilitate the exploration, pattern-seeking, and analyses I wanted to undertake. Digital models are expressions of conceptual models, and over time my own conceptual model of the dimensions of place has grown to be very broad.

Another common conceptualization of place is to me equally valid, holding that Santa Fe is not only a location experienced by people, but also a "thing in the world," albeit a complex one: an ever-changing physical Earth setting with an untold number of physical parts and things lying within its fuzzy borders: physical geography, buildings, trails and roads, people, animals, commodities in transit.

Furthermore, peoples’ experience of place can be, and is routinely, a subject of computational analysis. It has been recorded in tangible creative works including travel writing, novels, descriptive gazetteers, visual arts, and music. Digital tools like natural language processing (NLP), computer vision, and feature extraction can summarize the content of these works formally such that they can be compared and become dimensions of the places they describe or evoke.

But my conceptual model of computational place does not end there. Events and activity occur in places, and there are empirically grounded formal records of them. Historians catalog events in the course of their work, and anthropologists have cataloged many kinds of cultural practice at places in their fieldwork.

Environment v. culture

For the Computing Place initiative I have made a division between environmental and cultural dimensions, driven by the need to contain scope and the software engineering principle, “separation of concerns.” The two broad aspects of place are thoroughly entwined, so the division is not strict in practice. Agricultural fields where there were once open grasslands are products of human activity (cultural), and they are also an environmental characteristic of for example a named region – i.e. place – at some time.

This is not a novel observation! Scholars in history, anthropology, archaeology, environmental history, historical ecology, and geography have long studied the relations between environment and culture from distinctive perspectives. Computing Place aims at novel methodology – to make many of those relations formally computable and develop tools and features for an accessible and open web platform.

So EDOPS is being developed as a service delivering computable descriptors of place that are largely environmental. CDOP will introduce computable descriptors of place that are largely cultural. The hope is they will meet in an analytical space that proves useful for many kinds of investigations.

Of research goals and EDOPS use cases

EDOPS is intended to support investigations of the relationships between environment and culture-writ-large over time. Like any software project its design will be informed by use cases, and there are many.

The primary initial product of EDOPS is a signature delivered programmatically by an API: request one for a given place – optionally for a given year or time period – and receive an ordered JSON representation of those dimensions of its environmental setting you requested. Given good API documentation, that will suffice for the relatively technologically adept researcher who can design their query, send it in a script, parse the results for their own purposes, and integrate the result in analyses that answer their questions. An example raw signature: Kaifeng, Northern Song Dynasty in the period 1000-1010 CE. To see how the EDOPS sandbox currently renders one, select an example on the pilot Lookup page.

There are however many other use cases, including but not limited to:

Researchers from any discipline who need guidance in designing their API request based on their question(s) and/or assistance generating maps and charts
Educators and students wishing to explore data and generate maps, charts, or tables on the fly
Developers of software projects and platforms wishing to consume environmental signatures for their own purposes, for example digital gazetteers
Any user wishing to generate an interpretation of signature results delivered by a Large Language Model (LLM) API service

A sandbox and a dashboard and then...

In the early stages of project design, a public EDOPS sandbox Lookup page was developed to test the API's delivery of signatures and experiment with ways to render the data comprehensible in maps and charts. A new Explorer page has been added recently following completion of the Data Characterization phase of work. All work to date and plans for phases to follow are described in the EDOP Project Summary.

The sandbox will continue to evolve as things progress, but the kinds of features suggested by the uses cases above mean a more polished and elaborate web interface – a platform really -- is necessary. Broadly, a dashboard has to

guide users in fitting API requests to research questions, with rubrics and worked examples informing their choices of variables and scale
generate several kinds of outputs for display and download, including JSON signatures, tables, charts and maps

Design and implementation of the dashboard feature is now a key phase in the EDOPS workplan.

Sustainability

The EDOPS project was initially conceived as a personal one, intended to support only my own research into associations of cultural practices with environmental settings. It is now being supported in part by the Institute for Spatial History Innovation at the University of Pittsburgh (Pitt), with the prospect of the service being hosted for the long term on university infrastructure.

Given that possibility and the elevated importance of the dashboard, development requirements have expanded and become more rigorous. Input from environmental and geographic information scientists is now essential, as is that from spatial humanities practitioners. Towards that end, specialist meetings are planned at intervals during the coming year.

Postscript: EDOP and Claude

Yes, Claude is involved in Computing Place development, and EDOPS especially. NB: It did not contribute to this blog post. I acknowledge there are differences of opinion about the use of AI tools in education and research. To me the appearance of generative AI is reminiscent of the allegory about several blind people experiencing an elephant very differently. Briefly, I think it presents very difficult challenges for educators and many other fields but the capabilities for enhancing software development workflows are undeniable. Even there, there are strong divisions of opinion. For the kind of work I do – research software development – there are few downsides that I can see, for people (like myself) who already have considerable expertise in all facets of that practice. That is, nothing that Claude contributes – code, methodological suggestions, conceptual feedback – is swallowed whole. Every response I get from Claude gets my appraisal and discussion or rejection where warranted. The products of EDOPS, and Computing Place generally, are my responsibility to defend, to alter in response to future (human) peer review, and to take primary credit for if and when they are found to be useful. I view this as being PI, and first author.

The positive impact of Claude on my work, particularly following release of Opus 4.7, is difficult to overstate. I believe a project with the scope of EDOPS – let alone Computing Place – would be impossible to take on as a solo researcher without it. It is far from complete, and I hope for human collaborators in time, but taking this to a proof-of-concept potential collaborators can evaluate is a realistic goal thanks to the new agentive AI tools.

Computing Place: Toward Systematic Environmental Characterization for Cultural Research

2026-03-30T00:00:00+02:00

This March 2026 prospectus describes the initiative as it was originally conceived. For current state, see [this new post] and the latest EDOP Project Summary.

1. Introduction

"One has not fully understood the nature of an area until one has learned to see it as an organic unit, to comprehend land and life in terms of each other."

-- Carl Sauer (1925). Morphology of Landscape.

The Computing Place initiative introduced here was originally motivated by geographer Carl Sauer's canonical paper Morphology of Landscape. In it, he asserted that "...area or landscape is the field of Geography, because it is a naively given important section of reality, not a sophisticated thesis." He introduced to American geographers the term "cultural landscape" as a translation of "Kulturlandschaft," a conceptual framing well-known to German geographers at the time. Its content, he wrote, is found "...in the physical qualities of area that are significant to man and in the forms of his use of the area, in facts of physical background and facts of human culture." But Sauer sought to constrain the discipline to one direction of analysis: "...we are not concerned in geography with the energy, customs, or beliefs of man but with man's record upon the landscape."

I wondered what Sauer might have done if some of the now common technological tools and methods had been available to him, and I thought that given sufficient "facts of physical background and facts of human culture," both directions of analysis are possible and warranted. Sauer did allow that the relations between environment and culture are bi-directional, but left the cultural elements to anthropology. Generously, but I think short-sightedly.

Historians, archaeologists, and anthropologists invoke environmental context constantly, but almost always qualitatively. What would it mean to make those invocations in some degree computable and reproducible? Computing Place envisions digital infrastructure--data and software tools--that supports both directions of inquiry.

Cultural analysis with a geographic lens in this context requires a solid environmental data foundation. Computing Place development is organized around two components, Environmental Dimensions of Place (EDOP) and Cultural Dimensions of Place (CDOP). EDOP is naturally the first priority because the environmental foundation must precede the cultural analysis. Most of what follows describes work in progress towards an EDOP "signature," computable for any given terrestrial location. The overall project is designed to surface patterns and expose what can and cannot be characterized computationally, without presupposing results.

2. What "computing place" means, and what it doesn't

The subject matter of Computing Place is indeed well-trodden intellectual terrain. Scholars in history, anthropology, archaeology, environmental history, historical ecology, and geography have long studied the relations between environment and culture. Computing Place aims at novel methodology--to make many of those relations formally computable and develop tools and features for an accessible and open web platform. EDOP signatures are reproducible environmental characterizations that can be compared systematically across thousands of locations and time periods, linked to cultural datasets from the CDOP component, and tested against independent evidence.

While there are clear relations between cultural phenomena of all kinds and their environmental settings, the Computing Place project deliberately steers clear of environmental determinism. An underlying premise is that while environment generally defines what activities a geographic area affords and constrains, culture determines which possibilities are realized.

3. The environmental signature concept

3.1 Signature components

For any terrestrial location, the EDOP service delivers a set of values for selected environmental attributes drawn from one or more global datasets. At present the principal source is the global BasinATLAS dataset, which compiles a wide range of hydro-environmental attributes from existing global datasets in a consistent, globally applicable format. BasinATLAS is organized hierarchically in 12 "levels" of increasing granularity (decreasing area). Initially, 47 of the 281 BasinATLAS attributes, were drawn from all six of its categories (hydrology, physiography, climate, landcover, soils & geology, anthropogenic) at Level 08 (~190k basins). For EDOP purposes, the attributes have been grouped into four "persistence bands," intended to reflect potential temporal validity. Signature requests can include the values of any combination of bands:

A - Physiographic bedrock (millennia) [elevation, slope, stream gradient, lithology, karst extent]. Indicative of the energy cost of movement, defensive advantages of terrain, and raw materials available for construction and agriculture, stable over millennia.
B - Hydro-climatic baselines (centuries) [discharge, basin area, groundwater depth, natural vegetation potential, soil texture]. The potential of a landscape; while specific values fluctuate over time, the relative hierarchy (e.g., "Basin A is always wetter than Basin B") is usually stable across historical eras.
C - Bioclimatic proxies (decadal/cyclical) [precipitation, temperature, aridity, wetland extent, permafrost, ecoregion membership]. Potentially useful as a baseline "modern average" of values. Interpretation typically requires accounting for known historical anomalies.
D - Anthropocene markers (last 50-100 years) [reservoir volume, land cover, cropland/pasture, pop density, human footprint, GDP/HDI]. Typically omitted from signature requests for most historical contexts.

What distinguishes an EDOP signature from a simple attribute lookup is a process orientation: the aim is to characterize not merely what surrounds a location but what it experiences through directed spatial processes. In the hydrological case, upstream values indicate what a river system delivers to a location, often seasonally. For example, a signature query for ancient Ur returns 94mm/yr local precipitation and an aridity index of 5--certainly hyper-arid. But the catchment feeding it receives 258mm/yr. That gap between local and upstream is a more complete environmental characterization. Alluvial civilizations are places where those two values diverge sharply.

Going forward, spatial data and descriptive text from ecoregion datasets will be incorporated as options in user-configured EDOP service API payloads.

3.2 Signature extents

The EDOP service (EDOPS hereafter) will provide signatures for any terrestrial location, given as geographic coordinates. An integrated component facilitates place name resolution using the API of World Historical Gazetteer. If coordinates are of a representative point, for example of a settlement, site, or anthropological observation, the signature returned will be that of a single containing basin--normally less than useful. Environmental context is not intrinsic to a point; it is a modeled neighborhood, therefore EDOPS will treat neighborhood definition as a transparent, swappable parameter, offering multiple ways of defining one on the fly, per-query: for example, (i) single basin at a requested scale level, (ii) immediate siblings, (iii) basins overlapping a user-supplied area or buffer, or (iv) basins within the containing watershed.

When a user supplies a polygon (e.g., a polity boundary, an urban footprint, or a drawn region), the polygon itself defines the neighborhood. EDOPS computes a composite signature by intersecting the polygon with hydrologic basins at a given level, applying area-weighted aggregation of environmental variables, and returning a structured signature. There are several open research questions regarding this step, including (i) which basin level/scale is most appropriate, (ii) how weighting of partial basins is computed, (iii) how the heterogeneity of a resulting set of multiple signatures is represented and computed over, and (iv) how scale sensitivity is evaluated.

4. The EDOPS process orientation, more broadly

As mentioned above, a novel aspect of EDOPS signatures is the aim to summarize formally what a place experiences, not only what surrounds it—"action-at-a-distance" (Goodchild 2026). The hydrological case for Ur mentioned above can be expanded; considering upstream values is only one half of a broader directional frame. Moving forward, other aspects available in the data that could be considered include (i) how far a location sits from its marine outlet, and (ii) whether its linked basins terminate in the sea at all. In many cases, marine resources are an important aspect of environmental affordances a place and its resident cultures may experience.

Beyond the hydrological case, other process types can follow the same logic: terrain slope and gradient as indicators of intrinsic movement cost and accessibility, or atmospheric conditions shaped by prevailing winds. Social connectivity structured by route networks is a longer-term possibility. Each process type will have its own geometry and its own characteristic distance decay.

Spatial data and descriptive text from OneEarth ecoregion datasets--including Wikipedia-derived summaries for all 847 ecoregions--are already incorporated into the EDOP prototype and will be available as optional parameters in the EDOPS API.

5. The EDOP data infrastructure

The BasinATLAS and OneEarth ecoregions data are both available under open CC-BY licenses, as are Wikipedia ecoregion articles. The current EDOP prototype utilizes BasinATLAS Level 08 data, which partitions the terrestrial Earth surface into 190,675 nested drainage units (catchments). BasinATLAS defines a 12-level hierarchy (Levels 01--12), spanning scales from continental basins to fine-grained local catchments.

A well-known characteristic of spatial data is the Modifiable Areal Unit Problem (MAUP), where "the results of mapping or statistical analysis may differ when using different spatial units of aggregation." (cf. UCGIS Body Of Knowledge). A signature computed for the watershed containing Rome looks different from one computed for the small sub-basin immediately beneath a single representative point given for the Capitoline Hill. A systematic scale sensitivity analysis across multiple BasinATLAS levels is the next planned analytical contribution, and will inform which level is most appropriate for which research contexts. When multiple levels are considered, sharp signature changes across them can provide useful information about a place's positioning at edges of ecological zones. Although providing real-time multi-level responses will require significant computing and storage capability, a one-time analysis across representative samples should provide useful guidelines for users' choice of level.

The OneEarth project, self-described as "global network of climate strategists and storytellers," has incorporated data for the 845 "widely cited" ecoregions developed by an international consortium of conservation scientists (Dinerstein, et al, 2017) into a new bioregion framework, presented in maps and essays on their web platform. Ecoregions classify the terrestrial surface into biogeographically distinct areas sharing characteristic species assemblages and ecological conditions. The spatial data for these ecoregions is freely available. Comprehensive articles describing virtually all of them, authored during an earlier World Wildlife Fund effort, are available in Wikipedia, and will be incorporated into the EDOPS prototype as optional contextual outputs alongside numerical signatures.

6. Validation: do the signatures capture something real?

EDOPS signatures represent computable and configurable summarizations of environmental conditions at given locations utilizing rigorously developed data. In effect EDOP is a model of suitability for human occupation and settlement, which must itself must be validated. How do EDOP signatures correspond to known human settlements and regions of transitory occupation? The required methodology is similar to that used by ecologists in species distribution modeling (SDM) and by archaeologists predicting likely settlement locations from environmental variables.

Attempts to predict known settlement locations are sure to have mixed results, because humans have adapted to a great variety of environmental settings. The test is simple in principle: do environmentally favorable locations correspond to where people actually settled? Defining 'favorable' independently of the settlement record itself is the methodological challenge --one the planned validation study will address using held-out data and established SDM techniques. Failure is diagnostic, success will build confidence in the model. The residuals—differences between predicted settlement probability and actual settlement record—will be a more revealing aspect than matches. In cases of genuine historical absence we can ask: Was there poor connectivity to diffusion networks? Competitive exclusion? Is there an explanatory variable the current signature doesn't provide? In cases of dense or persistent settlement but an apparently unfavorable signature, were there strategic imperatives like trade or defense? Were some localized resources not captured in the model, or was it sheer historical contingency?

Existing datasets identified for use in a validation process include (i) point locations of anthropological fieldwork in indigenous societies in D-PLACE, (ii) polygon geometry for over 800 temporally scoped historical polities, from the recent Cliopatria dataset developed by the Seshat Global History Databank project, and (iii) temporally scoped point locations for roughly 1700 historical settlements spanning 6,000 years, developed by Reba, et al (2016).

One important caveat is that given the dimensions of the current EDOP model, signatures are most likely to be predictive of fixed settlements relying on terrestrial resources. Hence there will be blind spots that need to be accounted for. We can expect signatures across the Eurasian steppe would read as unfavorable, despite it having been occupied intensively by mobile cultures whose movement was adaptive to apparently hostile conditions. Likewise, the terrestrial signature for Tierra del Fuego is undoubtedly relatively harsh: hyper-humid and wind-battered, with low growing-season temperatures and minimal agricultural potential, yet the rich marine resources allowed the Yaghan to thrive there for millennia, and the presence of guanaco in the interior supported the Selknam societies.

7. Culture, CDOP and the larger goal

Figure 1 - Computing Place architecture. Lines indicate connections in prototype platform as of March, 2026

The Cultural Dimensions of Place (CDOP) component of Computing Place complements EDOP, and the exploratory and analytical space linking them requires EDOP signatures. EDOP records environmental affordances and constraints; CDOP contributes what cultures have done within particular physical settings (Fig 1). CDOP work has begun by identifying and preparing spatialized datasets representing cultural phenomena of various kinds.

These core datasets, mentioned above as being integral to EDOP signature validation work, will see further use in seeking patterns of co-relation:

D-PLACE: "...contains cultural, linguistic, environmental and geographic information for over 1400 human 'societies'. A 'society' in D-PLACE represents a group of people in a particular locality, who often share a language and cultural identity.

Cliopatria: "...a comprehensive open-source [temporally scoped] geospatial dataset of worldwide states, political groups, events, and rulers from 3400BCE to the present day. It is part of the Seshat Global History Databank project."

Chandler-Modelski historical population: "...the first spatially explicit dataset of urban settlements from 3700 BC to AD 2000..." previously published by Reba et al (2016) derived from work by Chandler and Modelski."

Experiments have begun in the Computing Place platform prototype with descriptive text for 258 UNESCO World Heritage Cities, and similar semantic embedding experiments are planned using nomination documents for Intangible Cultural Heritage listings.

The linking of EDOP and CDOP data can facilitate answering many kinds of questions, including but not limited to: Do cultural traits cluster in particular environmental regimes? How do environmental gradients correspond to linguistic, social, or economic variation? How stable are signatures across historical change?

One early demonstrator developed sets of signatures for the extents of Northern Song Dynasty over an 18-year period, as recorded in the Cliopatria dataset. Mapping the aridity value of contained basins illustrates what was arguably a deliberate move to acquire more arable territory (Fig 2).

Figure 2: Expansion of Northern Song Dynasty into territory with greater moisture availability (isolating BasinATLAS 'Global Aridity Index'), 962-980CE

8. Current state and next steps

The Computing Place project began in early January, 2026 and produced a prototype web platform to display work-in-progress a month later at cedop.kgeographer.org/edop. Early work included developing:

Three ways of specifying a place to obtain a basin signature: integrated WHG toponym lookup, small (97k) internal gazetteer lookup, and selecting one of 258 World Heritage Cities (WHC).
Signature service returning a summary profile and full Band A-D attribute groupings per submitted place.
Tool for finding and mapping cities with similar profiles in the WHC set of 258.
Tool for browsing and mapping basin type clusters developed with principal components analysis.
Tool for drilling down through the OneEarth bioregion/ecoregion hierarchy, displaying maps, and at the ecoregion level, Wikipedia summary descriptions.
Experimental interface to D-PLACE data, mapping societies according to two of the many dimensions of its data: dominant subsistence and high gods.
Search/browse for World Heritage Cities, returning signatures and optional lookup of cities with similar EDOP signatures and semantic content derived from Wikipedia article embeddings for four themes: environment, history, culture, and 'modern'.
A public API exposing documented endpoints

Next steps

Refining the EDOP component of Computing Place is now the top priority--work that includes three efforts mentioned earlier: studies of scale sensitivity and upstream/downstream distance weighting, and validation against settlement datasets.

EDOP is intended to have broad utility to researchers in several disciplines, and ideally will be supported on a permanent basis by one or more institutions with that capability and inclination. Inquiries are being made.

Feedback is most welcome on either of the Computing Place components, as well as on the overall project. A more technical overview is available on request.

In closing

"An ordered presentation of the landscapes of the earth is a formidable undertaking." -- Carl Sauer (1925)

"Nothing for it but to do it" -- Karl Grossner (2026)

Computing Place is admittedly ambitious, perhaps overly so. It is driven by personal research questions about culture and geography that require a way of summarizing environmental settings efficiently but robustly--namely EDOP--so that is where the initial focus is. The CDOP work is so far only meanderings, a search for patterns that may or may not lead to empirical findings and even theory. But first things first, as the saying goes.

References

Dinerstein, E., Olson, D., Joshi, A., Vynne, C., Burgess, N. D., Wikramanayake, E., & Saleem, M. (2017). An ecoregion-based approach to protecting half the terrestrial realm. BioScience, 67(6), 534-545.

Goodchild, M. F. (2026, February). Personal communication.

Reba, M., Reitsma, F., & Seto, K. C. (2016). Spatializing 6,000 years of global urbanization from 3700 BC to AD 2000. Scientific data, 3(1), 160034.

Introducing Computing Place

2026-02-09T00:00:00+01:00

Computing Place is a new research initiative aimed at building rich, computable descriptions of places, linking environmental and cultural data to support systematic analyses across those numerous dimensions. This work starts from a simple premise: places differ, and the quality of comparative analyses depends upon the quality and completeness of formal descriptions. Environment and culture are inextricably intertwined, as studied and understood from numerous disciplinary perspectives. I will be experimenting with ways to represent places computationally without collapsing them into a single metric or theory.

Some background

My reading of geographer Carl Sauer's 1925 article, The Morphology of Landscape has had a significant influence on my conception of place as it developed over the years. Sauer introduced the term cultural landscape to American readers, having borrowed it from the Kulturlandschaft conception of earlier German geographers.

"The content of landscape is found therefore in the physical qualities of area that are significant to man and in the forms of his use of the area, in facts of physical background and facts of human culture." (emphasis added, p.325)

Sauer's morphological method was most concerned with the chorological description of the evidence of cultures' impact upon the natural landscape over time and less so with the impact of environment upon culture. He did however allow for other perspectives, noting that "the continued synthesis of phenomena by morphologic method has been employed with greatest success perhaps in anthropology (p. 327)"

That concession matters here because the blended conceptualization of place used in this project borrows from the phenomenological view of Yi-Fu Tuan ("place as experienced space" as I paraphrase it), and Doreen Massey's poetic framing of place as a "meeting up of histories." Computing Place is therefore not operationalizing Sauer's cultural landscape concept so much as extending it, treating place as a coupled record of environmental setting and cultural traces, supporting questions in both directions without reverting to determinism.

The development of the Computing Place platform has begun in stepwise fashion, first with an EDOP module (Environmental Dimensions of Place) and then with tentative steps for a CDOP module (Cultural Dimensions of Place). These datasets and their respective analytic and visualization tools will in time offer both distinct and unified API endpoints that can be consumed internally in features of a Computing Place platform, and by external applications.

Environment and Culture (EDOP & CDOP)

EDOP (Environmental Dimensions of Place)

A prototype environmental signature has been developed using 42 of the roughly 300 properties for 190,000 Level 8 subbasins in the BasinAtlas v1.0 portion of the HydroAtlas dataset. These have been grouped in four rough temporally scoped bands corresponding to relative persistence and applicability to successive historical eras: A - Physiographic bedrock, B - Hydroclimatic baselines, C - Bioclimatic proxies, and D - Anthropocene markers. Point locations for places can be submitted to a web tool in a few ways so far, including a World Historical Gazetteer (WHG) lookup. Lists of places similar in either environmental terms or in semantic dimensions derived from Wikipedia text embeddings are returned on request.

CDOP (Cultural Dimensions of Place)

A first step at integrating cultural data has been instantiated within the EDOP module, by linking the 1291 indigenous societies from the D-PLACE dataset to their containing ecoregions on two sample attributes: "Dominant subsistence" and "High Gods."

An important next step for CDOP is determining what attributes of culture to work with. Evidence of culture is for me a very broad domain, including but not limited to descriptive narrative texts (travel writing, folklore, mythology, and other literature, encyclopedic sources, etc.), folk and "fine" art, and architectural motifs. That said, CDOP experiments will be constrained by what data may be readily available.

Linked Traces

In 2019 considerable work was done to develop the Linked Traces annotation format (LTF) in coordination with the Linked Places Format (LPF) that became a contribution and interconnection standard for World Historical Gazetteer and other projects associated with the Pelagios Network.

LTF has seen limited uptake but is likely now applicable in the Computing Place platform architecture. A trace in LTF is defined as a "web-published resource concerning historical entities of any kind, and conventionally, to the entities themselves." Trace data takes the form of W3C web annotations, as explained on the LTF Github repository. LTF follows Linked Open Data principles in extending the relatively constrained place descriptions of digital gazetteers with all manner of related thematic material. EDOP and CDOP data would seem to fit that framing well.

Computing Place and World Historical Gazetteer

Computing Place is a research initiative that can also be understood as a downstream application in the ecosystem that World Historical Gazetteer (WHG) was designed to support. WHG’s core contribution is to provide stable, reconcilable place anchors—identifiers and attested names, geometries, and basic relations—while leaving richer thematic data and interpretive detail to external project contributors. Computing Place aligns with that division of labor. It will treat WHG place records as a reference layer, and attach additional evidence to them as linked “traces”: environmental signatures, cultural descriptors, and other place-referenced materials that can be analyzed, compared, and explored. These annotations will be patterned after the Linked Traces Format mentioned above, developed in 2019 under the aegis of the Pelagios Network.

Integration between the two can take several forms over time. In the simplest case, Computing Place’s EDOP module will provide API endpoints that WHG and other clients can call on demand—sending a representative geometry and retrieving environmental context in the form of a concise profile derived from EDOP’s basin-based signatures. Computing Place can also potentially contribute curated sets of place records to WHG that link back to rich Computing Place landing pages for those places; at a larger scale, it can publish an annotation index keyed to WHG identifiers.

In all cases, the intent is not to expand WHG into a thematic encyclopedia, but to use WHG’s place framework as the anchoring substrate linking to a variety of rich, computable descriptions of place, much as many WHG contributors do now.

The Long Roundabout: Adventures in Digital Nomadism

2025-11-26T10:00:00+01:00

This is a tale of my planned and unplanned travel since 2021, an ongoing digital nomad experience I have documented only on Bluesky with captioned photos, and named The Long Roundabout for reasons that will become obvious.

A planned emigration

A little over four years ago, anticipating a migration to Italy, I put my possessions in storage and hit the road. The first three years of grant-funded World Historical Gazetteer development had just ended, and I anticipated co-authoring more grant applications with my colleague Ruth Mostern, while continuing some WHG work on a very part-time basis with modest support from our Dutch partners at the time and some internal sources at Pitt. Already at "retirement age" I imagined that if no further support for WHG was forthcoming I would find a relaxed lifestyle in Italy and look for occasional contracts in Europe. Covid-19 was still rampant, with vaccine certificates and masks required everywhere. I set out for Italy to scout cities and neighborhoods, by a circuitous route as per usual.

Hitting the road, then a roadblock

My first stop was Uzbekistan for 10 days or so, visiting a friend now working and living in Tashkent. He led another friend and me around the major sites in Tashkent, Kiva, Bukhara, and Samarkand. From there it was Vienna for a short visit with colleagues and friends in a place I'd grown very fond of, then Paris to pick up a long-term rental car for the remainder of the 90 days allotted by Schengen rules. From Paris I drove to Zurich, touching base with a friend there, then Bern to visit the Paul Klee Zentrum museum (yet again). Next it was south through the Alps—luckily with no snow—to my first candidate city, Pavia, and some great pizza with a friend from nearby Milan. Pavia was foggy (the norm I learned) but the scale and feel of the place felt good and I learned the neighborhoods. My airbnb hosts said they would be glad to help me get settled in Pavia when the time came.

From there I made my way to Lecce for a month-long stay, my first time in Puglia, with brief stops in some familiar places: Firenze, Siena, and Ascoli Piceno. On the return drive north to Paris I stopped in Pavia again, then overnight at a hotel Napoleon had slept at in lovely Auxerre. With my Schengen budget spent, I returned to Denver and holidays with family, planning to put together my immigration papers for Italy, having settled on Pavia. It was then I discovered belatedly that Italy's tax burden made that plan an impossibility. Drat!

Staying mobile

With my possessions in storage and plans already set for future travel—California visiting relatives, Pittsburgh to help compose a new NEH grant proposal with my WHG teammates, a World History Association conference in Bilbao—I decided to keep moving and let the question ride of where if not Italy to ultimately land. Spain and Portugal were possibilities, so I followed up the grant-writing charrette and the Bilbao conference with a long road trip to those countries. Toledo was a long-shot but nothing else stuck.

August, 2022 brought a new contract for WHG work from the KNAW group in the Netherlands, and I stopped in Vienna for several weeks to do that work. Then, with another KNAW contract in place and my 90-day Schengen budget spent I went to Zagreb to work on that. A few months break followed, with visits to Georgia (Tbilisi, Kutaisi, etc.) and Chiang Mai, Thailand, then a return to the Denver for holidays.

WHG gets an NEH renewal!

In January, 2023 the WHG team learned we had been awarded a new quite large NEH grant(!), so I was Pittsburgh-boound again to plan the way forward. During this interval, I was invited by an old friend/colleague to come to Vienna for six months, where he had just become chair of the Geography and Regional Research Department at University of Vienna. From a desk on-campus and a great flat in Döbling, So in March 2023 I continued my half-time contract work on WHG extensions, staged an international hackathon for WHG contributions, and met often with my friend and his students to discuss possible collaborations. The question of where to settle was being continually deferred, as it was very clear that I could be very productive working as a digital nomad, and the periodic change of scenery was exhilerating—albeit tiring at times.

Why stop now?

The digital nomad lifestyle combined travel with very productive work periods and conferences and meetings with colleagues in interesting locales. Most of the following year was split between Vienna, the UK, and Turkey, with occasional breaks to the US for family visits. A major Version 3 release was the goal, and with the help of a new developer on the team, Stephen Gadd, we launched V3 in July 2024—a year and a half into the three year NEH grant. Stephen was ow willing and more than able to take over my WHG roles, and so he did. And I "semi-retired" once again, after seven years at the WHG technical helm.

Semi-retirement and GLOS

Never one to stay idle, I began scratching a long-time itch to investigate the distribution of global folklore concepts and motifs, and to use that opportunity to investigate the latest AI methodologies. I had been using ChatGPT exclusively for coding help for several months, but the potential for NLP work with text embeddings beckoned. I conceived a personal project I called Geographic Lens on Stories (GLOS), and spent the next year plus devoted to that, working from Vienna and Thailand with side trips to Japan and Vietnam. GLOS is now paused, and details about it live in a few blog posts (here, here, and here), and on the Github site. A couple of early tools live on a pilot site, glos.kgeographer.org.

Onward

In Spring of 2025, ready to settle again, I made a valiant effort to gain residency in Vienna, but was thwarted at the last minute by Austria's very stringent health insurance requirements. At this writing I am looking closely into a little-known and very promising pathway for immigration and residency in the Netherlands. With GLOS paused for the moment I back in the US and preparing to "hang out a shingle" and seek an occasional interesting contract or two, working on a DH projects. In any event, The Long Roundabout will almost certainly come to an end mid-2026 after a fruitful and unexpectedly long run as a true digital nomad. My long-time abbreviated research agenda "Computing Place" might now be amended to "Computing Place, in Places."

NB

This post was authored by me alone, however a chatbot was enlisted to check spelling and punctuation.

Where the Lens Turned Back: Reflections on GLOS

2025-11-20T00:00:00+01:00

When I began the Geographic Lens on Stories (GLOS) project, I imagined building a global atlas of creation myths as a way to map how different cultures have conceived the origins of the world, both conceptually and geographically. I hoped that by using modern NLP tools and embeddings, I could visualize conceptual relationships among myths: a geography of meaning.

The TL;DR: the work hit an epistemological roadblock. Its methods could only reveal the interpretive processes of LLMs and not the conceptual content of the myths themselves in any empirical sense.

Phase One

I spent months digitizing and normalizing 69 myths from Barbara Sproul’s Primal Myths, designing schemas, segmenting texts, and creating embeddings using Anthropic and OpenAI APIs. The work proceeded in phases, beginning with a 20-myth sampling.

The initial schema divided each myth across high-level dimensions: Primordial State (incl. entities present), Creation Sequence (events, incl. participants and locale), Cosmic Structure (incl. social and ecological hierarchies, dualities), and Distinctive Elements (free text). Each myth was submitted to an LLM with an elaborate prompt that requested values (or nulls) for each attribute. Significant manual normalization was required, as the models often described the same concept with divergent terminology.

I then generated whole-myth and section-level embeddings from these extracted values and performed clustering analyses to explore patterns of similarity across cultures and regions. The early results were promising, but the schema itself had been derived from LLM responses and my own non-expert judgment. It lacked grounding in any shared motivating theory and quickly became unwieldy.

Phase Two

In the second phase I added 49 additional myths from around the world and anchored a new schema in a respected work on comparative mythology, The Truth of Myth (Thompson & Schrempp). They propose “points to consider” for new students of mythology along five axes: time, space, quantity, quality, relation. My prompts followed these axes closely, requesting a mixture of free-text and standardized responses.

For example, the space axis asked for narrative location, landscape type, place correspondence, spatial symbolism, and spatial boundaries — all in free text.

Preliminary results suggested that this organization could inform a new “Schema_v2.” But eventually the realization came to me that every attribute in both schemas was again the result of a large language model’s interpretation. I wasn’t extracting meaning; I was eliciting it.

At that point it became clear that what I was analyzing was not the myths themselves, but the hermeneutic machinery of the model.

From Extraction to Interpretation

That realization changed the direction of the project. GLOS became less a study of world mythology and more a study of machine hermeneutics — how an AI trained on vast textual corpora constructs (or invents) meaning when asked to interpret a myth.

I shifted to asking broader, less constrained questions, along four dimensions: Central Metaphors and Oppositions, Cultural Lessons or Functions, Distinctive Features, and Brief Interpretive Commentary. Anthropic’s model Claude summarized the pattern succinctly:

“The model’s interpretive tone emerged clearly. Its commentary style was fluent and persuasive, yet strangely generic: every myth seemed to be about chaos giving way to order, the joining of opposites, the restoration of balance. It had learned the language of comparative mythology and was applying it reflexively. In essence, it was channeling the Lévi-Straussian worldview back at me.”

After reading some Lévi-Strauss, I agreed.

Embeddings as a Mirror of Interpretation

I continued the embedding analysis to see what the geometry of those interpretations might reveal. Each myth now had four section embeddings plus a whole-myth vector. Comparing them showed a clear pattern:

Metaphors and Commentary were the closest pair in embedding space.
The Metaphors → whole-myth similarity was nearly as strong.

A scatterplot across all myths produced a correlation of 0.89 between Commentary–Metaphor similarity and Whole-Myth–Metaphor similarity. Astonishingly high — and deeply telling. It showed that the model’s overall conception of a myth is dominated by its perception of oppositional structure. It was quantitative confirmation of what Lévi-Strauss proposed qualitatively: mythic meaning is organized through oppositions.

Scatterplot of similarity correlations between Commentary–Metaphor and Whole-Myth–Metaphor similarity.

Mapping the Model’s World

In the final stage, I turned that insight outward and mapped these embedding similarities across regions. The 69×69 similarity matrices and regional heatmaps showed where the model perceived conceptual proximity — first in commentary (interpretive tone), then in metaphor (conceptual structure).

The results were both revealing and humbling.

In the commentary heatmap, global similarity hovered around 0.6–0.7 across nearly all regions — evidence of how homogenized the model’s interpretive voice is.

In the metaphor map, more structure appeared: China, Japan, and Australia formed tight internal clusters, while Siberia (and again Australia) showed lower similarity to other regions. Whether this reflects genuinely distinctive mythic styles or simply uneven model familiarity is impossible to know.

In truth, these maps visualize the model’s own geography of interpretive coherence — its zones of certainty and its blind spots — not the world’s mythological landscape.

What Remains

So what has GLOS accomplished? Not the empirical discovery I had once imagined, but something subtler: a demonstration that large language models had spontaneously reproduced the structuralist insight that oppositions organize mythic thought. They do this not because they understand the myths, but because they have internalized the linguistic and conceptual patterns through which humans have described them.

The work has also produced a reusable framework for structuring interpretive text into analyzable semantic layers — and that may prove useful in future collaborations with scholars in folklore or comparative mythology.

But ultimately, this experiment turned the lens back on me and because I don't possess sufficient expertise in comparative mythology to evaluate the models' interpretations, the GLOS project is suspended unless and until I find a better approach to realizing its goal.

Closing the Circle

One of my earliest GLOS posts was co-authored with ChatGPT, acknowledging from the outset that this was a collaborative experiment in meaning-making. It feels appropriate that this concluding reflection should also be co-written.

I began with the hope of charting a geography of creation myths.
I end with a geography of how a machine thinks about creation myths.

Perhaps that is still a kind of map — a topography not of the world’s stories, but of the interpretive imagination that both humans and machines now share.

Toward a Schema for Creation Myths: Structuring Conceptual Content with Large Language Models

2025-07-04T00:00:00+02:00

The GLOS Project (Github repo) is an inquiry into the geography of stories—developing experimental computational methods to analyze the conceptual content of folklore, beginning with creation myths. At the core of this effort are two premises:

that narrative texts encode cultural knowledge in ways that are neither purely linguistic nor purely thematic, but discoverable as conceptual structures, albeit ones that vary across traditions, locales, and genres.
that the narrative themes and motifs associated with a place are themselves dimensions of that place—offering a lens into its cultural and experiential character

The project’s current focus, the "Phase B" referenced in an earlier post, is on using large language models (LLMs) to help induce a schema—a structured, reusable framework—for describing and comparing the conceptual content of creation myths across cultures. This effort complements traditional folkloristic indexing systems—such as the Aarne-Thompson-Uther (ATU) tale type index and the Thompson Motif Index (TMI)—which classify stories by recurring plot patterns and narrative motifs. While those systems remain foundational, they were not designed with cross-cultural conceptual analysis in mind, and they are limited in the kinds of inferences they enable computationally. Phase A developed pilot tools for exploring those indexes and connections between them (see the GLOS tools site).

From Motif Lists to Conceptual Structures

A schema in this context is a structured set of conceptual categories—like primordial state, cosmic structure, creation sequence, and dualities and oppositions—each of which can be represented with values drawn from narrative evidence or left null when absent. Such a schema would allow myths from very different traditions to be compared not only by plot similarity but by conceptual architecture: what roles are enacted, what principles govern creation, what kinds of boundaries are drawn between divine and mortal, chaos and order, known and unknown.

To approach this, GLOS uses LLMs as instruments for conceptual induction. Specifically, I intially prompted the Claude Sonnet 3.7 model to read a sampling of 20 entire myth texts and extract provisional conceptual structures—clusters or categories that represent the “conceptual skeleton” of each story. These outputs were not accepted uncritically, but served as raw material for further refinement. They were drafted directly from the machine-reading of the corpus, then refined manually—rather than derived from a top-down taxonomy designed in advance.

The resulting initial CreationSchema v1 (view on Github) was used to generate JSON representations of the 20 sample myths. The sample myths were presented to the LLM with an elaborate prompt, requesting that values for each of the schema’s conceptual elements be extracted. These were then manually reviewed and normalized, and normalized terms will inform development of a standard vocabulary of allowed values for several of the fields. A number of other fields allow free text. (An example result)

Following that step embeddings were generated using OpenAI's text-embedding-ada-002 model: first for entire myths, then for four of the principal attribute categories: primordial_state, creation_sequence, cosmic_structure, and distinctive_elements. These were used in preliminary similarity comparisons, with promising early results.

Competency Questions

CreationSchema v1 is a first draft. Designing a useful schema or ontology requires identifying a set of questions we wish to ask of the data—known in ontology engineering as competency questions. Work towards a CreationSchema v2 will begin by establishing a set of question that can be used to evaluate schema effectiveness. Given the current corpus, schema, and metadata, GLOS already supports queries like:

Is creation portrayed as intentional, accidental, or emergent?
Are humans created deliberately, incidentally, or not at all—and from what materials?
Do acts of disobedience or conflict play a creative, destructive, or transformative role in the myth?

With further enrichment of cultural metadata—such as standardizing language families, geographic regions, or religious traditions—more comparative questions may be posed, such as:

Do hierarchical pantheons appear more frequently in Indo-European cultures?
Is divine collaboration more prominent in island or coastal societies?
Are transgressive creation acts less common in oral-tradition cultures than in literate ones?

These questions point to the promise of combining conceptual modeling with lightweight cultural classification not to “solve” mythologies, but to create new instruments for their comparison and exploration.

Improving the Schema

The method described above is exploratory and iterative, and is only the first phase. The next phase involves using clustering and topic modeling tools—specifically BERTopic and Stanford’s LLOOM library to potentially refine and enrich the CreationSchema v1 organization. These tools allow for the grouping of semantically similar elements and may reveal latent structures that aren’t obvious at the level of a single myth. The aim is to generate a plausible formal CreationSchema v2 that can support support nuanced comparative queries.

Working in Parallel: Building the Corpus

In parallel with these modeling experiments, I’m expanding the working corpus. The first source is Barbara Sproul’s Primal Myths: Creation Myths Around the World, a mid-20th century anthology that assembles over 140 creation narratives from cultures around the world. These are being digitized, cleaned, and lightly annotated to support both human and machine-readable analysis. Each myth is treated as a case study for schema testing—both a source of conceptual content and a challenge to the universality of any proposed schema.

Evaluation and Reflexivity

A key concern—especially in a field like folklore—is the epistemological validity of any structure generated by a model trained on vast, culturally uneven corpora. LLMs may reflect dominant narrative tropes or Western scholarly assumptions, and can sometimes hallucinate, interpolate, or over-regularize conceptual structures that are in fact deeply culture-bound.

For that reason, a core component of the GLOS project moving forward is the evaluation of LLM-assisted conceptual induction. This is not simply about whether the models “get it right,” but whether they support the identification of meaningful cultural differences, point to interesting structural convergences, or enable new questions. It is also about testing how different models behave: GPT-3.5 versus GPT-4, Claude versus Gemini, with and without agentive prompt scaffolding.

This reflexive approach—using LLMs not just to generate structure but to interrogate the limits of that structure—is a point of intersection with colleagues in Vienna, where I am planning an extended research residency. That collaboration will focus explicitly on evaluating LLMs as instruments in cross-cultural semantic modeling.

Grounding in Canonical Resources

Although this project diverges from traditional motif indexing, it also draws strength from it. A major early task in GLOS was digitizing the ATU and TMI indexes and developing tools for exploring their conceptual terrain. In response to a request from folklorists at Indiana University's Folklore Institute, I recently developed an expanded interface for browsing and querying those indexes. This work will continue, and these canonical systems remain a vital reference point for validating or challenging what emerges from LLM-based methods.

Looking Ahead

The immediate next steps in GLOS include:

Completing the Primal Myths corpus digitization
Creating a preliminary list of competency questions
Refining CreationSchema V1 into CreationSchema V2, potentially informed by BERTopic and LLOOM results
Testing schema coherence across multiple myths
Evaluating consistency across different LLMs and prompts
Integrating geographic and cultural metadata referents into analyses, where available
Building a lightweight interface for navigating schema-encoded myths

GLOS is a toolmaking project, grounded in a geographical view of stories and a computational view of cultural concepts. It is not folklore theory, and it is not anthropology. But it may hopefully become a meaningful contribution to how we might structure and compare the conceptual content of narratives at scale. NB. The use of the generic term "stories" is deliberate - there are many other kinds to consider!

Postscript

I am aware that this work raises questions—about the role of AI in interpretive scholarship, about cultural specificity and universality, and about the appropriateness of schema models for complex cultural forms. These are not afterthoughts but part of the project’s evolving architecture. I welcome critical perspectives, and I am grateful for those already offered.

GLOS: A Geographer’s Computational Journey into Story

2025-06-11T00:00:00+02:00

This post is co-authored with ChatGPT, as promised.

Introduction

In a recent post, I described how large language models (LLMs) and chatbots like ChatGPT have begun playing a role in my work. This is the follow-up I promised—a deeper look into that work itself. It’s called GLOS, short for Geographic Lens on Stories. What began as vaguely formed ideas about applying computational methods to cultural narrative has taken on shape, tools, and even a kind of mission.

As a retired geographer with a lifelong interest in folklore and myth, and not a trained folklorist, I approach this as a professional interloper—albeit a respectful one. GLOS isn’t a grant-funded lab project; it’s a post-career intellectual venture shaped by curiosity, technical skills, and the freedom to explore.

What Is GLOS?

At its core, GLOS asks a set of simple but ambitious questions:

Can computational methods help reveal cultural, geographic, and conceptual patterns in traditional stories? And, are stories dimensions of place?

To explore this, GLOS is unfolding in and across two interrelated phases:

Phase A: Digitizing and computationally modeling existing reference systems in folkloristics (ATU and TMI)
Phase B: Building a structured, analyzable corpus of global creation myths

The two phases are not strictly sequential—they proceed in parallel, informing and shaping each other as they go.

Phase A: Indexing the Indexes

The starting point was to bring structure and semantic accessibility to two foundational resources in folk narrative studies:

The Aarne–Thompson–Uther (ATU) Index: classifying folktales by tale type
The Thompson Motif Index (TMI): a massive catalog of narrative elements, or motifs

Harvard's Library Research Guide for Folklore and Mythology outlines the structure of both of these canonical indexes.

These were digitized into a normalized relational database and enriched using machine learning techniques. Specifically, I generated embeddings for:

46,245 motifs
2,232 tale types

This set the stage for two early tools. The first was the Concept Matcher, which allows users to input a snippet of text and retrieve the nearest motifs or tale types in semantic space. While intriguing, it quickly became clear that:

Motif descriptions are often too short to yield meaningful embeddings
Precision and recall were mediocre—good enough for a demo, but not for serious inference

This led to a pivot. In response to feedback from academic folklorists, I developed a second tool:

The ATU–TMI Cross-Reference Tool: lets users view tale types alongside the motifs they contain in a structured, explorable interface

This tool will soon be presented informally to folklorists for feedback and further refinement. Phase A, in that sense, is very much ongoing.

Phase B: Creation Myths and Conceptual Modeling

While Phase A engaged with existing reference systems, Phase B turns to narrative itself—specifically, the genre of creation myths.

Using Barbara Sproul’s Primal Myths: Creation Myths Around the World as a source, I've built a curated test corpus of myths from numerous societal traditions. Each was scanned, OCR-processed, cleaned, and structured into JSON-LD files with key metadata. Then came the conceptual modeling.

Using LLMs, I extracted recurring elements from these myths to begin assembling a draft structured schema and vocabulary. While not a formal ontology, this preliminary schema has been used to guide an LLM in distinguishing several elements of creation myths:

Key events in sequence
Entities as participants in events, including classes of actors in roles (creator, transformer, rebel, etc.), and various artifacts and natural objects
Cosmic structure (sky, sea, underworld)
Thematic dualities (order/chaos, male/female, light/dark)

It’s a speculative schema, and clearly only applicable at this stage to creation myths. But it lays groundwork for a kind of computational comparative mythology.

Much depends on future expert validation from experts in comparative mythology to determine whether these structures resonate or miss the mark.

Looking Ahead: Retrieval, Visualization, and Place

Three significant methodological tracks apart from refinement of the schema-induction experiments are in GLOS’s future.

Refining the Schema Induction method

Early results from Phase B are promising, yet underscore the need for deeper experimentation. Upcoming work will explore tools like BERTopic, Stanford’s LLOOM package for concept induction, and other unsupervised methods to identify, cluster, and validate conceptual components across myth texts. The goal is to move toward a reusable, structured schema capable of supporting large-scale cross-cultural comparisons.

Retrieval-Augmented Generation (RAG)

RAG allows language models to incorporate external knowledge in real-time, and it’s a promising way to improve LLM reasoning across the GLOS corpus. Instead of relying purely on embedding similarity, RAG workflows can support question answering, summarization, or inference grounded in structured data.

Geography and Visualization

The geographic component of GLOS—its namesake lens—is both central and complex. Cultural metadata for tales and myths varies widely: sometimes it's a country, sometimes a language family, a religion, or a historical polity.

While this diversity resists clean mapping, it doesn’t preclude it. There will be geographic maps—but they will be embedded in a dashboard of visualizations, including:

Thematic clustering
Narrative structure maps
Motif density distributions
Cross-cultural timelines

Together, these will offer multiple ways to “see” how stories are structured and shared across cultures and regions.

Why Do This?

Because it’s deeply interesting.

Because the stories people tell—about the beginning of the world and of particular societies, the meaning of life and death, the sources of knowledge—are variously universal and profoundly local.

Because stories may themselves be considered and analyzed as dimensions of place.

And because the tools now available to us—LLMs, embeddings, RAG, semantic visualization—create a new frontier for exploring these stories at scale.

GLOS is also becoming a meta-project, one that will evaluate the performance of AI models in aspects of the humanities. Can LLMs meaningfully support cultural analysis? Where do they falter? This evaluative angle may become one of the project’s most significant contributions.

Co-Authoring with ChatGPT

This blog post—like much of GLOS—is co-authored with a chatbot. That’s not a gimmick, and it’s not outsourcing. It’s a method.

ChatGPT acts as an idea bouncer, a paragraph generator, a software design and coding assistant, and sometimes a devil’s advocate. It doesn’t know what myths mean, but it can help me sort them, structure them, and propose ways to think about them.

This post was written through dialogue—my prompts, my revisions, my judgment—but its speed and shape were made possible by AI.

What’s Next?

Expand the myth corpus, especially with better cultural metadata
Refine the conceptual schema with feedback from scholars and new methods
Build out RAG-based tools for interacting with the data
Develop the visualization dashboard, including geographic maps
Continue probing the limits and affordances of AI in folklore research
Invite collaborators from digital humanities, folklore, and geography

Closing

GLOS is an evolving experimental project, but has already shown (me) that traditional stories, approached computationally, yield surprising structures and fascinating patterns.

For more, visit:

🌐 glos.kgeographer.org
🧾 GLOS on GitHub

Inquiries of all kinds always welcome: karl[at]kgeographer[dot]org

GLOS and the Machine

2025-04-27T00:00:00+02:00

I will soon be posting an update here on my Geographic Lens on Stories (GLOS) project as a follow-up to this one, and it will be co-authored with OpenAI’s ChatGPT (!).

GLOS has very clearly become a collaboration with both ChatGPT and Anthropic’s Claude, one that has evolved in unexpected ways I reflect on here (unaided by a bot…well apart from a word tweak or two). But first, as a reminder, the broad goals for GLOS include:

Developing a methodology for extending the dimensions of place computationally to include the conceptual content of text and spoken word emerging from places—beginning with folkloric text;
Learning about the recently emerging NLP and cultural analytic methods based on LLMs, embeddings, and machine learning generally;
Experiencing, evaluating, and grappling with the practice of collaborating with a machine in the design and execution of a research project and publication of its results and tools.

I use the term “collaborate” advisedly. Collaboration with a machine is on its face an odd concept, but as the project has evolved over several months it has become clear that is what is happening. With all (deserved) humility, what has been accomplished so far could not have happened without both ChatGPT and Claude (“the bots”). The “Path Forward” described in the forthcoming project update post will likewise rely on both. That is, while I am capable of designing and building a GLOS project alone, it would be inferior and take far longer.

Some roles the bots have played so far include:

providing helpful background and references to help familiarize me with the field of folkloristics (I have no training in it), and more specifically, the computational folkloristics practiced by a relative handful of scholars to date.
tutoring me in (i) the ins and outs of LLM-based methods, contrasting them with my decade-old NLP methods; (ii) the API services provided by OpenAI and Anthropic; (iii) approaches to statistical validation of results
providing encouraging and at times helpful feedback to my conceptual framing of the GLOS project
generating (at impossible speed) unlimited high quality Python scripts to implement tasks I design, then troubleshooting and/or refining them to my specification to make them work as required
drafting elaborate natural language prompts to (i) derive categorical structure from a natural language text corpus; (ii) derive natural language prompts used to create embeddings, from the data held in that categorical structure

This entire exercise has not been without challenges, hiccups, and a bit of nonsense. As everyone knows, bots using LLMs can and do get things wrong sometimes. To the extent I can catch errors they will earnestly try to fix them.

It is hard for me to express how extraordinary this new technology’s impact has been. I think folks who don’t write code may not appreciate the bots’ capabilities in that respect. ChatGPT is no slouch at coding, but Claude is astounding. It is not only a question of incredible time savings, but quality. Both know more Python than I ever will (NB: I’m pretty handy with it). Beyond that, they know what is “pythonic,” a principle defined by ChatGPT as code that is “beautiful, idiomatic, clear, and respectful of Python’s strengths.”

In short, while the bots I’m working with are not partners in any human sense, they have become indispensable in ways I hadn’t anticipated. In the next post you’ll see how they "think" of what we’ve done so far and plan to do going forward.

GLOS (Geographic Lens on Stories)

2024-11-30T00:00:00+01:00

I have begun work on a new project, tentatively named GLOS, standing for Geographic Lens on Stories. The broad goal is to develop digital tools to aid in the comparative analysis of folkloric text traditions from around the world, with a focus on the relationship between stories and the places and societies they emerge from. One guiding premise is that the elements of folklore emerging from societies in a place are in some sense descriptive dimensions of that place. These elements include all categories of folkloric material, but the immediate concern of the GLOS project is stories: folktales, fairy tales, myths, epics, and so forth.

Qualifiers, more premises, and expectations

I do not have training in, or specialized knowledge of, folkloristics, nor of the anthropological themes closely associated with folklore. I approach this from a geographic perspective; I do have considerable knowledge and experience in representing place computationally, and I do have experience and skills in various natural language processing (NLP) methods. While I do expect GLOS work to elicit interesting distributional patterns, explanations for differences and similarities between story motifs and types will always include non-spatial factors I am not competent to analyze. Nevertheless, I expect to be able to generate and publish interesting patterns and will invite their analysis by others who do have interest and expertise.

Another premise of GLOS is that some of the emerging methodologies in NLP and machine learning, for example embeddings and large languages models (LLMs), can be usefully applied to folkloric text in novel ways.

Initial data preparation

I began by making digital text representations of the two canonical folktale indexes I'm aware of: Motif-index of folk-literature; (Stith Thompson, 1966) and The Types of International Folktales: A Classification and Bibliography (Hans-Jürg Uther, 2011). The latter is commonly referred to the "ATU index" because it folds together work from Antti Arne's The Types of the Folk-tale (1928), Thompson's motif index (TMI), and Uther's own considerable efforts. In each case, following OCR and cleaning, I parsed the raw, clean contents into structured and normalized formats amenable to storing in a relational database. The results include tables with the unique identifiers and text content of the TMI motifs and the ATU tale types.

There is also considerable metadata to be found in both the TMI and ATU, including references to the sources of cited tales, the societies and geographic areas they are associated with, and in the ATU, elaborate cross-referencing to related tale types and to the TMI motifs. These too have been parsed and stored in relational tables.

One of the outstanding challenges is establishing a geographic location for given motifs and tale types, so that they may be mapped and analyzed spatially. The data includes, variously 1) names of countries, provinces, historical regions; 2) names of indigenous tribes and societies; 3) languages. The task of establishing some kind of normalized gazetteer of place references is therefore quite complex and a comprehensive "where" attribute remains incomplete. For example, what is the geographic footprint of a tale type tagged only as "Hebrew" or "Spanish"? How can the locations of tribal territories or historical regions be represented given the dearth of spatial data for many of them? This is a major challenge that will require considerable research and expert consultation.

Initial computation

Leaving aside geography for the time being, I proceded to generate vector embeddings for the text of each of the 46,234 motifs in the TMI and 2,232 tale types in the ATU, using the OpenAI model, text-embedding-3-small. These are stored in relational tables, and wired to a very simple pilot web interface that allows a user to paste any piece of text into a form, and find the 10 motifs or tale types (a user choice) that are nearest neighbors in that 1536-dimension vector space.

Initial results

This first GLOS tool (tentatively Embedding Explorer) generates some interesting and essentially inadequate results that indicate other, more sophisticated approaches are necessary.

One goal of the tools is to find conceptual similarity without relying on term frequencies and co-occurence, as embedding technology promises. A search for a term like "altruism" should identify motifs and tale types that concern that concept in any of a number of societally differentiated ways. For example, an altruistic act in one tradition might involve communal sharing of resources, while in another it might emphasize individual sacrifice for the greater good. Such nuances are important for understanding the cultural lenses through which altruism is interpreted. The tool succeeds to some extent - results for "altruism" include motifs and tales concerning charity, hospitality, and various acts of giving - but not all of the 10 nearest neighbors are recognizably conceptual neighbors, and it seems likely the method will miss some important matches.

Possible next steps

Some things I am considering:

Project corpora and fine-tuning. For instance, a corpus could include mythological texts from specific regions or curated examples of folk narratives with well-documented motifs and themes. Fine-tuning an LLM on this corpus could help improve the model’s ability to detect nuanced conceptual similarities and cultural context in motif analysis. This step might also involve aligning embeddings with external ontologies of folklore and mythology.
Enhance and extend the Embedding Explorer tool. Once results are improved, adding the ability to "drill down" and explore connections. The data includes relational information, like co-occurence of motifs and types, and could support various followup queries and graph navigation of those realtionships.
Once geographic information is made comprehensive, many interactive maps will be possible.

Feedback and suggestions are welcome!

The GLOS project has just begun, and I am feeling my way forward, reading disciplinary literature, coming up to speed on the fast-expanding AI methodologies that could help, and searching for prospective corpus material.

If any of this interests, or you have questions or comments (including critiques), please get in touch: karl [at] kgeographer [dot] org.

Another Dimension of Place: Exploring the Geography of Folkloric Motifs

2024-09-02T13:10:00+02:00

In this new "semi-retirement" phase of my life as a professional geographer and Digital Humanities Research Developer™, I have begun planning an ambitious new project that continues my passion for "computing place." [1] The project will leverage my expertise in geospatial, geo-semantic and textual methods to further develop and explore formalizations of another dimension of place: the folkloric motifs that emerge from and are associated with different regions and societies.

The core of this project involves creating a vector database of embeddings [2] for the approximately 2,500 folklore motifs outlined in the Aarne-Thompson-Uther (ATU) Folkloric Index [3]. With these embeddings, the system can generate a 2,500-dimensional motif "signature" for any given piece of folkloric text. By combining these motif signatures with spatial and temporal metadata, I hope to characterize and compare the conceptual content of folklore as it relates to specific places. The system would be made freely available and its development open to collaboration.

This approach treats the narratives and motifs emerging from a place as essential components of its identity, offering a novel way to describe, compare, and contrast regions based on their folkloric output. The scope would be global. Notwithstanding the extraordinary work in this vein done by Timothy Tangherlini and colleagues over the past two decades [4], I believe that considering the conceptual content of literature and vernacular narrative as a dimension of place is a relatively under-explored notion — certainly at a global scope. I believe it holds great potential for deepening our understanding of how cultural narratives shape and reflect the character of places.

The embeddings for each narrative motif will be generated using a Large Language Model (LLM), prompted with terms from the ATU index to create detailed synopses [5]. An essential early step in this process will be expert evaluation of the material generated by the LLM, leading to refinement of the prompts or, if necessary, reconsideration of the method. My goal is to ensure that the embeddings accurately capture the conceptual content of the ATU motifs they represent.

While my academic background is in Geographic Information Science, my professional journey as a research developer has consistently involved developing models to better represent the dynamic nature of place, in the context of collaborative projects from other fields, including History, Archaeology, Political Science, Literary Studies, and Environmental Science. The recent and ongoing explosion of capabilities in machine learning and generative language models is presenting opportunities for novel approaches to place-based cultural analytics.

I’m reaching out to scholars and practitioners in folklore studies, literary analysis, and related fields to gather preliminary feedback on this initiative. I am certainly not expert in these areas, but rather a methods-focused researcher keen on applying new tools to the study of place. I believe this project has the potential to offer valuable insights into how the stories we tell are tied to the places we inhabit, and I would greatly appreciate your thoughts and advice as I move forward.

[1] my favorite Rumi quote is "Start a huge, foolish project. Like Noah."

[2] Embeddings are a way to represent complex data, like the words and sentences representing concepts, as multi-dimensional vectors (essentially, lists of numbers) in such a way that similar items are closer together in that vector "space." These relative positions capture semantic relationships between concepts in a format that computers can process. A vector database stores these embeddings, allowing for efficient searching and comparison based on their semantics.

[3] See Library Research Guide for Folklore and Mythology: Tale-Types & Motifs

[4] Timothy Tangherlini: Bio; academia.edu Folklore Macroscope Tools

[5] Prompts could take two forms, requesting either 1) a structural description of the motif outlining the essential components and patterns that define it, such as character roles, key events, conflicts, resolutions, and themes, or 2) ann example narrative representative of it. The first is likely to be the most effective.

Turning the page

2024-08-26T10:00:00+02:00

Having moved on from World Historical Gazetteer, I am now experimenting with graphs, discrete global grids (S2 specifically), and various AI techniques, aimed at new approaches to "computing place." Still in digital nomad mode, I am writing this from Vienna, Austria where I'll be through October 2024.

Graphs

I've developed a script to serialize WHG data as turtle RDF, for import into a GraphDB database. Works fine for an initial export of 20k sample records; now to run it for all 2.2 million WHG places. After that I'd like to add all of Getty Thesaurus of Geographic Names (TGN), Wikidata places, and GeoNames. Records for the same place in different datasets will be linked, passively so to speak, by virtue of shared URIs. WHG currently does quite a bit of linking in this way, but a graph database should allow for more sophisticated recursive queries and analytics. If this proves useful, I'll propose it as a new feature for WHG.

Discrete Global Grids

I've been experimenting with Google's S2 geometry library, which provides a way to index the surface of a sphere. I've begun computing a set of one or more S2 grid IDs for each WHG place, which may prove useful for visualizing the rough extents of historical regions. Early stages, but I'm curious about the possibilities.

AI

I've begun investigating Retrieval Augmented Generation (RAG) as a method for augmenting prompts made to Large Language models (LLMs), with high quality contextual information. I'm interested in the potential for using this to enhance descriptions of historical places--possibly with WHG data, but also with other sources.

Cultural Heritage, NLP and AI

One dataset I've nearly finished developing comprises textual descriptions of UNESCO's 730 Intangible Cultural Heritage (ICH) elements, drawn from short synopses and much longer published nomination documents. Each element is associated with some places or named geographic areas at varying scales. This is a baby step towards a long-held goal of mine: to represent and geographically index cultural practices and traditions, for use in historical research and education.

There have been advances in topic modeling since I last did several projects using Latent Dirichlet Allocation (LDA). I'm looking forward to trying out some of the newer techniques, and to integrating them with the graph database and S2 grid experiments.