Uncategorized

Once Upon A Time: The Behaviour Space(s) of Stories

What follows is the script I laid out for myself for my DH Benelux 2026 conference keynote address.

Bolded square-bracketed text indicates where I wanted to move along in my slide deck, italicized and/or bolded text indicates stage directions to me for things I really wanted to emphasize and stress (to different degrees). I write across a couple of different devices and platforms, which all leave weirdness (inconsistent quotations, etc) and sometimes, I write when I’m really tired. (Update: I noticed really weird vertical spacing in this after I posted it. I pasted from Word into this useless Gutenberg editor, and so it turned every paragraph into a new block and added spacing and I’m too tired to fight. Sorry.)

What I delivered today started with this script, but then got filtered through performance, jet lag, and caffeine. But it more or less captures what I was trying to say. And it seemed to resonate! I thank the participants for their kindness, tolerance, and attention, and I thank the organizers for inviting me. This was fun.

Slides live at https://slides.com/shawngraham/dhbenelux and you’re welcome to go have a look.

Image

Once Upon a Time: The Behaviour Space(s) of Stories

Hello everyone. You can follow along with that URL over there – slides.com/shawngraham/dhbenelux

Act One: Story shapes are real

Act Two: …and we are in them…

Act Three: …and we can only escape through the generosity of failure

Thank you for having me; it’s an honour to be here and to share with you some things I’ve been thinking about,, which in short form look like this…. but in a more profound sense look like this:

People think that stories are shaped by people. In fact, it’s the other way around…Stories are a parasitical life form, warping lives in the service only of the story itself.” — Terry Pratchett, Witches Abroad

This is a big hint of what’s to come. I think it only fair to warn you that when it comes to stories, I find the work of Terry Pratchet useful to think with and think through.  So… Let me tell you a story.

Two People

Two stories in fact.

Once upon a time there was a man called ‘Alex’. Alex has always worked with his hands, but that’s not to say he’s uneducated or not astute. He can design things, draw them out, and make them real in the world. He’s a smart guy. He lives alone and in a lot of ways, the world is passing him by. He uses facebook to connect with people from previous eras in his life, and he picks up a lot of what’s happening in the world from that site. Nowadays, in Canada though, Facebook won’t let people post links to actual news sites, the CBC and so on, and Alex’s digital habits have migrated to Youtube. He spends a lot of time there and youtube of course has picked up on his habits for watching “news” that reflects his ‘common sense’ understandings of the world back to him. You know where this is going: that positive feedback loop keeps shifting more and more rightwards and that mild preference for ‘things that make sense’ has become a closed world where folks like him are under siege. Things are pretty dark there. Once, there was a man named Alex, and he fell into a gravity well. The attractor did the rest.

Once upon a time there was a woman named ‘Julie’. Julie’s first language isn’t English, but she’s a student at my university. She’s smart and curious about the world and wants to learn. She has a small scholarship that depends on her keeping a certain average grade, but it isn’t enough to cover the full cost of tuition, or the full cost of living. She took a part time job that eats nearly thirty hours a week but now she’s falling behind in her school work. She still has another two years to go in her degree and she’s really feeling the pressure to perform. If her grades drop, the money stops, and she’ll have to drop out and this will all have been for nothing. She wants to learn but she feels she can’t take any chances but return what the prof seems to want. She started using a GPT service to polish her writing and check her logic. She knows chapter and verse why she shouldn’t, but… other students are doing it and getting good grades, and that doesn’t seem fair… so why not her? She lets it rephrase things so that she can really express herself, she says… but she’s finding she’s using it more and more and understanding less and less and now she is using it every day just to keep up. She rationalizes this situation by suggesting the point of a degree is the credential, right? Once, there was a woman named Julie, and she fell into a gravity well, caught in a system’s attractor, where the path of least resistance led away from the things she came here for.

Alex and Julie are trapped by geometry, in a phase space rolling downhill in grooves they can’t escape on their own.

little red riding hood meets magrat 

Once upon a time. C’era una volta. Il était une fois. Es war einmal.

We all know how the best fairy tales start. They transport us to a world different than our own, where we might not know the rules yet, but we learn them in the telling. We see a child skipping through the forest and we know (or think we know) what is coming their way. A tale starts with six teenagers breaking into a cabin to spend the night after partying at the lake all day, and you just know it is going to get bleak.

Fairy tales are always with us. Now they are on TikTok, on Youtube, on LinkedIn, urban legends and the neighborhood facebook group.  And what Pratchett was arguing, for comedic effect, is that stories have an existence that exceeds the intentions of their tellers. Each time a story gets told, Pratchett says, it deepens the groove. Fall into that groove, and you are pretty much unable to climb out until you reach the story’s ending.

I like to think that Pratchett actually had chaos and complexity theory in mind when he came up with this, because his formulation is exactly a description of attractor dynamics in a complex system.

To see what I mean by grooves and attractors and the spaces within stories, let me walk through a small experiment. It’s a just a little toy to show that story shapes are computationally tractable, that we can see something of their geometry, and that this visibility has implications.

quick precis on how triplets become embeddings, key model approaches

Once upon a time, a little girl in a red riding hood travelled all alone through the woods to her grandma’s house to deliver a basket of goodies. Along the way she met the big bad wolf… If we could see the shape of this story, the embedded grooves in it, what would we see? One way to approach this is to create a knowledge graph of all the different actors and their relationships, and then express that graph as an embedding model of the story. This will let us see something of the push and pull of different grooves and attractors. A girl walks through the forest. The wolf eats the grandmother. The woodcutter kills the wolf. In my version of the story I end up with fourteen statements. An embedding model takes those statements and turns them into a list of numbers that express a direction in space. The latitude and longitude of Maastricht is a 2 dimensional vector. You can measure the difference, geometrically, with the 2 dimensional vector of Ottawa, and learn something new about the world and the space between, without knowing anything else about the two cities. Embedding models are a bit like that. For my embedding model, because I’m not that clever, I went with 3 dimensions. Getting from a knowledge graph to an embedding model is where the geometry comes into play.

There are a variety of mathematical techniques we could use; here, we’ll use ComplEx, DistMult, and RotatE, where relationships are represented in slightly different ways, with different consequences- basically, something like ComplEx can understand that the directionality implied in a ‘student’ -> ‘teacher’ relationship is different from that implied by a ‘teacher’ -> ‘student’, while DistMult will treat them the same (‘there’s a relationship here, don’t bother refining it’). RotatE on the other hand sees relationships as rotations around a circle, and that lets it model relationships that are symmetrical, are inverted, and composite (where if A connects to B connects to C, there is an implied linkage A to C). That’s about as complex as I’m going to get here talking about the math.

The process takes a triplet, and corrupts it by swapping out one of the entities or relationships such that the new triplet isn’t previously in the graph. That gives us a known ‘untruth’ or implausible assertion, and the process iterates so that the scores for the plausible versus the implausible are pushed as far apart as possible. In this way the embedding model picks up on different organizing principles in the data.

When you create an embedding model of a knowledge graph this way, you’re asking ‘which elements of the story matter’, and the model geometry you chose implies the why they matter. Now remember, this is just a toy, but I think it shows us something important.

diagrams, results

DistMult seems to organize the story around the contrast between places and actors. ComplEx seems to respond to the divide between ‘domestic’ and ‘wilderness’ and who has agency, while RotatE has captured the idea that the Wolf and Grandmother can be swapped, the substitution at the heart of the plot. I like this. Fourteen sentences represented via three different geometrical transformations make us see very different things. In fact, the choice of mathematical geometry shapes what the model is capable of noticing and determines what becomes visible, what remains hidden, and gives us a way of spotting the attractors.

There are formal definitions of what constitutes an attractor in complex systems, but for now, we can use it as a shorthand for those places in these diagrams where actors or relationships seem to cluster. But that means that the empty spaces also matter. The empty spaces are coordinates in this model space where there are ideas or concepts that we have not yet named or identified! With Red Riding Hood, there’s a space here between ‘wolf’ and ‘grandmother’ that makes the story disturbing rather than just ‘ah that’s sad about grandma getting eaten’. We could call it ‘substitution as violation’. It exists there in the embedding space, but the original knowledge graph didn’t name it.

Neat, eh? So, because I wanted to know where my own gaps are at and where my owned unnamed concepts are hiding, I decided to experiment on myself.

I took an early, unpolished draft of this talk as it stood back in March, and I hacked it down into a series of subject – predicate – object triplets. There’s a code notebook at the end of the slidedeck by the way if you want to see what this all looks like. I used a variety of different embedding model approaches to express those statements in three dimensions. Remember that the choice of which geometrical transformation to use, which model, conditions what will become visible, and the nature of the spaces between. I chose 3 dimensions again because that’s what is easiest to hold in my head when I look at the results. This is still all a toy, remember.

map of my talk as ternary diagram

What emerged across all four models was reasonably consistent (you can click on each diagram to zoom in if you’d like) but of course it’s the differences that are interesting here. In the DistMult model for example one dimension stretches from understanding story as statistics and probability at one end to at the other understanding it as a hyperobject inside of which we live. Another dimension captures the contrast of being inside a story and following a groove, with the ability to stand outside to map, to simulate, and to propose alternatives. The third dimension, if I squint, seems to capture the idea that every telling of a story is haunted by all the runs that came before, that Pratchet-esque accumulation of prior tellings.

The idea of ‘narrativium’ or Pratchett’s label for the parasitical story and the statistics of language orbit each other across the different models, sometimes closer, sometimes further away. I think this implies an unnamed mediating concept in that draft of the talk, an idea like ‘statistical narrative force’. And that might be expanded further to think of story grooves as both mathematical and cultural at the same time.

There is a lot of empty space in these diagrams. There’s something in the space between ‘generative AI’ and ‘fascist aesthetics’ in this model here at bottom right. Maybe the implied concept could be called ‘aestheticized algorithmic control’ to capture the way statistical convergence has political consequences. There’s something in the space between ‘worldbuilding’ and software ‘specification’ which that early draft was reaching for, the idea that through programming we might write this world rather than a story. I’ll come back to this, but this idea captures an intersection between the way the humanities has always made knowledge and the work of a software engineer whose approach I think builds that same idea in software. We’ll meet him in a moment.

[slide] – pathways of story quote, youngest son

The point of these two experiments is that mapping these simple embeddings show us how the shape of stories (where the different models are doing the storytelling) push and pull concepts into certain spaces or grooves. Those places or coordinates where entities all cluster together we can think of as attractors and ideas that we might want to know about or explore lurk unsaid in the empty parts of the map.

NOW. Imagine this not as a toy experiment but as a production system trained on billions of documents and billions of dimensions deployed into everything everywhere all at once. It’s embeddings that power and organize the underlying data of generative ai systems. The way those embeddings were created, the directions that push and pull ideas as we travel through them, are unknowable. Generative AI is nothing more than the parasitical stories that are writing the world. Our task is to escape the grooves, the pathways, and to find the empty spaces where we can flourish.

this slide blank but now we’re in act two

Here’s another story. Once upon a time, I was a PhD student. I hated my materials, but one day, I realized that these fragments of stamped Roman bricks that carried information about the place of manufacture, the person who did the work, the owner of the estate, the consular year were actually fossilized motes of space-time. The stamps were the traces of a variety of networks of power and materials that intersected in the physicality of brick. I never imagined I could be so excited by Roman bricks! And I started tracing out all of the networks I could find, stitching together co-occuring names and fabrics by the dates. But this caused new problems. Broadly speaking, I had about five time periods represented, five separate networks. I had no way of bridging the gap. I filled that gap by constructing plausible narratives -just-so stories- about what ‘must have happened’ to get from time 1 to time 2. That part of the thesis went over poorly.

A few years later I was able to approach the same problem through agent-based simulation. I started the simulation at the network configuration suggested by the archaeology for period one and gave the simulated heterogeneous population of agents behavioral scripts or rules each one interprets for how they would interact with each other under various conditions. Then I set them loose and watched what happened. The runs that produced network configurations closest to period two became objects of study. And when I mapped all of the possible outcomes, all the runs across all combinations of parameters, I could see something that Pratchett describes in his novels: a manifold of possibility space through which the system tended to evolve in certain ways, not because they were inevitable, but because they were stable; only certain paths would lead to period 2. In agent-based modeling, we call this ‘mapping the behaviour space of the model.’

cheeseroll

We can understand the outcomes of Generative AI systems using the same underlying logic. The embedding space of a large language model is a manifold. The reinforcement learning process reinforces certain attractor basins in that manifold corresponding to the statistical patterns most heavily represented in the training data. When you prompt such a system, you are releasing a particle into that landscape, and it rolls downhill toward the nearest attractor. [bliss attractor if time allows]

water/channel

The Reinforcement Learning from Human Feedback process that shapes these systems into chatbots can be understood as a deliberate act of attractor engineering where human raters choosing between outputs are selecting which basins get deepened. When describe a situation where there is a group of teenagers in a cabin in the woods, it does not reason from first principles. It rolls into the groove worn by every slasher film in its training data. The story completes itself.

Eryk Salvaggio has been pointing this out for years: any output from a generative AI system can be understood as an infographic of its underlying dataset. You can verify this empirically. Generate a thousand images with the same prompt and cluster them by visual similarity. You will see the centres of gravity immediately, the canonical representations, the defaults, the attractors. Generate a thousand text responses and run topic modelling over them and you’ll see the same thing.

And this is where things get political.

[this slide left intentionally blank]

Ic ould cite here work like Lorella Viola’s, who earlier today showed us the authority of infrastructure, or point to the new Debates in DH volume on critical infrastructure studies. I could point to Safiya Umoja Noble’s work on algorithmic oppression, Ruha Benjamin’s on the new Jim Code, Johanna Drucker, the work of Catherine D’Ignazio and Lauren Klein, Sarah Lang… there’s a large body of work showing that computational systems do not represent the world neutrally, that there’s no such thing as a neutral tool, that datasets are political. The difference though is one of scale and deployment and autonomy. These generative AI systems are writing a world into existence.

[sycophancy makes people assholes slide]

It’s changing how people write; the sycophancy of the machine makes you more machine-like, more like the average, more like the attractors. You can call that cognitive surrender. Some have called these models ‘bullshit machines’, but then again, David Graeber pointed out how so much of the work we do is because we have ‘bullshit jobs’. Some state actors apparently use bot farms to create texts that AI scrapers might ingest, in an effort to try to shift the attractors in the models towards their own goals; Elon Musk has a documented history of tipping things his way too. The perspectives most heavily represented in the training data and therefore the perspectives that define the dominant attractors in the embedding landscapes are those of whoever produces the most. And then you have to also factor in whatever drives the feedback used in reinforcement learning. Not to mention all of the other structures built on top for ‘guidance’, ‘harnesses’, or ‘guardrails’, or ‘constitutions’. It’s all a bit of a mess.

My entire formal academic career has paralleled the emergence of generative AI on the scene, from the sudden rapid improvements in image classification, and then machine translation, recurrent neural networks, long-short-term memory networks, convolutional neural networks, ‘attention’ and transformers… that decade from about 2012 to 2023 has been a wild ride and we haven’t caught our breath yet. Pareidolia (seeing familiar human characteristics in everyday objects) and the Eliza effect (the tendency to read human intention behind the wall of text in a textual interface) fired up the venture capitalist imagination because it promises them profits without workers, writing without writers (as Aimee Morrison put it), and the need to recoup those investments therefore has plumbed these systems into everything. And they have become a hyperobject: [slide hyperobject] not just language models, but entire systems that we are now enmeshed in, massively distributed and interconnected, that no one person can fully perceive in their totality.

But when we attend to the small things we can perceive in these systems, we notice the sameness of so much of it, we notice the way it spills over wastefully to make a mess: we notice the slop. Slop isn’t just an accident, or the word we give to the accumulation of AI outputs that nobody wants or likes. It is an aesthetic. It is a deliberate choice towards a particular set of attractors.

Gareth Watkins, writing in the New Socialist in 2025, explicitly ties slop to the aesthetics of fascism, an aesthetic that celebrates a deliberate narrowing of imaginative possibility. Fascism as a political form is, among other things, a project of collapsing the phase space of political imagination making it increasingly difficult to conceive of alternatives to the present arrangement, enforcing a single narrative of what the world is and must be. The mechanism Pratchett describes for comedic effect as narrativium (the way stories, once grooved deeply enough, make alternative paths unavailable) seems to me structurally the same mechanism. What Watkins is pointing toward is something Benjamin and Noble and others document in more granular detail, that the defaults of these systems, the attractors they roll toward when unchallenged, encode and reproduce particular distributions of power and visibility. What connects this to fascism is the sheer volume and ease with which AI slop can be created. Imagining alternatives is beaten out of you through sheer repetition. Reinforce the story enough and the system simply cannot produce anything else and we are all doomed to play that story out. Narrativium again

Watkins suggests that one way out of this through ridicule. He suggests that the story that any particular instance of slop promotes can be neutered through laughter. I see what he’s saying, but I don’t think ridicule will help Alex trapped at the bottom of the Youtube well nor Julie in the grade trap. An information-literacy module in the university learning management system isn’t going to cut it. The problem isn’t ignorance. The problem is geometry.

this slide left intentionally blank

So here is what I think is true:

We’re now living within the hyperobject of generative AI systems; this hyperobject is built from stories; programmers trying to deal with what generative AI does to their work have rediscovered hermeneutics; and so if we change the story, we can change the world.

That’s the ‘too long; didn’t read’ for today. I didn’t start with that because I think I needed to think through with you my process first. Otherwise the narrativium that captures academic key-notes would’ve taken over. I think.

Notice that I’m not claiming that software engineers have secretly been doing humanities all along! I am making an observation based on my training as an archaeologist who became a digital archaeologist through chance and circumstance. These things are still human artefacts; of course I as an archaeologist would be interested to try to understand what they do to us, and what we do through them. So I read a lot of what you might call ‘field reports’ from inside the hyperobject. It seems to me that the most sophisticated of these are from practitioners (not necessarily scholars) who have independently come up with something that look to me a lot like hermeneutics.

By accident, through accumulated encounters with the edges of the grooves, attractors, and blank spots on the map, and through accumulated failures, they have come around to the idea that:

1. No specification can fully anticipate its implementation.

2.  The act of building always feeds back to transform what was being specified.

And 3. This feedback is not a problem to be eliminated.

Sounds like hermeneutics to me. Feedback and change is the fundamental condition through which we come to understand something.

This all means that I’m trying to hold two big ideas together here. The first is mostly ontological. I’m saying that story shapes are computationally legible, and the geometry of that legibility shapes what can be noticed in or told by a given story, and this therefore has consequences for what gets written into the world at scale.

The second part is necessarily political. Because generative AI systems are built on these representations, are built on attractor dynamics that push toward the mean of their training data they are structurally conservative in a way that has material consequences.

Here is where the work of Drew Breunig comes in, as an example of what I’m talking about.

Drew Breunig, on his blog, says that he writes at the intersection of cultural anthropology and computer science. He wondered to himself what software engineering as a practice looks like when generative AI agents are pretty good at writing code for a given context.

As a thought experiment he proposed what he called a “no-code library”: a GitHub repository containing no code, only a specification document describing what a software library should do, a set of conformance tests defining what outputs are acceptable, and instructions for invoking a generative AI to produce the implementation.

The conceit was that if the specification is precise enough, and the tests comprehensive enough, the code should flow from them automatically. He called this Spec-Driven Development (I think he even coined the phrase), and it attracted significant attention partly because it worked, and partly because the failures were so instructive.

Breunig spec driven development

Breunig released this idea into the world and watched as people tried it out, and responded to the issues people raised and the pull requests on the repo in a subsequent public talk. What he came to understand was that the specification was never enough. No matter how much the author might try, the spec always comes up short because of the programmer’s assumptions about what matters and what can be taken for granted and left unsaid.  It’s ‘always already’ a partial reading shot through with unexamined silences and assumptions.

And so the spec never survives the encounter with implementation unscathed because it’s always wrong! The act of implementation discovers what the specification did not know it was assuming. When someone tries Breunig’s ‘no-code’ library approach with an agentic language model, every gap in the specification is filled by the language model’s ‘pre-understanding’ and all of those decisions remain hidden and undocumented. Deploy the code and future individuals and groups and communities and peoples have their lives impacted by choices no one ever recorded or even recognized were choices. I mean, this is true when humans code too, but it’s worse when machines do it because of the sheer scale. Hyperobjects again.

[slide Breunig with hermeneutic circle fragment]

Breunig’s response to this challenge was to build a tool called Plumb, which intercepts git commits and extracts the decisions made since the last commit, both from the human developer and from the AI agent’s conversation traces, presenting them for review before the commit completes. Approved decisions are used to update the specification. The specification, in turn, generates new requirements, which generate new tests, which generate new code, which generates new decisions. The whole history of interaction and re-examination is written, time-stamped, and versioned.

What Breunig has come up with, without using the terminology, is a software implementation of the hermeneutic circle. You cannot understand a part of a text without understanding the whole, but you cannot understand the whole without engaging with the parts. The circle is the condition that makes understanding possible at all. And it looks to me like what Breunig is doing has its closest structural analogue in the Talmudic commentary tradition where the primary text, commentary on it, and commentary on the commentary are all preserved together and considered authoritative at different levels. You approach understanding the primary text through its history of interpretation. Plumb’s decision log, linked to requirements, linked to code, echoes this structure in version control.

But remember my toy? My graphs? Remember the empty spaces? If stories write the world, they by mistake create a world that is far bigger than what you would think from just reading the stories. Let’s talk about the empty spaces. Let’s talk about worldbuilding.

Worlds, Not Stories

I want to consider the archaeologist Colleen Morgan’s proposal that archaeology is or ought to be worldbuilding.

Archaeologists have been concerned with storytelling for a while. There’s a dominant mode of archaeological storytelling where there is a clear, linear story concerned with physical things and oriented toward a single authoritative narrative: What happened? This happened, then this, therefore that. Morgan’s critique as I understand it, says that we’re missing the whole point of archaeology.  We should be worldbuilders. The worldbuilding model asks a different question: what are the conditions under which stories can be told? What are the elements that, assembled in this way, make some stories more possible than others?

To my mind then a world is an attractor landscape before any particular trajectory through it has been run. It is the phase space. The archaeologist’s job, on this account, is not to tell you what happened. The archaeologist’s job is to build a world wherein you can find the emergent story for yourself. You will encounter some grooves deeper than others, but no one groove can trap you. In such a setting, you might find yourself surprised.

The best piece of public history I have ever personally witnessed happened on a ghost tour at Fort Henry in Kingston, Ontario. (a fortress on lake Ontario, to defend us against the US, plus ca change). The guide was doing as much world building as they were doing story telling. The guide had a set route through the old fortress but part of what made navigating that route effective were the details of world building that enhanced the impact of any one particular story. Inspired by those details, a child in our group became very excited and wanted to contribute. The child’s tangent really had nothing very much to do with the guide’s explicit story at that moment, that trench the guide had worn down through hundreds of tours. But -and here’s the surprise – the guide did not redirect her. She said: yes, and! And suddenly, we were bouncing out of the rut into that larger ghost-ridden world.

Which reminds me of the work of April Beisaw who is an archaeologist who has asked why do ghost tours do so much better than public archaeology events? I listened to her talk once about her work in upstate New York. She was exploring the way that the landscape was altered by flooding valleys and villages to serve the water needs of the metropolis. When it came time to share her research with the public, she quickly discovered that the authoritative story, the one with the politicians and the legislation and the aqueduct and so on, did not resonate with the locals. But she observed a local paranormal research group active in the same area, pointing to the same remains, the same drowned landscape and she noticed how they were so much better at the public history of the drowned villages than the archaeologists. What made their approach effective seemed to be, according to Beisaw, because they left space for participants to layer their own stories, to weave them together, to fill the trench according to their own understanding of what haunting means. The ghost hunters did not tell people what the world was. They invited them into the conditions for finding out. They brought the audience “along on the process of imagining a past and trying to detect its signatures in the present”. Worldbuilding in action.

the different kinds of ‘yes, and!’

This is a different relationship to knowledge than the one generative AI is currently optimised for. The appearance of ‘yes and’ that marks the sycophancy of a generative AI system actually is very directed. It fills the space for possibility by reinforcing the story the user starts, in the direction of the dominant attractor and so the narrativium takes over.

Worldbuilding, in Morgan’s sense, is the act of building a space where different grooves are possible, where the phase space is wider, the attractor landscape more level, and the path of least resistance does not automatically lead to the canonical vision. This is what Breunig’s no-code approach using Plumb is doing, at the scale of a software library. It is encoding a particular world, a set of constraints, affordances, and relationships, within which the AI’s generative process is constrained to find something that fits; we use narrativium against itself. The world is a spiral, a world that changes with every encounter, accumulating decisions, revised by failure.

slide left blank (but: Spec-Driven Development as Humanistic Practice)

Alright, we’re just about at the pay-off.

The digital humanities is famously a big tent. Is it big enough to encompass Breunig’s hermeneutics-by-accident? I think so. If it does, I think it gives us a way where digital humanities can make a difference in the era of generative AI. It gives us a way to write the world.

The specification document is a formalization, in ordinary language, of how we do our work: what we are trying to understand, what assumptions we bring to it, what we are prepared to count as evidence. The test suite is a formalization of how we know what we know: what would have to be true for our claims to hold, what outputs are acceptable, where the specification makes predictions that can fail. The grooves in the embedding space, the attractors, that’s where the spec and tests fail or bounce us out, right? We need to document that.

Each point of failure in the circle is critical information: is the test wrong (are we measuring the right thing?); is the specification wrong (are we asking the right thing?); is the implementation wrong (are we making the right thing?). Tracking the failures is where the learning happens! We build a world, and we track the stories through it. The paradata that digital humanists have been recording as afterthought documentation, the decisions made during a project, the alternatives considered and rejected, the interpretive choices embedded in data structures becomes in this framework the primary material rather than the supplement. The paradata is where we write the world. The paradata is where we push back against the narrativium of generative AI.

Ten years ago or so Matthew Lincoln wrote a piece on ‘confabulation in dh’, about how we were all so good at story telling we could tell a plausible story for just about any result. Without writing out what we expected to find in the first place, any result could be meaningful. Now folks in DH, like the C2DH group led by Sean Takats, have done amazing things for creating reproducible workflows that document the iterative and interpretive decisions with their Kiara system, and we need more of that. But what brings me back to Breunig’s no-code-with-Plumbing approach is the way ordinary language becomes part of the worldbuilding. This kind of approach I think would take us to a point where dh thinking, dh storytelling could spill well outside our labs and workshops and specialist tools and conferences.

Now let me bring this back to Alex and Julie, whom I introduced to you back at the beginning.

For Julie, she’s already in the behaviour space of the stories, but maybe we’re framing the problem wrong. Maybe the problem isn’t that she’s using a text generator to stay afloat; clearly there are bigger structural issues that she is enmeshed in. But insofar as systemic issues can be addressed by individuals, what if the problem is that… she has no specification for what she is trying to understand, for what she is prepared to count as a good answer, for what a ‘test’ of these might look like for her own thinking? Instead of letting the ‘yes and’ of a generator fill the possibility space, what if we helped her write the specification such that we help her build a world to explore? Then when the generated text inevitably fails to meet the specification that failure becomes a signal she can act on from a position of understanding.

For Alex at the bottom of the youtube gravity well, things are more difficult. The attractor he is trapped in persists because it is etched so deeply. We have to help him out the other end of the story he is trapped in. What would it mean to worldbuild around him, to assemble the conditions under which a different story could find him? I’m not talking about telling him the right story because that would simply hit the edge of the groove banging him back in deeper. To help him bring the story he’s in to a close, we need to figure out how to leave space for him to encounter the empty coordinates and edges that might flip him out rather than bang him back in. In complex systems, small changes can lead to big effects: phase transitions.

I don’t have a specification for that. For Julie, because of the system she’s in, we as teachers can maybe reach her at the point of inquiry where there are things we can do. But for Alex, I am not suggesting Breunig’s git hooks will pull him out. He’s too far downstream of the attractor engineering, downstream of decisions made at civilizational scale by people who were not thinking about him but about juicing the bottom line.

But. What if the coders’ inadvertent hermeneutic circle became a standard way of working with all of these technologies that power everything from generative AI to recommendation systems? Google just tied generative AI into search, which makes all of this even more urgent. What if the hyperobject of AI learned to tell stories or build worlds the way we do?

There are ghosts in the data who get combined into something that has action in the world: the parasitical story of narrativium, the attractors that emerge from the embeddings. I’m willing to bet that a computationally tractable, deployable form of hermeneutic knowledge-making is something DH can do that could push those attractors somewhere more fitting for human dignitity.

Conclusion: Yes, And, and the generosity of failure

Let me end with the ghost tour.

The guide has walked this route hundreds of times. She knows every room, every artefact, every story the fort can tell. She is good at her job. And then the youngest person in the group says something orthogonal, something that comes from a completely different place, that has nothing to do with the narrative trench the guide has been reinforcing. The guide could push the youngster back in. She does not. She says: yes, and.

Let’s do that.

We are inside the hyperobject of generative AI. Its attractor dynamics are real and consequential. The groove-deepening mechanism Pratchett described in satire, that narrativium, is operating at civilizational scale. We have to study this thing. Laugh at the fascists, defuse the power of slop, by all means. But when we get down to the level of individuals and other people imbricated in the system’s consequences… I don’t think laughing at Alex at the bottom of the YouTube well or Julie in the grade trap would be helpful or kind.

There are two kinds of ‘yes ands’ that I have identified today. There is the sycophantic that retrenches and uses the dominant mean. And there is the generous ‘yes and’ of the paranormal research group, the generous ‘yes and’ of the tour guide who was kind. And that generosity opened up new possibilities that permitted us to escape the narrative rut: it made the world bigger.

Design your storytelling, your digital humanities practice, from that perspective of generosity. We could do that by writing specifications: explicit, ordinary-language accounts of what we are trying to understand and why. We can write tests: falsifiable claims about what would have to be true for our accounts to hold, what failure looks like, what the evidence can actually bear. We can track decisions: to make the interpretive history visible, versioned, and subject to public scrutiny and learning. We can leave empty spaces in the embedding landscape. We can build worlds rather than tell stories, we can assemble conditions rather than deliver conclusions. We can lay down better grooves, wider grooves, for others to follow who do not have our privilege to affect things.

Rather than allowing slop to collapse the phase space of our imagination into a fascist aesthetic of sameness, we must build worlds that leave space for empty coordinates, directions to aim for that resist the behaviour of the dominant stories. Breunig encountered the hermeneutic circle in the gap between specification and implementation. We DHers have been living in that circle for long enough to have built practices and disciplines around navigating it. The question is whether we can make those practices intelligible to the infrastructure that is doing the world-shaping in our present moment.

“There’s always a story. It’s all stories, really. The sun coming up every day is a story. Everything’s got a story in it. Change the story, change the world.” — Hat Full of Sky, Terry Pratchett

The spec is never finished. That’s just how it works. The generosity of failure is how we escape narrativium, the behaviour space of stories. Change the story, change the world.

Image

Uncategorized

Digital Reconstruction, Enchantment, and the Ghosts in Our Data (…ish…)

I stepped in to help out at the last minute at a workshop on digital reconstruction in archaeology, to make the opening remarks. This is what I cobbled together. April 22 2026.


imagine a slide here showing off Michael’s work! (also, you should know I have a pretty broad view for what counts as ‘reconstruction’)

Thank you everyone for having me here today; I believe Michael Carter was to address you but unfortunately couldn’t join. Which is a shame; Michael has been thinking longer and harder about the possibilities for reconstruction than very nearly anyone I know, drawing on his research into procedural animation and reconstruction of Iroquoian long houses. Some of my earliest conversations with Michael involved his idea of embedding the 3d metadata and paradata of reconstruction into the very geometry of 3d objects in the first place. Hold that thought; we’ll come back to it.


this slide left intentionally blank

This morning I am coming to you from my basement, here in Ottawa, on the unceded traditional territory of the Algonquin Anishinaabe, whom I acknowledge. Let me tell you why I am making this acknowledgement. My colleague here at Carleton, Tonya Davidson, just published a book called ‘Ottawology’, a sociological study of the city that begins with a land acknowledgement and asks us to consider what ‘doing sociology’ when we have made a land acknowledgement actually means. And part of her answer is that we have to think through what it means to have a settler colonial city on this land in the first place. And to do that means we have to confront the idea that the process of constructing the city in the first place involved an act of imagination, a deconstruction of previous lifeways in this place, of previous settlement, that colonization is not an event and not even a process but a structure through which dispossession can happen.

So here we are, archaeologists, thinking about reconstruction. What does it mean to think about reconstruction in the light of a land acknowledgement, especially when we know how archaeological reconstructions have been traditionally used in the past to create particular narratives, particular justifications?


Image
Image
Cardo deli Aurighit, before & after reconstruction ostia-antica.org

When I was a grad student in Italy, my supervisor, Janet DeLaine, was working at the Roman port city of Ostia, and she took me for a walk through the ruins. Ostia did not see sustained serious archaeological attention until the 20s and 30s under Mussolini’s archaeologists. His architects were at the time busy building a vision of the Italian state that drew explicit parallels and inspiration from an imagined Roman past; therefore it was incumbent upon his archaeologists to provide that archaeological past! And provide it they did. Janet toured me through the ruins, asking me pointed questions every so often about the architecture, looking to see what I would say. And as generally happens, I failed every single question because Mussolini’s archaeologists were extremely good at reconstruction: they’d sourced the clays, they’d mimicked the techniques, and they’d rebuilt the city in the image of their imagination so well that they only way to tell what was genuinely ancient ruin from 20th century reconstruction was to chemically test the mortar between the bricks. No other test could tell the difference.

Mussolini’s archaeologists were not concerned with process, they were not concerned with paradata or the choices that guide the reconstruction, they were not concerned to leave markers so that later archaeologists could make their own studies or determinations, or that visitors could learn what was ‘real’ versus what was an educated guess. They were gunning for a very different effect: they wanted their reconstructions to firmly cement a vision of Roman grandeur in the mind of the visitor, that was echoed only a few miles away in the modernist architecture of EUR, of the Museum of Roman Civilization, of the plazas and triumphal ways of the modern fascist state. Concrete and brick was the technology of fascist reconstruction in those days. The reconstructions of fascism aim to obliterate the self within the ideas of ‘accuracy’ or ‘authenticity’ or ‘aura’. But they close off the imagination. ‘THIS IS HOW IT WAS’ is not a statement about the past but an invitation to submit to the conditions of the present. It’s a statement about the future: AND THIS IS HOW IT WILL BE.


People think that stories are shaped by people. In fact, it’s the other way around […] Stories are a parasitical life form, warping lives in the service only of the story itself.

— Terry Pratchett, Witches Abroad

It reminds me of the humorous satire of Terry Pratchett. In Pratchett’s writings, there’s a concept that the world is driven by narrativium, that stories (reconstructions) are parasitical life-forms that cause themselves to be told by their victims. Each time they are told, they deepen the groove in space-time such that once trapped in the confines of a story, there’s nothing that can be done until you come to the end of the story.


Image

Me, in 2006 with a gnarly agent based modelling environment. This, too, is reconstruction- although the graphics are now much better. ABM by the way can also be thought of as games that play themselves.

I spent many years building agent based simulations – reconstructions – of Roman social life, and when I read Pratchett I instantly see the connection to complex adaptive systems and the idea of ‘attractors’, or gravity-wells in the behaviour space of the system, of the society (or at least, within my reconstruction of them). These gravity wells pull the behaviour of the system into particular configurations, no matter how I set the overall simulation up; but occasionally, the smallest bump can cause things to flip, can cause things to change. I called this approach ‘AI’ in a book I managed to publish with exquisite timing just before generative AI hit the scene in a big way and forever poisoned the term. My approach used distributed computing to model emergent behaviours from a reconstruction of the past specified at the level of the individual actor. From individual voices, we get a chorus and a reconstruction of past social activity. It was generative, even! It was a lens through which I could look at the archaeology with fresh eyes.

But that’s not what ‘ai’ means today. Generative AI today as a technology deployed by the tech oligopoly is the modern technology of fascism. The tech stack (of which the underlying model is just one part), the system, smears all individual expression into a probabilistic paste, through which different paths may be traced and probabilities extruded. When I was an undergraduate at Laurier, there was a dish in the cafeteria called the Rib O’ Pork, made from beef paste extruded into a rib shape. That’s a reconstruction too I suppose. These machine/systems have been deployed everywhere; they are inescapable, and whatever one thinks about the possibilities of the underlying technologies (they’re not statistical models of language; they’re models of culture, depending on the training data, and for that reason alone should be of interest to archaeologists), it is clear that in how they’ve been deployed in the world part of their purpose is to foreclose meaning, foreclose imagination: they drive towards the mean, driven by the act of triangulation between the words or concepts the human supplies. Meat paste shaped by the form of our prompts. Right? I’m talking about AI slop, the ease with which stories – reconstructions – can be generated that push us to cognitive surrender: It is an aesthetics of fascism in that we are meant to be overwhelmed by the rapid pace, the ubiquity, and to accept, to submit.


Image

“Generative AI is digital humanities run in reverse…. AI becomes a system for producing approximations of human media that align with all the data swept together to describe that media.” Salvaggio

And, changing metaphors, they’re full of ghosts. Take image generation models (another form of reconstruction), and these three images. The artist/scholar Eryk Salvaggio has argued that such images are better understood as ‘ontolographs’, or infographics of their underlying datasets. Look at the shadows in these ‘photographs’. There is of course no light source in this image. The shadows are there because they are quite literally ghosts from all of the training images. Context is stripped when the training data is collected; pictures of a summer camp are all the same to the model as pictures from a concentration camp. The centres of gravity in the data become attractors from which there is no escape. The ghosts in the data push us in ways we might not want to go, and the reconstruction isn’t created so much as reified.

The tools of generative AI, the reconstruction of human language or human artefacts (in terms of code outputs, or image outputs, or 3d models and so on), are haunted by the ghosts in their data, and in the ubiquitous deployment of these technologies everywhere, we have machines that make Pratchett’s narrativium real in the world. So what do we do?


Reproductions without Reproducers

Aimee Morrison, a professor of English and Digital Humanities at the University of Waterloo, says that ‘Generative AI pretends that there can be writing without writers, which is as nonsensical as suggesting there can be swimming without swimmers, or breathing without breathers’ (https://compstudiesjournal.com/wp-content/uploads/2023/06/morrison.pdf). Mussolini’s archaeologists built reconstructions that pretended you could have archaeology without archaeologists. Morrison reminds us to think about what our goal is, when we write, when we reconstruct. Her students sometimes believe the point of writing is the production of clear competent prose, but this is a mistake and in any event the machine can probably now do that much better than the student can. Instead, she suggests a new goal: writing to delight. When you write to delight, when you ‘have something to say, and [you] know that if [you] do it persuasively, [you] can change some part of the world in some small way.”. Change the story, change the world, says Terry Pratchett.

The challenge of AI for us then, interested in archaeological reconstruction, is not to do what the machine can do. It’s rather to ask, what am I doing to change the world in some small way? Small changes can flip a system. What does my reconstruction do that challenges the way the world is? If I write ‘to delight’, as Morrison says we should, maybe I can fight the power of narrativium. And when I say ‘write’, I mean all the active verbs of archaeology: when I build a reconstruction. When I craft this virtual environment. When I restore the colour on this ancient frieze.


Image

I sometimes challenge my students to think about ‘aura’, in this age of digital reproduction. Can digital copies have ‘aura’? What does ‘digital preservation’ actually preserve? Most of my students come down heavily on the idea that the digital copy is lesser, somehow not as valuable. Then I ask them to consider Shakespeare (as I draw on Latour). The local theatre group is putting on Macbeth. All of the players are female. The setting is a northern mining community in Ontario. What is the relationship between this copy, this reconstruction, to Shakespeare’s play? No one would quibble that the play is somehow not valid because it’s not the ‘original’ play from early 17th century London. No, we understand that this performance is a reconstruction on its own with its own values: we might see something new in the text that we hadn’t spotted before. It might make us think differently about the relationship of the metropolitan south of this province and its northern hinterland. This copy has an aura of its own. Aura, for Latour, is not in some kind of faithful fidelity to accuracy but rather resides in the sense of something being ‘copious’, a copy that contains within itself something productive, something that creates something new. Something, I would add, that delights us.


Mussolini’s reconstructions were not copious in this sense. They foreclosed meaning, they foreclosed imagination. There is an implicit appeal to the authority of the reconstruction-maker at Ostia that is not present in the re-staging of Macbeth. Mussolini’s archaeologists were framing a lot of what they were doing as a kind of ‘rescue’ – if you dig through the old archaeological journals of the day, many anglophone archaeologists were bang onside this idea too, and generally thought it was all a Good Thing.

When archaeology is always in crisis, when there is always a rescue to be performed, writing to delight might be seen as a luxury. When archaeology is always in crisis, when we are always rescuing things, when we need to bang the drum for funding and somehow trying to capture the imagination of a public who ultimately pays for all of these things is the extra ‘knowledge mobilization plan’ of the grant, there is I think a tendency towards the ‘because I said so’ of archaeological authority. A crisis can only be addressed or resolved with action. The reconstruction, the talking-to-the-normies can happen later, if at all.

This is a function I think of the professionalization of archaeology in the 70s and 80s, and then when that generation’s grad students became practitioners in turn, we draped our reconstructions with the authority of science and scientific process. That’s where the money was. We saw the way the prevailing winds were blowing, and we left our general every-day audiences behind. We started writing not to delight but to please the lottery of the funding organizations who became our primary audiences. Archaeology became dis-enchanted.

Sara Perry, following the political theorist Jane Bennet, argues that the sense of enchantment that engagement with the deep past can provoke, can move people towards an ethics of care and generosity. That generosity emerges through embodied engagement, not through passive reception. And that this sense of enchantment is what we need to realize the potential of archaeology to be something that makes change in the world. But the professionalization of archaeology says, ‘we are the authority, here: receive what we have determined!’ We turned our audiences off, and we ceded the ground on the very real enchantment of archaeology to people who had therefore learned to distrust archaeologists (we were very good at telling everyone else why they were wrong about everything). And so, looking to understand, engage with, or be moved by the past, the world at large abandoned the archaeologists. Yes, I’m being contentious here, being provocative, gently winding you all up a bit: but just look at the ai-slop reconstructions of the past that pollute Facebook. My own family prefers those reconstructions to anything I’ve ever accomplished!


so….?

  • process over product
  • make your reconstructions copious and enchanted
  • CAREful

So, at the start of a day of papers and sharing: when we think about land acknowledgements, when we make those statements, as scholars we are I think implicitly challenging ourselves as settler-colonists to reconstruct our relationship to the communities we work with, and that includes communities from the past. We do this by thinking through process, making the edges clear, attending to the sources of enchantment, and resisting the desire to submit to the power of the digital. A land acknowledgement is an invitation I think to think about [EMPHASIS] collective benefit, authority, responsibility and ethics. In the same way that Michael imagined embedding the para and metadata of reconstruction into the object itself, I think we need to be thinking about how we can make our reconstructions similarly embiggened, copious, and enchanted: where the reconstruction invites us in and opens conversation rather than foreclosing it. Build reconstructions that delight you, and communicate the source of that delight. And so I look forward to learning from all of you today as you explore what reconstruction can be, in this place, at this time, and in the context of these relationships.

Thank you.

Image
Uncategorized

Boswell

If I had been offered project management training when I was doing my PhD, I probably would’ve been one of *those guys* who huffed and postured and been a general jerk about why I was being made to do it. I like to think I’ve grown since then. I can make Gantt charts with the best of them, I can plot out the sequence of what ought to be done by when; I can figure out beforehand roughly what the budget ought to be. But… I certainly have no decent day-to-day workflow for managing my time (and others’ time!) well. Like many (most?) I just… manage to get along. But like a hangnail in my mind, there’s this persistent thought that I ought to be better at this. That there ought to be things I could do that magically would make everything flow.

Like in my search for the One True Note Making Process to Rule Them All, I have now embarked on the One True Management Process… and like always, I started by trying to build something. My own personal Mystery House of Management.

Lo, I give you… Boswell, having just listened to the entertaining podcast series by the Rest is History exploring that gentleman and his relationship with Johnson. A fellow who documents everything? Sounds just the ticket. it comes in two flavours – a web app which you point at a folder of local markdown, or an Obsidian plugin . I made this for myself, for my own purposes. You don’t have to use it. If you don’t like how I cobbled it together, that’s great, knock out your own.

But if you do decide to give this a whirl…

The idea is that it lets you get a sense of what’s going on and who is responsible for what. There is a very simple and straightforward kanban board. You can generate tasks from the ‘writing’ and ‘experiments’ tab. You can keep rudimentary track of your funds and what you’ve been spending them on. You can keep track of the contact details for any student assistants you might have (I’m always losing student id numbers for when I have to put them into the system.) You can keep track of whether or not you’ve filed the appropriate paperwork to get expenses reimbursed. Make an expense here; it’ll show you how much money you have left.

Everything is saved locally, in markdown, so you can do with the data what you will. It saves automatically, so if you do open things in another application, better to save a copy somewhere else.

…and yes, I’ve been doing this rather than actually get started with my actual project. Boswell would understand.

  1. git clone this repo.
  2. npm install
  3. npm build (or npm run dev if you want to make your own changes.)

You can then run a server in the dist folder eg python -m http.server 8000 or you can load up the version I’ve pushed online at https://boswellapp.netlify.app/. Open in Chrome, Edge, or Opera. If you open in Firefox or Safari and select ‘start new project’ it will show you a demo mode of what the inside looks like.

Image

If you want to try the Obsidian version (https://github.com/shawngraham/obsidian-boswell) , make a subfolder called ‘boswell’ in .obsidian/plugins/ and then you just drop the manifest.json, main.js and styles.css into it. Or you can use the BRAT community plugin to handle that for you (which has the added benefit that if I make any changes to the repo, BRAT will automagically grab them and install for you). When I have this running on my own machine, I make a separate vault for different projects. I open the dashboard and then pin it to the right sidebar, or call it up from the command palette. See the main boswell repo (the web app one) for all the different functionality.

Do I need an obsidian plugin just for this? Of course not. There’s more than enough functionality in ‘vanilla’ obsidian (or whatever combination of existing plugins you’d like) that could do all this. The thing is, that way madness lies. With my plugin, I just have a basic Obsidian set up and my little wee plugin. It just does enough. But I know myself: I am far too inclined to Try All The THINGS, INSTALL ALL THE THINGS MOAR MOAR MOAR. My hope is that maybe this little wee thing is all I need.

We shall see I guess.

(“every johnson needs a boswell” …. erm, no).

Uncategorized

Wyrm – Citation Network Explorer

Recently, I’ve had a number of students wrestling with the challenges of scoping a field, pulling together a literature review, and trying to determine how different pieces might be in conversation with each other. I used to sometimes sit students down with Ed Summer’s old ‘Etudier’ package which was great; but Google Scholar doesn’t really let you pull things out in any way that doesn’t involve scraping, and scraping websites has become very difficult these days – especially Scholar. But there are other options, like Open Alex, Semantic Search, Arxiv. In which case, I give you Wyrm, a visual citation network explorer with metadata download. Source code (it’s a 1-pager) is at github.com/shawngraham/shawngraham.github.io Enter a search term or paper title. See what search returns. Add articles to the canvas. If the services have information about what cites what, that’ll appear. Nodes that are larger have been cited more often. Anything you add to the canvas will be a big red splodge, so you don’t get lost. Export the graph as edge and node list for subsequent analysis/manipulation. DOIs are clickable.

Code is as-is/tel-quel. Knock yourself out!

Image

If bespoke code is a cathedral, and open source code is a bazaar, then Wyrm is my very own mystery house.

Uncategorized

A dh tutorial web app

tl;dr: I build little wonky things to help with my own teaching and research. If all goes well, they turn out to be useful/interesting for other people too. This post is in the spirit of that sharing. A worsening problem I am having is an overall decline in basic digital literacy in my students. Since many of my classes turn on interrogating humanities materials with digital tools, or interrogating the digital from a humanities perspective (ie, DH!) this means I am spending ever more frustrating amounts of time just trying to get people up and running. Nothing is more tedious/maddening that trying to configure several different versions of (rapidly enshittfying) windows, a few chromebooks, some older macs, and the one person who turns up with a linux box. Oh, there’s usually an older ipad or a microsoft surface too.

So I built something to help with that – a dh tutorial platform built on top of pyodide so that my dh students can grapple with the ‘why’ of things first, and leave the nuts-and-bolts for their own machine later, once they’ve worked out what matters. (Benefits of pyodide: everything stays in the local browser.)

It mostly works. I’m not likely to tinker with the framework much more now, but I’ll probably add some more lessons for my Fall teaching in due course, maybe clean out some of the things that are currently in it.

You can try my live work-in-progress at https://test-for-dh-tutorials.netlify.app/ Things might change, be warned.

“The hardest aspect of a DH workshop or course has never been the code work but developing the participants’ intuition for the kinds of research questions that are amenable to computation. Recognizing the computable in humanistic thought and vice versa is a skill, yes, but it also requires a degree of sensitivity and imagination. – Edwin Roland”, No More Tools

Roland explains how the imagination for what is possible can be expanded when we consider small bespoke wonky applications developed for our own purposes, where we carefully spec out the how and why to a machine like Claude. That’s pretty much what happened here (I would not have attempted this without Claude); and what I’m trying to do is expand my students’ imaginations and intuition without crashing out on installation hiccups or fear of ‘breaking’ something.

What might be most useful to the rest of you is not so much the content of this tutorial platform, but the framework itself; repository at: https://github.com/XLabCU/dh-tutorial-platform.

Don’t like my lessons? Did I make some foolish mistakes? No worries. They’re all just markdown files.

Clone my repository.

Delete my markdown in the /lessons/ folder and write your own (following the provided template).

With node installed on your machine,

node install and then

npm run build .

Drop the resulting /dist/ folder onto a server somewhere.

Point your students at it.

Profit!

(It’s a bit more complicated than that, but not by much. There’s a guidance document in the repository for what other files need to be updated if you add/delete lessons. Basically, there’s a modules.ts file with the metadata for the lesson, what broad theme it goes with etc that you update. If you want to change the initial message or the nature of the on-boarding stuff, just change the relevant onboarding.ts file, about page file.)

The thing is clearly taking inspiration from learn-to-code websites, but you won’t ‘learn to code’ with my tutorials. But a student, if all goes well, will learn the language of these things, and get a sense of the possibility space of DH work. That’s the kind of knowledge you need when you’re starting out. These tutorials are low-stakes and low-key (and all answers are provided). Again, it’s about expanding their imagination for what is possible AND make it possible for me in class to do things without getting bogged down in installation etc.

I did initially imagine a self-directed learner coming to this platform, so there’s an on-boarding system that remains a bit wonky but will select for you tutorials that match your interests. You can then follow those, or just pick and do whichever ones you like from the Library. There should be an ‘orientation’ lesson for all tracks that just explains what the interface is doing and what’s going on. There is a time estimate but right now I’ve just set everything to ‘1 hr’ and someday, when I have actual usage in class under my belt, I could update those figures to something realistic; ignore for now.

If gold stars matter, for any lesson you can ‘check my code’ to see if what you’ve written in the sandbox matches what the lesson imagines as the correct framing. Gold stars (I’m being metaphorical here) add up and on the dashboard you get a little badge celebrating your progress. You can ignore that, if you want. When you click to move on to the next lesson, remember to scroll back up to the top of the page. Yes, that’s a wee bug I could probably fix.

There’s a ‘notes’ button that lets you open a little scratch pad for making observations using markdown conventions. The export function exports all of the lessons AND your notes as an Obsidian vault, with appropriate wikilinks (so scratchpad notes link to the appropriate lesson). My thinking here is that you might only have a one-off interaction with this site, but you might want to return to things at a later date in the context of your own personal knowledge management, note making, research etc.

Anyway, the value in all of this is probably not so much in the lessons (which reflect my own take on topics my students ought to learn) and which I’m still in the process of tidying, but in the platform itself. Take a copy, clean my stuff out and fill it back up with your own. No need to file pull requests etc; it’s as-is and do with as you’d like.

…in related news, we’re working on the second edition of the Open Digital Archaeology Textbook Environment, which will be using jupyterlite and is imagined more like an interactive textbook than this present effort. Stay tuned on that front.

Image
Uncategorized

Toys and Other Things, January 2026

I make crap when I’m overwhelmed by other things. By that metric, January was a doozy:

The newspaper utilities thing can be configured with yaml files specifying what kinds of transformations to make; that became the idea behind DHMarginalia, a way of configuring what you wanted to analyze, which I see becomes Polybius which becomes more about the storytelling rather than finding the story.

The Historic Places Explorer can read other csv files, so if you deployed it without the historic places dataset, you get presented with a file upload card and it’ll try to read all that; with a bit more work it could become a bit more swiss-army-knife like (maybe add a way during the upload for the user to specify which columns should be mapped where, sort of thing).

The last suite of things I pulled together in January (and ok, one or two days in February) return to my happy place, sonification. I’ve published a couple of articles related to sonification of data (and not just conventional mappings, either!) as well as a creaky tutorial for the Programming Historian that could stand some overhauling. Some of the most engaging student work I’ve been privileged to be part of were sonification projects pushing public history in neat directions. There’s something wonderful about refusing to visualize or engage with data the way everyone expects you to, a productive contrariness.

On the other hand, doing things differently risks being ignored. If what you do isn’t legible to whoever controls the reward structure in your discipline, that’s a not insignificant risk.

Ah well. As the poet said,

‘fuck it’.

Uncategorized

On listening to the space between: narrative causality, parasitical stories, and language models

“People think that stories are shaped by people. In fact, it’s the other way around.

Stories exist independently of their players. If you know that, the knowledge is power. Stories, great flapping ribbons of shaped space-time, have been blowing and uncoiling around the universe since the beginning of time. And they have evolved. The weakest have died and the strongest have survived and they have grown fat on the retelling . . . stories, twisting and blowing through the darkness. And their very existence overlays a faint but insistent pattern on the chaos that is history. Stories etch grooves deep enough for people to follow in the same way that water follows certain paths down a mountainside. And every time fresh actors tread the path of the story, the groove runs deeper.

This is called the theory of narrative causality and it means that a story, once started, takes a shape. It picks up all the vibrations of all the other workings of that story that have ever been. This is why history keeps on repeating all the time […]

It is now impossible for the third and youngest son of any king, if he should embark on a quest which has so far claimed his older brothers, not to succeed.  Stories don’t care who takes part in them. All that matters is that the story gets told, that the story repeats. Or, if you prefer to think of it like this: stories are a parasitical life form, warping lives in the service only of the story itself.”

  • Pratchett, Witches Abroad

“There’s always a story. It’s all stories, really. The sun coming up every day is a story. Everything’s got a story in it. Change the story, change the world.”

  • Pratchett, A Hat Full of Sky

Consider a world where we have devised a way for every story ever written, every commentary upon every story, every little bit of thing that can be expressed in text, to be collapsed and jumbled altogether. Assume further that we have worked out a way of compressing all of that such that each little thing that there can be, can be expressed as a list of numbers. Position in that list signifies some distance along a particular dimension. Then consider the consequences for us when we went out of our way to plumb such a system into the digital control systems of modern life.

The result, I think, is a world where narrative causality actually exists. We have created narrativium. The system isn’t artificially intelligent. It’s a parasitical life form, warping lives in the service only of the story itself.

And all of the information in that system – which was following grooves anyway, which you’ll be surprised to know can be called ‘culture’ – is now channelling the cuts deeper and deeper. Train a system on fiction where rogue AI takes over the world, and the story duly told will be one where the AI performs trying to take over the world. It will push you there.

It follows then that one ought to be focused right now on understanding what pushes or pulls narrative causality this way or that. This is where ‘the Bliss Attractor’ as discussed by David M. Berry https://stunlaw.blogspot.com/2026/01/the-bliss-attractor.html is important, or the ‘third yes’ https://thejester.substack.com/p/the-third-yes as described by Daniel Bashir comes into play. I am not so clever or well-read in literary theory – I’m more likely to reach for Pratchett than James Joyce – but the story we need to be telling right now is about stories and how they work, and the consequences of making stories real.

How can we map/measure/perceive the narrative causality?

Let’s take a story, and create an embedding model of it. Here is the story of Little Red Embedded Hood:

triples = [
['Red Riding Hood', 'lives_in', 'Village'],
['Red Riding Hood', 'is_child_of', 'Mother'],
['Mother', 'gives_basket_to', 'Red Riding Hood'],
['Red Riding Hood', 'visits', 'Grandma'],
['Grandma', 'lives_in', 'Forest House'],
['Wolf', 'lives_in', 'Forest'],
['Wolf', 'meets', 'Red Riding Hood'],
['Wolf', 'eats', 'Grandma'],
['Wolf', 'disguises_as', 'Grandma'],
['Wolf', 'waits_in', 'Forest House'],
['Woodcutter', 'hunts', 'Wolf'],
['Woodcutter', 'saves', 'Red Riding Hood'],
['Grandma', 'is_at', 'Forest House'],
['Wolf', 'is_at', 'Forest House']
]

And we’re going to express this knowledge in three dimensions (nothing special about ‘3’; I just wanted to keep things simple; I used Ampligraph 2 in a colab notebook hacked and kludged a bit to solve dependency hell).

A lot hinges on how we do the expression. There are a variety of mathematical techniques we could use; here, we’ll use ComplEx, DistMult, and RotatE, where relationships are represented in slightly different ways, with different consequences- basically, something like ComplEx can understand that the directionality implied in a ‘student’ -> ‘teacher’ relationship is different from that implied by a ‘teacher’ -> ‘student’, while DistMult will treat them the same (‘there’s a relationship here, don’t bother refining it’). RotatE on the other hand sees relationships as rotations around a circle, and that lets it model relationships that are symmetrical, are inverted, and composite (where if A connects to B connects to C, there is an implied linkage A to C). That’s about as complex as I’m going to get here talking about the math.

Now, I’ve expressed the story using those techniques giving me three models. Let’s look at the actors and elements of our story along the first dimension for each model.

Image

This geometry, ‘ComplEx’ , has placed the Village at one extreme and Mother / Red Riding Hood at the other (my dots and labels don’t line up well, don’t sweat it). In the first dimension of a ComplEx model, entities that share many of the same “active” and “passive” relations often clump together. Since the Wolf, Grandma, and Woodcutter all converge at the “Forest House,” in the triples the model can’t quite tell them apart.

Image

RotatE views relations as rotations (steps). Its Dimension 0 looks like a linear progression of the plot. The Start (Far Right): We begin at the Village. The Traveler: Red Riding Hood is the next point over, moving away from the Village toward the centre. The Destination: The Forest House, Grandma, and the Wolf are clustered in the middle. This is where the “climax” of the story happens. The Resolution (Far Left): The Woodcutter and Mother are at the opposite end.

Image

DistMult is symmetric, so it’s not looking at journeys; it’s looking at who is “like” whom in terms of their relationships. The Outlier: Grandma is all by herself on the far left. She is the only one who is “eaten” and “visited” and “lives_in” a specific place without hunting or giving. She is a unique “node of vulnerability.” The Protectors (Far Right): Mother and Woodcutter are together. They are the “Adults” who provide or save. The Story Core (Middle left-ish): Red Riding Hood, the Village, and the Forest House are in the middle-ish. They are the “Connectors” that hold the triples together. The Gap: Notice how the Wolf is isolated in the middle-right-ish. He is near the Forest, but far from Grandma. In this model, the “Empty Space” around the Wolf is his alienation. He doesn’t belong to the Village group, and he doesn’t belong to the Protector group. He is a predator in a void.

NOW. Consider the kind of story/groove/world that each of these models implies, above and beyond ‘mere’ narrative. Can you see the consequences? Can you describe the nature of the groove?

~

Embeddings underpin the large language models that have been shoehorned into everything, polluting everything, changing us, pushing us. My embeddings above are built on knowledge graph triples; the big models use the raw text of the world itself, and perform all kinds of transformations as they go. But in the same way there’s a little four-legged hairy mouse-like creature in the ancestry of every mammal, my simple models capture the main bits. Don’t get too hung up on it. The key thought: what are large language models but Pratchett’s parasitical life-forms, ‘warping lives in the service only of the story itself’? What stories do the transformations themselves prioritize, give parasitical life to? What is their nature? What do they prey on? What do they need in order to express themselves? 

In my classes, and in my Practical Necromancy book, I describe one possible way of trying to map this impact; there I called it ‘mapping the behaviour space’ drawing explicit connection with how archaeologists validate and understand the outputs of agent based simulations. With the texts that LLMs generate, one could sweep the hyperparameters for a particular prompt a thousand times, and then topic model the results; the resulting visualization would draw your eyes to the grooves/gravity wells/attractors around that particular prompt (in the book, following the ‘necromancy’ metaphor, I called them the ghosts in the data).

But, instead of looking at consistencies over output (the attractors, the grooves, the ghosts), I want to suggest the idea that the space between the grooves could be where we break free, where we change the story.

Change the story, change the world.

Digital humanities is about deformation. So I started developing a sonification, to make all this (even) strange(r). In ‘Wikiphonic’, https://shawngraham.github.io/wikiphonic/  I developed a little webtoy that pulls from the Wikipedia api articles that are geolocated close to the user’s position (assuming that personal familiarity with the subject matter will help with the intelligibility of the result). The user selects a starting article. The short snippet for each article is pushed through an embedding model on the user’s device, expressing the text in 384 dimensions. I slice those up, and map various sound/music transformations to it. The user selects the second article, and it too is transformed. Then the app subtracts the second vector from the first, and turns the result into music.

It’s not pleasant music; that’s not the point. The point is that you now have generated something new from the latent space between the two articles, what an article if it existed there in that place would sound like. That location is where the groove-in-the-hillside is weakest, what Narrative Causality has avoided. In the same way my little visualizations of the different embeddings for Little Red Embedded Hood capture the models’ different visions of the world (and so, the space between points would suggest different things in the different embeddings), Wikiphonic is capturing the space between the parasitical life forms, making it present for you to explore.

If we understood the space between (or even if we could merely perceive it), we’d know what we’re dealing with, I think. Understanding the space between, I think, could give us a wedge into understanding the deep grooves being cut for us. Literally, I am trying to get us out of our ruts. You might not like my approach to the problem, and yeah, for a first attempt, this is still quite thin. My metaphors could use some work. I feel no shame in mixing them. And the way generation works versus vector analogies and so on… yeah, still lots of things to think about. That’s fine; it’s ok to be wrong. But we need to develop our own (dh) ways of exploring these parasitical life forms, Pratchett’s Narrative Causality made flesh in the world:

Change the story, change the world.

Uncategorized

A Data Exploration Dashboard for the Canadian Register of Historic Places

On January 19th, the National Trust for Canada (a heritage charity) raised the alarm about the imminent closure of the Canadian Register of Historic Places website and database. The register is at https://www.historicplaces.ca/. The National Trust’s post contains the details about which ministers at the Federal and Provincial levels to contact to protest this act of vandalism. It’s one thing to sunset a website that is creaky and becoming too awkward to maintain (especially I’ll bet in the face of AI scraper bots). It’s quite another to do so without any word or seeming plan on what will happen to the data. A lot of the data exists as paper records in offices scattered around Ottawa, Hull, and Gatineau, but there’s something to be said for having all of that information online!

Now, a number of people have stepped up to begin mirroring the site as it currently exists. This morning I learned of Bluesky user ‘Halifax Shipping News‘ who has mirrored the site at https://builthalifax.ca/HistoricPlaces/. That’s excellent.

But… maybe there could be more DH ways of having that information online? If you examine the html for any single record on https://www.historicplaces.ca/, you’ll see that everything that is being displayed is also captured in the meta tags in the html. I developed a polite scraper that uses github actions (manually triggered) to retrieve the metadata (code is here). Then I started developing a dashboard that loads that json up, and enables a variety of views and filters. You can visit it here: https://shawngraham.github.io/historicplaces/ . Ever wonder how changing ideas of what constitutes ‘historic’ can be visualized or explored? You can do that with the dashboard. Did I miss a view? Take a copy of the data and the index.html from https://github.com/shawngraham/historicplaces and modify at will, or build new from scratch.

Each entry in the enormous json file (I should probably split the json up and load it in chunks) has the following kinds of values and keys:

{
"id": 13983,
"name": "Steeves House",
"other_names": "Steeves HouseHon. William Henry Steeves House MuseumMusée de la maison de l'hon. William Henry SteevesW.H. Steeves House MuseumMusée de la maison W.H. SteevesSteeves House MuseumMusée de la maison Steeves",
"location": "40 Mill Street, Hillsborough, New Brunswick",
"address": "40 Mill Street, Hillsborough, New Brunswick, E4H, Canada",
"province_territory": "New Brunswick",
"latitude": 45.92525,
"longitude": -64.64383,
"jurisdiction": "New Brunswick",
"recognition_authority": "Province of New Brunswick",
"recognition_statute": "Historic Sites Protection Act, s. 2(2)",
"recognition_type": "Historic Sites Protection Act – Protected",
"recognition_date": "2009/09/01",
"formally_recognized": "2009/09/01",
"listed_on_register": "2009/10/07",
"status": "Published",
"description_of_place": "The Steeves House is a two-storey wood-frame Neo-Classical-inspired house built between 1812 and 1840, with several additions and modifications since its original construction. The residence is located on a 3,481 square metre lot on Mill Street in the Village of Hillsborough near the Petitcodiac River. It currently serves as a museum relating to the Steeves family and to the history of the region.",
"heritage_value": "The Steeves House is designated a Provincial Historic Site for its association with Hon. William Henry Steeves. The house is the birthplace of the Hon. William Henry Steeves, one of New Brunswick’s Fathers of Confederation. He was a judge in the Lower Court of Hopewell, New Brunswick, as well as the first postmaster of Hillsborough and the first Minister of Public Works in New Brunswick. He and several of his siblings operated a mercantile and international lumber export business with headquarters and several stores in Saint John, New Brunswick and offices in Liverpool, England. Steeves was an appointed New Brunswick delegate to the 1864 Pre-Confederation Conferences in Charlottetown and Quebec City. He assisted in the creation of the “Seventy-Two Resolutions” at the conference in Quebec that formed the framework for the Canadian Constitution. The Hon. William Henry Steeves is recognized by the Federal Government as a Person of National Historic Interest for his significant contributions to Canada. The Steeves House is also recognized for its association with the Albert Manufacturing Company, later taken over by the Canadian Gypsum Company. For about 100 years, this company was the principal employer in the village and environs. In 1871, the house became the residence of the plant manager of the gypsum mill. There were at least seventeen mill managers who eventually resided in this spacious home. \n\nSource: New Brunswick Department of Wellness, Culture and Sport, Heritage Branch, Site File: “Steeves House” #132.",
"character_defining_elements": "The character-defining elements relating to the placement and grounds of the Steeves House include:- location offering sightlines to the Petitcodiac River and the site of the former Canadian Gypsum Company.\n\nThe character-defining elements relating to the architecture of the house include:- original one-room cottage discernable from nearly 200 years of extensions and alterations relating to the progression of occupants of the home;- large attached barn;- central wooden door with sidelights- evenly-spaced 6-over-6 windows;- bay window on the east façade with six 4-over-4 narrow windows affording a view of the Petitcodiac River;- wide corner boards with capitals and original clapboard siding;- window and entrance entablatures;- two chimneys, one of which is placed at right angles to the ridge board;- raked chimney visible in the attic;- masonry constructs in the basement, including two large water cisterns used to heat the home and thick tapered stone walls.\n\nThe character-defining elements relating to the interior of the residence include:- four fireplaces;- curving central staircase with mahogany railing;- original crown moulding in the dining room and bright blue tiles around the fireplace, said to be from the 18th century;- narrow servant back staircase featuring thick glass inlays in the stair treads;- several unusual storage areas with concealed shelves.",
"construction_date": "1812/01/01 to 1840/01/01",
"significant_dates": "1871/01/01 to 1871/01/01",
"architect_designer": null,
"builder": null,
"function_category": "Leisure",
"function_type": "Museum",
"fpt_identifier": "1850",
"location_of_documentation": "New Brunswick Department of Wellness, Culture and Sport, Heritage Branch, Site File: “Steeves House” #132.",
"primary_image": "/hpimages/Thumbnails/59060_thumb.jpg",
"image_urls": [
"https://www.historicplaces.ca/hpimages/Thumbnails/59060_Medium.jpg",
"https://www.historicplaces.ca/hpimages/Thumbnails/59061_Medium.Jpg",
"https://www.historicplaces.ca/hpimages/Thumbnails/59062_Medium.Jpg"
],
"scraped_at": "2026-01-26T01:12:09.347907",
"source_url": "https://www.historicplaces.ca/en/rep-reg/place-lieu.aspx?id=13983",
"raw_html_path": null
}

You can see there’s a lot one might do with that kind of rich data! So kudos to Parks Canada for putting up such well-structured data in the first place. One respondent to my initial posts on scholar.social suggested trying to integrate with wikidata; that’s totally doable though I’d have to go learn how.

What would you do with rich data on over 13 000 historic sites?

Image
Uncategorized

3d digital object display for your local small museum

When you go to one of the big museums, you expect to see cool displays, cool interactives, gee-whiz digital wizardry. Smaller museums, not so much. I wanted to figure out what could be done for a smaller outfit. So I started playing.

I’ve had students make Pepper’s Ghost type displays in the past using cd jewel cases to make a truncated pyramid and then using a powerpoint template to mirror the recording of the model so that the recording appears in each face of the pyramid. But I want something more.

I want to be able to display a 3d digital object so that everyone clustered around the display sees the correct view – if you’re at the back, you see the back of the object while at the same time the person at the front sees the front. One lucky viewer could drive the thing using a mouse to spin it around, zoom in, zoom out, and everyone else would see the correct view. And why stop at a mouse? Why not use hand gestures to control the display? An ultra leap motion controller isn’t cheap, but it isn’t all that expensive either, about $260. So I thought it’d be good to figure out how to do that too.

The vision: simple software & reused hardware combined in an attractive mount, allowing visitors to interact with your 3d content without fancy goggles, or expensive components.

Image

The positioning of that ultraleap motion controller, on second thought, would probably be better along the bottom front edge, so you don’t look like a maniac when you manipulate the digital 3d object. But I digress.

What you need

  • 3d model file of an object or artefact (lots of different ways to achieve this. I’ll assume you’ve already got some. Otherwise go to somewhere like sketchfab.com and see if anything strikes your fancy from the free downloadable ones).
  • a spare second monitor connected to your computer, or an ipad.
  • ultra leap controller if you’ve got one, but not mission critical.
  • clear acrylic sheet (11″ x 14″, $12, from Michael’s; other sizes are available. Go big!)
  • pyramid template (I used this one from the Ontario Science Centre)
  • some way of cutting the sheet.

Make the pyramid

I am not handy, and I ended up cutting my hand. I was using a draw-knife to cut the acyrlic sheet. I laid the template down and scaled it appropriately to make the most of my sheet (I ended up with a pyramid 15 cm x 15 cm across the base), traced the edges of the different faces of the pyramid onto the acrylic, and away I went. I used tape to hold it altogether. Were I more crafty I’d use some kind of clear glue. But for an experiment, this is fine.

Get the scripts

I spent some time cooking up two scripts. The first one is all you actually need: https://github.com/shawngraham/holoview (the second one is for the gesture control, see below).

The secret? It’s just a webpage! Just a single html file. It uses three.js https://threejs.org/, to handle loading the .obj, displaying it, and manipulating it.

To point it at your own model, you just change the filenames; the page is expecting an .obj file, a .mtl file, and a texture file in a subfolder. Those three files together specify the geometry and the look of the finished model. The lines in index.html that you’re looking for are lines 77 to 86 (the .mtl file itself has a line that points to the texture, so you shouldn’t need to worry about that):

    const mtlLoader = new MTLLoader();
    mtlLoader.setPath('dachshund/'); // change this to the name of the folder with your model

    mtlLoader.load('Dachshund-bl.mtl', (materials) => { // change the name of the .mtl file
        materials.preload();
        const objLoader = new OBJLoader();
        objLoader.setMaterials(materials);
        objLoader.setPath('dachshund/'); // change this to your folder again

        objLoader.load('Dachshund-bl.obj', (obj) => { //and change this to the name of the .obj file
        
        

Now, assuming you have python on your computer, you can start a webserver in the folder that has your index.html file with python -m http.server 8000 and go to localhost:8000 and you’ll see the model correctly positioned. Spin the model, and the different views spin correctly to keep everything aligned.

If you don’t have python, or that sentence made no sense to you at all, zip your folder up (the one that has index.html and the subfolder with your model in it) and go to https://app.netlify.com/drop . Drag the zip folder onto the circle in the middle of the screen, and the Netlify service will set you up with a webaddress where you can see see your model arranged rather like this:

Image

With a second spare monitor connected to your computer, drag your browser window over to that second monitor. Lay that monitor down flat. Place your pyramid broad side down. Make sure its brightness is turned up, then turn off the lights. Peer into the pyramid at eye level. Voilà! Your model floats in the air – and if you have friends gathered around, they will all see the correct aspect of the model from their vantage point. Use your mouse to spin or zoom the model.

If you put it online, then you can use an ipad, same idea. The index.html recongizes mouse and touch. If you took a copy of my script by forking, you can just turn on github pages and navigate to that, eg in my case, https://shawngraham.github.io/holoview/

Image

The screenshot below is blurry because a) my hands shake and b) I made the acrylic dirty as I cobbled the thing together.

Image

I intend to do this with a raspberry pi, and a spare monitor, where I 3d print a case for the monitor and the computer. I also intend to use gesture control so a visitor can move the model around with their hands, adding a sense-of-touch-at-one-remove, as it were. The script for gesture control is at https://github.com/shawngraham/mouseLeap . It’s still a little finicky, but if you only have the one browser window open when you run that script, it should work fine. Hey, it’s early days.

Image

So there you have it. A single-page website that loads your model and handles displaying it and interacting with it so that, with a pyramid placed on the screen, the reflected images hang in space and can be viewed from four different directions correctly. The bigger the pyramid, the better.

A nice case will make this all seem much more polished. I mean, it’s a helluva lot cheaper than a Voxon. I mean, look what this person accomplished! https://www.instructables.com/Hologram-Display-Peppers-Ghost-Projector-Part-2/. And I’ll bet – though I don’t have any data to back this up yet – it would make for a much different visitor experience.

Uncategorized

Futzing with Newspaper OCR

My kid and I were listening to a podcast about Jack the Ripper. We started talking, and I mentioned how newspapers would reprint stories from each other. This led to us developing an interesting question: did our local newspaper print anything about the Whitechapel murders, and if so, what would that have meant to the people of this community? I filed the thoughts away for future reference, but then later saw some posts about Anastasia Salter’s session at the MLA on agentic coding for the humanities (January 2026). I looked up her course materials, and thought I would follow along with them, using their guidance for prompting Claude Code with our initial question.

Claude was great and making a nice one-page html visualization from the eventual analysis of the OCR. Anastasia’s directions were clear and we had fun. But the tricky bit – ’twas ever thus – was the bloody OCR. Our version used pytesseract. Vision models from Google etc do extremely good OCR, if you’ve got an api key and are paying for it. I wanted to keep things on my local machine though. So I futzed with pytesseract, paddleocr, and surya. Pytesseract is fast but all over the map; Surya is pretty good but can get flumoxed; paddleocr just freezes my damn machine up. But the hardest bit was just getting the newspapers chopped up into small enough bits so that I could process them.

The Shawville Equity was scanned some time ago by the provincial archives, the BANQ; here’s the very first issue from 1883. There is a text layer in the pdf from the BANQ, but it looks to have been done through an automatic process without human intervention, so images are sometimes askew and the underlying text is often very very poor indeed (look up the work of Ian Milligan on the consequences for research of bad newspaper OCR).

The process that ended up working best involved counting on the Equity to maintain its 5-column layout and horizontal spacers. The Equity, as befits a publication with over 140 years of history, has gone through some layout changes over the years.

Image

That slight skew drives me nuts.

Anyway, we preprocessed the paper by trying to identify those vertical lines and horizontal spacers then chopping the image up accordingly. Because the OCR stuff memory-wise works better on smaller images, we also chopped up anything longer than 2000 px. The coordinates are all mapped out in the output json so that everything can be stitched back together again.

I spent way too much time trying to solve what has been solved by vision models more or less but the more things we can do locally, the better. So, knock yourself out: here are my scripts for futzing with newspapers.

Oh, and yeah: The Shawville Equity did publish stories about the Ripper. And at the same time, they printed a bunch of stuff about the Burke and Hare murders too, for good measure. They published a few stories shortly after the murders in Whitechapel started, and then returned a few years later for good measure. So still an interesting question to explore…

Uncategorized

Reading is Note Making is Thinking is Writing

I need to workshop my titles more. But anyway: this post reflects on teaching the history of the internet to a class of 160 first year students in a world where generative AI shabbiness is pushed on them and a perfectly rationale way to deal with the myriad pressures and bad choices of being a student is to go ahead and use it. What’s a prof to do?

The first thing I tried to do was use the metaphor of going to the gym: you go to exercise your body and get stronger. If there was a machine that lifted weights for you, could you go, turn it on, point to it and say, ‘look, weights have been lifted! I have therefore exercised!’ No, you cannot. But – the same error is made in university classrooms all the time. Look! An essay has been written! Give me my grade! And I don’t need to spill any more photons, bits, or ink over the instrumentalization of higher education that has led us here. Instead, here’s how I tried to deal with it this term. And no, I didn’t put any trojan horse prompts into my assignments.

Instead, I chose to focus on reading and notemaking.

By hand.

I asked all students to get a little paper notebook. I showed them the readings; I showed them hypothes.is; we talked about how to read and what to pay attention to (“don’t read it through like a novel! Read like a predator! Go to where the game is!” etc etc). Then in class, I asked them to do two things for a given reading: write a rhetorical précis (using a model developed by historian Chad Black) and a research memo-to-self that pulls together one’s observations and annotations. They had to do this cold. In my lecture hall. No computer. No phone. No notes. (Students with accommodations: I made accommodations.)

I also told them: we’re doing this multiple times throughout the semester. You’re going to have off days. I’ll take your best 3 of 4 examples for grading. And we graded at first for the format, for the shape of what we were after, and then started pushing them towards deeper engagement with the content. They were always encouraged to filter these ideas through my lectures too. A pretty good example (though not perfect) of what we were after is this composite of a couple of student’s responses, after reading a longer blog post by Doctorow on Enshittification:

PRECIS
MAJOR CLAIM: Doctorow argues in his McLuhan lecture on enshittification (2024) that platforms degrade through a three-stage process of user exploitation, business exploitation, and shareholder extraction leading to a world of digital decay known as the enshittocene.
HOW: Doctorow develops this argument through a detailed case study of Facebook, tracing the three stages of enshittification (from user surplus to business surplus to shareholder surplus) while systematically dismantling the historical constraints that once prevented such decay, and showing how the erosion of competition, regulation, self-help, and labor power enabled the collapse of digital trust.
PURPOSE: The author’s apparent purpose is to diagnose the systemic decay of digital platforms and show how it spreads across industries in order to empower users, workers, and policymakers to reverse the trend and build a more equitable, open digital world.

MEMO
INITIAL OBSERVATION: WHAT IF there’s a connection with the Bory piece; what if tech ceos believe themselves to be the hero of the journey? This’d create a cultural narrative in which enshittification is not a failure, but a necessary stage of progress. THEN this mythos might normalize the extraction of surplus from users, workers, and business partners, treating exploitation as a form of “service” or “evolution”? #to-investigate #possible-thesis
KEY: The reading matters because it reframes enshittification not as a technical process, but as a cultural one. #cultural-processes
MY CONTRIBUTION: Doctorow’s framework shows how platforms collapse through a three-stage exploitation process: user → business → shareholder. There’s a connection here with Bory’s critique, which reveals that this process is culturally enabled by a narrative in which the founder is the hero, and the platform is the vehicle of a moral mission. When founders say, “I created this to serve humanity,” they are not just describing a product; they are enacting a myth. And when that myth is accepted, enshittification becomes not just a crisis, but a natural consequence of leadership.

These for the most part got better as the term went on. However, it took us longer to grade them than I would’ve liked. I transformed the final exercise from another round of precis/memo combos (we’d do 2 per session) to one last class workshop on ‘how to write with these things’ (where grading was pass/fail did-you-do-the-thing?-full-points).

The idea is, a student would look at these precis/memos and think to themselves, ‘what’s the story here? How do these observations speak to one another?’ How you look at things – ie, historical theory – guides your attention to some ideas rather than others. It being the last week of term, I wanted to do something fun first to get them in the mood, so today we did a kind of team debate-cum-tournament style sort of thing, where suggestions for the most important people/ideas/technologies of the history of the internet were gathered. These were arranged into a bracket. For each round in the bracket, I suggested a different lens through which the disputants were to make their argument for the greater importance of their person/idea/technology. Winners were chosen through applause from the class (y’know, I forget the winner? But I think it was between the ENIAC women and Vannevar Bush). And do you know, students were drawing some pretty nifty arguments from their precis/memos to do this, bouncing ideas off one another. It was neat to see! And difficult: the power went off during class and we did this via the blackboard and cellphone flashlights (internal lecture theatre without windows).

On wednesday this week, the idea is the students will have their precis/memo combos ready to hand. I’ll say, ‘let’s assume we’re looking at the history of the internet through a social history lens. What have you got that speaks to that or could be informed from that?’ The idea is, they’ll make a list (with page & paragraph numbers, since they’ll have numbered the pages in their booklets) of these interesting observations. We’ll do some think-pare-share: show your neighbour what you’ve got. Then, I’ll have them create an outline with each element they have, beginning with: where’s the question here? They’ll reorder their useful observations such that there appears to be an emergent story or argument. At that point, I’ll ask them to think about ‘what is missing? What pieces of connective tissue do you have to write?’ … and they’ll then make quick notes about what they’d need to look into or write to make the tissue of observations whole.

This will be what they need to do for the final exam, so I’ll give them the exam question on Friday (in the exam room: no aide-memoire. They’ll have had to work through their materials before going in). I’m feeling pretty good about this.

And that’s how I’ve moved through reading -> note making -> thinking -> writing in an age of generative AI.

Yes, this was a lot of work. And I find language models interesting to explore. But that doesn’t mean I think they have any business in a first year class.

Post script: Because I’m interested in code, and in the way generative AI as an average machine spit out things that work (for a given value of work) I also like to try cooking up small one-page html tools since I know precious little about javascript, react, etc. I built a little outliner tool that I will use in class on wednesday to explain the concept that I am after, on the big screen. And maybe some of my gang will find it useful. You can give it a whirl here: https://shawngraham.github.io/outliner/ and you can grab the html from here: https://github.com/shawngraham/outliner/.


Uncategorized

Machines Reading Latin Epigraphy

in which I try to retrain/fine-tune a spaCy model on Latin inscriptions

There is a lot of Roman epigraphic data online; the EDH is a great source for this. But none of the databases (at least, the ones that I have looked at) seem to provide a version with structured demographic or onomastic or whatever data derived from the inscriptions. Presumably that data is out there – epidoc formatted xml would have what I’m after, I should think – but I thought, what the hell: how hard could it be to train a spaCy model to read Roman inscriptions, which are after all famously formulaic? They don’t call it ‘the epigraphic habit‘ for nothing, right? If the average Roman could read them and understand – with their relatively low level of functional literacy – then a machine should be able to do this? …Right? Reader, it was harder than I thought.

Be Warned: My epigraphic experience is limited to the scintillating world of stamped Roman bricks. And it’s been over twenty years since I really futzed in any meaningful way with Latin. Caveat lector.

The idea is therefore:

  • download real data
  • annotate the data with the start and end positions of the different kinds of structured data that I am after
  • enhance this data with synthetic examples so that I get enough coverage of the different kinds of elements (the origin of the deceased in a funerary inscription is not as common as listing their cognomen, right? So training on exclusively real data would overfit on some things and miss others, right? That was my logic).
  • harmonize the synethic data with the real data so that any annotation label glitches in step 2 get sorted out
  • fix alignments so that annotations do not overlap
  • train.
  • (as an aside, how the f*n hell do you get the Gutenberg editor to give you a numbered list? THIS kind of shit is why I don’t blog very much any more: it’s such a bloody pain in the ass!)

Yes, I had help from Claude Haiku 4.5 and Gemini 3 Pro Preview for the fiddly bits. I downloaded data in two tranches. The first batch I tried downloading via the API and so got the inscriptions but didn’t realize I was leaving a lot of useful metadata behind – the second tranche I got from the EDH data dump website itself, where some of the metadata was provided by virtue of the column headings. I dropped the first tranche through a local LLM (Qwen 3) with instructions on returning jsonl data with annotations… that was an enormous pain in the arse and ultimately largely a waste of time. But I did get around 450 lines of stuff that was annotated sufficiently I could use it. The second tranche was easier- I downloaded the dump, filtered columns using Excel so that I got around 750 rows where an inscription had metadata for each column of interest. That was a reduction of tens of thousands of rows of data to just under a thousand (!). I converted each row to jsonl.

This next bit was where I had the most help from the big-ass LLMs. I devised the logic for an inscription generator that would use the Roman’s own epigraphic habits as rules for generation. It is probabilistic and is pretty good, for the most part, at creating legible inscriptions (I am reminded of John Clarke’s 19th century Eureka machine for generating Latin hexametre verse). Then, looking at what kinds of things my real data contained, I tweaked the generator so that it would produce examples to fill the gaps. I ended up with a ratio of about 2 synthetic examples for every 1 real example.

After that, it was just a matter of training.

LABELPRECRECF1
AGE0.960.960.96
COGNOMEN0.880.780.83
FORMULA0.940.680.79
MILITARY_UNIT0.880.730.80
NOMEN0.750.760.75
OCCUPATION0.840.630.72
ORIGO0.730.460.57
PRAENOMEN0.780.840.80
RELATIONSHIP0.970.790.87
TRIBE0.750.730.74

If you look at the ‘train your model’ jupyternotebook (in the repo), you’ll see where I ran the model against the testing split (dev.jsonl); these were the metrics:

  • Total Predictions: 4426
  • Total Gold Labels: 5034
  • Correct (Exact Match): 3847
  • Precision: 0.869
  • Recall: 0.764
  • F1 Score: 0.813

Now – more real well-annotated examples, complemented by synthetic ones to fill the gap, might lead to higher scores, but the real proof is in the pudding, not in these test-case scenarios. It might be that I’ve got a model here that’s really good at… my bespoke admixture of read/synthetic. It might (probably will?) fall down when thrown against your data. But that’s what makes this fun. So… give it a whirl on your own epigraphic data, see what percolates out? Feel free to modify, make better.

Repo at: https://github.com/shawngraham/latin-epigraphy-model