Text Mining the Republic of Letters
Tags: Digitization, Editions, Electronic Enlightenment, France, Mapping the Republic of Letters, Networks, Text Mining, Visualization
Podcast available on the seminar page!
In the fourth paper of our seminar series on Thursday 17 May, Dr Glenn Roe – formerly of the University of Chicago, and current Mellon Fellow in Digital Humanities at Oxford’s OERC – gave a sophisticated and suggestive paper on ‘Text-Mining Electronic Enlightenment: Influence and Intertextuality in the Eighteenth-Century Republic of Letters’.
.
Building on his recent work with the Electronic Enlightenment corpus and other online repositories of long-form historical text, Glenn started his talk by observing the irony that the recent efflorescence of big data, culturomics, network analysis, and other quantitative approaches to culture – focusing in many cases on the macro interpretation of metadata over content – has authorized and promoted a convention of ‘not reading’ within the digital humanities, in which historical texts themselves can be marginalized or effaced altogether by the superabundance of information. The ready modelling of letters as a finite number of abstract datapoints (sender, recipient, and so on) and the vast quantities of diverse and often disorganized information exchanged within epistolary systems makes correspondence highly susceptible to such an approach.

Glenn during discussion.

Visualizing influence.
As a supplement to this ‘distant’ reading, Glenn went on to demonstrate the potential of the latest machine-learning technologies to render significant volumes of transcription meaningful via text mining and the automated creation of patterns, frequencies, statistical models, and other forms of ‘mediated’ or ‘directed’ reading. Glenn distinguished between three kinds of text mining: predictive classification (used to generate new categories from unprocessed texts); comparative classification (used to correct and refine existing categories within processed texts); and similarity (used to measure broader similarities between documents and parts of documents, especially in terms of the identification of meaningful borrowing and instances of intertextuality). He then demonstrated each kind of approach within a rich series of examples drawn from his work with the ARTFL Encyclopédie Project, and most recently Electronic Enlightenment, before concluding his analysis by presenting – with caveats – some preliminary radial visualizations of textual influence generated using the D3 JavaScript library.
With its whirl of bunting, teapots, cotton frocks, and the river pageant, Diamond Jubilee fever is sweeping the country for just the second time in British history. Early modern subjects might not have had the opportunity to celebrate a sixty-year reign, but it’s clear from 


Moving on to more methodological questions, Alison explained that capturing and communicating significant information on the material and visual features of letters, such as the writer’s use of ‘significant space’, paper quality and size, the employment of colourful silk ribbons and flosses, seal choice, and the many varieties of folding, can be particularly difficult in a digital environment, which has a tendency to reify disembodied text at the expense of the letter-object (concerns also raised by 
Since its arrival from the New World and a serendipitous combination with milk and sugar the cacao bean has held European taste-buds in its thrall, and those who craved ‘a fix’ during the Lenten fast might empathise with 


Join