* You are viewing Posts Tagged ‘Text Mining’

Text Mining the Republic of Letters

Podcast available on the seminar page!

In the fourth paper of our seminar series on Thursday 17 May, Dr Glenn Roe – formerly of the University of Chicago, and current Mellon Fellow in Digital Humanities at Oxford’s OERC – gave a sophisticated and suggestive paper on ‘Text-Mining Electronic Enlightenment: Influence and Intertextuality in the Eighteenth-Century Republic of Letters’.

.
Building on his recent work with the Electronic Enlightenment corpus and other online repositories of long-form historical text, Glenn started his talk by observing the irony that the recent efflorescence of big data, culturomics, network analysis, and other quantitative approaches to culture – focusing in many cases on the macro interpretation of metadata over content – has authorized and promoted a convention of ‘not reading’ within the digital humanities, in which historical texts themselves can be marginalized or effaced altogether by the superabundance of information. The ready modelling of letters as a finite number of abstract datapoints (sender, recipient, and so on) and the vast quantities of diverse and often disorganized information exchanged within epistolary systems makes correspondence highly susceptible to such an approach.

roe_1

Glenn during discussion.

roe_2

Visualizing influence.

As a supplement to this ‘distant’ reading, Glenn went on to demonstrate the potential of the latest machine-learning technologies to render significant volumes of transcription meaningful via text mining and the automated creation of patterns, frequencies, statistical models, and other forms of ‘mediated’ or ‘directed’ reading. Glenn distinguished between three kinds of text mining: predictive classification (used to generate new categories from unprocessed texts); comparative classification (used to correct and refine existing categories within processed texts); and similarity (used to measure broader similarities between documents and parts of documents, especially in terms of the identification of meaningful borrowing and instances of intertextuality). He then demonstrated each kind of approach within a rich series of examples drawn from his work with the ARTFL Encyclopédie Project, and most recently Electronic Enlightenment, before concluding his analysis by presenting – with caveats – some preliminary radial visualizations of textual influence generated using the D3 JavaScript library.