A £9.2m grant has been awarded to the ‘Living with Machines’ AI research project at Queen Mary University, which will use new artificial intelligence techniques to analyse historical sources.

The project will focus on the century after the first Industrial Revolution and from 1780 to 1918, looking at the changes in all aspects of society brought about by the advance of technology.

New methods of research in artificial intelligence and data science will be used to analyse historical resources and digitised collections.

It will take place over five years and will see Queen Mary work with The Alan Turing Institute and the British Library.

The project will give insight into the debates and discussions happening around today’s digital industrial revolution.

Collaboration between humanities teams and scientists

There is a lot of similarity in the scope of changes seen during the Industrial Revolution from 1760 to the changes we see happening today.

The pace of change has actually quickened, but society is more familiar with the idea of technology and the progress it brings.

The Living with Machines project will use AI capabilities in analysis and data science to help curators, historians, geographers and computational linguists collaborate and devise new research methods.

It will also try to break down barriers between academic traditions by bringing data scientists and software engineers from The Alan Turing Institute together with curators from the British Library.

A new research paradigm

Queen Mary University senior lecturer in Renaissance Studies and project lead Dr Ruth Ahnert said: “For me, this is more than just a research project.

“It is also a bold proposal for a new research paradigm.

“That paradigm is defined by a radical collaboration that seeks to close the gap between computational sciences and the arts and humanities by creating a space of shared understanding, practices, and norms of publication and communication.

3 Things That Will Change the World Today

“We want to create both a data-driven approach to our cultural past and a human-focused approach to data science.”

Dr Ahnert added that: “The overarching challenge is to break down a humanly expressed research question into logical steps we can operationalise in a big data and machine learning environment. That’s not as easy as it seems.

“One truly new thread is that of embracing uncertainty: data science can often be working in a black and white, true or false world: one where a bank balance will always add up to the same value, no matter what.

“Living with Machines is about exploring the past and reflecting on society, it will always be subjective.

“What is key here is to let those biases filter through the whole research process, to the point where a search for a particular topic will alongside results also say what’s not there, for example, are the voices of women or particular political views not found? Can the machine prompt a more balanced query?”

New methods in data science and AI

The Queen Mary University project will use natural language processing to model language in the sources, automatically identifying when new topics emerge, such as industrialisation, and showing how they are represented in the discourse.

It will also consider how discussion around topics changes over time, and between geographies and communities.

Ahnert said it is vital to capture the sentiment from various perspectives, groups and people.

The research will distinguish the names of people, places and processes, which is useful for searching, and will work out, for example, where Smith is a person and not a blacksmith.

This disambiguation will use algorithms based on, and understanding, context.

This means that human interest stories otherwise buried within millions of pages of newspapers can be pulled out of the sources.

There is also a technique to mix distant and close reading, exploring texts at a higher level and then zooming into the content to study people and communities and then individuals.

Ahnert described this as flipping the research process so that instead of looking for a keyword, which the researcher already knows, the data tells a story.

The team expect 100 Terabytes of textual content and an additional challenge will be giving access to everyone so that there is the best setting for collaboration and developing new ideas.