The Impact of Missing Data on Network Centrality Measures

by Joe Chick (University of Warwick). This blog is based on the author’s presentation to the Economic History Society’s annual conference 2021 (session NRIE)

Social Network Analysis is at the cutting-edge of history research in the twenty-first century Originating in sociology, this method allows researchers to explore relationships between people. In practical terms, it involves collecting the names of people who appear together in source material and viewing them as linked ‘nodes’ in a network. My research utilises this method to explore the levels of interaction between top-level civic officials and lower-status inhabitants.

The use of social network analysis in pre-modern history has only recently emerged, so many questions over its robustness are yet to be explored. Rather than presenting the historical conclusions of my research, this paper tests the robustness of the dataset, which comprises 298 surviving Reading wills between 1490 and 1589. These years were chosen because the dissolution of Reading Abbey, the manorial lord of the town, occurs at the midway point, allowing for a comparison of the pre- and post-dissolution data. The wills in question were registered with two different courts. Wealthier testators registered their wills with the Prerogative Court of Canterbury, while less wealthy ones used the Berkshire Archdeaconry Court. The balance of wills between the two courts changes before and after the dissolution, which was one motivation for conducting a test of the extent to which source survival affects the robustness of the network analysis method.

A common statistical tool used by researchers is centrality. Researchers must choose between several different ways of measuring centrality. Given that virtually every medieval dataset is incomplete, having a measure that is not overly distorted by missing data is an important consideration. Frequently, omissions in medieval datasets are not random and there is also the problem of how to address the loss of entire collections of documents. Instead of deleting data randomly, in this paper I removed the entire collection of Berkshire Archdeaconry wills to reflect the situation that medievalists face. This collection was chosen because these testators tended to be of lower social status, representing the type of individual that is often missing in medieval sources.

This paper explores the relative robustness of the different measures of centrality and makes a case for Eigenvectors being a promising option. It also considers whether centrality more generally is a useful tool when based on incomplete datasets or whether all measures are too skewed by missing data to be meaningful. My results suggest that even with large sections of data missing, centrality will rank people in a broadly meaningful position, but recommends certain caveats in how researchers should phrase their analysis. Specifically, it is dangerous to hinge a historical argument on the centrality of a single actor. Rather, it is safer to analyse the average centrality of groups of actors who share an attribute, such as burgesses, officeholders, or members of a particular trade. It is also preferable to phrase analysis in terms of whether actors are towards the top, middle, or bottom of the hierarchy rather than citing the precise score that network software packages provide.

To contact the author: J.Chick@warwick.ac.uk

The Impact of Missing Data on Network Centrality Measures

Latest news

Agrarian roots of capitalism in England, c. 1550-1850

Early Modern Capitalism: Trade, Risk and Profit