Extracting Meaningful Data from Decomposing Bodies

Alison Langmead, Paul Rodriguez, Sandeep Puthanveetil Satheesan, and Alan Craig

Decomposing Bodies (https://sites.haa.pitt.edu/db/) is a large-scale, lab-based, digital humanities project housed in the Visual Media Workshop at the University of Pittsburgh that is examining the system of criminal identification introduced in France in the late 19th century by Alphonse Bertillon. Bertillon, the French criminologist, statistician, and inventor, developed this practice by using anthropometrical measurement to classify human beings on individual standardized cards (see figure 1). Each card used a pre-established set of eleven anthropometrical measurements (such as height, length of left foot, and width of the skull) as an index for other identifying information about each individual (such as the crime committed, their nationality, and a pair of photographs). “Bertillonnage,” as this system is commonly known, was the first measurement-based, state-controlled system used for criminal identification.

Schermafbeelding 2017-06-24 om 16.54.47

Currently, to study the data contained by these cards in bulk, it is necessary to rely on the time-consuming process of manual transcription. Many of the research goals of Decomposing Bodies would clearly benefit from a complete transcription of these handwritten records, a task which might be efficiently done by computer automation. This process, however, is not only highly-computationally expensive, but one that involves much high-level system training and machine learning skills. To try to approach the problem of data extraction from this angle, therefore, a collaboration between the Pittsburgh-based Decomposing Bodies team and a computer-science-focused XSEDE/ECSS team was formed (for more on XSEDE, please see: https://www.xsede.org/home).

This collaboration worked together for approximately two years on an end-to-end system for extracting handwritten text and numbers from scanned Bertillon cards in a semi-automated fashion. To the best of our knowledge, this was the first project to attempt to automate Bertillon card analysis through the application of standard image analysis tools and methods for field segmentation and denoising, as well as the use of machine learning techniques that have shown some success for recognition of handwritten printed digits or words. Such a system would enable historians and humanities researchers to study the data produced by the Bertillon System with ease. At this talk, we will present our current results on performing document analysis on a selected set of scanned Bertillon cards from the Ohio State Reformatory and Ohio Penitentiary, a process that demonstrated both successes and disappointments. In addition, we will contribute a number of thoughtful reflections on the inner workings of this collaboration from both the technologists’ and the digital humanist’s points-of-view.