Session 20: Data Curation

Moderator: Angela Murillo (University of North Carolina Chapel Hill)
Open Government, Government Administrative Data and the Record Keeping Role in the UK: New Research / Elizabeth Shepherd (University College London)

Professor Elizabeth Shepherd, UCL Department of Information Studies, London Researchers at UCL in the Department of Information Studies are engaged in several research projects which focus on the management of records and administrative data in the public sector. This paper will report interim findings of research using government administrative data (part of an Economic and Social Science Research Council-funded five year project, ADRC-E) which seeks to promote wider research access to, and innovative linkage between, datasets held by UK government departments and agencies, thereby transforming the way data are currently converted into knowledge and evidence for public and economic policy. Early findings from a case study focusing on students’ educational data will be discussed. This research (carried out by Shepherd with colleagues, Dr Oliver Duke-Williams and Alexandra Eveleigh) draws upon in-depth interviews with academic researchers across a range of social scientific disciplines, data collection agents, civil servants responsible for assessing and administering access requests, and representatives from charitable foundations who commission research using government administrative data.

I will be able to present some preliminary conclusions. I will also report on findings from another project, part of INTERPARES Trust, which is looking at the management of records and information in an open government environment, in particular in local government, and provide some analysis of similarities and differences between the two cases.

Data Sharing Across Disciplines: Communicating About Data in an Interdisciplinary Research Group / Morgan Daniels (Vanderbilt University)

Data sharing and reuse has become a priority for many grant funding organizations, research institutions, publishers, and individual researchers. The potential for new combinations and analyses of data to reveal novel patterns and findings has made data sharing an important concern across a spectrum of research topics and methods. Methods for sharing research data range from person-to-person correspondence about data to datasets added to repositories created for a given discipline, data type, publication venue, or institution. These methods permit various kinds of communication about data as well, from informal communication to the standardized metadata and documentation required by some repositories. Data sharing infrastructure is being developed through the collaboration of numerous stakeholders, often from a particular research context with a deep understanding of the practices they intend to support. Anticipating the needs of interdisciplinary data reuse, however, is more complicated.

This study examines the ways that members of an interdisciplinary research team share their data with each other and, in particular, the ways they communicate about the data they collect in order to make datasets meaningful and usable to other researchers in the team.   This presentation is based on interviews and observations with members of an academic research team examining people’s responses to water quality problems in a community in the developing world. They are environmental scientists and sociologists using both observational data, collected from a variety of instruments, and interview data, obtained from members of the community. While team members understand how to use shared data emerging from their own disciplinary methods, they have a difficult time understanding how to analyze the data created by their colleagues: the sociologists are unsure how to make use of the environmental science data, and vice versa. By uncovering the communication tactics of this team, this presentation will shed light on some of the necessary elements for making data reuse possible outside a single discipline

The Concept of Provenance as Portrayed in Traditional Archives, Digital and Data Curation, and Computer Science, or How Archivists Can Talk to Each Other and Still Talk to Technologists / Lorraine Richards (Drexel University)

The concept of provenance as a controlling principle in archival theory and practice has remained the profession’s mainstay for generations (Bearman and Lytle 1985; Nesmith 1993). However, in recent decades we have seen the term “provenance” expand to represent the capture of contextual and chain of custody information within digital environments as well as acting as a principle of arrangement in traditional, paper-based environments. Continuum theorists speak of “multiple” and “parallel” provenance (Evans et al. 2005). Post-modern and post-colonial thinkers have developed notions of “social provenance” in communities of records (Bastian 2006) and “provenance as ethnicity” (Wurl 2005). In fact, with the advent of digital curation and digital preservation, “provenance” has also come to refer to a specific type of metadata (Hedges et al. 2012). Even computer scientists use the term to represent the granular scientific workflows that support the development of scientific data sets (Amsterdamer et al. 2012). This paper will present a comparative view of a variety of conceptualizations of provenance being used in traditional archives, digital preservation, digital and data curation, and computer science. The goal is to improve the ability of archivists and digital curators to communicate better among themselves, as well as with the information technologists and computer scientists with whom they work when engaging in digital and data curation activities.

References:

Amsterdamer, Y., S. Davidson, D. Deutch, T. Milo, J. Stoyanovich, and V. Tannen. “Putting Lipstick on a Pig: Enabling Database-Style Workflow Provenance.” Proceedings of the VLDB Endowment 5 (4), 346-357. Available at http://arxiv.org/pdf/1201.0231.pdf.

Bastian, J. “Reading Colonial Records Through an Archival Lens: The Provenance of Place, Space and Creation.” Archival Science 6:267–284.

Bearman, D. and R. Lytle. “The Power of the Principle of Provenance.” Archivaria (1985) 21: 14-27.

Evans, J., S. McKemmish, and K. Bhoday. “Create Once, Use Many Times: The Clever Use of Recordkeeping Metadata for Multiple Archival Purposes.” Archival Science (2005) 5: 27-42.

Hedges, Mark, Tobias Blanke, Stella Fabiane, Gareth Knight, and Eric Liao. “Sheer Curation of Experiments: Data, Process, Provenance.” Journal of Digital Information, 13 (1) March 2012. Available at: https://journals.tdl.org/jodi/index.php/jodi/article/view/5883/5890. Date accessed: 06 Feb. 2015.

Nesmith, T. Canadian Archival Studies and the Rediscovery of Provenance. Metuchen, New Jersey: Association of Canadian Archivists and SAA, 1993.