Session 17: Access, Classification, and Privacy

Moderator: Sue McKemmish (Monash University) 
Reimagining Archival Access: Building Search Engines for the Archival Enterprise / Douglas W. Oard (University of Maryland)

The traditional role of a search engine is much like the traditional role of a library: generally the objective is to help people find things. Archives, by contrast, sometimes need to limit what may be seen, when it may be seen, and by whom it may be seen. Born-digital records such as email pose two new challenges that drive us outside what present access control processes can reasonably accommodate, however. First, the size of the collections are growing at an astounding rate. For example, the Clinton administration generated around 32 million emails from the Executive office of the President, but it has been estimated that the Obama administration will generate a billion. Second, records that could be opened now are often intermixed with records that require protection and with vast quantities of non-record material, but without any obvious way of efficiently determining which items are in each category. This large-scale intermixing poses a dilemma: the only way we can protect what must be protected is to deny access to what could, and properly should, be seen. If we are to develop archival access systems in the future that are worthy of the name, we will need search engines that act less like a library and more like an archive. In this talk, I will introduce the idea of “search among secrets” in which the goal is to help some users find some content while protecting some content from some users (or from some uses). We’ll start by looking at how two such processes actually work today: review for privilege in e-discovery, and review for redaction in declassification. We’ll then draw on those examples to begin to think through how to balance risks as a matter of policy, bit affordability and effectiveness might be balanced with the help of automation, and how we might measure how well we are doing at striking the balances we have sought to achieve. I’ll conclude with an invitation for archivists and information retrieval researchers to begin to think together about how best to address these challenges, and a few thoughts on what we need to learn from each other in order to do so.

Who Controls the Bits: Enabling Access to Authentic Born-Digital Records While Protecting Sensitive Data / Cal Lee (University of North Carolina Chapel Hill)

Archivists are increasingly obtaining materials through the acquisition of media such as hard disks, removable magnetic and optical media, as well as processing backlogs of materials that are stored on such media. The bitstreams stored on digital media often contain personally identifying information, records of communication, location data, and other forms of data that can be deemed by relevant stakeholders to be sensitive. There are many potential approaches to preventing inadvertent access to such sensitive data, including permanent destruction, redaction at the time of request, filtering through user access systems, and enforcement of temporary access restrictions (i.e. closure periods). All of the above approaches depend upon methods to identify the specific patterns (e.g. names, numbers) that are considered sensitive.

I will discuss various forms of information that can be personally revealing: not just patterns such as social security numbers, but also more seemingly mundane traces such as geographic coordinates, network path information and social media artifacts. I will also elaborate methods and tools for providing access to born-digital data while removing or masking sensitive information. Finally, I will suggest implications for both the workflows of archivists and potential future directions for archival researchers and educators. This presentation will report on work of the BitCurator Access project (2014-2016), which is funded by the Andrew W. Mellon Foundation.

Looking at Classified Information: Archival Infrastructure as Cultural Technique / Stacy Wood (UCLA)

Classified information  is a category of record deemed by a government agency or body to contain information considered sensitive. This form of information has a unique relationship to archival theory and practice, shifting and challenging persistent expectations of public knowledge and accountability. The current normative sets of protocols governing the confidentiality, integrity and availability of that information have come into being through an ad hoc process with roots in the first World War. Through a series of Executive Orders, Legislative Acts and legal precedent, the infrastructure of classified information has expanded and contracted. Historians, political scientists and those in science and technology studies have looked into classified information primarily as a source of frustration, carefully monitoring the declassification process in hopes that more information will expand existing narratives or augment lost scientific knowledge. Those in archival studies have confronted classified information as a problem both of management and professionalism. While there is a tradition in both media studies and science and technology studies of focusing on the material features of bureaucratic structures and systems of documentation, for the most part the attitude towards classified information can be largely summed up by Ben Kafka’s statement about historians who have discovered all sorts of interesting and important things looking through paperwork, but seldom paused to look at it. This paper seeks to move beyond seeing government structures, and specifically archival bodies as institutional frames, for the production and transmission of documents and instead identify the infrastructure of classified information as a cultural technique that is materially constituted and expressed through executive orders, legislative acts, manuals and various apparatus for marking, transmission and storage. The concept of cultural techniques has seen a resurgence in recent German media theory, focusing primarily on the relationship between technologies and users. When used as a theoretical tool to address bureaucratic systems, this concept expands notions of secrecy, openness and agency and allows for a focus on the complexity of socio-techno-legal systems. Cultural techniques due to this iterative and distributed process of enactment naturalize themselves, and in the process this naturalization serves to erase its historical and cultural contexts. Leveraging the concept of cultural techniques allows us to move beyond the characterization of classified information as an anti-epistemology, for the complexity of a system in which operations, subject performing operations and the object being manipulated are constituted mutually and iteratively. Executive Orders, as one of the primary foundational pieces of the classified information infrastructure are a generative source of multiple epistemologies. Classified information facilitates a particular form of knowing which is intimately related to identity, status and temporal and spatial orientations.