Session 11: Archival Description and Access Systems 

Moderator: Karen Anderson (Mid Sweden University)
Archival Description in a New Technology Environment / Jinfang Niu (University of South Florida)

Since its creation about two decades ago, EAD has played a vital role in publishing archival finding aids online. Despite its success, EAD has been criticized on various aspects. EAD was created to convert paper finding aids into online format, thus it imitates the content and layout of paper finding aids and employs a document-centric approach which is very different from the record-centric approach used by most other metadata schemas. An EAD finding aid contains not only metadata, but also formatting and structural information such as lists and paragraphs. This incompatibility with other metadata standards makes it difficult to convert EAD finding aids to other metadata formats and causes interoperability issues. In addition, users, especially novice users, have trouble understanding the archival jargons used in EAD finding aids, and become lost in the complex hierarchical structure of finding aids. Some researchers have also pointed out that the monolithic EAD files with a deep hierarchical structure makes it difficult to directly access particular components without accessing the whole hierarchy first. Notwithstanding these criticisms and the shifting technology environment, the most recent revisions of EAD by the Technical Subcommittee for Encoded Archival Description and the Schema Development Team are mostly minor adjustments and do not address most of these problems. This presentation proposes two solutions to overcome these limitations. One solution is to modify the current EAD schema based on the entity-relationship model defined in the Australian series system. The other solution is to replace EAD with another standard, the Open Archives Initiative Object Reuse and Exchange (OAI-ORE), which can be used to produce more flexible archival descriptions in linked data format. This presentation is based on a paper published in the journal Archives & Manuscripts.

Controlled Vocabularies: Why Archives Need Them More Than Ever? / Zdenka Semlič Rajh (University of Ljubljana)

In the field of records management and archives, we can very early find tendencies to provide faster access to documents, including the problems of the creation of headings, thesauri, classifications and creation of indices of the archival finding aids. These tendencies have been exercised in different ways, which led to the development of various systems for managing content of documents, both in classical as in electronic form. Encountered methodological problems were solved individually, while looking for a good solution and thus creating own system. However, these systems are sufficient only to provide faster access to documents described in traditional paper based finding aids. With the introduction of new IT solutions, especially in building mutual archival databases such method is not only unsuitable, but also inefficient and ineffective.

Systems for content identification may be systemized according to the purpose, objectives and implementation to a thesaurus, thesauri and classifications. Each of them has its own internal logic of the creation and use. Their importance within the contents of archival value lies in the rational placement into the system of managing the whole of archival material and also in the system of professional work. Thesaurus, thesauri and classifications are in fact crucial in the arrangement, description and transfer of records and archives.

Without a doubt, we can assert that the descriptors represent an important tool in the process of the creation of objective information on the archives. As such, they serve different purposes in the system of arrangement, description and use of the preserved archives. Therefore, the values of individual descriptors must be standardized in both content and also in terms of the creation of their recording in the system. For this reason, it is good that the Slovenian archival service continues with good practice of overtaking and importing of descriptors from trusted external sources and their incorporation into the Slovenian archival information system. In the case, the import from external sources is not possible or not meaningful, it is necessary to create them in the process of capturing of data freely but in accordance with the needs and demands of professional standards. In doing so, archivists must follow the rules of capturing of data in the respective system. Connectivity between the descriptors themselves or descriptors and other entities in the information system must be done so that there will be no misunderstanding as to the content as well as the appearance whether in the process of their capturing, amending and use. Archivists are to decide in which cases they will use additional descriptors and how many they will use.

The objective in establishing a unified system of descriptors should be the preparation of guidelines for creation of descriptors and thesaurus at the contents of archival value, which can be used in practice for the description of archives by any software tool. Since it is likely that the software tool will change, it is necessary to provide such a processing of the contents that will not be affected by the change of the system.

Infrastructure for Supporting Exploration and Discovery in Web Archives / Jimmy Lin (University of Maryland)

Web archiving initiatives around the world capture ephemeral web content to preserve our collective digital memory. However, unlocking the potential of web archives requires tools that support exploration and discovery of captured content. These tools need to be scalable and responsive, and to this end we believe that modern “big data” infrastructure can provide a solid foundation. We present Warcbase, an open-source platform for managing web archives built on HBase, a distributed “big data” store. Our system provides a flexible data model for storing and managing raw content as well as metadata and extracted knowledge. Tight integration with the Hadoop platform provides powerful tools for analytics and data processing. Relying on HBase for storage infrastructure simplifies the development of scalable and responsive applications.