Towards A Web-Centric Approach for Capturing the Scholarly Record

Towards A Web-Centric Approach for Capturing the Scholarly Record

Researchers at Los Alamos National Laboratory (LANL) and the New Mexico Consortium are concerned with archiving scholarly work done over the internet. In the past, research communication was a paper-based endeavor and over time scholars have transitioned to using web-based methods for communicating and sharing their work.

Increasingly, scholars across disciplines and throughout the research life cycle are using a wide variety of online portals such as GitHub, FigShare, Publons, and SlideShare to conduct aspects of their research and to communicate research outcomes. However, these portals, whether dedicated to scholarly use or general purpose, exist outside of the traditional scholarly publishing system and no infrastructure exists to systematically and comprehensively archive the deposited artifacts. We know from previous work that without adequate infrastructure, scholarly artifacts will vanish from the web in much the same way and with similar frequency “regular” web resources do.

The Prototyping Team of the Research Library at LANL has partnered with the Computer Science Department at Old Dominion University to address this problem in the Andrew W. Mellon Foundation funded research project “Towards A Web-Centric Approach for Capturing the Scholarly Record”. In this project, we assume the perspective of institutions interested in collecting scholarly artifacts created by their researchers. As such, we are designing an institutional pipeline to track, capture, and archive such artifacts. The tracking part is crucial as institutions are usually not even aware of the existence of artifacts created by their researchers in online portals. For the capture process, we are designing a novel framework we call Memento Tracer that plays a crucial role in creating high-fidelity archival copies (Mementos) of artifacts. With Memento Tracer, a human curator interacts with a web-based artifact to establish its essential components, and to record these interactions as Traces. A Trace can be used as instructions for automatic web archiving frameworks to capture artifacts of the same class. In addition, Traces can be shared with a community of practice enabling a new level of collaboration among artifact archiving institutions. These characteristics give Memento Tracer the potential to bring about significant progress for high-quality web archiving at scale.

To demonstrate the potential of this approach, we have established a pilot that is available at We shared a few insights gained from this pilot at the CNI 2019 Spring meeting.

For questions, comments, or other feedback, please contact the project PI Martin Klein at

This research is a joint project between Old Dominion University, Los Alamos National Laboratory, and the New Mexico Consortium.