Web Data Analysis

Research on web knowledge and data analysis at the New Mexico Consortium includes:

Towards a Web-Centric Approach for Capturing the Scholarly Record

Herbert Van de Sompel, LANL Staff Scientist and NMC Affiliate
Michael Nelson, Old Dominion University
Michael Wiegel, Old Dominion University
Harihar Shankar, LANL Staff Scientist and NMC Affiliate
Shawn Jones, Graduate Student, LANL, NMC Affiliate
Over the past two decades, research communication has transitioned from a paperbased endeavor to a web-based digital enterprise. More recently, the research process itself has started to evolve from being a largely hidden activity to one that becomes plainly visible on the global network. These transitions come with significant social, economical, legal, and technical challenges and even raise the question of what exactly the scholarly record is when all scholarship and scholarly communication is conducted on the global network. Irrespective of what the scholarly community will eventually decide regarding the delineation of the scholarly record on the web, an essential requirement will be to archive it. 
The goal of this research is to extend current web archiving activities so they are better suited for archiving the full range of scholarly materials. Although there has been a recent proliferation in web archiving services and technologies, they are still ill-suited for scholarly materials beyond the conventional PDF: resources in GitHub, SlideShare, Youtube, figshare, Twitter, etc. are not archived, despite these (and others like them) being the preferred channels for scholars. Furthermore, if those resources are paywalled, we can detect and archive unencumbered preprint versions, and detail, record, and archive the relationship between the preprint and the official version. This project will also investigate methods for measuring and conveying archival quality as well as verifiability.

Specifically, the objectives of this project are the creation of a technical architecture, demonstrators, techniques, and interoperability specifications related to resource capturing for web-based scholarly communication, which is a pre-requisite of fulfilling the Archival function. The proposed effort includes web-scale experimentation, modeling of information and system interoperability, prototyping of potential solutions in credible settings, performance and cost quantification, operation of demonstrator services, specification, and standardization. 

This research is a joint project between Old Dominion University, Los Alamos National Laboratory, and the New Mexico Consortium. 

© 2017 New Mexico Consortium