Internet Archives as a Tool for Research: Decay in Large Scale Archival Records

Hai Nguyen, Matthew S. Weber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Web archiving provides social scientists and digital humanities researchers with a data source that enables the study of a wealth of historical phenomena. One of the most notable efforts to record the history of the World Wide Web is the Internet Archive (IA) project, which maintains the largest repository of archived data in the world. Understanding the quality of archived data and the completeness of each record of a single website is a central issue for scholarly research, and yet there is no standard record of the provenance of digital archives. Indeed, although present day records tend to be quite accurate, archived Web content deteriorates as one moves back in time. This paper analyzes a subset or archived Web data, measures the degree of degradation in a subset of data, and proposes statistical inference to such overcome limitations.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015
EditorsLatifur Khan, Carminati Barbara
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages724-727
Number of pages4
ISBN (Electronic)9781467372787
DOIs
StatePublished - Aug 17 2015
Externally publishedYes
Event4th IEEE International Congress on Big Data, BigData Congress 2015 - New York City, United States
Duration: Jun 27 2015Jul 2 2015

Publication series

NameProceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

Other

Other4th IEEE International Congress on Big Data, BigData Congress 2015
Country/TerritoryUnited States
CityNew York City
Period6/27/157/2/15

Bibliographical note

Publisher Copyright:
© 2015 IEEE.

Keywords

  • analytics
  • archival data
  • big data
  • research
  • statistical validity

Fingerprint

Dive into the research topics of 'Internet Archives as a Tool for Research: Decay in Large Scale Archival Records'. Together they form a unique fingerprint.

Cite this