Academic researchers access commercial web sites to collect research data. This research practice is likely to increase. Is this appropriate? Is this legal? Such commercial web sites are maintained to achieve business objectives; research access uses site resources for other purposes. Web site administrators may, therefore, deem academic data collection inappropriate. Is there a process to make research access more open and acceptable to web site owners and administrators? These are significant issues. This article clarifies the problems and suggests possible approaches to handle the issues with sensitivity and openness. Research access to commercial web sites may be manual (using a standard web browser) or automated (using automated data collection agents). These approaches have different effects on web sites. Researchers using manual access tend to make a limited number of page requests because manual access is costly to perform. Researchers using automated access methods can request large numbers of pages at a low cost. Therefore, web site administrators tend to view manual access and automated access very differently. Because of the number of accesses and the nonbusiness purpose, automated research requests for data are sometimes blocked by site administration using a variety of means (both technological and legal). This paper details the pertinent legal issues including trespass, copyright violation, and breech of contract. It also explains the nature of express and implied consent by site administration for research access. Based on the issues presented, guidelines for researchers are proposed to reduce objections to research activities, to facilitate communication with web site administration, and to achieve express or implied consent. These include notification to web site administration of intended automated research activity, description of the research project posted as a web page, and clear identification of automated requests for web pages. In order to encourage good research practices with respect to automated data collection, suggestions are made with respect to disclosing methods used in research papers and for self regulation by academic associations.
- Automated data collection