Despite its potential, for many consumer researchers, web scraping is not a methodology they can consider. Heretofore, this remains an untapped opportunity, a black box. One reason: There are few standards or uniform methods to evaluate web scraping research. With little consensus, we researchers are, as James Madison put it, “…in a wilderness without a single footstep to guide us.” This data is very different from that generated from conventional, consumer research methods. Furthermore, the current literature is insufficient in describing the decision-making and judgment calls required in the process of web scraping. This makes it difficult to replicate methods or compare findings.
The internet plays an increasingly central role in consumers’ daily lives. Every second, consumers create terabytes of data containing rich information about their opinions, preferences and consumption choices. The massive volume and variety of consumers’ digital footprints present many opportunities for researchers to examine and test theories about consumer processes and behaviors in the field.
In this paper, Johannes Boegershausen, Abhishek Borah and Andrew Stephen outline the key challenges, state-of-the-art remedies, best practices and corresponding standards for evaluation for web scraping in consumer research. They provide a structured workflow designed to achieve a sufficient level of consistency and standardization with respect to how web scraping is conducted, documented, reported and evaluated in both the research and peer review processes.
The authors propose four interdependent facets necessary for generating credible, scientific findings from web scraping research:
- Design transparency.
- Analytic reproducibility.
- Analytic robustness.
- Effect replicability and generalizability.
The structured workflow outlined offers a pathway for generating interesting, impactful and credible consumer research findings. This allows more researchers to embrace web scraping as an avenue for producing timely and credible scholarly knowledge about consumer behavior.
Read the working paper here.