Can mistyped urls deliver representative samples?
An article on the Washington Post’s Monkey Cage blog describes how researchers managed to carry out opinion polls on executions in Bahrain, «one of the most difficult countries in the region for such sensitive research». In order to overcome the difficulties encountered, they ran two ‘innovative surveys’ in partnership with research company RIWI.
RIWI takes advantage of the fact that people sometimes make mistakes when they type a url in the address bar of their browser. If the url they mistakenly go to happens to be controlled by RIWI, they are redirected to a short questionnaire. RIWI claims this is a cheap way to obtain a non-biased sample.
This sounds like a smart approach that might actually work. But does it? Some people have doubts, such as one of the commenters on the Monkey Cage post:
Innovative is certainly one way to describe it. How can you possibly consider Internet typo redirects as a nationally representative sample? Would be very curious to see what the raw demographics look like compared to the population. Hope there was some sophisticated weighting used.
In a recent article in Nature, RIWI founder and CEO Neil Seeman explains his method. In a comment, one Charles Packer observes:
There are no citations here of publications that assess the validity of the company’s claims. Same for the corporate website: no discussion of the mechanics of its methodology.
When I searched the company on Google, I found a lot of articles aimed at investors and very few discussing their research methods. The most detailed description of their methodology I found is in Seeman’s patent application. It explains that, for example, «Google could harvest the many thousands of users who inadvertently type in gogle.com instead of google.com and direct them to an online polling page, instead of simply redirecting them to the google.com web site».
The main type of typos RIWI uses seems to be those where people type .cm
, .co
or .om
instead of .com
. RIWI uses the respondents’ IP addresses to guess their location. In his patent application, Seeman claims that his approach is successful in reducing bias:
Under the invention, every individual Internet user around the globe has the equal probability of being drawn into the potential respondent pool. This dramatically reduces selection bias and coverage bias as compared to all other current techniques of respondent identification and selection online. There is no reason to believe that the people who fail to randomly fall into the potential survey population (i.e., who do not make the typographical error) have distinct characteristics from the people who do, thus increasing the validity of the results. This makes the process of respondent selection scientifically valid, superior even to random digit telephone dialing.
Is that true? While their claims sound plausible, it’s still conceivable that bias occurs. For example, through the selection of urls RIWI uses; because people who tend to make typos may be different from people who don’t; or because people who directly type urls into the address bar of their browser may be different from people who prefer to google for sites.
It has been claimed that RIWI has predicted election results in Egypt and Turkey more accurately than other firms. That sounds promising, but it would be helpful to know how many election outcomes RIWI has predicted and how accurate all of these predictions were. RIWI also refers to a validation study of one of their US samples, but the original study seems to have been removed from their website. The website’s FAQ says ‘third party and academic review’ is available, but only on request: «Yes, but please contact us first so we can get a sense of your needs and most applicable information to send you».
It’s quite possible that RIWI’s approach is superior to the survey panels used by other firms, but more openness about their methodology and results would make their case more convincing.