champagne anarchist | armchair activist

‘Open company data played role in downfall of Spanish minister’

How transparent are countries when it regards company data? Score of the Netherlands on Open Corporates’ Open Company Data Index, compared to other EU countries. Ordered by score and alphabetically on English name. Source Open Corporates, chart dirkmjk.nl.

«There is a delicious irony in Soria being brought down in part by open data», Open Corporates wrote on their blog a week ago. By Soria they refer to former minister José Manuel Soria of the right-wing Partido Popular, who had just stepped down. The story, as summarised by Open Corporates:

Soria was discovered in the Panama Papers, but denied any connection to the Bahamas company referenced in them. It turns out that a company of the same name, UK Lines Limited, had been incorporated in the UK, with officerships linked to him and his family. Further investigation into this company and another UK one, Oceanic Lines Limited, used company filings and shareholder documents to show that these were indeed connected with Soria and his family. Yesterday, newspaper El Mundo nailed the case showing Soria was also director of a Jersey company when he was already a politician.

Information about the UK connection was obtained from Open Corporates. Journalists in other countries - from Nigeria to Argentina - have similarly used data from Open Corporates to make sense of the Panama Papers.

The information they used may well have been available from official databases as well. However, the fact that countries like the UK have opened up company data, and that Open Corporates serves as a portal to such information, makes it much easier to investigate abuses compared to a situation in which you have to buy each document you want to take a look at.

So what about the ‘delicious irony’ mentioned at the beginning of this article? Spain happens to be one of the most secretive countries in the world when it comes to company data, according Open Corporates:

you can’t even search to see if a company exists without giving your credit card, and they have been adamant that they will not open up the register, still less make it available as open data.

This earns Spain a score of 0/100 on the Open Company Data Index, which is even worse than the embarrassingly low score of 20/100 for the Netherlands. The good news here is that the Dutch Lower House has passed a motion asking the government to see whether it can open up the company register (KvK) as open data, and to report to Parliament this spring.

Tags: 

My entry for the Best Worst Viz competition

Number of tweets with hashtag #BestWorstViz, per date of the month April 2016 and time of the day. Times are UTC, 18 April is the deadline. Data updates every hour; clear browser history to refresh. Entry for Best Worst Viz competition, created by dirkmjk.

I love to hate bad graphs (who doesn’t), and I think Andy Kirk’s idea to organise a Best Worst Viz competition is quite brilliant. As he explains, there’s something fair about creating your own bad graph rather than criticising somebody else’s:

[..] picking on bad visualisation involves work by other people who we might never meet or have a chance to learn about what the true circumstances and intent of a project were. The essence of this challenge is based on your best worst visualisation - the best worst visualisation you can possibly make.

I had to give it a try. But how? An exploding 3D pie chart, truncated y-axis, out-of-control spaghetti chart - it all seemed a bit too obvious. I aimed for something different, drawing inspiration from the blink element of the early days of web design. The shifting colours of the stacked bar chart pointlessly illustrate the direction of time - or whatever. I think it’s pretty bad.

Standalone version of graph here.

Links between businesses and politics II: revolving door and access to ministers

Eline Huisman and Ariejan Korteweg of the Volkskrant have done some good investigative journalism by finding out how often companies, organisations and inviduals have visited the current ministers (this data wasn’t publicly available in the Netherlands). It’s interesting to compare the top–10 of companies with access to ministers to the top–10 of revolving door companies (companies where national politicians have or have had a position).

Position of companies on the access to ministers ranking and the revolving door ranking

Access Revolving door
Air France-KLM 1 6
Rabobank 2 1
Shell 3 2
ING Bank 4 5
ABN AMRO 5 3
Schiphol 6 -
Aegon 7 8
KPN 8 -
SNS Reaal 9 -
KPMG 10 4
NS 7
Delta Lloyd - 9
PGGM 10

I’m sure more can be said about this, but the comparison shows there’s conciderable overlap between the two lists (for the geeks among you: the Jaccard index is 0.54). The following companies score high on both measures of political ties: Air France-KLM, Rabobank, Shell, ING Bank, ABN Amro, Aegon and KPMG. Dutch Railways (NS) and PGGM don’t feature in the Volkskrant business ranking because they classify them as semipublic.

Of course, these lists provide no basis for firm conclusions about cause and effect. However, one can imagine that companies that participate actively in the revolving door could have easier access to ministers.

The details of the Volkskrant investigation can be found in this visualisation, which unfortunately isn’t easily searcheable. The underlying data are available here as csv. If you’d classify NS and PGGM as companies in the Volkskrant list, the overlap wouldn’t change because other companies would drop out of the top–10. Further, for comparability I’ve removed industry and lobby organisations such as employers’ organisation VNO-NCW from the access to ministers ranking. Alphabetical order was used where two companies have the same score.

Tags: 

Coursera Data Analysis and Interpretation

I was initially introduced to R by Nathan Yau’s Visualize This, but subsequently I learned a lot about R through some of the courses in Brian Caffo, Roger Peng and Jeff Leek’s Data Science Specialization at Coursera. In fact, the course was a reason for me to postpone switching from R to Python.

By now, I’ve decided to make the switch anyhow, and I think I’ve found another Coursera specialisation that will help me learn the tricks: Lisa Dierker and Jen Rose’s Data Analysis and Interpretation. It’s kind of basic, at least at the beginning, but that’s good. Some of the assignments require you to blog about a project of your choosing, so I’ll be posting about my homework here.

Tags: 

Can mistyped urls deliver representative samples?

An article on the Washington Post’s Monkey Cage blog describes how researchers managed to carry out opinion polls on executions in Bahrain, «one of the most difficult countries in the region for such sensitive research». In order to overcome the difficulties encountered, they ran two ‘innovative surveys’ in partnership with research company RIWI.

RIWI takes advantage of the fact that people sometimes make mistakes when they type a url in the address bar of their browser. If the url they mistakenly go to happens to be controlled by RIWI, they are redirected to a short questionnaire. RIWI claims this is a cheap way to obtain a non-biased sample.

This sounds like a smart approach that might actually work. But does it? Some people have doubts, such as one of the commenters on the Monkey Cage post:

Innovative is certainly one way to describe it. How can you possibly consider Internet typo redirects as a nationally representative sample? Would be very curious to see what the raw demographics look like compared to the population. Hope there was some sophisticated weighting used.

In a recent article in Nature, RIWI founder and CEO Neil Seeman explains his method. In a comment, one Charles Packer observes:

There are no citations here of publications that assess the validity of the company’s claims. Same for the corporate website: no discussion of the mechanics of its methodology.

When I searched the company on Google, I found a lot of articles aimed at investors and very few discussing their research methods. The most detailed description of their methodology I found is in Seeman’s patent application. It explains that, for example, «Google could harvest the many thousands of users who inadvertently type in gogle.com instead of google.com and direct them to an online polling page, instead of simply redirecting them to the google.com web site».

The main type of typos RIWI uses seems to be those where people type .cm, .co or .om instead of .com. RIWI uses the respondents’ IP addresses to guess their location. In his patent application, Seeman claims that his approach is successful in reducing bias:

Under the invention, every individual Internet user around the globe has the equal probability of being drawn into the potential respondent pool. This dramatically reduces selection bias and coverage bias as compared to all other current techniques of respondent identification and selection online. There is no reason to believe that the people who fail to randomly fall into the potential survey population (i.e., who do not make the typographical error) have distinct characteristics from the people who do, thus increasing the validity of the results. This makes the process of respondent selection scientifically valid, superior even to random digit telephone dialing.

Is that true? While their claims sound plausible, it’s still conceivable that bias occurs. For example, through the selection of urls RIWI uses; because people who tend to make typos may be different from people who don’t; or because people who directly type urls into the address bar of their browser may be different from people who prefer to google for sites.

It has been claimed that RIWI has predicted election results in Egypt and Turkey more accurately than other firms. That sounds promising, but it would be helpful to know how many election outcomes RIWI has predicted and how accurate all of these predictions were. RIWI also refers to a validation study of one of their US samples, but the original study seems to have been removed from their website. The website’s FAQ says ‘third party and academic review’ is available, but only on request: «Yes, but please contact us first so we can get a sense of your needs and most applicable information to send you».

It’s quite possible that RIWI’s approach is superior to the survey panels used by other firms, but more openness about their methodology and results would make their case more convincing.

Tags: 

Pages