champagne anarchist | armchair activist

Minister Jeroen Dijsselbloem takes up data visualisation challenge

Every year, Dutch Finance Minister Jeroen Dijsselbloem sends a report to Parliament on state participations - companies that are (partially) owned by the state. Recently, the minister answered questions from the Finance Committee of the Lower House. One of them questioned the use of a stacked bar chart to show dividends, «since this isn’t very clear». The minister acknowledges the problem and takes up the challenge:

In creating this bar chart we aimed at comprehensiveness by including all dividends received from all state participations. Because of the large differences in dividend, this results in sub-optimal readability. For the 2015 annual report, it will be considered whether the readability can be improved without making concessions to comprehensiveness.

I’m sure he’ll be interested in good ideas, so if you have any suggestions for improving the chart, tweet them to @j_dijsselbloem. And if you want to give it a try yourself: here’s the data for 2010–2014.

Update: Jean Adams shows how the chart can be improved. Adams correctly points to a discrepancy between the csv and the original chart: the csv contains data on total dividend paid, whereas the original chart shows the amount received by the state (the two are different for companies owned for less than 100% by the state).


Solid reputation of Statistics Netherlands (CBS) ‘at risk’

Statistics Netherlands (CBS), the Dutch national statistics office, has always had a solid, if somewhat dull, reputation. The organisation published data, but didn’t do projections and was reluctant to offer interpretations. Meanwhile, it was considered to be among the best statistics offices in the world. But over the past two years, there have been some changes.

In 2014, the newly appointed director of the CBS said in an interview (in Dutch) that he wanted his organisation to participate in public debates. Not to express opinions, he assured, but to correct «inaccurate representations». Asked for an example, he referred to the Pikkety debate. He felt that data about inequality had been used to provoke a response of «emotional aversion».

In early 2015, the CBS developed a strategic agenda. Some elements of this agenda were about its core business. For example, the CBS wants to automate in order to become less dependent on spreadsheets and manual data processing - which seems to make sense. But the emphasis was on becoming a «news organisation» with a «prime time focus».

Today, Rutger Bregman of De Correspondent has published an analysis (in Dutch) of the new course of the CBS. The organisation plans to stop collecting data on a wide range of topics, including private debts to car dealers and credit card firms, and patients’ satisfaction with health care. Meanwhile, it has invested in a «newsroom».

Bregman discusses a number of instances where the CBS took a position in charged political debates on topics like inequality and the effects of child care cuts. He argues that its role in those debates was dubious. For example, the CBS said that participants in support programmes for job seekers are more likely to find a job than non-participants, without pointing out that this says nothing about the effectiveness of these programmes. Of course, the broader issue is that the CBS gets caught up in controversies, which may undermine public confidence in its data.

Public funding of the CBS has been cut. Income from external clients has risen from 5% to 15% and is expected to reach 25% by 2019, according to a chart in Bregman’s article. The government has sent a proposal to Parliament to dismantle the independent body that determines the research programme of the CBS (an amendment to preserve the independence of the CBS will put to a vote on Tuesday). Bregman concludes:

[…] data is easily misused. A statistics office that wants to offer more interpretation, wants to make the headlines more often, wants to earn more money and has less oversight, runs more of a risk to do so, no matter how you look at it. The CBS has become world-class precisely by resisting this temptation.

In his article, Bregman indicates he sent his article to the CBS last week, but apparently they declined to comment. Today, their chief economist has responded on Twitter to one of the controversies discussed by Bregman. According to one of their researchers, Bregman’s article has created quite a stir within the CBS already.


Power and buzz: Analysing trade union HQ locations by closeness to power and by convenience store score

When Hans Spekman ran for chairman of the Dutch Social-Democrat party in 2011, he said he wanted to move the party’s headquarters from the posh office at the Herengracht in Amsterdam to a «normal district, a neighbourhood where things happen, like Bos en Lommer». Bos en Lommmer is a multicultural neighbourhood in the west of the city, in transition from deprived to gentrified.

I agree with Spekman (at least on this matter) and I think his ideas about locations should also apply to trade union headquarters. Out of curiosity I decided to analyse the headquarters locations of European trade unions, using two criteria. First: closeness to power, operationalised pragmatically as the walking distance from the union office to the national parliament. And second: the liveliness of the neighbourhood. For measuring this I propose the convenience store score, which assumes that the number of convenience stores within half a kilometer gives a rough indication of how lively a neighbourhood is. Convenience stores could be for example 7-Eleven or AH to go stores and some ethnic shops will also be classified as convenience stores.

The chart below shows the scores for each union. You can also see the locations of union offices, parliaments and convenience stores on an interactive map, but note that the map may take a while to load - it’s not very suitable for viewing on a smartphone.

The median union headquarters is within 2km walking distance from parliament. For about three-quarters of unions, the distance is below 5km. The general pattern thus seems to be that unions have their national offices close to the institutions of political power. There are exceptions though. Officials of the major Dutch federations FNV and CNV would have to walk 15 to 68km to reach parliament. And sometimes the distance is even longer: a Basque union has its HQ in Bilbao; a Turkish union in Istanbul and Polish union Solidarnosz has its HQ near the port of Gdansk, where it originated. But all in all, the large Dutch unions are quite exceptional in that they don’t have their headquarters near the centre of political power.

As for liveliness: the median number of convenience stores within half a kilometer from union headquarters is 2, but about one in three unions have no convenience stores nearby at all. Some of the most lively union office locations are in countries like Romania, Hungary and Bulgaria. Other examples are CFDT (France), TUC (UK), SAK (Finland) and UGT (Spain). Dutch unions are at the other end of the spectrum and have rather dull headquarters locations - judging by the convenience store score.

So where should a union be? I’d say that influencing the government is one of the tasks unions should be doing, and an important one at that. However, this doesn’t depend on having a headquarters close to parliament, but rather on the ability to mobilise workers. I’d argue that the convenience store score is a far better criterium to judge headquarters locations by.

In case you were wondering: Spekman was successful in his bid for the chairmanship of the Social-Democrat party. The party’s headquarters is still at the Herengracht, though: it turned out the lease doesn’t expire until 2018.

Full disclosure: I work at the FNV, at the former FNV Bondgenoten location.


This analysis turned out to be quite a bit more challenging than I initially thought, but it was very instructive. I’m especially happy that I now have a basic understanding of the Overpass API that you can use to retrieve Open Street Map data. OSM has always been a bit of a black box to me but the Overpass API turns out to be a valuable tool.

Measuring neighbourhood characteristics

Initially I wanted to use Eurostat regional stats to analyse neighbourhood characteristics, but Eurostat doesn’t have data beyond the NUTS 3 level (I should’ve known). Level 3 areas may comprise entire cities and are useless for analysing neighbourhoods, so I had to look for alternatives.

Subsequently, I tried getting the name of the smallest area a location is in using the Mapit tool (based on Open Street Map). I thought I might then be able to construct a Wikipedia url by adding the name to This turned out to work pretty well, not least because Wikipedia is quite good at handling different variants of geographical names. However, while Wikepedia articles tend to be informative, they do not contain a lot of uniform statistical information. Often population, area and population density will be included, but not much beyond that. In addition, the fact that the size of the areas varies poses problems. For example, the population density of a small area cannot be meaningfully compared to the density of a large area. In the end I did add the Wikipedia links to the popups on the map, but I continued looking for other ways to analyse neighbourhood characteristics.

One of the measures I ended up using is closeness to power, operationalised as the walking distance to the national parliament (in countries with a bicameral parliament, I used the location of the lower house). This was a pragmatic choice. An alternative would have been to use the location of ministries, but then I’d have to come up with a way to pick the relevant ministry.

For measuring the liveliness of a neighbourhood, I used the number of convenience stores within half a kilometer, using data from Open Street Map. Obviously there are some limitations to this method. For example, some countries will be mapped in more detail than others. Also, there will be inconsistencies in how shops are classified (cf this discussion in Dutch about how to classify stores of chains like Blokker).

Obviously, the convenience store score has not been properly validated. I’m not even sure whether objective measures of a neighbourhood’s liveliness exist. I checked this list of «coolest» neighbourhoods in Europe and all but one (Amsterdam Noord) have convenience stores nearby, but then again coolness isn’t the same as liveliness (I guess a neighbourhood can be uncool yet lively). Furthermore, being on a list of cool neighbourhoods isn’t necessarily an indicator of coolness.

Ideally I think a proper assessment of the convenience store score should include a comparison with measurements of criteria derived from Jane Jacob’s The death and life of great American cities: mixed primary uses, short blocks, buildings of various ages and density. I guess it should be possible to measure some of these with OSM data (especially the first two). However, that would require a deeper understanding of OSM classifications than I currently have.

Getting the data

While some of the data was obtained by good old-fashioned googling, some of it could be automated.

The starting point for the analysis was the list of affiliates of the European Trade Union Confederation (ETUC). Note that this includes unions in non-EU countries such as Turkey. Also note that I use the word union but most are in fact union federations (the FNV is a bit more complicated; a recent merger has partly done away with the federation structure).

The ETUC doesn’t seem to have a list of addresses on their website. They do provide urls for most of their affiliates. Still, looking up addresses was a bit of an adventure, especially for countries which use non-Latin alphabets (let me know if you find any errors).

For walking distances I used the Bing API. In a number of cases Bing couldn’t find a walking route or the distance seemed wrong. In those cases I manually looked up the distance in Google Maps. Here’s a sample url for getting information from the Bing API (replace KEY with API key).

I used the Overpass API (demo) of Open Street Map to get all nodes within 500m from the union HQs, which I used for counting the number of convenience stores. I also used the API for getting the coordinates of all convenience stores in all countries where the ETUC has affiliates. Here’s a sample url for getting all nodes within 500m of a location, and here for getting all convenience stores in a country.

A few unions are missing in the final results because of missing data. For example, I couldn’t figure out what the main office of the Belgian ACV is and I couldn’t find the exact location of the parliament of Malta (somewhere along Republic Street, Valletta).

Calculating scores

I calculated scores as either walking distance to parliament in kilometers or the number of nearby convenience stores. In both cases I took the log10 of the value + 1. To arrive at a 0 to 10 scale, I multiplied by 10 and divided by the maximum score for each variable. For the distance to power measure I converted the score to 10 minus the score, so that a higher score means closer to power.


I used Leaflet and D3.js to map the locations of HQs, parliaments and convenience stores. There are over 60,000 convenience stores in the dataset. This turned out to be a bit too much and the browser all but crashed. I found this script that deals with exactly this problem. While I managed to figure out what I needed to change to make the script work with my data, I’m afraid I don’t fully understand how it works. It’s still too slow for mobile, though.

The political effects of financial crises

In a fascinating study, Manuel Funke, Moritz Schularick and Christoph Trebesch analysed the social and political aftermath of 103 financial crises. During the five years following a financial crisis, the following pattern can be expected:

  • The vote share of far right parties increases by 30%. For far left parties, such an effect was not found. «After a crisis, voters seem to be particularly attracted to the political rhetoric of the extreme right, which often attributes blame to minorities or foreigners».
  • The fragmentation of politics increases and the vote share of coalition parties diminishes.
  • There is more frequent government instability and a higher probability of executive turnover.
  • The average number of anti-government protests almost triples; the number of violent riots doubles (but this effect is lacking in the post-WW2 period) and general strikes increase by at least one-third.

Sounds familiar. Interestingly, the researchers have also looked into long-term effects:

The graphs demonstrate that the political effects are temporary and diminish over time. 10 years after the crisis, almost all variables are back to their pre-crisis levels. The top panel shows that the increase in far-right votes is no longer significantly different from zero after year 8.

The authors ascribe the rise of the Dutch Party for Freedom (5.9% in 2006, 15.5% in 2010) to the crisis of 2008, so the historical pattern suggests their popularity will diminish by 2016.

Or does it? The graph the authors refer to helps to clarify this matter. There’s no evidence that the popularity of far right parties diminishes in the longer term. What they’re describing is that the confidence interval (the grey area) widens. So much so that you can’t really predict on the basis of the available data what will happen after eight years.

Another matter is the interpretation of the effects. Funke e.a. consider the political instability following financial crises a «political disaster»:

These developments likely hinder crisis resolution and contribute to political gridlock. The resulting policy uncertainty may contribute to the much debated slow economic recoveries from financial crises.

They seem to imply that governments tend to take appropriate measures and that therefore, having a strong government is good for economic recovery. That’s debatable. People like Paul Krugman and Ewald Engelen argue that the austerity policies of especially European governments have a negative impact on economic recovery.

This is relevant, for previous research found that the same social upheaval Funke a.o. associate with financial crises can also be explained as an effect of austerity policies. This raises the question how causality works here: are social (and political) unrest caused by financial crises, or by the way in which governments respond to these crises? Perhaps the stubborn austerity policies of the European and Dutch governments have contributed to the continuing popularity of the Party for Freedom?

Funke a.o. describe their research here; Statewatch has put the original article (pdf) online (I discovered the study via an article by Krugman). The earlier study on austerity and protests was done by Jacopo Ponticelli and Hans-Joachim Voth (I wrote a post on it a couple years ago).


Collecting data on millions of Facebook users to analyse their psychological traits

The Guardian has revealed how British academics have collected information about millions of Facebook users and used the data to score them on openness, conscientiousness, extraversion, agreeableness and neuroticism. The academics were paid by funders of the campaign of US presidential candidate Ted «Carpet Bomb» Cruz.

The fact that information from public Facebook profiles can be used to create psychological profiles is intriguing but not really new. Researchers have claimed they can assess someone’s personality reasonably well by analysing what they like on Facebook or by analysing personal information, activities and preferences, language features and internal Facebook statistics.

What was new to me (but apparently not to everyone) is how the academics connected to the Cruz campaign went about collecting people’s Facebook data. They used Amazon’s Mechanical Turk platform to recruit people to fill out a questionnaire that would give the researchers access to that person’s Facebook profile. Not only would they download data about the participants themselves, but also about their Facebook friends - even though those friends were unaware of this and hadn’t given permission. Participants were paid about $1 each for access to their Facebook network.

According to the Guardian, Facebook users had on average 340 friends in 2014. Of course, there’s considerable overlap between people’s networks so it can be assumed that the average participant would yield far less than 340 new profiles. Even so, this would seem to be a pretty efficient - if sneaky - way to collect data on Facebook users.

The Guardian doesn’t discuss whether this method would still work today, but I doubt it would. Out of concern for the privacy of its users (sure!) Facebook has cut off access to users’ friends’ data when it updated it’s API earlier this year.