champagne anarchist | armchair activist

Rabid feminists, fans and rightwingers

The Oxford Dictionary (the default dictionary on Mac OSX) has been accused of sexism in the examples it provides to illustrate how words are used. The debate focused on its definition of rabid: 1. having or proceeding from an extreme or fanatical support of or belief in something: a rabid feminist. 2. (of an animal) affected with rabies. her mother was bitten by a rabid dog. Why this example? Why portray feminists as rabid?

Apparently, the Oxford Dictionary first ridiculed the critique, but later issued a statement:

We apologise for the offence that these comments caused. The example sentences we use are taken from a huge variety of different sources and do not represent the views or opinions of Oxford University Press. That said, we are now reviewing the example sentence for «rabid» to ensure that it reflects current usage.

«In other words, it’s not the dictionary that’s sexist, it’s the English-speaking world», David Shariatmadari commented in the Guardian. He adds a warning that the review the dictionary plans to do may well find that rabid in fact does occur more often in combination with feminist than with other words (especially if online discussions are included). Even so, the dictionary cannot simply hide behind a word count - they’re still responsible for the editorial choices they make.

And how about the Guardian itself? The table below lists the words that appear most frequently after rabid in Guardian articles since 1999. The words have been stemmed so as to lump together terms like racist and racists.

term count
dog 136
anti 86
fan 63
right 33
support 21
nationalist 18
anim 15
press 14
rightwing 14
tori 13
republican 13
fanbas 13
rightw 12
puppi 11
follow 10
critic 9
antisemit 9
nation 9
racist 9
bat 9
feminist 9
crowd 9
home 9

The term anti deserves a separate analysis. The table below lists the most frequent words matching the pattern rabid anti[\s|\-]([a-z]+), again reduced to their stem.

term count
semit 11
european 10
communist 9

Terms like dog, anim[al] and bat obviously have to do with the second meaning of the term rabid (affected with rabies). Other than that, it’s clear that rabid is far more often used in combination with fan or rightwing than feminist. At least so in the Guardian.


I simply adapted the code I wrote earlier to analyse use of the term illegal in the Guardian and the New York Times.


Minister Jeroen Dijsselbloem takes up data visualisation challenge

Every year, Dutch Finance Minister Jeroen Dijsselbloem sends a report to Parliament on state participations - companies that are (partially) owned by the state. Recently, the minister answered questions from the Finance Committee of the Lower House. One of them questioned the use of a stacked bar chart to show dividends, «since this isn’t very clear». The minister acknowledges the problem and takes up the challenge:

In creating this bar chart we aimed at comprehensiveness by including all dividends received from all state participations. Because of the large differences in dividend, this results in sub-optimal readability. For the 2015 annual report, it will be considered whether the readability can be improved without making concessions to comprehensiveness.

I’m sure he’ll be interested in good ideas, so if you have any suggestions for improving the chart, tweet them to @j_dijsselbloem. And if you want to give it a try yourself: here’s the data for 2010–2014.


Solid reputation of Statistics Netherlands (CBS) ‘at risk’

Statistics Netherlands (CBS), the Dutch national statistics office, has always had a solid, if somewhat dull, reputation. The organisation published data, but didn’t do projections and was reluctant to offer interpretations. Meanwhile, it was considered to be among the best statistics offices in the world. But over the past two years, there have been some changes.

In 2014, the newly appointed director of the CBS said in an interview (in Dutch) that he wanted his organisation to participate in public debates. Not to express opinions, he assured, but to correct «inaccurate representations». Asked for an example, he referred to the Pikkety debate. He felt that data about inequality had been used to provoke a response of «emotional aversion».

In early 2015, the CBS developed a strategic agenda. Some elements of this agenda were about its core business. For example, the CBS wants to automate in order to become less dependent on spreadsheets and manual data processing - which seems to make sense. But the emphasis was on becoming a «news organisation» with a «prime time focus».

Today, Rutger Bregman of De Correspondent has published an analysis (in Dutch) of the new course of the CBS. The organisation plans to stop collecting data on a wide range of topics, including private debts to car dealers and credit card firms, and patients’ satisfaction with health care. Meanwhile, it has invested in a «newsroom».

Bregman discusses a number of instances where the CBS took a position in charged political debates on topics like inequality and the effects of child care cuts. He argues that its role in those debates was dubious. For example, the CBS said that participants in support programmes for job seekers are more likely to find a job than non-participants, without pointing out that this says nothing about the effectiveness of these programmes. Of course, the broader issue is that the CBS gets caught up in controversies, which may undermine public confidence in its data.

Public funding of the CBS has been cut. Income from external clients has risen from 5% to 15% and is expected to reach 25% by 2019, according to a chart in Bregman’s article. The government has sent a proposal to Parliament to dismantle the independent body that determines the research programme of the CBS (an amendment to preserve the independence of the CBS will put to a vote on Tuesday). Bregman concludes:

[…] data is easily misused. A statistics office that wants to offer more interpretation, wants to make the headlines more often, wants to earn more money and has less oversight, runs more of a risk to do so, no matter how you look at it. The CBS has become world-class precisely by resisting this temptation.

In his article, Bregman indicates he sent his article to the CBS last week, but apparently they declined to comment. Today, their chief economist has responded on Twitter to one of the controversies discussed by Bregman. According to one of their researchers, Bregman’s article has created quite a stir within the CBS already.


Power and buzz: Analysing trade union HQ locations by closeness to power and by convenience store score

When Hans Spekman ran for chairman of the Dutch Social-Democrat party in 2011, he said he wanted to move the party’s headquarters from the posh office at the Herengracht in Amsterdam to a «normal district, a neighbourhood where things happen, like Bos en Lommer». Bos en Lommmer is a multicultural neighbourhood in the west of the city, in transition from deprived to gentrified.

I agree with Spekman (at least on this matter) and I think his ideas about locations should also apply to trade union headquarters. Out of curiosity I decided to analyse the headquarters locations of European trade unions, using two criteria. First: closeness to power, operationalised pragmatically as the walking distance from the union office to the national parliament. And second: the liveliness of the neighbourhood. For measuring this I propose the convenience store score, which assumes that the number of convenience stores within half a kilometer gives a rough indication of how lively a neighbourhood is. Convenience stores could be for example 7-Eleven or AH to go stores and some ethnic shops will also be classified as convenience stores.

The chart below shows the scores for each union. You can also see the locations of union offices, parliaments and convenience stores on an interactive map, but note that the map may take a while to load - it’s not very suitable for viewing on a smartphone.

The median union headquarters is within 2km walking distance from parliament. For about three-quarters of unions, the distance is below 5km. The general pattern thus seems to be that unions have their national offices close to the institutions of political power. There are exceptions though. Officials of the major Dutch federations FNV and CNV would have to walk 15 to 68km to reach parliament. And sometimes the distance is even longer: a Basque union has its HQ in Bilbao; a Turkish union in Istanbul and Polish union Solidarnosz has its HQ near the port of Gdansk, where it originated. But all in all, the large Dutch unions are quite exceptional in that they don’t have their headquarters near the centre of political power.

As for liveliness: the median number of convenience stores within half a kilometer from union headquarters is 2, but about one in three unions have no convenience stores nearby at all. Some of the most lively union office locations are in countries like Romania, Hungary and Bulgaria. Other examples are CFDT (France), TUC (UK), SAK (Finland) and UGT (Spain). Dutch unions are at the other end of the spectrum and have rather dull headquarters locations - judging by the convenience store score.

So where should a union be? I’d say that influencing the government is one of the tasks unions should be doing, and an important one at that. However, this doesn’t depend on having a headquarters close to parliament, but rather on the ability to mobilise workers. I’d argue that the convenience store score is a far better criterium to judge headquarters locations by.

In case you were wondering: Spekman was successful in his bid for the chairmanship of the Social-Democrat party. The party’s headquarters is still at the Herengracht, though: it turned out the lease doesn’t expire until 2018.

Full disclosure: I work at the FNV, at the former FNV Bondgenoten location.


This analysis turned out to be quite a bit more challenging than I initially thought, but it was very instructive. I’m especially happy that I now have a basic understanding of the Overpass API that you can use to retrieve Open Street Map data. OSM has always been a bit of a black box to me but the Overpass API turns out to be a valuable tool.

Measuring neighbourhood characteristics

Initially I wanted to use Eurostat regional stats to analyse neighbourhood characteristics, but Eurostat doesn’t have data beyond the NUTS 3 level (I should’ve known). Level 3 areas may comprise entire cities and are useless for analysing neighbourhoods, so I had to look for alternatives.

Subsequently, I tried getting the name of the smallest area a location is in using the Mapit tool (based on Open Street Map). I thought I might then be able to construct a Wikipedia url by adding the name to This turned out to work pretty well, not least because Wikipedia is quite good at handling different variants of geographical names. However, while Wikepedia articles tend to be informative, they do not contain a lot of uniform statistical information. Often population, area and population density will be included, but not much beyond that. In addition, the fact that the size of the areas varies poses problems. For example, the population density of a small area cannot be meaningfully compared to the density of a large area. In the end I did add the Wikipedia links to the popups on the map, but I continued looking for other ways to analyse neighbourhood characteristics.

One of the measures I ended up using is closeness to power, operationalised as the walking distance to the national parliament (in countries with a bicameral parliament, I used the location of the lower house). This was a pragmatic choice. An alternative would have been to use the location of ministries, but then I’d have to come up with a way to pick the relevant ministry.

For measuring the liveliness of a neighbourhood, I used the number of convenience stores within half a kilometer, using data from Open Street Map. Obviously there are some limitations to this method. For example, some countries will be mapped in more detail than others. Also, there will be inconsistencies in how shops are classified (cf this discussion in Dutch about how to classify stores of chains like Blokker).

Obviously, the convenience store score has not been properly validated. I’m not even sure whether objective measures of a neighbourhood’s liveliness exist. I checked this list of «coolest» neighbourhoods in Europe and all but one (Amsterdam Noord) have convenience stores nearby, but then again coolness isn’t the same as liveliness (I guess a neighbourhood can be uncool yet lively). Furthermore, being on a list of cool neighbourhoods isn’t necessarily an indicator of coolness.

Ideally I think a proper assessment of the convenience store score should include a comparison with measurements of criteria derived from Jane Jacob’s The death and life of great American cities: mixed primary uses, short blocks, buildings of various ages and density. I guess it should be possible to measure some of these with OSM data (especially the first two). However, that would require a deeper understanding of OSM classifications than I currently have.

Getting the data

While some of the data was obtained by good old-fashioned googling, some of it could be automated.

The starting point for the analysis was the list of affiliates of the European Trade Union Confederation (ETUC). Note that this includes unions in non-EU countries such as Turkey. Also note that I use the word union but most are in fact union federations (the FNV is a bit more complicated; a recent merger has partly done away with the federation structure).

The ETUC doesn’t seem to have a list of addresses on their website. They do provide urls for most of their affiliates. Still, looking up addresses was a bit of an adventure, especially for countries which use non-Latin alphabets (let me know if you find any errors).

For walking distances I used the Bing API. In a number of cases Bing couldn’t find a walking route or the distance seemed wrong. In those cases I manually looked up the distance in Google Maps. Here’s a sample url for getting information from the Bing API (replace KEY with API key).

I used the Overpass API (demo) of Open Street Map to get all nodes within 500m from the union HQs, which I used for counting the number of convenience stores. I also used the API for getting the coordinates of all convenience stores in all countries where the ETUC has affiliates. Here’s a sample url for getting all nodes within 500m of a location, and here for getting all convenience stores in a country.

A few unions are missing in the final results because of missing data. For example, I couldn’t figure out what the main office of the Belgian ACV is and I couldn’t find the exact location of the parliament of Malta (somewhere along Republic Street, Valletta).

Calculating scores

I calculated scores as either walking distance to parliament in kilometers or the number of nearby convenience stores. In both cases I took the log10 of the value + 1. To arrive at a 0 to 10 scale, I multiplied by 10 and divided by the maximum score for each variable. For the distance to power measure I converted the score to 10 minus the score, so that a higher score means closer to power.


I used Leaflet and D3.js to map the locations of HQs, parliaments and convenience stores. There are over 60,000 convenience stores in the dataset. This turned out to be a bit too much and the browser all but crashed. I found this script that deals with exactly this problem. While I managed to figure out what I needed to change to make the script work with my data, I’m afraid I don’t fully understand how it works. It’s still too slow for mobile, though.

The political effects of financial crises

In a fascinating study, Manuel Funke, Moritz Schularick and Christoph Trebesch analysed the social and political aftermath of 103 financial crises. During the five years following a financial crisis, the following pattern can be expected:

  • The vote share of far right parties increases by 30%. For far left parties, such an effect was not found. «After a crisis, voters seem to be particularly attracted to the political rhetoric of the extreme right, which often attributes blame to minorities or foreigners».
  • The fragmentation of politics increases and the vote share of coalition parties diminishes.
  • There is more frequent government instability and a higher probability of executive turnover.
  • The average number of anti-government protests almost triples; the number of violent riots doubles (but this effect is lacking in the post-WW2 period) and general strikes increase by at least one-third.

Sounds familiar. Interestingly, the researchers have also looked into long-term effects:

The graphs demonstrate that the political effects are temporary and diminish over time. 10 years after the crisis, almost all variables are back to their pre-crisis levels. The top panel shows that the increase in far-right votes is no longer significantly different from zero after year 8.

The authors ascribe the rise of the Dutch Party for Freedom (5.9% in 2006, 15.5% in 2010) to the crisis of 2008, so the historical pattern suggests their popularity will diminish by 2016.

Or does it? The graph the authors refer to helps to clarify this matter. There’s no evidence that the popularity of far right parties diminishes in the longer term. What they’re describing is that the confidence interval (the grey area) widens. So much so that you can’t really predict on the basis of the available data what will happen after eight years.

Another matter is the interpretation of the effects. Funke e.a. consider the political instability following financial crises a «political disaster»:

These developments likely hinder crisis resolution and contribute to political gridlock. The resulting policy uncertainty may contribute to the much debated slow economic recoveries from financial crises.

They seem to imply that governments tend to take appropriate measures and that therefore, having a strong government is good for economic recovery. That’s debatable. People like Paul Krugman and Ewald Engelen argue that the austerity policies of especially European governments have a negative impact on economic recovery.

This is relevant, for previous research found that the same social upheaval Funke a.o. associate with financial crises can also be explained as an effect of austerity policies. This raises the question how causality works here: are social (and political) unrest caused by financial crises, or by the way in which governments respond to these crises? Perhaps the stubborn austerity policies of the European and Dutch governments have contributed to the continuing popularity of the Party for Freedom?

Funke a.o. describe their research here; Statewatch has put the original article (pdf) online (I discovered the study via an article by Krugman). The earlier study on austerity and protests was done by Jacopo Ponticelli and Hans-Joachim Voth (I wrote a post on it a couple years ago).


Collecting data on millions of Facebook users to analyse their psychological traits

The Guardian has revealed how British academics have collected information about millions of Facebook users and used the data to score them on openness, conscientiousness, extraversion, agreeableness and neuroticism. The academics were paid by funders of the campaign of US presidential candidate Ted «Carpet Bomb» Cruz.

The fact that information from public Facebook profiles can be used to create psychological profiles is intriguing but not really new. Researchers have claimed they can assess someone’s personality reasonably well by analysing what they like on Facebook or by analysing personal information, activities and preferences, language features and internal Facebook statistics.

What was new to me (but apparently not to everyone) is how the academics connected to the Cruz campaign went about collecting people’s Facebook data. They used Amazon’s Mechanical Turk platform to recruit people to fill out a questionnaire that would give the researchers access to that person’s Facebook profile. Not only would they download data about the participants themselves, but also about their Facebook friends - even though those friends were unaware of this and hadn’t given permission. Participants were paid about $1 each for access to their Facebook network.

According to the Guardian, Facebook users had on average 340 friends in 2014. Of course, there’s considerable overlap between people’s networks so it can be assumed that the average participant would yield far less than 340 new profiles. Even so, this would seem to be a pretty efficient - if sneaky - way to collect data on Facebook users.

The Guardian doesn’t discuss whether this method would still work today, but I doubt it would. Out of concern for the privacy of its users (sure!) Facebook has cut off access to users’ friends’ data when it updated it’s API earlier this year.


Interactive charts - are Dygraphs or Plotly alternatives for D3?

There are quite a few Javascript libraries that you can use to create interactive graphs (with the added bonus that your graphs look crisp: somehow my PNG images always end up looking blurry). Some of these libraries are based on D3.js and are intended to make coding with D3 easier:

The sheer number of D3 based charting tools gives a good indication of how much people love the D3’s functionality, and yet actually hate coding with D3 directly.

Personally, I don’t hate coding with D3. Still, because I only use it sporadically, I’m constantly figuring out how things worked again (unsurprisingly, this involves a lot of code recycling and a lot of googling). So I wouldn’t be averse to an easier alternative for creating basic line graphs that don’t really require the advanced capabilities of D3.

The other day, I posted an article on how newspapers use the word illegal, which included two rather dense spaghetti graphs (I’m using the term in the non-technical sense of the word). I added dropdowns to link labels to lines. Labour relations scholar Maarten Hermans suggested I take a look at Dygraphs instead. A few days later, I read at Flowing Data that the Plotly.js library is being open sourced. So let’s give them a try (note that I haven’t checked how they do on mobile devices).


As far as I can tell, Dygraphs can only create line graphs. Below is a Dygraphs version of one of the spaghetti graphs from my article on the word illegal.

I have no problem with their somewhat dull colour scheme, although I’d probably change that. Note that you can click-and-drag to zoom in on a part of the chart and double-click to zoom out again.

This chart type isn’t really suitable for a graph with this many lines: when you hover your mouse over the graph, the labels obscure much of the graph. You could move the labels outside the graph area, but then you’d have to reserve quite a bit of space to accomodate them.

So I also did a version of a simpler graph, from an article in which I traced the use of the word machine for bicycle.

In this chart I made some modifications (line colour, position of the labels). I found that making these changes is relatively straightforward and well-documented. I’m not entirely happy yet with how the labels look, but other than that I think the graph looks pretty much OK.


Same approach with Plotly: here’s a version of the spaghetti graph from the article on the word illegal:

Plotly has similar click-and-drag / double-click functionality to zoom in and out as Dygraphs (it works slightly different in that you can also zoom in vertically). I think the way they show the labels to the right of the chart is OK. If you click them, the line associated with the label disappears and reappears, so you can easily find the line associated with a label. It would be even better if clicking a label would make the associated line toggle between grey and coloured.

I’m not happy with what happens when you hover your mouse over the graph. You get a menu to the top that offers functionality that strikes me as superfluous. And the labels that pop up are obviously a mess with a spaghetti graph like this one.

So here’s a Plotly version of the simpler bicycle graph:

That’s better, although I still haven’t figured out how to get the title to left-align (CSS doesn’t seem to work and here they only explain how to set font-family and size). More importantly, I still think there’s too much stuff popping up when you hover your mouse over the graph.


I can see myself using Dygraphs on occasion if I need a simple line graph. As for Plotly, I don’t like all the clutter in their charts and while I’m sure you can get rid of that, I’m afraid that would defeat the purpose of making it easier to create graphs.

Come to think of it, it seems that quite a few data visualisation libraries, and I’m not just thinking of Plotly here, haven’t really outgrown the «look at all the neat tricks I can do» phase. Until they move in the direction of Edward Tufte’s more minimalistic approach to data visualisation, I think I’ll rather keep struggling with D3.

Immigrants, filesharing and wiretaps: How newspapers use the word illegal

People should mind their language: an apparently neutral term like immigration has gotten xenophobic overtones as a result of its frequent use in combination with illegal, James Gingel argued in the Guardian. As an illustration, he pointed out that illegal, when typed into a Google search box, will likely get autocompleted to illegal immigrant or illegal immigration.

Earlier, the Guardian had been criticised for using the term illegal immigrant, among other things because it’s dehumanising. David Marsh of the Guardian Style Guide agreed. (The Style Guide itself takes a rather technical position on the matter: «… there is no such thing as an illegal asylum seeker … An asylum seeker can become an illegal immigrant only if he or she remains in Britain after having failed to respond to a removal notice.»)

Personally, I’d be in favour of reappropriating the term illegal immigrant - but it’s not for me to tell other people what strategy to use.

So how does the Guardian use the word illegal? I counted the words that follow the word illegal in their articles. I ignored stop words and in most cases I used stemming to lump together words like download, downloads, and downloading (see Method below).

The chart shows that the term illegal is most often used in combination with immigrant and variants. Other than that, it appears that illegal filesharing is a 2009 thing and that illegal phone [hacking] was an issue in 2011. Unsurprisingly, the expression illegal war started being used in 2003. By the way, what’s the status of that trial?

There’s also a bit of a peak in mentions of illegal thing in 2000. This can be attributed to a series of interviews in which one of the standard questions was «What was the last illegal thing you did?» The answers are somewhat boring, with the exception of «Shot a man in Reno just to watch him die» (a reference to Johnny Cash, of course).

The Guardian’s search API is largely limited to articles that appeared after 1998. For a longer term perspective, let’s turn to the New York Times, which offers access to the lead paragraphs of articles dating back to it’s origin in 1851.

That’s weird: expressions with the term illegal seem to have been rare until the 1970s. Either that, or I made an error in my analysis of the NYT data. I checked their own Chronicle tool, which confirms that the term illegal wasn’t used very much before the 1970s.

Again, the term illegal is mainly used in combination with aliens (1980s) and immigrants (2000s), but such uses seem to have dropped in the 2010s. My guess would be that this has to do with the growing importance of the «Latino vote», which means that politicians can no longer evoke negative images of immigrants without risking electoral consequences.

Speaking of vote: the expression illegal vote is one of the rare uses of the term illegal in the early days of the New York Times. Illegal voting appears to have been a recurrent concern in 19th century New York, as illustrated by a report from 1888:

Notwithstanding the widespread reports to the contrary and the wholesale issue of warrants for the arrest of illegal voters yesterday’s election in King’s County passed off without unusual excitement.

Tracking the use of the expression illegal strike provides an interesting insight into American social history: wildcat teachers’ strikes in the 1960s, broader public sector strikes in the 1970s and Reagan’s brutal standoff with air traffic controllers in the 1980s. Despite the progressive reputation it enjoys today, the New York Times often sided with law and order, for example in this 1962 report:

It was not a day New York City could be proud of. Half of the city’s 40,000 public school teachers had chosen an outlaw course and stayed away from their classrooms in an illegal strike. (If you’re wondering why public sector workers resorted to illegal strikes, read this article.)

The 1970s saw a modest peak in the use of the expression illegal wiretaps, often in connection with Watergate. In an article from 1974, the question was raised «whether President Nixon may have knowingly used claims of national security to cloak illegal wiretaps and other illegal surveillance». How modern.

So here’s my preliminary, non-scientific conclusion: newspapers appear to use the term illegal mainly to talk about immigrants, but when those in power really mess up, their actions will occasionally be called illegal too.


I used the search APIs of the Guardian and the New York Times to search for articles with the search term illegal. I counted the words following the term illegal, using the Python nltk library to exclude English stop words and to reduce words to their stem. A practical matter is that stemming will reduce both immigrant and immigration to immigr. Since some of the arguments against using the expression illegal immigrant do not apply to illegal immigration, it makes sense to differentiate between immigration and immigrant. Therefore, I separately counted occurances of the expression illegal immigrant[s]. Here’s the code.


Strava has new maps, created by Mapbox


Strava has new maps. They look good and they are well-designed. Highways, which dominate normal maps, have gotten a modest grey colour. Paths and pedestrian areas are highlighted in yellow. Parks and water bodies serve as orientation marks.

They have removed clutter so you can easily follow the road pattern. And they’ve taken care to make elevation patterns visible (too bad most hills in the Netherlands are too low to qualify for such markings).

The maps have been created by Mapbox, an alternative for Google Maps. Some of the material they use is from Open Street Maps and they contribute to Leaflet, an open source Javascript library for creating online maps. When Strava initially switched from Google Maps to Mapbox last summer, they apparently got some complaints about the disappearance of Streetview. Good thing they decided to stick with Mapbox and further improve the maps.

Hopefully the new Strava maps will become available open source for use with Leaflet. Or for your Garmin.

Via @hapee.


Dates in Excel

Microsoft explains:

«Microsoft Excel is preprogrammed to make it easier to enter dates. For example, 12/2 changes to 2-Dec. This is very frustrating when you enter something that you don't want changed to a date ...»


«... Unfortunately there is no way to turn this off.»

Er.. WHY?